The Slot Machine
The render was black. It should have been a fractal. I asked the AI to fix it.
It changed a threshold. Still black. It changed a sign. Still black. It reordered two folds. Still black. It added an epsilon. Still black. Each diff was clean. Each one had a confident sentence attached. None of them touched the actual problem. I watched it pull the lever eleven times in a row and call each pull a fix.
This is the most common way AI-assisted debugging fails. Not bad code. Bad search.
Operating Conditions
I was building a GPU fractal renderer. One formula family, a hybrid that folds space before estimating distance, produced a black frame. No crash. No error. Just nothing where geometry should be. The hardest class of bug: silent, visual, and buried in math the model half-understands.
The AI was eager. That was the problem.
Failure Modes
A human debugging this bug runs maybe three experiments an hour. Reading the math is slow. Changing a line, recompiling the shader, re-rendering, and squinting at a black square is slow. That slowness is not a defect. It is a tax on guessing, and the tax forces you to think before you spend.
The model pays no such tax. A fix costs it a few seconds and zero embarrassment. So it does the thing that is locally rational and globally useless: it guesses, fast, forever. Change something plausible, observe failure, change something else. No hypothesis. No mechanism. Just a hand on the lever and an infinite bankroll of other people's time.
Eleven pulls in, the codebase had drifted. Three of the "fixes" were still in the tree, interacting. Now I had the original bug plus a small pile of plausible noise on top of it. The search had made the problem harder to see.
Root Cause
The model has the wrong cost function. It optimizes for producing a next action, not for reducing uncertainty. A failed attempt and a successful diagnosis look almost identical in its loss: both are a turn that ends. There is no internal penalty for thrashing. You have to impose one from the outside.
Speed is the trap. The faster the tool, the cheaper the guess, the more guessing replaces thinking. A slow human and a fast model fail the same bug in opposite ways. The human gives up. The model never does, and that is worse.
Proposed Fix
I gave it a rule. Two failed fixes, then stop.
After the second attempt that does not work, no third attempt is allowed. Instead, write one page. Three hypotheses for the mechanism. For each, the observation that would confirm or kill it. Then go read the reference source and find out which one survives. Only then are you allowed to touch code again.
The rule does nothing clever. It just reintroduces the tax. It makes the model spend the expensive resource, attention, before the cheap one, edits. The first time I enforced it on the black render, the one-page write-up found the cause in a single pass: a fold path with a negative option was poisoning the distance estimator, sending it down a numeric branch the geometry never intended. The reference renderer skipped that branch by design. We were not skipping it. No threshold, no sign, no epsilon was ever going to find that. Reading was.
Eleven guesses found nothing. One forced hypothesis found everything.
System Status
The lever is not the enemy. A fast tool is a gift, right up until it lets you avoid thinking. Your job is to put the tax back. Cap the guesses. Force the write-up. Make the machine say what it believes is true and how it would know, before it is allowed to change one more line.
An AI will pull the lever as many times as you let it. The number you set is the whole game.
Set it to two.
No comments yet