Building with Claude
03Chapter · Building with ClaudeNew
2 min read

Reading the Output: Trust, but Verify

The skill that separates people who get burned from people who ship isn't prompting. It's knowing how to read what comes back — and how to make Claude check its own work.

Last updated ·
What changed
  • New chapter — added in the June 2026 restructure.

The loop has a step everyone rushes: read. Claude stops when the work looks done — not when it is. Without a check, you are the verification loop.

Here is the uncomfortable truth underneath every impressive demo. A model produces text that looks finished, confident, and correct — and looking correct is not the same as being correct. When it writes “Done, everything's working,” it isn't lying; it genuinely can't tell the difference between a result it verified and one it merely produced. So either you check every answer by hand, forever, or you give the loop something it can check itself against. That second move is the whole skill of this lesson.

act produce check? ship it pass fail — back to act, automatically
Give the loop a check that returns pass or fail — a test, a build, a screenshot diff — and it closes on its own: it sees red, fixes, sees green, without you in the middle.

Demand evidence, not assertion

The cheapest habit that changed my results: when Claude claims success, ask to see the proof, not the claim. Watch the difference play out:

Done — I fixed the bug and everything's working now.
Show me the test output.
Ran npm test — 1 failing: checkout total. Let me look again…

↳ the gap was real and one question surfaced it. “It's working” is an opinion; the test output is a fact.

So make the evidence part of the ask: “change this, then run the tests and show me the result,” not just “change this.” If you can't verify it, don't ship it — and scale your scrutiny to the stakes. A throwaway draft you'll read anyway needs only a glance; a database migration or anything shared earns the full proof before you trust it.

Where it breaks

Plausible-but-wrong output that passes a glance — clean-looking code with a subtle off-by-one, a summary that reads perfectly and quietly dropped the one fact that mattered. It's the most dangerous failure because nothing looks wrong. And when a thread gets stuck re-failing, don't argue with it — after a couple of bad corrections, clear and re-prompt with a clean desk.

Try it yourself

On your next real task, before you accept the answer, ask yourself one question: what evidence would prove this is right? A passing test, a matching screenshot, the command and its actual output. Then make Claude produce that evidence rather than just claim success. The habit is small; the number of quietly-broken things it catches is not.

Grounded in Anthropic's Claude Code best practices — “give Claude a way to verify its work,” and the trust-then-verify gap.

New chapters land here as I learn them. Want the next one?