I have read comments about this on X, here, and other places, yet I have ever se...

wg0 · 2026-03-15T21:10:50 1773609050

I agree. The code despite detailed spec reveals bugs and edge cases upon inspection.

I'm talking Claude Opus 4.6 here.

dude250711 · 2026-03-15T22:29:04 1773613744

For all we know, some important clients might just be getting better service out of Anthropic's/OpenAI's "black boxes".

cluckindan · 2026-03-15T22:53:34 1773615214

The spec needs to be explicit about edge and corner cases.

layer8 · 2026-03-15T23:24:25 1773617065

At some point such a spec converges to the actual code you’d have written.

fy20 · 2026-03-16T02:36:38 1773628598

I've actually tried that and it helps. First I create a PRD type doc, then I have the AI break it down in a task doc, including code snippets where relevant. This helps it to think through edge cases before it starts implementing (oh we need X now, but that means we should have done task 3 differently to allow that).

NinjaTrance · 2026-03-15T21:37:37 1773610657

> I use Claude Opus (4.5, 4.6) all the time and catch it making making subtle mistakes, all the time.

Didn't we make subtle mistakes without AI?

Why did we spend so much time debugging and doing code reviews?

> Are you really being more productive (let’s say 3x times more)

At least 2x more productive, and that's huge.

csto12 · 2026-03-15T21:57:21 1773611841

I think you’ve forgotten about the context of OP’s post. He said he uninstalled vscode and uses a dashboard for managing his agents. How are you going to be able to do code review well when you don’t even know what’s going on in your own project? I catch subtle bugs Claude emits because I know exactly what’s happening because I’m actively working with Claude, not letting Claude do everything.

turlockmike · 2026-03-15T22:01:54 1773612114

The code is still visible if i want to review it.

But since I have a strong rule about always writing unit tests before code, my confidence is a lot higher.

https://simonwillison.net/2025/Dec/18/code-proven-to-work/

csto12 · 2026-03-15T22:56:39 1773615399

>The code is still visible if i want to review it.

I agree that the test harness is the most important part, which is only possible to create successfully if you are very familiar with exactly how your code works and how it should work. How would you reach this point using a dashboard and just reviewing PRs?

jplusequalt · 2026-03-15T21:45:17 1773611117

Are you getting paid 2x more?

cdelsolar · 2026-03-15T22:17:49 1773613069

i really don't understand why people keep thinking this. i'm easily 10x more productive since Claude Code came out. it's insane how much stuff you can build quickly, especially on personal projects.

csto12 · 2026-03-15T22:57:49 1773615469

Of course personal projects are much quicker because usually personal projects don't have high code standards... I'm talking about production code.

baq · 2026-03-15T21:19:15 1773609555

typical experience when only using one foundational model TBH. results are much better if you let different models review each other.

the bottleneck now is testing. that isn't going away anytime soon, it'll get much worse for a bit while models are good at churning code out that's slightly wrong or technically correct, but solving a different problem than intended; it's going to be a relatively short lived situation I'm afraid until the industry switches to most code being written for serving agents instead of humans.

turlockmike · 2026-03-15T22:03:20 1773612200

The way LLMs work, different tokens can activate different parts of the network. I generally have 2-3 different agents review it from different perspectives. I give them identities, like Martin Fowler, or Uncle Bob, or whatever I think is relevant.

baq · 2026-03-16T09:28:53 1773653333

true - but the way LLMs are trained, google's RLVR is different from anthropic's is different from openai's. you'll get very good results sending the same 'review this change' prompt (literally) to different models.