I love using Claude, to the point where I cancelled ChatGPT, Perplexity, and Gemini because Claude did everything I needed. But if you’re like me and have been using Claude for a while, you would’ve noticed the responses feeling sloppier. The model seemed to forget what it was doing mid-task, and code quality dropped massively.
When I went looking for answers, none came from Anthropic. Frustrated users kept complaining across platform, and the company behind Claude remained silent—unitl one AMD executive made that silence impossible to maintain.
Someone left Claude Code running overnight, and it cost $6,000
Claude Code worked overtime and billed like a senior consultant.
Users noticed the change before Anthropic did
Benchmark drops, strange behavior, and mounting complaints
The complaints about Claude’s degrading quality started trickling in around early March 2026. Developers on Reddit, GitHub, and Hacker News reported that Claude Code—Anthropic’s AI-powered coding tool—had gotten noticeably worse. The model was reading through code less carefully before making changes, leaving tasks halfway, and producing fixes that were technically correct but an architectural nightmare.
Among these frustrated users was Stella Laurenzo, senior director of AMD’s AI group and the engineer who previously built Google’s OpenXLA infrastructure. On April 2, she filed a detailed GitHub issue that ended up becoming the starting point for the entire controversy. Her complaint wasn’t based on gut feeling either.
She and her team had analyzed 6,852 Claude Code sessions covering 17,871 thinking blocks and 234,760 tool calls. What they found was that Claude’s median thinking depth had collapsed by roughly 73% since early February. The read-to-edit ratio, a measure of how much Claude studies code before touching it also fell from 6.6 reads per edit to just 2. Edits made without reading any code first jumped from 6.2% to 33.7%. The conclusion: Claude cannot be trusted to perform complex engineering tasks.
|
Period |
Thinking Visible |
Thinking Redacted |
|---|---|---|
|
Jan 30 – Mar 4 |
100% |
0% |
|
Mar 5 |
98.5% |
1.5% |
|
Mar 7 |
75.3% |
24.7% |
|
Mar 8 |
41.6% |
58.4% |
|
Mar 10-11 |
>99% |
|
|
Mar 12+ |
0% |
100% |
Every senior engineer on her team had independently noticed the same pattern, which made it especially hard to dismiss. She also noticed a clear behavioral shift from Claude being research-first and cautious to being edit-first and hasty. The GitHub issue explains this by saying that when thinking is shallow, the model defaults to the cheapest action available: edit without reading, stop without thinking, dodge responsibility for failure.
- Developer
-
Anthropic PBC
- Price model
-
Free, subscription available
The silence became part of the story
Why the lack of communication frustrated users even more
The degraded output is one thing, but Anthropic’s response, or rather the lack of it, made things worse. For weeks, the company offered no blog post, no status page update, no email to subscribers, and no formal acknowledgment of the problem. Individual engineers made informal comments on social media, but the company as a whole said nothing meaningful while charging customers $20 to $200 per month for a tool that had suddenly become significantly worse.
In the meantime, third-party benchmark data kept piling on. BridgeMind reported that Claude Opus 4.6’s accuracy on their hallucination benchmark had dropped from 88.3% to 68.3%, sending it from second place all the way down to tenth on the leaderboard. The exact methodology behind the test is contested, but it was consistent with the broader narrative that Claude had suddenly become worse, and no one knew why.
One month, multiple problems
The bugs, regressions, and odd behavior that piled up fast
Anthropic finally released a detailed report on what was going on on April 23. Long story short, the problems were being caused by three separate product-layer changes that had stacked on top of each other between March and April, each affecting a different part of the user base on a different schedule. The weights themselves never changed, but the infrastructure around them was affected.
The first change happened on March 4, when Anthropic changed Claude Code’s default reasoning effort from high to medium. Anthropic claims it was done because high-effort mode was causing the UI to appear forzen during long thinking periods. What actually ended up happening is that a lot of users suddenly started noticing the intelligence drop but didn’t know why, and Anthropic had shipped no formal warning in advance. It took until April 7, over a month later, for the company to revert the change.
Second issue was a caching bug that came March 26. Anthropic built an optimization to clear Claude’s older reasoning history from sessions that were idle for over an hour. However, a bug in this routine caused the cache clearning to fire every single turn for the rest of the session, not just once. This meant that every follow-up question you asked, reduced Claude’s reasoning history and over time, longer chats started showing the forgetfulness and bizzare tool choices user had been reporting. It also cause a separate wave of complaints about usage limits draining faster than usual.
Last but not least, the third change shipped on April 16 alongside Opus 4.7. Anthropic added a verbisoty instruction to Claude Code’s system prompt that kept tool calls to less than 25 words and final responses to less than 100 words. It seemed safe during weeks of internal testing, but later investigation showed a 3% quality drop across both Opus 4.6 and Opus 4.7. It was reverted four days later on April 20.
Because each change hit different users at different times, the overall effect looked more like a subtle, inconsistent degradation.
The internet did the investigation
How users, benchmarks, and community testing exposed the issues
Stella Laurenzo’s GitHub post might the be most significant factor behind Anthropic not only being caught, but also forced to investigate and fix the issue. She didn’t just complaint—she built an analysis pipeline, intrumented her sessions, and produced a data-backed report that Anthropic’s internal teams couldn’t ignore. It was also detailed and repoducable enough that other engineers could look at their own session logs and recognize the same patterns.
The Hacker News thread debated whether Anthropic’s stated rational for the changes was real or just a cover for cost-cutting. Anthropic’s internal staff had also been using a different build of Claude Code than what shipped to paying customers, meaning the dogfooding that’s supposed to catch these issues never caught them. I love Claude, but these mistakes were painful.
Why Claude feels more human to talk to than ChatGPT, and what that actually means
It’s not magic. Here’s what’s actually going on.
In its response, Anthropic has committed to fixing both problems. A larger share of internal staff will now use teh exact public build. The company will run broader per-model evaluation suites for every system prompt change. And as a gesture of goodwill, Anthropic reset usage limtis for all subscribers on April 23.
Regardless, it took a senior AMD director building a custom analysis pipeline to force the conversation. The lesson for anyone depending on a black-box AI service for professional work is clear: if you don’t measure your sessions, you may never find out if the tool got quitely worse.


