Anthropic made Claude worse for a month — this is how they got caught

I love using Claude, to the point where I cancelled ChatGPT, Perplexity, and Gemini because Claude did everything I needed. But if you’re like me and have been using Claude for a while, you would’ve noticed the responses feeling sloppier. The model seemed to forget what it was doing mid-task, and code quality dropped massively.

When I went looking for answers, none came from Anthropic. Frustrated users kept complaining across platform, and the company behind Claude remained silent—unitl one AMD executive made that silence impossible to maintain.

Someone left Claude Code running overnight, and it cost $6,000

Claude Code worked overtime and billed like a senior consultant.

Users noticed the change before Anthropic did

Benchmark drops, strange behavior, and mounting complaints

The complaints about Claude’s degrading quality started trickling in around early March 2026. Developers on Reddit, GitHub, and Hacker News reported that Claude Code—Anthropic’s AI-powered coding tool—had gotten noticeably worse. The model was reading through code less carefully before making changes, leaving tasks halfway, and producing fixes that were technically correct but an architectural nightmare.

Among these frustrated users was Stella Laurenzo, senior director of AMD’s AI group and the engineer who previously built Google’s OpenXLA infrastructure. On April 2, she filed a detailed GitHub issue that ended up becoming the starting point for the entire controversy. Her complaint wasn’t based on gut feeling either.

She and her team had analyzed 6,852 Claude Code sessions covering 17,871 thinking blocks and 234,760 tool calls. What they found was that Claude’s median thinking depth had collapsed by roughly 73% since early February. The read-to-edit ratio, a measure of how much Claude studies code before touching it also fell from 6.6 reads per edit to just 2. Edits made without reading any code first jumped from 6.2% to 33.7%. The conclusion: Claude cannot be trusted to perform complex engineering tasks.

Period	Thinking Visible	Thinking Redacted
Jan 30 – Mar 4	100%	0%
Mar 5	98.5%	1.5%
Mar 7	75.3%	24.7%
Mar 8	41.6%	58.4%
Mar 10-11		>99%
Mar 12+	0%	100%

Every senior engineer on her team had independently noticed the same pattern, which made it especially hard to dismiss. She also noticed a clear behavioral shift from Claude being research-first and cautious to being edit-first and hasty. The GitHub issue explains this by saying that when thinking is shallow, the model defaults to the cheapest action available: edit without reading, stop without thinking, dodge responsibility for failure.

Developer: Anthropic PBC
Price model: Free, subscription available

The silence became part of the story

Why the lack of communication frustrated users even more

Claude Code editing photos using Darktable. — Yadullah Abidi / MakeUseOf

The degraded output is one thing, but Anthropic’s response, or rather the lack of it, made things worse. For weeks, the company offered no blog post, no status page update, no email to subscribers, and no formal acknowledgment of the problem. Individual engineers made informal comments on social media, but the company as a whole said nothing meaningful while charging customers $20 to $200 per month for a tool that had suddenly become significantly worse.

In the meantime, third-party benchmark data kept piling on. BridgeMind reported that Claude Opus 4.6’s accuracy on their hallucination benchmark had dropped from 88.3% to 68.3%, sending it from second place all the way down to tenth on the leaderboard. The exact methodology behind the test is contested, but it was consistent with the broader narrative that Claude had suddenly become worse, and no one knew why.

One month, multiple problems

The bugs, regressions, and odd behavior that piled up fast

Claude Code open on multiple monitors on a desktop computer running Windows — Amir Bohlooli / MUO

Anthropic finally released a detailed report on what was going on on April 23. Long story short, the problems were being caused by three separate product-layer changes that had stacked on top of each other between March and April, each affecting a different part of the user base on a different schedule. The weights themselves never changed, but the infrastructure around them was affected.

The first change happened on March 4, when Anthropic changed Claude Code’s default reasoning effort from high to medium. Anthropic claims it was done because high-effort mode was causing the UI to appear forzen during long thinking periods. What actually ended up happening is that a lot of users suddenly started noticing the intelligence drop but didn’t know why, and Anthropic had shipped no formal warning in advance. It took until April 7, over a month later, for the company to revert the change.

Second issue was a caching bug that came March 26. Anthropic built an optimization to clear Claude’s older reasoning history from sessions that were idle for over an hour. However, a bug in this routine caused the cache clearning to fire every single turn for the rest of the session, not just once. This meant that every follow-up question you asked, reduced Claude’s reasoning history and over time, longer chats started showing the forgetfulness and bizzare tool choices user had been reporting. It also cause a separate wave of complaints about usage limits draining faster than usual.

ChatGPT, Claude, Gemini, and Perplexity open on Windows laptop and Android phone. — Image taken by Yadullah Abidi | No attribution required.

Last but not least, the third change shipped on April 16 alongside Opus 4.7. Anthropic added a verbisoty instruction to Claude Code’s system prompt that kept tool calls to less than 25 words and final responses to less than 100 words. It seemed safe during weeks of internal testing, but later investigation showed a 3% quality drop across both Opus 4.6 and Opus 4.7. It was reverted four days later on April 20.

Because each change hit different users at different times, the overall effect looked more like a subtle, inconsistent degradation.

The internet did the investigation

How users, benchmarks, and community testing exposed the issues

Stella Laurenzo’s GitHub post might the be most significant factor behind Anthropic not only being caught, but also forced to investigate and fix the issue. She didn’t just complaint—she built an analysis pipeline, intrumented her sessions, and produced a data-backed report that Anthropic’s internal teams couldn’t ignore. It was also detailed and repoducable enough that other engineers could look at their own session logs and recognize the same patterns.

The Hacker News thread debated whether Anthropic’s stated rational for the changes was real or just a cover for cost-cutting. Anthropic’s internal staff had also been using a different build of Claude Code than what shipped to paying customers, meaning the dogfooding that’s supposed to catch these issues never caught them. I love Claude, but these mistakes were painful.

Why Claude feels more human to talk to than ChatGPT, and what that actually means

It’s not magic. Here’s what’s actually going on.

In its response, Anthropic has committed to fixing both problems. A larger share of internal staff will now use teh exact public build. The company will run broader per-model evaluation suites for every system prompt change. And as a gesture of goodwill, Anthropic reset usage limtis for all subscribers on April 23.

Regardless, it took a senior AMD director building a custom analysis pipeline to force the conversation. The lesson for anyone depending on a black-box AI service for professional work is clear: if you don’t measure your sessions, you may never find out if the tool got quitely worse.

What's Hot

ThreatLocker Raises $190 Million in Series F Funding

Hack “Writers” Fuming After ChatGPT Starts Refusing Prompts to Copy a Specific Author’s Style

This is what the Samsung Galaxy Buds On look like

Anthropic made Claude worse for a month — this is how they got caught

Winamp aims for a comeback with a new music player powered by Deezer

How to configure Heartbeats and Automations in Microsoft Scout

These Samsung features are quietly eating your storage

Unlocking Bing Search Hidden Features you didn’t know existed

6 things you’re doing that are ruining your Roku

My three-year-old Android feels new again after these tweaks

You don’t need a NAS to self-host — I proved it with hardware from my closet

Spotify is giving one of its best playlists a big visual upgrade to give subscribers ‘a closer connection’ to its New Music Friday curators — and I think it could be the update it’s always needed

The iPad Air brand makes no sense – it needs a rethink

Our Picks

ThreatLocker Raises $190 Million in Series F Funding

Hack “Writers” Fuming After ChatGPT Starts Refusing Prompts to Copy a Specific Author’s Style

This is what the Samsung Galaxy Buds On look like

Subscribe to Updates

What's Hot

Anthropic made Claude worse for a month — this is how they got caught

Users noticed the change before Anthropic did

Benchmark drops, strange behavior, and mounting complaints

The silence became part of the story

Why the lack of communication frustrated users even more

One month, multiple problems

The bugs, regressions, and odd behavior that piled up fast

The internet did the investigation

How users, benchmarks, and community testing exposed the issues

Related Posts