Blog · 2026-05-16 · 6 min read

Why developers spent spring 2026 trying to cancel Claude

A rate-limit revolt, a $20 plan that vanished for a day, an officially-admitted quality regression — and what an LLM API actually costs once you stop paying frontier rates for everything.

· llmdeal.me

The timeline nobody at Anthropic wanted

Events ran fast enough that even developers paying close attention missed some of it.

March 23, 2026. Anthropic throttled Claude Code during peak hours. Documented 5-hour Max 20x sessions — that's the $200/month plan — collapsed to roughly 19 minutes under sustained coding load. In the same announcement, Anthropic acknowledged that approximately 7% of users would newly hit rate limits they hadn't hit before. For agentic workloads that run long multi-step loops, 19 minutes is not a session. It's a warm-up.

Then, in the days that followed: a caching bug started draining sessions abnormally fast. Reports surfaced of a single prompt taking a session from 21% used to 100% in one shot. If you were unlucky enough to hit that on the first call of the morning, your session was gone before you'd done anything.

April 4, 2026. Anthropic blocked third-party tools — OpenClaw, Cline, Aider, OpenCode — from using subscription plans. The rule change was quiet: keep using those tools and you pay per-token API rates on top of your existing subscription. For developers who had built their entire workflow around Cline or Aider precisely because those tools weren't the Claude.ai interface, this was a significant forced upgrade in their cost model. The Hacker News thread hit 1,099 points and 827 comments. That's not typical HN engagement for a pricing tweak.

April 21–22, 2026. Claude Code was quietly removed from the $20 Pro plan for new users, then restored roughly 24 hours later after the backlash hit critical mass. A second HN thread: 683 points, 642 comments. By this point developers weren't debating the change — they were debating whether they trusted that the plan terms meant anything stable.

April 23, 2026. Anthropic published a postmortem. It admitted three engineering missteps that had been quietly degrading output for weeks: reasoning effort had been silently lowered, a context-caching bug was misfiring, and a response-length cap had been introduced that measurably cost quality. This was, to Anthropic's credit, an unusually candid document. But it also confirmed what developers had been suspecting: the model that people were using in April was materially different from the one they'd calibrated against in January — and they'd found out from usage patterns, not from any announcement.

The clearest data point came from Stella Laurenzo, a Senior Director of AI at AMD, who had audited 6,852 of her own Claude Code sessions. Her analysis found that median visible reasoning had fallen 73% — from roughly 2,200 characters to roughly 600. Fortune and TechRadar both covered it. It went viral in developer circles because it was empirical rather than anecdotal, and the magnitude was hard to explain away.

It was never really about Claude

The model itself wasn't the core problem. The billing model was.

The anger wasn't directed at Claude as a model — most developers who were loudest in the threads still considered it one of the best coding models available. What broke the trust was the subscription structure itself: a flat fee with limits that move underneath you, and you learn about those changes from the usage meter, not from an email.

That structure has a fundamental misalignment baked into it. A subscription gives the vendor every incentive to quietly tighten limits when compute is scarce — and Anthropic has been publicly candid that compute is scarce. When your provider is capacity-constrained and you're on a flat subscription, you're the variable in their equation. One viral post put it bluntly: "Claude is just like a person — it only works 8 hours a day." That framing stuck because it was accurate enough to sting.

The API-versus-subscription math is worth being honest about. A heavy month of agentic coding work — long context windows, multi-step tool calls, the kind of sessions Claude Code was marketed around — can burn well over $1,000 of tokens at raw API rates. That's why a $200/month flat subscription felt like an obvious buy: 5–10x savings, easy. Until the limits turned it into a $200/month cap on a shrinking bucket. At that point the math stops working in either direction. You're not getting the unlimited throughput of the API, and you're not getting the predictability of the subscription you thought you were buying.

What an LLM API actually costs in 2026

Headline rates, per million tokens, USD. Prompt caching and batch inference cut these further.

Model Input / 1M Output / 1M
Frontier
Claude Opus 4.7 $5.00 $25.00
Claude Sonnet 4.6 $3.00 $15.00
OpenAI o3 $2.00 $8.00
Google Gemini 3.1 Pro $2.00 $12.00
xAI Grok 4.1 Fast $0.20 $0.50
Value / open-weight
Claude Haiku 4.5 $1.00 $5.00
Gemini 3.1 Flash-Lite $0.25 $1.50
DeepSeek V4 Flash (open-weight) $0.14 $0.28
Llama 3.3 70B · DeepInfra (open-weight) $0.10 $0.32

The spread here is substantial. The most expensive output token — Opus 4.7 at $25/M — costs roughly 90× the cheapest credible option — DeepSeek V4 Flash at $0.28/M. A year ago that gap existed but the cheap end was meaningfully worse. Today, open-weight models trade blows with frontier models on real coding benchmarks. That changes the routing calculus significantly.

Rates checked against providers' own pricing pages, May 2026. Prompt caching (50–80% off input on supported models) and batch inference cut these further for asynchronous workloads.

Where everyone went

Spoiler: it wasn't a single destination.

A 500-developer Reddit sample taken after the April incidents found OpenAI's Codex preferred over Claude Code roughly 65% to 35% — and the dominant reason wasn't model quality, it was that Codex "doesn't hit limits." When your primary complaint is capacity, you vote with your feet toward the option that doesn't ration you, even if the underlying model is comparable.

The more interesting move, though, is hybrid routing. The premise: keep your toolchain and your existing IDE integrations, but swap the backend. Route the easy 60–80% of calls — autocomplete, short edits, docstring generation, simple refactors — to a cheap or self-hosted model. Escalate only the genuinely hard ones — architectural reasoning, multi-file refactors, complex debugging — to frontier. The tooling for this has matured: most serious codebases are now routing through an OpenAI-compatible proxy layer, which makes model selection a config change rather than a migration.

What does this actually save? Independent benchmarking from RouterArena and a Rice University study published in October 2025 put realistic routing savings at around 35% with under 2% accuracy loss compared to routing everything to the frontier. Combine routing with prompt caching and deliberate model selection per task type, and 70–85% cost reduction is reachable — but that upper range requires real engineering effort and depends heavily on your workload's distribution. It's not a slider you set to 85% and collect savings. It's a system you build and tune. The 35% figure is more defensible as a floor for most workloads with minimal effort.

The boring lesson: don't bet the stack on one model

Spring 2026's lesson isn't "Claude bad." Claude is still an excellent model — the postmortem and the subsequent fixes largely restored what was lost, and Anthropic's engineering team is among the most capable in the space. The lesson is structural: wiring your entire stack to one vendor's subscription creates a single point of failure for cost, capacity, and quality, all simultaneously. The limit can move. The model can change. The plan terms can change. And you'll find out from your usage meter.

The fix is to treat models as interchangeable backends behind a single interface. That's what llmdeal.me is — one OpenAI-compatible key, smart routing across cheap and frontier models, EU-resident inference, crypto checkout, no KYC. A bad week at any one provider becomes a routing decision instead of an outage. See the pricing · Read the docs.

References

  1. Hacker News — Anthropic blocks third-party Claude Code tools from subscription plans (1,099 pts, 827 comments; accessed 2026-05-16)
  2. Hacker News — Claude Code removed from the $20 Pro plan (683 pts, 642 comments; accessed 2026-05-16)
  3. The Register — Anthropic removes Claude Code from Pro plan (2026-04-22)
  4. Fortune — Anthropic's engineering missteps and Claude Code performance decline (2026-04-24)

Rates checked against providers' own pricing pages, May 2026. Article published 2026-05-16.