Blog · 8 min read

What every major LLM API costs in 2026

A complete, sourced pricing reference — frontier models, value models, and open-weight inference hosts — all in one place. Per 1M tokens, input and output, as of May 2026.

· llmdeal.me

The spread in 2026

The cheapest production API token today costs roughly 3,000× less than the most expensive one. DeepInfra's Qwen3-235B-A22B runs at $0.071/$0.10 per million tokens. Anthropic's Claude Mythos Preview — invite-only, critical infrastructure only — is priced at $25/$125 per million (third-party sourced; not on the official pricing page). Most developers will live somewhere between those extremes, but knowing the full range matters when you are deciding which model to default to, which to cache, and which to batch.

The other defining trend is the open-weight wave. DeepSeek V4 Flash, Kimi K2.6, MiniMax M2.7, GLM-5.1, and Qwen3-235B-A22B are all open-weight and available on multiple inference hosts simultaneously. That creates genuine competition on hosting price in a way that proprietary models cannot. When the weights are public, every inference provider can undercut the originator.

Frontier models — Table 1

Closed, proprietary models from the major western labs. Prices are for the standard synchronous API unless noted. All figures are per 1M tokens.

Model Provider Input $/1M Output $/1M Context
Claude Opus 4.7 / 4.6 / 4.5Anthropic$5.00$25.001M
Claude Sonnet 4.6 / 4.5Anthropic$3.00$15.001M
Claude Haiku 4.5Anthropic$1.00$5.00200K
GPT-5.5OpenAI$5.00$30.001M
GPT-5.4OpenAI$2.50$15.001M (272K short)
GPT-5.4 miniOpenAI$0.75$4.50400K
GPT-5.4 nanoOpenAI$0.20$1.25400K
o3 (reasoning)OpenAI$2.00$8.00200K
o4-mini (reasoning)OpenAI$0.55$2.20200K
Gemini 3.1 ProGoogle$2.00 / $4.00*$12.00 / $18.00*2M
Gemini 3.1 Flash-LiteGoogle$0.25$1.50
Gemini 2.5 ProGoogle$1.25 / $2.50*$10.00 / $15.00*1M
Gemini 2.5 FlashGoogle$0.30$2.501M
Gemini 2.5 Flash-LiteGoogle$0.10$0.401M
Grok 4.1 Fast / 4 FastxAI$0.20$0.502M
Grok 4.3xAI$1.25$2.501M
Grok 4xAI$3.00$15.00256K
Cohere Command ACohere$2.50$10.00256K
Cohere Command R7BCohere$0.037$0.150128K

* Google Pro models: lower price applies ≤200K tokens; higher price applies above 200K tokens.

o-series models bill internal reasoning tokens at output rates — actual cost can be 3–10× the headline price for complex tasks.

Grok 4.3 pricing from third-party aggregators only (x.ai API returned HTTP 403 during research).

Value / open-weight API providers — Table 2

Models from Chinese labs and European providers, mostly open-weight (weights publicly available). Prices are the provider's own hosted API unless noted. All figures per 1M tokens, standard tier, cache-miss input.

Model Provider Input $/1M Output $/1M Notes
DeepSeek V4 FlashDeepSeek$0.14$0.28MIT; cache hit $0.0028
DeepSeek V4 ProDeepSeek$0.435$0.8775% promo until 2026-05-31; full price $1.74/$3.48
Qwen-TurboAlibaba Cloud$0.05$0.20Cheapest production Qwen
Qwen3-235B-A22BAlibaba Cloud$0.70$2.80Open-weight Apache 2.0; 128K ctx
Qwen3-MaxAlibaba Cloud$1.20–$2.40$6.00Tiered by ctx length
Mistral NemoMistral AI$0.02$0.03Cheapest in lineup
Mistral Small 3.2 24BMistral AI$0.075$0.20Best value in Mistral range
Mistral Large 3 2512Mistral AI$0.50$1.50No prompt caching
Codestral 2508Mistral AI$0.30$0.90Code-specialist; 256K ctx
GLM-4.5-Flash / GLM-4.7-FlashZhipu / Z.aiFreeFreeNo quota cap
GLM-5Zhipu / Z.ai$1.00$3.20203K ctx
GLM-5.1Zhipu / Z.ai$1.40$4.40Open-sourced 2026-04-08
Kimi K2.6Moonshot AI$0.73–$0.75$3.49–$3.50MIT+; 262K ctx; cache hit $0.15
MiniMax M2.7 / M2.5MiniMax$0.28–$0.30$1.20Open-weight; 1M ctx (M2.5)

Mistral does not offer prompt-caching discounts on any models as of May 2026, unlike Anthropic (10% of base on cache hits) and OpenAI (75–90% on cache hits).

DeepSeek V4 Flash cache hit: $0.0028/M — a 98% discount vs standard input. DeepSeek V4 Pro's promo expires 2026-05-31; full price will be $1.74/$3.48 thereafter.

Open-weight inference hosts — Table 3

These hosts run open-weight models on their own GPU infrastructure. You get the same weights, different SLAs, speeds, and prices. All figures per 1M tokens.

Provider Model Input $/1M Output $/1M Speed (tok/s)
GroqLlama 3.3 70B$0.59$0.79~325
GroqLlama 4 Scout (17Bx16E)$0.11$0.34
GroqQwen3 32B$0.29$0.59
GroqLlama 3.1 8B Instant$0.05$0.08651
CerebrasLlama 3.3 70B$0.60$0.601,800+
CerebrasLlama 3.1 8B$0.10$0.102,360
Together AILlama 3.3 70B$0.88$0.88~150
Together AIDeepSeek R1$3.00$7.00
Fireworks AIGPT OSS 20B$0.07$0.30
Fireworks AIKimi K2.6$0.95$4.00$0.16 cached
DeepInfraLlama 3.3 70B Turbo$0.10$0.32~19 (FP8)
DeepInfraQwen3-235B-A22B$0.071$0.10
DeepInfraDeepSeek R1-0528$0.50$2.15$0.35 cached
Novita AILlama 3.3 70B$0.135$0.40

Cerebras uses wafer-scale silicon, not GPU clusters — hence the extreme throughput. Groq uses LPU silicon. Both offer free tiers: Cerebras ~1M tokens/day on Llama 3.3 70B; Groq free tier at 30 req/min. Lepton AI was acquired by NVIDIA (May 2025) and has been sunset.

What changed in 2026

Anthropic cut Opus pricing 67%. Claude Opus 4.5, 4.6, and 4.7 are all priced at $5/$25 per million tokens — down from $15/$75 for the earlier Opus 4.1. The effective saving is real, but read the small print: Opus 4.7 ships with a new tokenizer that may use up to 35% more tokens for the same text. Same per-token price, potentially higher per-request cost.

OpenAI's o-series got dramatically cheaper. o3 at $2/$8 replaced o1 at $15/$60 — roughly an 87% input cut. o4-mini at $0.55/$2.20 is cheaper still. The hidden cost remains: reasoning tokens on o-series are billed at output rates, and for complex tasks the model generates a lot of them. Headline price and actual bill can diverge 3–10×.

Google Flash-tier is genuinely cheap. Gemini 2.5 Flash-Lite at $0.10/$0.40 matches the cheapest open-weight inference on major hosts. The tradeoff is Google's April 2026 free-tier cuts — no migration period, no advance warning — and mandatory monthly spending caps ($250/mo Tier 1, $2,000/mo Tier 2) that suspend your API automatically when hit.

xAI entered with surprisingly competitive fast-tier pricing. Grok 4.1 Fast at $0.20/$0.50 with a 2M context window undercuts Claude Sonnet, GPT-5 mini, and Gemini Flash. The maturity gap is real — shorter deployment track record, and one third-party source notes a knowledge cutoff of November 2024 without search tools — but the price-per-context-token on fast-tier Grok is hard to beat among proprietary models.

DeepSeek's cache pricing is extreme. V4 Flash cache hits: $0.0028/M — 98% below the standard input rate. If your workload has a large, reusable system prompt, the effective cost can fall well below any frontier provider's headline rate. The V4 Pro 75% promo expires 2026-05-31; budget accordingly.

Open-weight models are credibly competitive. MiniMax M2.7 scores 56.22% on SWE-Pro, matching GPT-5.3-Codex. Kimi K2.6 reportedly ties GPT-5.5 on SWE-Bench Pro coding at ~80% lower cost, at $0.73–$0.75/$3.49–$3.50. These are MoE architectures (many parameters, few active per token), which is why they can be hosted cheaply.

How to read the numbers

Headline price vs effective price. Several mechanisms drive a wedge between the two:

The managed cloud premium. AWS Bedrock prices Claude at parity with Anthropic direct, but prices Llama 70B at a significant markup vs specialist inference hosts. Azure OpenAI prices tokens identically to OpenAI direct, but production deployments add support plans ($100–$1,000+/month) and infrastructure overhead, typically 20–40% above listed rates. Both are worth it for specific compliance requirements; neither saves money on raw token cost.

A multi-key router — something like llmdeal.me — lets you hit the cheapest backend for each request class without managing a dozen API keys and billing dashboards yourself.

References

  1. AnthropicOfficial API Pricing — accessed 2026-05-16
  2. AnthropicSubscription Plans — accessed 2026-05-16
  3. AnthropicClaude Opus 4.7 Announcement — 2026-04-16
  4. OpenAIAI Pricing Guru: OpenAI Pricing — updated 2026-05-15
  5. OpenAIMetaCTO: True Cost of OpenAI API — accessed 2026-05-16
  6. OpenAICloudZero: OpenAI Pricing Guide 2026 — accessed 2026-05-16
  7. OpenAICostGoat: OpenAI o-series pricing — accessed 2026-05-16
  8. GoogleOfficial Gemini Developer API Pricing — accessed 2026-05-16
  9. GoogleFindSkill: Gemini API Pricing Guide (April 2026 free-tier changes) — accessed 2026-05-16
  10. GoogleTokenMix: Gemini 3.1 Pro pricing deep-dive — accessed 2026-05-16
  11. xAIFelloAI: Grok Pricing — accessed 2026-05-16
  12. xAIMem0: xAI Grok API Pricing — 2026-05-15
  13. xAIPricePerToken: xAI Model Catalogue — accessed 2026-05-16
  14. DeepSeekDeepSeek Official Pricing — accessed 2026-05-16
  15. DeepSeekOfox: DeepSeek V4 Flash vs Pro breakdown — accessed 2026-05-16
  16. Alibaba / QwenAlibaba Cloud Model Studio Pricing — accessed 2026-05-16
  17. Mistral AIPricePerToken: Mistral AI Model Catalogue — accessed 2026-05-16
  18. Mistral AIDevTk: Mistral API Pricing Guide 2026 — accessed 2026-05-16
  19. Zhipu / GLMVibeCoding: Zhipu AI GLM Pricing 2026 — accessed 2026-05-16
  20. Zhipu / GLMCnTechPost: GLM-5.1 open-source launch — 2026-04-08
  21. Moonshot / KimiOpenRouter: Kimi K2.6 — accessed 2026-05-16
  22. Moonshot / KimiDeepInfra: Kimi K2.6 Pricing Guide — accessed 2026-05-16
  23. MiniMaxMiniMax Official API Pricing — accessed 2026-05-16
  24. MiniMaxMarkTechPost: MiniMax M2.7 launch — 2026-04-12
  25. GroqGroq Official Pricing — accessed 2026-05-16
  26. CerebrasTokenMix: Cerebras Pricing and Speed Tests — accessed 2026-05-16
  27. Together AITogether AI Official Pricing — accessed 2026-05-16
  28. Fireworks AIFireworks AI Serverless Pricing — accessed 2026-05-16
  29. DeepInfraDeepInfra Official Pricing — accessed 2026-05-16
  30. Novita AINovita AI Official Pricing — accessed 2026-05-16
  31. CohereCohere Official Pricing — accessed 2026-05-16
  32. PerplexityPerplexity Sonar API Pricing (Official) — accessed 2026-05-16
  33. AWS BedrockAWS Bedrock Pricing (Official) — accessed 2026-05-16
  34. AWS BedrockTokenMix: AWS Bedrock Pricing Guide — updated 2026-04-29
  35. Azure OpenAI / AI FoundryAzure OpenAI Pricing (Official) — accessed 2026-05-16
  36. Azure OpenAI / AI FoundryCloudZero: Azure OpenAI Total Cost Analysis — 2026-05-04
  37. Cross-providerPECollective: LLM Pricing Comparison 2026 — April 2026
  38. Self-hostingDevTk: Self-Hosting vs API Cost 2026 — February 2026

Rates checked against providers' own pricing pages, May 2026. Article published 2026-05-16.