๐Ÿš€ LIVE ยท llama-3.3-70b-self-hosted is now serving on our EEA GPU.
Preorder bonus through Wed 20 May ยท Launch Mon 18 May

The LLM API gateway your stack can't get subpoenaed.

EU-hosted inference. Paid in BTC. No KYC. No subscription. Cancel by not topping up โ€” there's nothing to unsubscribe from.

+30% bonus locked through Wed 20 May ยท 100% refundable in fiat (you cover exchange-rate delta) ยท Credits never expire.

Who this is for

Built for founders and engineers who can't afford to have their stack subpoenaed.

We don't compete on the cheap end. If your monthly spend is under $100, Together, Hyperbolic, or DeepInfra will serve you better and faster. Our customer is the founding engineer at a stealth startup, the AI consultant under NDA, the team running production inference where the billing trail is its own threat model.

Why we exist

On May 13 2026, Anthropic raised Max-tier token caps and announced a $200/mo Agent SDK credit (effective 15 June) โ€” because users like our operator were burning through the prior limits: 40+ million tokens in 17 of the past 21 days. That isn't abuse; that's what production inference looks like in 2026. The credit closes the raw-throughput gap for Max customers โ€” but the structural friction remains. No US card on file. No prompts in a US discovery surface. No passport handed to a reseller. If any of that describes your situation, you're our customer.

We target production-grade developers on large and x-large projects: teams who have hit the ceiling on Anthropic, Cursor, Windsurf, and Cline, and who need an independent path on jurisdiction, payment rails, and data retention โ€” not just on tokens-per-minute. We're the viable option for the workload they can't (or won't) run through the default stack โ€” privacy, jurisdiction, and billing trail first; smart-route cost savings second.

Pay in seconds with crypto. Cancel by not topping up. We never read your prompts. We never charge a card.

Starter

EU GPU inference ยท lowest per-token rate on the platform
$0.60 / $1.20
per 1M tokens (input / output)
  • Qwen2.5-Coder-32B on EEA GPU
  • OpenAI-compatible API โ€” drop-in replacement, no SDK changes
  • 12k context (upgrading to 24-32k post-launch)
  • EU-resident inference โ€” your data never leaves EEA
  • BTC / XMR / LTC checkout โ€” no card, no KYC
  • Per-key budget caps โ€” spending stays predictable
Preorder Starter credits

Elite

70B EU model ยท 128k context ยท launches June 2026 ยท 100% refundable until live
$4.00 / $9.00
per 1M tokens (weighted avg)
  • 128k context window
  • EEA GPU + EU-resident open-weight models only โ€” your prompts never leave the EU
  • GDPR Article 28 DPA available โ€” sign before you send a single token
  • Dedicated rate limit pool โ€” no contention with other tiers
  • Direct DM support (Matrix / Telegram) โ€” reach a human, not a ticket queue
  • Extended EU model catalogue โ€” additional 70B+ open-weight models available on request
  • ๐Ÿ” YubiKey hardware key included FREE โ€” same as Pro. Not priced into per-token rates.
  • All Pro features
Reserve Elite credits

100% refundable in fiat until 70B EU model goes live ยท YubiKey included free

Commitment plans

Built for teams at 40M+ tokens/month. Reserve capacity and lock in a lower per-token rate.

Production

$499 / month base
  • โˆ’20% off Pro and Elite per-token rates
  • Reserved capacity during peak hours โ€” no queue, guaranteed throughput
  • Priority Discord DM support
  • Monthly usage report broken down by model
  • Cancel anytime โ€” no annual lock-in
  • Annual prepay option: $4,989/yr (1 month free)
Preorder Production

Scale

$1,999 / month base
  • โˆ’30% off Pro and Elite per-token rates
  • Dedicated Matrix/Slack channel with founder โ€” direct line, no queue
  • Reserved capacity + 99.5% SLA
  • Weekly usage + cost breakdown by model
  • Quarterly architecture review
  • Annual prepay: $19,990/yr (2 months free)
Talk to founder

Sovereign

custom ยท annual
  • Dedicated routing pool โ€” your own GPU slice on the EU box
  • GDPR Article 28 DPA + sub-processor list signed before go-live
  • Annual commit, invoiced โ€” no card required
  • Direct operator phone / Signal line
  • Custom retention + audit terms
  • From $4,999 / year, scoped to your workload
Request quote

The cost case, plainly stated

Direct comparison against routing every token to Sonnet 4.6 at retail.

Daily workload All-Sonnet 4.6 llmdeal Starter llmdeal Pro Pro savings
6M tok/day ยท light agent $1,260/mo $144/mo $540/mo $720/mo ยท 57%
30M tok/day ยท steady prod $6,300/mo $720/mo $2,700/mo $3,600/mo ยท 57%
100M tok/day ยท heavy agent fleet $21,000/mo $2,400/mo $9,000/mo $12,000/mo ยท 57%
Same Pro on Production commit (โˆ’20%) โ€” โ€” + $499 base $1.60/$4.00 net per 1M
Per-bucket breakdown โ†’

Assumes avg 800 input / 400 output tokens per request. Pro routing (~50% Qwen-Coder, 25% Llama-3.3-70B, 15% DeepSeek-V3.2, 10% Codestral/Qwen3-235B/GLM-5) bills at a flat $2/$5 per 1M weighted average โ€” we absorb model-to-model cost variance internally. Starter is single-model (our self-hosted Qwen-Coder-32B), no routing overhead.

Pro per-bucket on 6M tok/day workload (4M input + 2M output):
  50% to Qwen-Coder-32B (EU GPU):       2.0M in + 1.0M out  โ†’ $9.00/day
  25% to llama-3.3-70b-self-hosted:     1.0M in + 0.5M out  โ†’ $4.50/day
  15% to DeepSeek-V3.2 (reasoning):     0.6M in + 0.3M out  โ†’ $2.70/day
  10% to Codestral/Qwen3-235B (heavy):  0.4M in + 0.2M out  โ†’ $1.80/day
                                                            TOTAL: $18.00/day ยท ~$540/month

Pro routes exclusively across our self-hosted + open-weight stack (Llama, DeepSeek, Mistral, Qwen, GLM). Median savings versus a single frontier provider: ~55โ€“65% vs Sonnet, up to 80%+ vs Opus-tier depending on workload mix.

Founding members

Early builders who claimed a founder seat and pushed the product into shape before launch.

khuur .dev Founding member ยท seat 01

Software company shipping developer tools and AI workflow products โ€” Snitch (security auditing), Jeremy (AI context layer), Scribe (voice-to-text), and a deep catalogue of focused developer and creative apps.

khuur.dev โ€” precision software for developers and technical teams โ†’

Pre-purchase FAQ

Real objections, straight answers โ€” no sales spin.

What happens if credits hit zero mid-request?

The in-flight request completes โ€” we absorb the overrun. Every subsequent request returns a 402 with an explicit "out of credits" body. Top up; service resumes immediately. No silent throttling, no surprise invoices.

Can I bring my own Anthropic / OpenAI key?

Not today. Smart routing works because we hold the upstream contracts โ€” that's what lets us route to the cheapest qualified model per request. BYO-key support is on the Sovereign tier roadmap, but it undercuts the routing margin, so it will be priced to reflect that.

How does Pro routing decide which model fires?

A small open-source classifier (RouteLLM-style) scores each prompt on complexity, latency-sensitivity, code vs prose, and reasoning depth. Easy โ†’ Qwen-Coder-32B (our EU GPU). Fast workhorse โ†’ llama-3.3-70b-self-hosted (our EU GPU). Reasoning โ†’ DeepSeek V3.2 or Qwen3-Next 80B Thinking. Code-heavy โ†’ Codestral. Hardest queries โ†’ Qwen3 235B or GLM-5. Per-request telemetry shows exactly which model fired. The router is open-source and pinned in our repo โ€” audit the logic yourself.

What does "EU-only routing by default" actually mean?

On Elite, every request is served from our EEA GPU and EU-resident model providers โ€” your prompts never enter US discovery scope. Elite routes exclusively across EEA-GPU and EU-resident open-weight models. No US-hosted frontier models are provisioned or available on this tier. The default: nothing leaves the EU.

How does the refund actually work in BTC?

Refunds are paid in fiat (USD / EUR / SEK / NOK), not BTC. You receive the fiat value your crypto was worth on the day we recorded the inbound payment, minus per-second prorated usage. BTC price movement between purchase and refund is your exposure โ€” we don't hedge FX. Refund window: cumulative usage < 3 hours across all orders ever (not calendar time). Fees shown in plain text before we send.

Why is a FIDO2 hardware key required on Pro+? Do I pay for it?

Pro+ accounts hold real spending power. A compromised account can drain credits faster than detection allows. FIDO2 (YubiKey, SoloKey, Apple/Google Passkeys) eliminates the phishing and credential-stuffing attack surface. Starter is FIDO2-optional; Pro / Elite require it at sign-in.

The YubiKey is FREE on Pro+ accounts. The cost is on us โ€” not baked into per-token rates. Pick your delivery option at preorder:

  • Mail-drop address of your choice (preserves the no-KYC story โ€” operator-relay shipping)
  • Cost reimbursed in credits ($50 cap) โ€” buy your own from yubico.com or a local reseller, expense it against your llmdeal balance
  • Use an existing FIDO2 key you already own โ€” no shipping, no reimbursement, just enrol from account settings

Details in privacy ยง8a.

What's the $3,500 preorder threshold?

When public preorder volume crosses $3,500, we fund a second EU GPU node โ€” expanding capacity and adding larger open-weight models to the Pro routing pool. Pro has always routed exclusively across our self-hosted + open-weight stack (Llama, DeepSeek, Mistral, Qwen, GLM); the threshold unlocks more GPU headroom. Progress is tracked on the homepage public counter.

Why no fiat โ€” no cards, Stripe, or bank transfer?

Every fiat rail (Visa, Stripe, ACH, SEPA, SWIFT) puts a KYC-bearing intermediary between you and us. The no-KYC promise becomes structurally unenforceable the moment a single fiat payment clears our books. Crypto keeps that chain broken.

Longer take: see the Why we don't accept fiat callout above.

Can I test the API before paying?

Preorder $20 โ†’ $26 in credits and you get a working key the moment the gateway opens (Mon 18 May 2026). Pre-launch, the API base is https://api.llmdeal.me/v1 โ€” run curl /v1/models to see which models are live now. Model availability is public.