llama-3.3-70b-self-hosted is now serving on our EEA GPU.
EU-hosted inference. Paid in BTC. No KYC. No subscription. Cancel by not topping up โ there's nothing to unsubscribe from.
+30% bonus locked through Wed 20 May ยท 100% refundable in fiat (you cover exchange-rate delta) ยท Credits never expire.
Built for founders and engineers who can't afford to have their stack subpoenaed.
We don't compete on the cheap end. If your monthly spend is under $100, Together, Hyperbolic, or DeepInfra will serve you better and faster. Our customer is the founding engineer at a stealth startup, the AI consultant under NDA, the team running production inference where the billing trail is its own threat model.
On May 13 2026, Anthropic raised Max-tier token caps and announced a $200/mo Agent SDK credit (effective 15 June) โ because users like our operator were burning through the prior limits: 40+ million tokens in 17 of the past 21 days. That isn't abuse; that's what production inference looks like in 2026. The credit closes the raw-throughput gap for Max customers โ but the structural friction remains. No US card on file. No prompts in a US discovery surface. No passport handed to a reseller. If any of that describes your situation, you're our customer.
We target production-grade developers on large and x-large projects: teams who have hit the ceiling on Anthropic, Cursor, Windsurf, and Cline, and who need an independent path on jurisdiction, payment rails, and data retention โ not just on tokens-per-minute. We're the viable option for the workload they can't (or won't) run through the default stack โ privacy, jurisdiction, and billing trail first; smart-route cost savings second.
Pay in seconds with crypto. Cancel by not topping up. We never read your prompts. We never charge a card.
FIDO2 required at sign-in ยท YubiKey ships free ยท why
100% refundable in fiat until 70B EU model goes live ยท YubiKey included free
Built for teams at 40M+ tokens/month. Reserve capacity and lock in a lower per-token rate.
Direct comparison against routing every token to Sonnet 4.6 at retail.
| Daily workload | All-Sonnet 4.6 | llmdeal Starter | llmdeal Pro | Pro savings |
|---|---|---|---|---|
| 6M tok/day ยท light agent | $1,260/mo | $144/mo | $540/mo | $720/mo ยท 57% |
| 30M tok/day ยท steady prod | $6,300/mo | $720/mo | $2,700/mo | $3,600/mo ยท 57% |
| 100M tok/day ยท heavy agent fleet | $21,000/mo | $2,400/mo | $9,000/mo | $12,000/mo ยท 57% |
| Same Pro on Production commit (โ20%) | โ | โ | + $499 base | $1.60/$4.00 net per 1M |
Assumes avg 800 input / 400 output tokens per request. Pro routing (~50% Qwen-Coder, 25% Llama-3.3-70B, 15% DeepSeek-V3.2, 10% Codestral/Qwen3-235B/GLM-5) bills at a flat $2/$5 per 1M weighted average โ we absorb model-to-model cost variance internally. Starter is single-model (our self-hosted Qwen-Coder-32B), no routing overhead.
Pro per-bucket on 6M tok/day workload (4M input + 2M output):
50% to Qwen-Coder-32B (EU GPU): 2.0M in + 1.0M out โ $9.00/day
25% to llama-3.3-70b-self-hosted: 1.0M in + 0.5M out โ $4.50/day
15% to DeepSeek-V3.2 (reasoning): 0.6M in + 0.3M out โ $2.70/day
10% to Codestral/Qwen3-235B (heavy): 0.4M in + 0.2M out โ $1.80/day
TOTAL: $18.00/day ยท ~$540/month
Pro routes exclusively across our self-hosted + open-weight stack (Llama, DeepSeek, Mistral, Qwen, GLM). Median savings versus a single frontier provider: ~55โ65% vs Sonnet, up to 80%+ vs Opus-tier depending on workload mix.
Early builders who claimed a founder seat and pushed the product into shape before launch.
Software company shipping developer tools and AI workflow products โ Snitch (security auditing), Jeremy (AI context layer), Scribe (voice-to-text), and a deep catalogue of focused developer and creative apps.
khuur.dev โ precision software for developers and technical teams โ
Real objections, straight answers โ no sales spin.
The in-flight request completes โ we absorb the overrun. Every subsequent request returns a 402 with an explicit "out of credits" body. Top up; service resumes immediately. No silent throttling, no surprise invoices.
Not today. Smart routing works because we hold the upstream contracts โ that's what lets us route to the cheapest qualified model per request. BYO-key support is on the Sovereign tier roadmap, but it undercuts the routing margin, so it will be priced to reflect that.
A small open-source classifier (RouteLLM-style) scores each prompt on complexity, latency-sensitivity, code vs prose, and reasoning depth. Easy โ Qwen-Coder-32B (our EU GPU). Fast workhorse โ llama-3.3-70b-self-hosted (our EU GPU). Reasoning โ DeepSeek V3.2 or Qwen3-Next 80B Thinking. Code-heavy โ Codestral. Hardest queries โ Qwen3 235B or GLM-5. Per-request telemetry shows exactly which model fired. The router is open-source and pinned in our repo โ audit the logic yourself.
On Elite, every request is served from our EEA GPU and EU-resident model providers โ your prompts never enter US discovery scope. Elite routes exclusively across EEA-GPU and EU-resident open-weight models. No US-hosted frontier models are provisioned or available on this tier. The default: nothing leaves the EU.
Refunds are paid in fiat (USD / EUR / SEK / NOK), not BTC. You receive the fiat value your crypto was worth on the day we recorded the inbound payment, minus per-second prorated usage. BTC price movement between purchase and refund is your exposure โ we don't hedge FX. Refund window: cumulative usage < 3 hours across all orders ever (not calendar time). Fees shown in plain text before we send.
Pro+ accounts hold real spending power. A compromised account can drain credits faster than detection allows. FIDO2 (YubiKey, SoloKey, Apple/Google Passkeys) eliminates the phishing and credential-stuffing attack surface. Starter is FIDO2-optional; Pro / Elite require it at sign-in.
The YubiKey is FREE on Pro+ accounts. The cost is on us โ not baked into per-token rates. Pick your delivery option at preorder:
Details in privacy ยง8a.
When public preorder volume crosses $3,500, we fund a second EU GPU node โ expanding capacity and adding larger open-weight models to the Pro routing pool. Pro has always routed exclusively across our self-hosted + open-weight stack (Llama, DeepSeek, Mistral, Qwen, GLM); the threshold unlocks more GPU headroom. Progress is tracked on the homepage public counter.
Every fiat rail (Visa, Stripe, ACH, SEPA, SWIFT) puts a KYC-bearing intermediary between you and us. The no-KYC promise becomes structurally unenforceable the moment a single fiat payment clears our books. Crypto keeps that chain broken.
Longer take: see the Why we don't accept fiat callout above.
Preorder $20 โ $26 in credits and you get a working key the moment the gateway opens (Mon 18 May 2026).
Pre-launch, the API base is https://api.llmdeal.me/v1 โ run curl /v1/models
to see which models are live now. Model availability is public.