llmdeal.me — EU LLM API Gateway, No KYC, Crypto

Who this is for

Built for founders and engineers who can't afford to have their stack subpoenaed.

We don't compete on the cheap end. If your monthly spend is under $100, Together, Hyperbolic, or DeepInfra will serve you better and faster. Our customer is the founding engineer at a stealth startup, the AI consultant under NDA, the team running production inference where the billing trail is its own threat model.

Why we exist

On May 13 2026, Anthropic raised Max-tier token caps and announced a $200/mo Agent SDK credit (effective 15 June) — because users like our operator were burning through the prior limits: 40+ million tokens in 17 of the past 21 days. That isn't abuse; that's what production inference looks like in 2026. The credit closes the raw-throughput gap for Max customers — but the structural friction remains. No US card on file. No prompts in a US discovery surface. No passport handed to a reseller. If any of that describes your situation, you're our customer.

We target production-grade developers on large and x-large projects: teams who have hit the ceiling on Anthropic, Cursor, Windsurf, and Cline, and who need an independent path on jurisdiction, payment rails, and data retention — not just on tokens-per-minute. We're the viable option for the workload they can't (or won't) run through the default stack — privacy, jurisdiction, and billing trail first; smart-route cost savings second.

Native BTC, XMR, LTC — zero KYC, on either side, ever.
EU jurisdiction — EEA GPU, GDPR-compliant, your prompts never enter US discovery scope.
Smart routing across self-hosted + open-weight models — ~55–65% cheaper than all-Sonnet on the same workload.
Three tiers, flat per-token pricing — no per-region matrix, no surprise overages, no PO required.

Pay in seconds with crypto. Cancel by not topping up. We never read your prompts. We never charge a card.

Starter

EU GPU inference · lowest per-token rate on the platform

$0.60 / $1.20

per 1M tokens (input / output)

Qwen2.5-Coder-32B on EEA GPU
OpenAI-compatible API — drop-in replacement, no SDK changes
12k context (upgrading to 24-32k post-launch)
EU-resident inference — your data never leaves EEA
BTC / XMR / LTC checkout — no card, no KYC
Per-key budget caps — spending stays predictable

Preorder Starter credits

Pro

Smart-routed across 6 models · ~55–65% cheaper than Sonnet

$2.00 / $5.00

per 1M tokens (weighted avg)

Smart routing across 6 models — GLM-5, Qwen3-Next 80B, DeepSeek V3.2, Llama 3.3 70B, Codestral, Qwen3 235B
ATLAS ~0.77 — matches Opus 4-6, beats Sonnet 4.6 + GPT-5.5
32k context window
Per-model response caching — repeat-query cache hits billed at a reduced rate
Per-request cost telemetry — see exactly what each call costs
Priority routing during peak load — your requests skip the queue
🔐 YubiKey hardware key included FREE — not priced into per-token rates. Worth $50-100 retail; we cover it.
All Starter features

Preorder Pro credits

FIDO2 required at sign-in · YubiKey ships free · why

Reserve · Launches June

Elite

70B EU model · 128k context · launches June 2026 · 100% refundable until live

$4.00 / $9.00

per 1M tokens (weighted avg)

128k context window
EEA GPU + EU-resident open-weight models only — your prompts never leave the EU
GDPR Article 28 DPA available — sign before you send a single token
Dedicated rate limit pool — no contention with other tiers
Direct DM support (Matrix / Telegram) — reach a human, not a ticket queue
Extended EU model catalogue — additional 70B+ open-weight models available on request
🔐 YubiKey hardware key included FREE — same as Pro. Not priced into per-token rates.
All Pro features

Reserve Elite credits

100% refundable in fiat until 70B EU model goes live · YubiKey included free

Commitment plans

Built for teams at 40M+ tokens/month. Reserve capacity and lock in a lower per-token rate.

Production

$499 / month base

−20% off Pro and Elite per-token rates
Reserved capacity during peak hours — no queue, guaranteed throughput
Priority Discord DM support
Monthly usage report broken down by model
Cancel anytime — no annual lock-in
Annual prepay option: $4,989/yr (1 month free)

Preorder Production

Scale

$1,999 / month base

−30% off Pro and Elite per-token rates
Dedicated Matrix/Slack channel with founder — direct line, no queue
Reserved capacity + 99.5% SLA
Weekly usage + cost breakdown by model
Quarterly architecture review
Annual prepay: $19,990/yr (2 months free)

Talk to founder

Sovereign

custom · annual

Dedicated routing pool — your own GPU slice on the EU box
GDPR Article 28 DPA + sub-processor list signed before go-live
Annual commit, invoiced — no card required
Direct operator phone / Signal line
Custom retention + audit terms
From $4,999 / year, scoped to your workload

Request quote

The cost case, plainly stated

Direct comparison against routing every token to Sonnet 4.6 at retail.

Daily workload	All-Sonnet 4.6	llmdeal Starter	llmdeal Pro	Pro savings
6M tok/day · light agent	$1,260/mo	$144/mo	$540/mo	$720/mo · 57%
30M tok/day · steady prod	$6,300/mo	$720/mo	$2,700/mo	$3,600/mo · 57%
100M tok/day · heavy agent fleet	$21,000/mo	$2,400/mo	$9,000/mo	$12,000/mo · 57%
Same Pro on Production commit (−20%)	—	—	+ $499 base	$1.60/$4.00 net per 1M

Per-bucket breakdown →

Assumes avg 800 input / 400 output tokens per request. Pro routing (~50% Qwen-Coder, 25% Llama-3.3-70B, 15% DeepSeek-V3.2, 10% Codestral/Qwen3-235B/GLM-5) bills at a flat $2/$5 per 1M weighted average — we absorb model-to-model cost variance internally. Starter is single-model (our self-hosted Qwen-Coder-32B), no routing overhead.

Pro per-bucket on 6M tok/day workload (4M input + 2M output):
  50% to Qwen-Coder-32B (EU GPU):       2.0M in + 1.0M out  → $9.00/day
  25% to llama-3.3-70b-self-hosted:     1.0M in + 0.5M out  → $4.50/day
  15% to DeepSeek-V3.2 (reasoning):     0.6M in + 0.3M out  → $2.70/day
  10% to Codestral/Qwen3-235B (heavy):  0.4M in + 0.2M out  → $1.80/day
                                                            TOTAL: $18.00/day · ~$540/month

Pro routes exclusively across our self-hosted + open-weight stack (Llama, DeepSeek, Mistral, Qwen, GLM). Median savings versus a single frontier provider: ~55–65% vs Sonnet, up to 80%+ vs Opus-tier depending on workload mix.

Pre-purchase FAQ

Real objections, straight answers — no sales spin.

What happens if credits hit zero mid-request?

The in-flight request completes — we absorb the overrun. Every subsequent request returns a 402 with an explicit "out of credits" body. Top up; service resumes immediately. No silent throttling, no surprise invoices.

Can I bring my own Anthropic / OpenAI key?

Not today. Smart routing works because we hold the upstream contracts — that's what lets us route to the cheapest qualified model per request. BYO-key support is on the Sovereign tier roadmap, but it undercuts the routing margin, so it will be priced to reflect that.

How does Pro routing decide which model fires?

A small open-source classifier (RouteLLM-style) scores each prompt on complexity, latency-sensitivity, code vs prose, and reasoning depth. Easy → Qwen-Coder-32B (our EU GPU). Fast workhorse → llama-3.3-70b-self-hosted (our EU GPU). Reasoning → DeepSeek V3.2 or Qwen3-Next 80B Thinking. Code-heavy → Codestral. Hardest queries → Qwen3 235B or GLM-5. Per-request telemetry shows exactly which model fired. The router is open-source and pinned in our repo — audit the logic yourself.

What does "EU-only routing by default" actually mean?

On Elite, every request is served from our EEA GPU and EU-resident model providers — your prompts never enter US discovery scope. Elite routes exclusively across EEA-GPU and EU-resident open-weight models. No US-hosted frontier models are provisioned or available on this tier. The default: nothing leaves the EU.

How does the refund actually work in BTC?

Refunds are paid in fiat (USD / EUR / SEK / NOK), not BTC. You receive the fiat value your crypto was worth on the day we recorded the inbound payment, minus per-second prorated usage. BTC price movement between purchase and refund is your exposure — we don't hedge FX. Refund window: cumulative usage < 3 hours across all orders ever (not calendar time). Fees shown in plain text before we send.

Why is a FIDO2 hardware key required on Pro+? Do I pay for it?

Pro+ accounts hold real spending power. A compromised account can drain credits faster than detection allows. FIDO2 (YubiKey, SoloKey, Apple/Google Passkeys) eliminates the phishing and credential-stuffing attack surface. Starter is FIDO2-optional; Pro / Elite require it at sign-in.

The YubiKey is FREE on Pro+ accounts. The cost is on us — not baked into per-token rates. Pick your delivery option at preorder:

Mail-drop address of your choice (preserves the no-KYC story — operator-relay shipping)
Cost reimbursed in credits ($50 cap) — buy your own from yubico.com or a local reseller, expense it against your llmdeal balance
Use an existing FIDO2 key you already own — no shipping, no reimbursement, just enrol from account settings

Details in privacy §8a.

What's the $3,500 preorder threshold?

When public preorder volume crosses $3,500, we fund a second EU GPU node — expanding capacity and adding larger open-weight models to the Pro routing pool. Pro has always routed exclusively across our self-hosted + open-weight stack (Llama, DeepSeek, Mistral, Qwen, GLM); the threshold unlocks more GPU headroom. Progress is tracked on the homepage public counter.

Why no fiat — no cards, Stripe, or bank transfer?

Every fiat rail (Visa, Stripe, ACH, SEPA, SWIFT) puts a KYC-bearing intermediary between you and us. The no-KYC promise becomes structurally unenforceable the moment a single fiat payment clears our books. Crypto keeps that chain broken.

Longer take: see the Why we don't accept fiat callout above.

Can I test the API before paying?

Preorder $20 → $26 in credits and you get a working key the moment the gateway opens (Mon 18 May 2026). Pre-launch, the API base is https://api.llmdeal.me/v1 — run curl /v1/models to see which models are live now. Model availability is public.

The LLM API gateway your stack can't get subpoenaed.