llmdeal.me is an OpenAI-compatible LLM API gateway. Drop in your key, set model to
smart-route, and each request lands on the cheapest model that can handle it:
Qwen-Coder-32B for boilerplate, Groq Llama and DeepSeek-V3 for mid-complexity, Mistral
Codestral for code, Qwen3-235B for reasoning. Frontier access (Claude / GPT-4o) unlocks
at the $3,500 preorder threshold. EU-resident infrastructure. Crypto checkout. No KYC.
Preorder in BTC — +30% bonus credits through Wed 20 May 2026. $20 buys $26 in credits. $100 buys $130. Credits activate at launch (Mon 18 May).
$ curl https://api.llmdeal.me/v1/chat/completions \
-H "Authorization: Bearer $LLMDEAL_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "smart-route",
"messages": [{"role": "user", "content": "refactor this function"}]
}'
# Routed to qwen-coder-32b — $0.0004 instead of $0.012 on Sonnet
Most teams send every query to Sonnet or GPT-4o. A third of those are formatting, syntax questions, regex explanations, or boilerplate generation — tasks a 32B coding model handles just as well at <5% of the cost.
Every request hits Claude Sonnet 4.6 at $3 input / $15 output per 1M tokens. You pay frontier rates for the long tail of work that doesn't need it.
Boilerplate → our Qwen-Coder-32B in the EU (~$0.80/$1.50). Fast queries → llama-3.3-70b-self-hosted on our EU GPU. Reasoning → DeepSeek-V3 / Qwen3-235B on Cerebras. Code → Mistral Codestral. Frontier (Claude / GPT-4o) unlocks at the $3,500 stretch-goal threshold.
No magic, no VC hand-waving. Just boring infrastructure decisions that add up.
An open-source classifier scores each request by difficulty and routes it to the cheapest model that can handle it. Thresholds are tunable per workload.
Qwen2.5-Coder-32B runs on our own EU GPU. Flat hardware costs, no per-token margin to a cloud provider — so we undercut every commodity Llama-70B reseller.
llama-3.3-70b-self-hosted — LIVE on our EEA GPUNow live: llama-3.3-70b-self-hosted running on our own EEA GPU — no upstream fees, same API key as the rest of the gateway, EU jurisdiction end-to-end. In the Pro routing mix today.
Inference runs in the EU. GDPR jurisdiction applies by default. Open to developers worldwide — we put the compute here so privacy-sensitive workloads never have to cross the border.
Pay in BTC, XMR, or LTC. No identity check, no card decline, no chargeback exposure. We don't know who you are and don't need to.
We keep only what billing requires: contact handle, order ID, dollar amount, token counts. We do not store prompt content, response content, or IPs past settlement. GDPR-compliant by default — same rules apply worldwide, not just for EU users.
Pay-as-you-go in crypto. No subscription. No credits that expire. Rates lock in at public launch Mon 18 May 2026 (GMT+2).
llama-3.3-70b-self-hosted + DeepSeek + Codestral + Qwen3-235BPay in crypto now. Lock in 30% extra credits, get a beta key the day the gateway opens. No KYC. No subscription. Credits don't expire.
After Mon 18 May 2026, these prices are gone as a standing offer — the same model mix will only be available through consortium-tier deals with added profit margin. The honest version: if preorder volume doesn't fund the GPU, the window closes and that's it.
$50 pack on the buy page. Need a custom amount? DM me.
Larger commit, larger perks. Each tier directly funds the next GPU node.
router_logic.py when it open-sources25 slots remaining
10 slots remaining
Every dollar is earmarked against a specific threshold. Hit it, that infra deploys. Public counter, no spin.
Loading roadmap …
Counter updates on each page load. No polling loop — we're not here to burn your battery.
Three channels. Matrix is preferred (E2EE, federated, no phone number required). Telegram works. Discord works now but the invite rotates before 18 May — preorder backers get the new link DM'd directly.
<owner-matrix-handle>
— DM'd to backers + published here before 18 May.<owner-telegram-handle>
— same: DM'd to backers + published here before 18 May.Drop your contact. I'll reach out once when the gateway opens — no newsletter, no drip sequence, one message.
One DM when the gateway opens. Nothing before that, nothing after.
We operate under GDPR (Norway / EEA) and apply the same rules to every customer, regardless of where you are. No legal boilerplate — just the facts.
Order record. Order ID, SKU, currency, amount, status, timestamps. Append-only ledger — needed to credit your account at launch.
Contact handle. The email or messaging handle you provide at checkout. Used only to deliver your API key and billing notifications.
Token counts. Once live, we log input + output token counts per request for billing. The prompt content is never stored.
Prompt content. Prompts and responses are discarded when the request completes. No training. No content audit log.
KYC / identity data. No name, address, government ID, or card details — ever. Crypto only.
IP addresses. Held transiently for fraud and rate-limit checks, deleted within 24 h of order settlement.
Want your data deleted? DM us from the handle you signed up with. We remove the order record within 48 hours and send you a deletion timestamp. GDPR Article 17 ("right to be forgotten") — extended to every customer, EU resident or not. Full policy: /privacy.html.
Straight answers. More added as beta progresses.
The gateway isn't fully live yet — the GPU node is being provisioned in the EU. Preorders fund the GPU deposit and signal real demand. In return: +30% bonus credits baked into every preorder pack, plus priority onboarding when the gateway opens.
Pay $X in BTC today, get $X × 1.30 in llmdeal credits at launch. Credits never expire. No subscription follows.
Target: Monday 18 May 2026, GMT+2 (Central European Summer Time). GPU is being provisioned this week; smart routing and gateway go live on that date.
If launch slips past 2026-06-01, every preorder is refundable in BTC to the address you paid from. No questions — DM me on Telegram / Signal.
Status updates go to your DMs (whatever handle you give at checkout) — at least one before launch day.
Full refund. BTC value back to your payment address, on request, any time before launch.
After launch, credits stay refundable too — gated by recorded usage. See the money-back guarantee below.
The tier prices on /pricing.html are the preorder window. After public launch on 18 May, the same model mix will likely only be reachable through consortium-tier deals with added profit margin — not as standing retail pricing on this page.
The honest version: if significant preorder volume doesn't arrive, the GPU doesn't get funded, the gateway doesn't open at these prices, and the window closes. No preorders → no llmdeal at this price level → you're welcome to keep paying frontier providers what they ask, where the money really, really matters. That's the deal.
Full refund available on every order ever placed, gated only by cumulative token-usage time recorded on your account. Under 3 hours total of recorded usage across all orders? Refundable on request — DM the founder. Past 3 hours, the service is considered consumed.
Refunds are per-second prorated against recorded usage, minus the on-chain crypto network fee. For BTC refunds you cover the network fee in fiat upfront — we don't deduct it from the refunded principal; the BTC value you paid comes back to your payment address. XMR + LTC refunds net the fee on-chain.
This isn't a 14-day trial gimmick — the clock runs on actual usage, not calendar time. Preorder, hold credits, never call the API? You can refund a year later.
Yes — check before you preorder, not after. Every model in our Pro mix has a public per-token price and an independent benchmark score on third-party surfaces with no financial stake in llmdeal.me:
If our Pro mix isn't on those leaderboards at the prices we quote, the refund guarantee applies — see above.
Within 7 days of the public counter crossing $1k. The marketing milestone flips the moment it's crossed; the A6000 spins up shortly after.
Honest caveat: we apply a small internal safety buffer — slightly above the public threshold plus a check that real customers, not a single whale, drove the number — before wiring up the A6000. That prevents one $500 Founder order followed by a refund from triggering a month of GPU rental we can't sustain. The public milestone still flips green the moment the public threshold is met.
No. We run our own model (Qwen2.5-Coder-32B) on our own GPU. We smart-route to OpenAI, Anthropic, Groq, Together, and DeepSeek when a query needs more horsepower than our model can give.
The goal isn't to replace frontier models. It's to not pay frontier prices for queries that don't need them.
Every request hits a small open-source classifier (RouteLLM-style) that scores difficulty. Easy queries (formatting, simple regex, syntax fixes) → our Qwen-Coder-32B in the EU. Fast workhorse queries → llama-3.3-70b-self-hosted on our EU GPU. Reasoning queries → DeepSeek-V3. Coding-heavy queries → Mistral Codestral. Highest-difficulty queries → Qwen3-235B on Cerebras (frontier OSS-class).
Direct frontier API routing (Claude, GPT-4o) unlocks at the $3,500 preorder stretch goal — once preorders fund the operator credit at Anthropic + OpenAI. Until then, Claude is reachable via the openrouter/anthropic/claude-* path (billed against our OpenRouter prepay, not your llmdeal credit budget directly — pass-through pricing).
Override per-request by passing any specific model name (e.g. model: "deepseek-chat"). The router only activates when you set model: "smart-route".
Short version: contact handle and order record are stored. Prompts and responses are not retained. Full breakdown in the Privacy & data section above.
Our own model runs in the EU — GDPR jurisdiction. We don't log prompt content. We keep token counts for billing only.
When you route to a frontier provider (OpenAI, Anthropic, Groq), the request goes to them under their data policy — the same as if you called them directly with your own key. Anthropic doesn't train on API data. OpenAI lets you opt out via the dashboard.
The Elite tier pins routing to EU-only by default — requests never leave the EU even when escalated to frontier models. US-hosted models are not in the default Elite pool; they're provisioned on-demand, first-come-first-serve, tailored per end user only when a customer explicitly opts in.
Yes. We serve developers worldwide. "EU residency" refers to where our infrastructure runs (EEA GPU + EEA-based operator) — not who can use the service. No geographic restrictions on signup.
We apply GDPR-level handling to every customer regardless of location. Varying it per-jurisdiction is operational overhead we don't want.
Three reasons. One: credit cards require KYC and we won't ask for it. Two: chargebacks on usage-based products are a nightmare. Three: devs paying for an API shouldn't need to identify themselves.
We accept BTC (auto-checkout), XMR and LTC (semi-manual — DM us, we send a one-time address, you pay, we credit your account within 1-4 hours).
The router classifier and gateway code will be open-sourced once the gateway is stable post-launch (target Mon 18 May 2026). The marketing site and inference stack are private.
The underlying model (Qwen2.5-Coder-32B) is Apache 2.0 — Alibaba's open release. We didn't train it; we serve it.
An EEA-based independent operator on owned bare-metal infrastructure. No VC, no team, no roadmap deck. Pricing is honest because the cost structure is honest.
Public launch Monday 18 May 2026 (GMT+2). Preorders are open now — every BTC backer locks in +30% bonus credits and a beta key delivered on launch day.