llmdeal.me is an OpenAI-compatible LLM API gateway. Drop in your key, set model to
smart-route, and each request lands on the cheapest model that can handle it:
Qwen-Coder-32B for boilerplate, Groq Llama and DeepSeek-V3 for mid-complexity, Mistral
Codestral for code, Qwen3-235B for reasoning. GPU capacity in the US and the EU — EU-resident routing available as a privacy option. Crypto checkout. No KYC.
Preorder in BTC — +30% bonus credits through Wed 20 May 2026. $20 buys $26 in credits. $100 buys $130. Credits activate at launch (Mon 18 May).
$ curl https://api.llmdeal.me/v1/chat/completions \
-H "Authorization: Bearer $LLMDEAL_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "smart-route",
"messages": [{"role": "user", "content": "refactor this function"}]
}'
# Routed to qwen-coder-32b — $0.0004 instead of $0.012 on Sonnet
Most teams send every query to Sonnet or GPT-4o. A third of those are formatting, syntax questions, regex explanations, or boilerplate generation — tasks a 32B coding model handles just as well at <5% of the cost.
Every request hits Claude Sonnet 4.6 at $3 input / $15 output per 1M tokens. You pay frontier rates for the long tail of work that doesn't need it.
Boilerplate → our Qwen-Coder-32B (~$0.80/$1.50) on our own GPU. Fast queries → llama-3.3-70b-self-hosted on our own GPU. Reasoning → DeepSeek-V3 / Qwen3-235B on Cerebras. Code → Mistral Codestral. Choose EU-resident routing and all inference stays in EEA — or use our US capacity; your call.
No magic, no VC hand-waving. Just boring infrastructure decisions that add up.
An open-source classifier scores each request by difficulty and routes it to the cheapest model that can handle it. Thresholds are tunable per workload.
Qwen2.5-Coder-32B runs on our own GPU hardware. Flat hardware costs, no per-token margin to a cloud provider — so we undercut every commodity Llama-70B reseller.
llama-3.3-70b-self-hosted — LIVE on our GPUNow live: llama-3.3-70b-self-hosted running on our own GPU — no upstream fees, same API key as the rest of the gateway. Choose EU-resident routing and it stays EEA end-to-end. In the Pro routing mix today.
We run GPU capacity in both the US and the EU. Open to developers worldwide. Want your inference in the EEA? Choose EU-resident routing and GDPR jurisdiction applies end-to-end. Prefer lower-latency US capacity? That's available too.
Pay in BTC, XMR, or LTC. No identity check, no card decline, no chargeback exposure. We don't know who you are and don't need to.
We keep only what billing requires: contact handle, order ID, dollar amount, token counts. We do not store prompt content, response content, or IPs past settlement. GDPR-compliant by default — same rules apply worldwide, not just for EU users.
Pay-as-you-go in crypto. No subscription. No credits that expire. Rates lock in at public launch Mon 18 May 2026 (GMT+2).
llama-3.3-70b-self-hosted + DeepSeek + Codestral + Qwen3-235BPay in crypto now. Lock in 30% extra credits, get a beta key the day the gateway opens. No KYC. No subscription. Credits don't expire.
After Mon 18 May 2026, these prices are gone as a standing offer — the same model mix will only be available through consortium-tier deals with added profit margin. The honest version: if preorder volume doesn't fund the GPU, the window closes and that's it.
$50 pack on the buy page. Need a custom amount? DM me.
Larger commit, larger perks. Each tier directly funds the next GPU node.
router_logic.py when it open-sources25 slots remaining
10 slots remaining
Every dollar is earmarked against a specific threshold. Hit it, that infra deploys. Public counter, no spin.
Loading roadmap …
Counter updates on each page load. No polling loop — we're not here to burn your battery.
Three channels. Matrix is preferred (E2EE, federated, no phone number required). Telegram works. Discord works now but the invite rotates before 18 May — preorder backers get the new link DM'd directly.
<owner-matrix-handle>
— DM'd to backers + published here before 18 May.<owner-telegram-handle>
— same: DM'd to backers + published here before 18 May.Drop your contact. I'll reach out once when the gateway opens — no newsletter, no drip sequence, one message.
One DM when the gateway opens. Nothing before that, nothing after.
We operate under GDPR (Norway / EEA) and apply the same rules to every customer, regardless of where you are. No legal boilerplate — just the facts.
Order record. Order ID, SKU, currency, amount, status, timestamps. Append-only ledger — needed to credit your account at launch.
Contact handle. The email or messaging handle you provide at checkout. Used only to deliver your API key and billing notifications.
Token counts. Once live, we log input + output token counts per request for billing. The prompt content is never stored.
Prompt content. Prompts and responses are discarded when the request completes. No training. No content audit log.
KYC / identity data. No name, address, government ID, or card details — ever. Crypto only.
IP addresses. Held transiently for fraud and rate-limit checks, deleted within 24 h of order settlement.
Want your data deleted? DM us from the handle you signed up with. We remove the order record within 48 hours and send you a deletion timestamp. GDPR Article 17 ("right to be forgotten") — extended to every customer, EU resident or not. Full policy: /privacy.html.
Straight answers. More added as beta progresses.
The gateway isn't fully live yet — GPU nodes are being provisioned. Preorders fund the hardware deposit and signal real demand. In return: +30% bonus credits baked into every preorder pack, plus priority onboarding when the gateway opens.
Pay $X in BTC today, get $X × 1.30 in llmdeal credits at launch. Credits never expire. No subscription follows.
Target: Monday 18 May 2026, GMT+2 (Central European Summer Time). GPU is being provisioned this week; smart routing and gateway go live on that date.
If launch slips past 2026-06-01, every preorder is refundable in BTC to the address you paid from. No questions — DM me on Telegram / Signal.
Status updates go to your DMs (whatever handle you give at checkout) — at least one before launch day.
Full refund. BTC value back to your payment address, on request, any time before launch.
After launch, credits stay refundable too — gated by recorded usage. See the money-back guarantee below.
The tier prices on /pricing.html are the preorder window. After public launch on 18 May, the same model mix will likely only be reachable through consortium-tier deals with added profit margin — not as standing retail pricing on this page.
The honest version: if significant preorder volume doesn't arrive, the GPU doesn't get funded, the gateway doesn't open at these prices, and the window closes. No preorders → no llmdeal at this price level → you're welcome to keep paying frontier providers what they ask, where the money really, really matters. That's the deal.
Full refund available on every order ever placed, gated only by cumulative token-usage time recorded on your account. Under 3 hours total of recorded usage across all orders? Refundable on request — DM the founder. Past 3 hours, the service is considered consumed.
Refunds are per-second prorated against recorded usage, minus the on-chain crypto network fee. For BTC refunds you cover the network fee in fiat upfront — we don't deduct it from the refunded principal; the BTC value you paid comes back to your payment address. XMR + LTC refunds net the fee on-chain.
This isn't a 14-day trial gimmick — the clock runs on actual usage, not calendar time. Preorder, hold credits, never call the API? You can refund a year later.
Yes — check before you preorder, not after. Every model in our Pro mix has a public per-token price and an independent benchmark score on third-party surfaces with no financial stake in llmdeal.me:
If our Pro mix isn't on those leaderboards at the prices we quote, the refund guarantee applies — see above.
Within 7 days of the public counter crossing $1k. The marketing milestone flips the moment it's crossed; the A6000 spins up shortly after.
Honest caveat: we apply a small internal safety buffer — slightly above the public threshold plus a check that real customers, not a single whale, drove the number — before wiring up the A6000. That prevents one $500 Founder order followed by a refund from triggering a month of GPU rental we can't sustain. The public milestone still flips green the moment the public threshold is met.
No. We run our own model (Qwen2.5-Coder-32B) on our own GPU. We smart-route to Groq, Together, Cerebras, and DeepSeek when a query needs more horsepower than our model can give.
The goal isn't to replace frontier models. It's to not pay frontier prices for queries that don't need them.
Every request hits a small open-source classifier (RouteLLM-style) that scores difficulty. Easy queries (formatting, simple regex, syntax fixes) → our Qwen-Coder-32B on our own GPU. Fast workhorse queries → llama-3.3-70b-self-hosted on our own GPU. Reasoning queries → DeepSeek-V3. Coding-heavy queries → Mistral Codestral. Highest-difficulty queries → Qwen3-235B on Cerebras (frontier OSS-class).
Override per-request by passing any specific model name (e.g. model: "deepseek-chat"). The router only activates when you set model: "smart-route".
Short version: contact handle and order record are stored. Prompts and responses are not retained. Full breakdown in the Privacy & data section above.
Our own models run on our GPU hardware — GDPR compliance applies. We don't log prompt content. We keep token counts for billing only.
When queries route to open-weight providers (Groq, Cerebras, Together, DeepSeek's own API), those requests go to them under their respective data policies — the same as if you called them directly. None of these providers train on API traffic by policy.
The Elite tier uses EU-resident routing by default — when that's active, requests stay within EEA jurisdiction end-to-end. US capacity is switchable on request.
Yes. We serve developers worldwide — no geographic restrictions on signup. We run GPU capacity in both the US and the EU. "EU-resident inference" is a privacy option (the default on Elite), not a limit on who can sign up or where compute runs.
We apply GDPR-level handling to every customer regardless of location. Varying it per-jurisdiction is operational overhead we don't want.
Three reasons. One: credit cards require KYC and we won't ask for it. Two: chargebacks on usage-based products are a nightmare. Three: devs paying for an API shouldn't need to identify themselves.
We accept BTC (auto-checkout), XMR and LTC (semi-manual — DM us, we send a one-time address, you pay, we credit your account within 1-4 hours).
The router classifier and gateway code will be open-sourced once the gateway is stable post-launch (target Mon 18 May 2026). The marketing site and inference stack are private.
The underlying model (Qwen2.5-Coder-32B) is Apache 2.0 — Alibaba's open release. We didn't train it; we serve it.
An EEA-based independent operator on owned bare-metal infrastructure. No VC, no team, no roadmap deck. Pricing is honest because the cost structure is honest.
Public launch Monday 18 May 2026 (GMT+2). Preorders are open now — every BTC backer locks in +30% bonus credits and a beta key delivered on launch day.