llmdeal.me is a smart LLM gateway. One API key routes each request to the cheapest model that can actually answer it — our own Qwen-Coder-32B for trivial work, Groq Llama and DeepSeek-V3 for the medium range, Mistral Codestral for code, Qwen3-235B for reasoning. Frontier API access (Claude / GPT-4o) unlocks once preorders fund the operator credit. Open to developers worldwide; EU-resident infrastructure for the privacy-conscious. Crypto checkout. No KYC.
Preorder now in BTC — +30% bonus credits through Wed 20 May 2026. $20 buys you $26 in credits. $100 buys $130. Use them the moment the gateway opens (Mon 18 May).
$ curl https://api.llmdeal.me/v1/chat/completions \
-H "Authorization: Bearer $LLMDEAL_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "smart-route",
"messages": [{"role": "user", "content": "refactor this function"}]
}'
# Routed to qwen-coder-32b — $0.0004 instead of $0.012 on Sonnet
Most teams default every query to Sonnet or GPT-4o. But a third of those queries are formatting, syntax, "what does this regex do", or generating boilerplate — work a 32B coding model handles identically at <5% of the cost.
All queries → Claude Sonnet 4.6 at $3 input / $15 output per 1M tokens. You're paying frontier prices for the long tail of trivial requests.
Trivial → our Qwen-Coder-32B in the EU (~$0.80/$1.50). Fast queries → llama-3.3-70b-self-hosted on our EU GPU. Reasoning → DeepSeek-V3 / Qwen3-235B on Cerebras. Coding → Mistral Codestral. Frontier (Claude / GPT-4o) unlocks at the $3,500 stretch-goal threshold.
No "AI startup" magic. Boring engineering moves that compound.
Open-source classifier (RouteLLM-style) scores each request by difficulty, then routes to the cheapest model that can actually answer. Tune the thresholds per workload.
Qwen2.5-Coder-32B running on our GPU in the EU for the cheap tier. We pay flat hardware costs, so we can price below every commodity Llama-70B reseller.
llama-3.3-70b-self-hosted — LIVE on our EEA GPUNow live: llama-3.3-70b-self-hosted serving from our own EEA GPU box — no upstream-provider fees, same auth as the rest of the gateway, EU jurisdiction end-to-end. Available in the Pro smart-routing mix today.
Inference runs in the EU — GDPR jurisdiction. Open to developers worldwide; we just put the compute in the EU so privacy-conscious workloads don't have to leave it.
Pay in BTC, XMR, or LTC. No identity check, no card decline, no chargeback risk. We don't ask who you are.
We store only what we need to bill you and reach you: contact handle, order ID, dollar amount, token counts. We do not store prompt content, response content, or IPs past order settlement. GDPR-compliant by default — we apply the same handling worldwide.
Pay-as-you-go in crypto. No subscription. No "credits" that expire. Final pricing locks in at public launch Mon 18 May 2026 (GMT+2) — preview shown.
llama-3.3-70b-self-hosted + DeepSeek + Codestral + Qwen3-235BBack the build now in crypto, lock in a 30% credit bonus, get a beta key as soon as the gateway opens. No KYC. No subscription. No expiry.
After Mon 18 May 2026, these prices stop being a standing retail offer — same model mix may then only be reachable via consortium-tier deals with added profit margin. If the preorder window doesn't bring significant volume, the GPU doesn't get funded and the window simply closes.
$50 pack also available on the buy page. Want a custom amount? DM me.
Bigger commit = bigger perks. Funds the next-tier GPU nodes directly.
router_logic.py when it open-sources25 slots remaining
10 slots remaining
Every preorder dollar is earmarked. Hit a threshold, that infra deploys. Public counter, no spin.
Loading roadmap …
Refresh-rate: each page load fetches the live counter. No client-side polling — we don't want to wake your laptop.
Three channels. Matrix is preferred (E2EE, federated, no phone number). Telegram works. Discord works now but the invite is being rotated soon and may not stay reachable past 18 May — preorder backers will get the new one DM'd directly.
<owner-matrix-handle>
— handle DM'd to backers + posted here before 18 May.<owner-telegram-handle>
— same: DM'd to backers + posted here before 18 May.If you're not buying yet, drop your contact and I'll ping you once at launch. No newsletter, no drip, one message.
No spam. No newsletter. One DM when the gateway opens.
We operate under GDPR (Norway / EEA) and apply the same handling to every customer regardless of where you live. Plain English, no privacy-policy-lawyer-speak.
Order record. Order ID, SKU, currency, dollar amount, status, timestamps. Append-only ledger so we can credit your account at launch.
Contact handle. The email / Telegram / Discord / Signal handle you give us at checkout. Used only to deliver your API key and bill-readiness DMs.
Token counts. Once the gateway is live, we log the number of input + output tokens per request — for billing only. Never the prompt content.
Prompt content. Your prompts and responses are not retained after the request completes. No training, no audit log of content.
KYC / identity data. We don't ask for your name, address, ID, or payment-card details. Crypto only.
IP addresses. Captured transiently for fraud/rate-limit, deleted within 24 h of order settlement.
Want your data deleted? DM the contact handle you signed up with. We remove the order record within 48 hours and email you the deletion timestamp. GDPR Article 17 ("right to be forgotten") — extended to every customer, EU resident or not. Full policy: /privacy.html.
Honest answers. We'll add more as the beta progresses.
The gateway isn't fully live yet — the GPU node is being provisioned (the EU, EU). Preorders fund the GPU deposit and signal demand. In return: +30% bonus credits baked into every preorder pack, plus priority onboarding when the gateway opens.
You pay $X in BTC today, you get $X × 1.30 in llmdeal credits when the gateway goes live. Credits never expire. No subscription kicks in afterwards.
Target: Monday 18 May 2026, GMT+2 (Central European Summer Time). GPU is being provisioned this week; smart routing + gateway go live on the date above.
If launch slips past 2026-06-01, every preorder is refundable in BTC at the address you paid from. No questions, just DM me on Telegram / Signal.
Status updates land in your DMs (whatever handle you give us at checkout) — at least one before launch day.
Refund. Full BTC value back to your payment address, on request, any time before launch.
After launch, credits remain refundable too — gated by recorded usage. See the money-back guarantee below.
The tier prices on /pricing.html are the preorder window. After public launch on 18 May, the same model mix may only be reachable through consortium-tier deals with added profit margin — not as standing retail pricing on this page.
And the honest version: if significant preorder volume doesn't arrive, the GPU doesn't get funded, the gateway doesn't open at these prices, and the window just closes. No preorders → no llmdeal at this price level → you're welcome to keep paying frontier providers what they ask, where the money really, really matters. That's the deal.
Zylo API ("Next-Generation AI Inference Platform") is a frontier API gateway with direct access to Claude Opus 4.6, GPT-5.2, Gemini 3.1 Pro, DeepSeek V3.2 Thinking, Qwen 3.5, MiniMax 2.5, Kimi K2.5 — 26+ models total.
Our collaboration: Pro and Elite tier customers get passthrough access to Zylo's frontier lineup at −40% Zylo's public retail rate, billed against your llmdeal credit balance. Same llmdeal API key, opt-in per request.
Headline rates (vs Zylo retail published at zyloai.net):
Why this is real: Zylo has direct upstream provider relationships and bulk pricing we don't. The partnership lets us pass their bulk rate through to our Pro+ customers with a meaningful discount on top. They get a distribution channel; our customers get frontier at non-frontier prices.
Crosscheck the retail numbers yourself at zyloai.net — that's part of why this partnership is visible on the page instead of buried.
Full refund available on every order ever placed, gated only by cumulative actual token-usage time recorded on your account. Under 3 hours total of recorded usage across all your orders ever? Refundable on request — DM founder. Past 3 hours, the service is considered consumed.
Refunds are per-second prorated against the usage actually recorded, minus the on-chain crypto network fee required to send the refund. For BTC refunds you cover the network fee in fiat upfront — we don't deduct it from the refunded principal; the BTC value you paid in comes back to your payment address. XMR + LTC refunds net the fee on-chain.
This isn't a 14-day trial gimmick — the clock is on actual usage, not calendar time. If you preorder, hold credits, and never touch the API, you can refund a year later.
Yes — and we'd rather you check before you preorder than after. Every model in our Pro mix has a public per-token price and an independent benchmark score on high-trust third-party surfaces that have no financial stake in llmdeal.me:
If our Pro mix isn't on those leaderboards at the prices we quote, the refund clock applies — see above.
Within 7 days of the public counter crossing $1k. We promise the threshold in marketing the moment it's crossed; the rented A6000 spins up shortly after.
Honest caveat: we also apply a small internal safety buffer (slightly above the public threshold + a check that real customers, not one whale, drove the number) before we wire up the A6000. This is so a single $500 Founder buying then asking for a refund doesn't trigger a month of GPU rental we can't sustain. The public milestone still flips green the moment the public threshold is met.
No. We run our own model (Qwen2.5-Coder-32B) on our own GPU. We also smart-route to OpenAI, Anthropic, Groq, Together, and DeepSeek when your query needs more horsepower than our model can give.
The point isn't to replace frontier models. The point is to not pay frontier prices for queries that don't need them.
Each request hits a small classifier (open-source, RouteLLM-style) that scores difficulty. Easy queries (formatting, simple regex, syntax fixes) → our Qwen-Coder-32B in the EU. Fast workhorse queries → llama-3.3-70b-self-hosted on our EU GPU. Reasoning queries → DeepSeek-V3. Coding-heavy queries → Mistral Codestral. Highest-difficulty queries → Qwen3-235B on Cerebras (frontier OSS-class).
Direct frontier API routing (Claude, GPT-4o) unlocks at the $3,500 preorder stretch goal — once preorders fund the operator credit at Anthropic + OpenAI. Until then, you can reach Claude via the openrouter/anthropic/claude-* path (billed against our OpenRouter prepay, not against your llmdeal credit budget directly — pass-through pricing).
You can override per-request by passing any specific model name (e.g. model: "deepseek-chat") directly. The router only kicks in when you ask for model: "smart-route".
Short version: contact handle + order record stored. Prompts and responses are not retained. Full breakdown in the Privacy & data section above.
Our own model runs in the EU — GDPR jurisdiction. We don't log prompt content. We keep token counts for billing only.
When you route to a frontier provider (OpenAI, Anthropic, Groq), the request goes to them under their data policy — exactly as if you called them directly with your own key. Anthropic doesn't train on API data. OpenAI lets you opt out via the dashboard.
The Elite tier pins routing to EU-only by default — your requests never leave the EU even when escalated to frontier models. US-hosted models are not in the default Elite pool; they're provisioned on-demand, first-come-first-serve, tailored per end user only when a customer explicitly opts in.
Yes. We serve developers worldwide. The "EU residency" piece is about where our infrastructure runs (EEA GPU + EEA-based operator) — not about who can use the service. No geographic restrictions on signup.
We apply GDPR-level handling to every customer regardless of where you live, because doing it differently per-jurisdiction is operational overhead we don't want.
Three reasons. One: credit cards require KYC and we don't want to ask. Two: chargebacks on usage-based products are a nightmare. Three: we genuinely think devs paying for an API shouldn't have to identify themselves.
We accept BTC (auto-checkout), XMR and LTC (semi-manual — DM us, we send a one-time address, you pay, we credit your account within 1-4 hours).
The router classifier and gateway code will be open-source once the gateway is stable post-launch (target Mon 18 May 2026). The marketing site and inference stack are private.
Our underlying model (Qwen2.5-Coder-32B) is Apache 2.0 — Alibaba's open release. We didn't train it; we serve it.
an EEA-based independent operator on owned bare-metal infrastructure. No VC, no team, no roadmap deck. Pricing is honest because the cost structure is honest.
Public launch Monday 18 May 2026 (GMT+2). Preorders open right now — every BTC backer locks in +30% bonus credits and a beta key delivered on launch day.