1. Quickstart
One Bearer header. One POST. Standard OpenAI body. smart-route
selects the cheapest capable model automatically.
export LLMDEAL_KEY=sk-... # received via DM at launch
curl https://api.llmdeal.me/v1/chat/completions \
-H "Authorization: Bearer $LLMDEAL_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "smart-route",
"messages": [{"role": "user", "content": "Hello!"}]
}'
The response is standard OpenAI chat-completion shape — see
§6. Response format. To pin a model, replace
"smart-route" with any ID from §4. Models.
2. Authentication
Every request requires a Bearer token in the Authorization header:
Authorization: Bearer sk-<your-key>
- Keys start with
sk- and are 48 characters long.
- Keys are delivered via DM at launch — to whichever contact handle (email / Telegram / Signal / Discord) you used at preorder.
- Lost your key? DM the owner from the same handle you preordered with. We revoke the old key and DM you a fresh one.
- Missing or invalid key →
401 invalid_api_key.
3. Base URL & endpoints
All endpoints live under a single base URL:
https://api.llmdeal.me/v1
POST
/v1/chat/completions
main inference endpoint
GET
/v1/models
list models available to your tier
GET
/v1/usage
current balance + month-to-date token usage
The path structure mirrors OpenAI exactly — any OpenAI-compatible client works
unchanged with just a base URL swap.
4. Models
Set "model" to any ID below. Tier badges indicate the minimum subscription
level required to access that model.
Live availability — updated 2026-05-14
- ●
qwen-coder-32b — live (the EU 4090, fp16, 12k ctx)
- ●
llama-3.3-70b-self-hosted — LIVE on our EEA GPU
- ●
llama-3.1-8b-instant — live
- ●
codestral — live (Mistral)
- ●
qwen3-235b-cerebras — live (Cerebras)
- ●
gemini-2.5-flash — live (Google)
- ●
openrouter/* — live (~100 models)
- ●
deepseek-chat / deepseek-reasoner — wired, awaiting operator prepay top-up
● live · ● wired, pending unblock · pre-launch the API serves noop-test; full routing goes live Mon 18 May 2026 (GMT+2)
| Model ID |
Tier |
Notes |
smart-route |
Starter+ |
Default. Routes to the cheapest model in your tier that can handle the request. |
qwen-coder-32b |
Starter+ |
Our own Qwen2.5-Coder-32B in the EU, EU. 12k context (upgrading post-launch). |
llama-3.3-70b-self-hosted |
Pro+ |
Llama-3.3-70B self-hosted on our EEA GPU. No upstream-provider fees, EU jurisdiction end-to-end. LIVE. |
llama-3.1-8b-instant |
Starter+ |
Llama-3.1-8B fast tier (small-prompt workhorse). Cheapest fast routing. |
deepseek-chat |
Pro+ |
DeepSeek-V3 (general-purpose, strong reasoning). Wholesale: $0.27/$1.10 per 1M. |
deepseek-reasoner |
Pro+ |
DeepSeek-R1 (deep reasoning). Wholesale: $0.55/$2.19 per 1M. |
codestral |
Pro+ |
Mistral Codestral — coding specialist. Wholesale: $0.30/$0.90 per 1M. |
qwen3-235b-cerebras |
Pro+ |
Qwen3 235B MoE on Cerebras (~2000 tok/s). Wholesale: $0.85/$1.20 per 1M. |
gemini-2.5-flash |
Pro+ |
Google Gemini 2.5 Flash. Reasoning-mode default; pass max_tokens: 300+ for visible output. |
openrouter/<provider>/<model> |
Pro+ |
~100 OpenRouter-aggregated models. Billed against our OpenRouter prepay; pass-through pricing applies. |
llama-3.3-70b-self-hosted |
$1,000 stretch goal |
Self-hosted Llama-3.3-70B on a EEA A6000 (no upstream-provider fees). Unlocks at the $1k preorder threshold. |
Call GET /v1/models at runtime to get the exact list available to your key.
Tier upgrades reflect immediately — no config changes needed.
5. Request format
Standard OpenAI chat/completions body. JSON only, Content-Type: application/json required.
{
"model": "smart-route",
"messages": [
{"role": "system", "content": "You are a terse assistant."},
{"role": "user", "content": "Summarise this PR in one sentence."}
],
"temperature": 0.2,
"max_tokens": 512,
"top_p": 1,
"stream": false
}
Supported parameters
model — required. See §4. Models.
messages — required. Array of { role, content }. role is system, user, assistant, or tool.
temperature, top_p, max_tokens — standard OpenAI semantics.
stream — when true, returns SSE chunks; see §7. Streaming.
tools — supported on most upstream providers (OpenAI-compatible tools/tool_choice wire format). Specific OpenRouter routes inherit whatever the underlying provider supports.
6. Response format
Standard OpenAI chat-completion shape — with two extra usage fields
that tell you exactly which model ran and what it cost.
{
"id": "chatcmpl-9f2c…",
"object": "chat.completion",
"created": 1747200000,
"model": "smart-route",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Refactored the loop into a single-pass reduce."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 412,
"completion_tokens": 87,
"total_tokens": 499,
"model_used": "qwen-coder-32b",
"credits_used_usd": 0.000293
}
}
llmdeal-specific fields
usage.model_used — the underlying model smart-route selected. Matches your pinned model when you specify one directly.
usage.credits_used_usd — exact USD deducted from your balance for this request, covering input + output tokens at the per-model rate.
7. Streaming
Set "stream": true to receive Server-Sent Events. The format is wire-identical
to OpenAI's streaming response: data: { ... } chunks, terminated by
data: [DONE].
data: {"id":"chatcmpl-9f2c…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":"Refactored"},"finish_reason":null}]}
data: {"id":"chatcmpl-9f2c…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" the loop"},"finish_reason":null}]}
data: {"id":"chatcmpl-9f2c…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":412,"completion_tokens":87,"total_tokens":499,"model_used":"qwen-coder-32b","credits_used_usd":0.000293}}
data: [DONE]
The final chunk carries the complete usage object, including
model_used and credits_used_usd. The OpenAI Python and Node
SDKs parse this shape natively — no custom parser required.
8. Error codes
All errors return JSON in the standard OpenAI { "error": { "code", "message" } } envelope.
| HTTP |
Code |
Meaning |
401 |
invalid_api_key |
Auth header missing, malformed, or key revoked. |
402 |
insufficient_credits |
Your balance is less than the estimated cost of this request. Top up at /buy.html. |
403 |
tier_does_not_allow_model |
Your tier can't select this model (e.g. Starter requesting deepseek-reasoner). |
429 |
rate_limited |
Per-key rate limit exceeded. See §10. Rate limits. Retry after Retry-After seconds. |
503 |
model_unavailable |
The specific upstream model is down. Either retry with smart-route or pin a fallback. |
500 |
internal_error |
Unexpected. Safe to retry once. |
{
"error": {
"code": "insufficient_credits",
"message": "Balance $0.0021 is below the estimated cost $0.012 for this request."
}
}
9. Code samples
One request, three runtimes. The Python and Node examples use the official
openai SDK with the llmdeal base URL — a one-line swap for any project
already on OpenAI's client libraries.
curl
curl https://api.llmdeal.me/v1/chat/completions \
-H "Authorization: Bearer $LLMDEAL_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "smart-route",
"messages": [
{"role": "user", "content": "Write a Python one-liner to flatten a list of lists."}
]
}'
Python (openai SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.llmdeal.me/v1",
api_key="sk-...", # your llmdeal key
)
resp = client.chat.completions.create(
model="smart-route",
messages=[
{"role": "user", "content": "Write a Python one-liner to flatten a list of lists."}
],
)
print(resp.choices[0].message.content)
print("routed to:", resp.usage.model_extra["model_used"])
print("cost USD:", resp.usage.model_extra["credits_used_usd"])
Node / TypeScript (openai SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.llmdeal.me/v1",
apiKey: process.env.LLMDEAL_KEY,
});
const resp = await client.chat.completions.create({
model: "smart-route",
messages: [
{ role: "user", content: "Write a Python one-liner to flatten a list of lists." },
],
});
console.log(resp.choices[0]?.message.content);
// Cast to access llmdeal-specific usage fields:
const usage = resp.usage as typeof resp.usage & {
model_used: string;
credits_used_usd: number;
};
console.log("routed to:", usage.model_used);
console.log("cost USD:", usage.credits_used_usd);
Streaming (Node)
const stream = await client.chat.completions.create({
model: "smart-route",
messages: [{ role: "user", content: "Stream a haiku about refactoring." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
10. Rate limits
Rate limits apply per API key — sustained throughput plus a short burst allowance:
| Tier |
Sustained |
Burst (30 s) |
| Starter |
60 req/min |
120 req/min |
| Pro |
200 req/min |
400 req/min |
| Elite |
500 req/min |
1000 req/min |
Exceeding the burst window returns 429 rate_limited with a
Retry-After header in seconds. Token throughput is uncapped — only
request count is limited.
11. SDK compatibility
Works out of the box
Any client that lets you override the base URL on an OpenAI-style interface:
openai (Python) — set base_url.
openai / @openai/sdk (Node, TypeScript) — set baseURL.
instructor (structured output wrapper) — wraps the openai client.
LangChain — pass openai_api_base="https://api.llmdeal.me/v1" to ChatOpenAI.
- Any OpenAI-compatible client library — point
api_base at https://api.llmdeal.me/v1.
aider — --openai-api-base https://api.llmdeal.me/v1.
cursor — add as a custom OpenAI provider in settings.
- Any client that speaks the OpenAI
/chat/completions wire format.
Not compatible
- The Anthropic-native SDK (
@anthropic-ai/sdk, anthropic Python) — uses a different wire format. llmdeal does not offer Claude models; use the OpenAI-compatible path with any of the available open-weight models instead.
12. Stability
- Everything documented above is the launch spec. Model-serving endpoints go live on Monday 18 May 2026, GMT+2.
- Until launch,
https://api.llmdeal.me/v1 serves a noop-test model — useful for client-library smoke tests, nothing else.
- Preorder credits at /buy.html convert to live balance at launch. Credits never expire. Full BTC refund available before launch.
- Material changes to this API are logged in the project CHANGELOG. Endpoint shape (
/v1/chat/completions body) is frozen — additive fields only between now and v1 GA.
- Versioning:
/v1 stays stable for at least 12 months past launch. Breaking changes ship under /v2 with a 6-month deprecation window.
Ready to lock in your credits?
Preorder credits →