API Docs — llmdeal.me LLM Gateway

1. Quickstart

One Bearer header. One POST. Standard OpenAI body. smart-route selects the cheapest capable model automatically.

export LLMDEAL_KEY=sk-...   # received via DM at launch

curl https://api.llmdeal.me/v1/chat/completions \
  -H "Authorization: Bearer $LLMDEAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-route",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

The response is standard OpenAI chat-completion shape — see §6. Response format. To pin a model, replace "smart-route" with any ID from §4. Models.

2. Authentication

Every request requires a Bearer token in the Authorization header:

Authorization: Bearer sk-<your-key>

Keys start with sk- and are 48 characters long.
Keys are delivered via DM at launch — to whichever contact handle (email / Telegram / Signal / Discord) you used at preorder.
Lost your key? DM the owner from the same handle you preordered with. We revoke the old key and DM you a fresh one.
Missing or invalid key → 401 invalid_api_key.

3. Base URL & endpoints

All endpoints live under a single base URL:

https://api.llmdeal.me/v1

POST /v1/chat/completions main inference endpoint

GET /v1/models list models available to your tier

GET /v1/usage current balance + month-to-date token usage

The path structure mirrors OpenAI exactly — any OpenAI-compatible client works unchanged with just a base URL swap.

4. Models

Set "model" to any ID below. Tier badges indicate the minimum subscription level required to access that model.

Live availability — updated 2026-05-14

● qwen-coder-32b — live (the EU 4090, fp16, 12k ctx)
● llama-3.3-70b-self-hosted — LIVE on our EEA GPU
● llama-3.1-8b-instant — live
● codestral — live (Mistral)
● qwen3-235b-cerebras — live (Cerebras)
● gemini-2.5-flash — live (Google)
● openrouter/* — live (~100 models)
● deepseek-chat / deepseek-reasoner — wired, awaiting operator prepay top-up

● live · ● wired, pending unblock · pre-launch the API serves noop-test; full routing goes live Mon 18 May 2026 (GMT+2)

Model ID	Tier	Notes
`smart-route`	Starter+	Default. Routes to the cheapest model in your tier that can handle the request.
`qwen-coder-32b`	Starter+	Our own Qwen2.5-Coder-32B in the EU, EU. 12k context (upgrading post-launch).
`llama-3.3-70b-self-hosted`	Pro+	Llama-3.3-70B self-hosted on our EEA GPU. No upstream-provider fees, EU jurisdiction end-to-end. LIVE.
`llama-3.1-8b-instant`	Starter+	Llama-3.1-8B fast tier (small-prompt workhorse). Cheapest fast routing.
`deepseek-chat`	Pro+	DeepSeek-V3 (general-purpose, strong reasoning). Wholesale: $0.27/$1.10 per 1M.
`deepseek-reasoner`	Pro+	DeepSeek-R1 (deep reasoning). Wholesale: $0.55/$2.19 per 1M.
`codestral`	Pro+	Mistral Codestral — coding specialist. Wholesale: $0.30/$0.90 per 1M.
`qwen3-235b-cerebras`	Pro+	Qwen3 235B MoE on Cerebras (~2000 tok/s). Wholesale: $0.85/$1.20 per 1M.
`gemini-2.5-flash`	Pro+	Google Gemini 2.5 Flash. Reasoning-mode default; pass `max_tokens: 300+` for visible output.
`openrouter/<provider>/<model>`	Pro+	~100 OpenRouter-aggregated models. Billed against our OpenRouter prepay; pass-through pricing applies.
`llama-3.3-70b-self-hosted`	$1,000 stretch goal	Self-hosted Llama-3.3-70B on a EEA A6000 (no upstream-provider fees). Unlocks at the $1k preorder threshold.

Call GET /v1/models at runtime to get the exact list available to your key. Tier upgrades reflect immediately — no config changes needed.

5. Request format

Standard OpenAI chat/completions body. JSON only, Content-Type: application/json required.

{
  "model": "smart-route",
  "messages": [
    {"role": "system", "content": "You are a terse assistant."},
    {"role": "user",   "content": "Summarise this PR in one sentence."}
  ],
  "temperature": 0.2,
  "max_tokens": 512,
  "top_p": 1,
  "stream": false
}

Supported parameters

model — required. See §4. Models.
messages — required. Array of { role, content }. role is system, user, assistant, or tool.
temperature, top_p, max_tokens — standard OpenAI semantics.
stream — when true, returns SSE chunks; see §7. Streaming.
tools — supported on most upstream providers (OpenAI-compatible tools/tool_choice wire format). Specific OpenRouter routes inherit whatever the underlying provider supports.

6. Response format

Standard OpenAI chat-completion shape — with two extra usage fields that tell you exactly which model ran and what it cost.

{
  "id": "chatcmpl-9f2c…",
  "object": "chat.completion",
  "created": 1747200000,
  "model": "smart-route",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Refactored the loop into a single-pass reduce."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens":     412,
    "completion_tokens": 87,
    "total_tokens":      499,
    "model_used":        "qwen-coder-32b",
    "credits_used_usd":  0.000293
  }
}

llmdeal-specific fields

usage.model_used — the underlying model smart-route selected. Matches your pinned model when you specify one directly.
usage.credits_used_usd — exact USD deducted from your balance for this request, covering input + output tokens at the per-model rate.

7. Streaming

Set "stream": true to receive Server-Sent Events. The format is wire-identical to OpenAI's streaming response: data: { ... } chunks, terminated by data: [DONE].

data: {"id":"chatcmpl-9f2c…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":"Refactored"},"finish_reason":null}]}

data: {"id":"chatcmpl-9f2c…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" the loop"},"finish_reason":null}]}

data: {"id":"chatcmpl-9f2c…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":412,"completion_tokens":87,"total_tokens":499,"model_used":"qwen-coder-32b","credits_used_usd":0.000293}}

data: [DONE]

The final chunk carries the complete usage object, including model_used and credits_used_usd. The OpenAI Python and Node SDKs parse this shape natively — no custom parser required.

8. Error codes

All errors return JSON in the standard OpenAI { "error": { "code", "message" } } envelope.

HTTP	Code	Meaning
`401`	`invalid_api_key`	Auth header missing, malformed, or key revoked.
`402`	`insufficient_credits`	Your balance is less than the estimated cost of this request. Top up at /buy.html.
`403`	`tier_does_not_allow_model`	Your tier can't select this model (e.g. Starter requesting `deepseek-reasoner`).
`429`	`rate_limited`	Per-key rate limit exceeded. See §10. Rate limits. Retry after `Retry-After` seconds.
`503`	`model_unavailable`	The specific upstream model is down. Either retry with `smart-route` or pin a fallback.
`500`	`internal_error`	Unexpected. Safe to retry once.

{
  "error": {
    "code":    "insufficient_credits",
    "message": "Balance $0.0021 is below the estimated cost $0.012 for this request."
  }
}

9. Code samples

One request, three runtimes. The Python and Node examples use the official openai SDK with the llmdeal base URL — a one-line swap for any project already on OpenAI's client libraries.

curl

curl https://api.llmdeal.me/v1/chat/completions \
  -H "Authorization: Bearer $LLMDEAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-route",
    "messages": [
      {"role": "user", "content": "Write a Python one-liner to flatten a list of lists."}
    ]
  }'

Python (`openai` SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.llmdeal.me/v1",
    api_key="sk-...",   # your llmdeal key
)

resp = client.chat.completions.create(
    model="smart-route",
    messages=[
        {"role": "user", "content": "Write a Python one-liner to flatten a list of lists."}
    ],
)

print(resp.choices[0].message.content)
print("routed to:", resp.usage.model_extra["model_used"])
print("cost USD:", resp.usage.model_extra["credits_used_usd"])

Node / TypeScript (`openai` SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.llmdeal.me/v1",
  apiKey:  process.env.LLMDEAL_KEY,
});

const resp = await client.chat.completions.create({
  model: "smart-route",
  messages: [
    { role: "user", content: "Write a Python one-liner to flatten a list of lists." },
  ],
});

console.log(resp.choices[0]?.message.content);
// Cast to access llmdeal-specific usage fields:
const usage = resp.usage as typeof resp.usage & {
  model_used: string;
  credits_used_usd: number;
};
console.log("routed to:", usage.model_used);
console.log("cost USD:", usage.credits_used_usd);

Streaming (Node)

const stream = await client.chat.completions.create({
  model: "smart-route",
  messages: [{ role: "user", content: "Stream a haiku about refactoring." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

10. Rate limits

Rate limits apply per API key — sustained throughput plus a short burst allowance:

Tier	Sustained	Burst (30 s)
Starter	60 req/min	120 req/min
Pro	200 req/min	400 req/min
Elite	500 req/min	1000 req/min

Exceeding the burst window returns 429 rate_limited with a Retry-After header in seconds. Token throughput is uncapped — only request count is limited.

11. SDK compatibility

Works out of the box

Any client that lets you override the base URL on an OpenAI-style interface:

openai (Python) — set base_url.
openai / @openai/sdk (Node, TypeScript) — set baseURL.
instructor (structured output wrapper) — wraps the openai client.
LangChain — pass openai_api_base="https://api.llmdeal.me/v1" to ChatOpenAI.
Any OpenAI-compatible client library — point api_base at https://api.llmdeal.me/v1.
aider — --openai-api-base https://api.llmdeal.me/v1.
cursor — add as a custom OpenAI provider in settings.
Any client that speaks the OpenAI /chat/completions wire format.

Not compatible

The Anthropic-native SDK (@anthropic-ai/sdk, anthropic Python) — uses a different wire format. llmdeal does not offer Claude models; use the OpenAI-compatible path with any of the available open-weight models instead.

12. Stability

Everything documented above is the launch spec. Model-serving endpoints go live on Monday 18 May 2026, GMT+2.
Until launch, https://api.llmdeal.me/v1 serves a noop-test model — useful for client-library smoke tests, nothing else.
Preorder credits at /buy.html convert to live balance at launch. Credits never expire. Full BTC refund available before launch.
Material changes to this API are logged in the project CHANGELOG. Endpoint shape (/v1/chat/completions body) is frozen — additive fields only between now and v1 GA.
Versioning: /v1 stays stable for at least 12 months past launch. Breaking changes ship under /v2 with a 6-month deprecation window.

Ready to lock in your credits? Preorder credits →