API reference · v1 · OpenAI-compatible

llmdeal API reference.

Drop-in OpenAI replacement. Same request shape, same response shape, same client libraries. Bearer auth, JSON in, JSON or SSE out. Let smart-route pick the cheapest model, or pin one yourself.

!
Launch Monday 18 May 2026, GMT+2. Endpoints documented below go live on that date. Preorder credits at /buy.html convert to live balance at launch. Until then, https://api.llmdeal.me/v1 serves a noop-test model — useful only for client-library smoke testing (auth + base URL + JSON shape).

1. Quickstart

One Bearer header. One POST. Standard OpenAI body. smart-route selects the cheapest capable model automatically.

export LLMDEAL_KEY=sk-...   # received via DM at launch

curl https://api.llmdeal.me/v1/chat/completions \
  -H "Authorization: Bearer $LLMDEAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-route",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

The response is standard OpenAI chat-completion shape — see §6. Response format. To pin a model, replace "smart-route" with any ID from §4. Models.

2. Authentication

Every request requires a Bearer token in the Authorization header:

Authorization: Bearer sk-<your-key>
  • Keys start with sk- and are 48 characters long.
  • Keys are delivered via DM at launch — to whichever contact handle (email / Telegram / Signal / Discord) you used at preorder.
  • Lost your key? DM the owner from the same handle you preordered with. We revoke the old key and DM you a fresh one.
  • Missing or invalid key → 401 invalid_api_key.

3. Base URL & endpoints

All endpoints live under a single base URL:

https://api.llmdeal.me/v1
POST /v1/chat/completions main inference endpoint
GET /v1/models list models available to your tier
GET /v1/usage current balance + month-to-date token usage

The path structure mirrors OpenAI exactly — any OpenAI-compatible client works unchanged with just a base URL swap.

4. Models

Set "model" to any ID below. Tier badges indicate the minimum subscription level required to access that model.

Live availability — updated 2026-05-14
  • qwen-coder-32b — live (the EU 4090, fp16, 12k ctx)
  • llama-3.3-70b-self-hosted — LIVE on our EEA GPU
  • llama-3.1-8b-instant — live
  • codestral — live (Mistral)
  • qwen3-235b-cerebras — live (Cerebras)
  • gemini-2.5-flash — live (Google)
  • openrouter/* — live (~100 models)
  • deepseek-chat / deepseek-reasoner — wired, awaiting operator prepay top-up
live · wired, pending unblock · pre-launch the API serves noop-test; full routing goes live Mon 18 May 2026 (GMT+2)
Model ID Tier Notes
smart-route Starter+ Default. Routes to the cheapest model in your tier that can handle the request.
qwen-coder-32b Starter+ Our own Qwen2.5-Coder-32B in the EU, EU. 12k context (upgrading post-launch).
llama-3.3-70b-self-hosted Pro+ Llama-3.3-70B self-hosted on our EEA GPU. No upstream-provider fees, EU jurisdiction end-to-end. LIVE.
llama-3.1-8b-instant Starter+ Llama-3.1-8B fast tier (small-prompt workhorse). Cheapest fast routing.
deepseek-chat Pro+ DeepSeek-V3 (general-purpose, strong reasoning). Wholesale: $0.27/$1.10 per 1M.
deepseek-reasoner Pro+ DeepSeek-R1 (deep reasoning). Wholesale: $0.55/$2.19 per 1M.
codestral Pro+ Mistral Codestral — coding specialist. Wholesale: $0.30/$0.90 per 1M.
qwen3-235b-cerebras Pro+ Qwen3 235B MoE on Cerebras (~2000 tok/s). Wholesale: $0.85/$1.20 per 1M.
gemini-2.5-flash Pro+ Google Gemini 2.5 Flash. Reasoning-mode default; pass max_tokens: 300+ for visible output.
openrouter/<provider>/<model> Pro+ ~100 OpenRouter-aggregated models. Billed against our OpenRouter prepay; pass-through pricing applies.
llama-3.3-70b-self-hosted $1,000 stretch goal Self-hosted Llama-3.3-70B on a EEA A6000 (no upstream-provider fees). Unlocks at the $1k preorder threshold.

Call GET /v1/models at runtime to get the exact list available to your key. Tier upgrades reflect immediately — no config changes needed.

5. Request format

Standard OpenAI chat/completions body. JSON only, Content-Type: application/json required.

{
  "model": "smart-route",
  "messages": [
    {"role": "system", "content": "You are a terse assistant."},
    {"role": "user",   "content": "Summarise this PR in one sentence."}
  ],
  "temperature": 0.2,
  "max_tokens": 512,
  "top_p": 1,
  "stream": false
}

Supported parameters

  • model — required. See §4. Models.
  • messages — required. Array of { role, content }. role is system, user, assistant, or tool.
  • temperature, top_p, max_tokens — standard OpenAI semantics.
  • stream — when true, returns SSE chunks; see §7. Streaming.
  • tools — supported on most upstream providers (OpenAI-compatible tools/tool_choice wire format). Specific OpenRouter routes inherit whatever the underlying provider supports.

6. Response format

Standard OpenAI chat-completion shape — with two extra usage fields that tell you exactly which model ran and what it cost.

{
  "id": "chatcmpl-9f2c…",
  "object": "chat.completion",
  "created": 1747200000,
  "model": "smart-route",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Refactored the loop into a single-pass reduce."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens":     412,
    "completion_tokens": 87,
    "total_tokens":      499,
    "model_used":        "qwen-coder-32b",
    "credits_used_usd":  0.000293
  }
}

llmdeal-specific fields

  • usage.model_used — the underlying model smart-route selected. Matches your pinned model when you specify one directly.
  • usage.credits_used_usd — exact USD deducted from your balance for this request, covering input + output tokens at the per-model rate.

7. Streaming

Set "stream": true to receive Server-Sent Events. The format is wire-identical to OpenAI's streaming response: data: { ... } chunks, terminated by data: [DONE].

data: {"id":"chatcmpl-9f2c…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":"Refactored"},"finish_reason":null}]}

data: {"id":"chatcmpl-9f2c…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" the loop"},"finish_reason":null}]}

data: {"id":"chatcmpl-9f2c…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":412,"completion_tokens":87,"total_tokens":499,"model_used":"qwen-coder-32b","credits_used_usd":0.000293}}

data: [DONE]

The final chunk carries the complete usage object, including model_used and credits_used_usd. The OpenAI Python and Node SDKs parse this shape natively — no custom parser required.

8. Error codes

All errors return JSON in the standard OpenAI { "error": { "code", "message" } } envelope.

HTTP Code Meaning
401 invalid_api_key Auth header missing, malformed, or key revoked.
402 insufficient_credits Your balance is less than the estimated cost of this request. Top up at /buy.html.
403 tier_does_not_allow_model Your tier can't select this model (e.g. Starter requesting deepseek-reasoner).
429 rate_limited Per-key rate limit exceeded. See §10. Rate limits. Retry after Retry-After seconds.
503 model_unavailable The specific upstream model is down. Either retry with smart-route or pin a fallback.
500 internal_error Unexpected. Safe to retry once.
{
  "error": {
    "code":    "insufficient_credits",
    "message": "Balance $0.0021 is below the estimated cost $0.012 for this request."
  }
}

9. Code samples

One request, three runtimes. The Python and Node examples use the official openai SDK with the llmdeal base URL — a one-line swap for any project already on OpenAI's client libraries.

curl

curl https://api.llmdeal.me/v1/chat/completions \
  -H "Authorization: Bearer $LLMDEAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-route",
    "messages": [
      {"role": "user", "content": "Write a Python one-liner to flatten a list of lists."}
    ]
  }'

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.llmdeal.me/v1",
    api_key="sk-...",   # your llmdeal key
)

resp = client.chat.completions.create(
    model="smart-route",
    messages=[
        {"role": "user", "content": "Write a Python one-liner to flatten a list of lists."}
    ],
)

print(resp.choices[0].message.content)
print("routed to:", resp.usage.model_extra["model_used"])
print("cost USD:", resp.usage.model_extra["credits_used_usd"])

Node / TypeScript (openai SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.llmdeal.me/v1",
  apiKey:  process.env.LLMDEAL_KEY,
});

const resp = await client.chat.completions.create({
  model: "smart-route",
  messages: [
    { role: "user", content: "Write a Python one-liner to flatten a list of lists." },
  ],
});

console.log(resp.choices[0]?.message.content);
// Cast to access llmdeal-specific usage fields:
const usage = resp.usage as typeof resp.usage & {
  model_used: string;
  credits_used_usd: number;
};
console.log("routed to:", usage.model_used);
console.log("cost USD:", usage.credits_used_usd);

Streaming (Node)

const stream = await client.chat.completions.create({
  model: "smart-route",
  messages: [{ role: "user", content: "Stream a haiku about refactoring." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

10. Rate limits

Rate limits apply per API key — sustained throughput plus a short burst allowance:

Tier Sustained Burst (30 s)
Starter 60 req/min 120 req/min
Pro 200 req/min 400 req/min
Elite 500 req/min 1000 req/min

Exceeding the burst window returns 429 rate_limited with a Retry-After header in seconds. Token throughput is uncapped — only request count is limited.

11. SDK compatibility

Works out of the box

Any client that lets you override the base URL on an OpenAI-style interface:

  • openai (Python) — set base_url.
  • openai / @openai/sdk (Node, TypeScript) — set baseURL.
  • instructor (structured output wrapper) — wraps the openai client.
  • LangChain — pass openai_api_base="https://api.llmdeal.me/v1" to ChatOpenAI.
  • Any OpenAI-compatible client library — point api_base at https://api.llmdeal.me/v1.
  • aider--openai-api-base https://api.llmdeal.me/v1.
  • cursor — add as a custom OpenAI provider in settings.
  • Any client that speaks the OpenAI /chat/completions wire format.

Not compatible

  • The Anthropic-native SDK (@anthropic-ai/sdk, anthropic Python) — uses a different wire format. llmdeal does not offer Claude models; use the OpenAI-compatible path with any of the available open-weight models instead.

12. Stability

  • Everything documented above is the launch spec. Model-serving endpoints go live on Monday 18 May 2026, GMT+2.
  • Until launch, https://api.llmdeal.me/v1 serves a noop-test model — useful for client-library smoke tests, nothing else.
  • Preorder credits at /buy.html convert to live balance at launch. Credits never expire. Full BTC refund available before launch.
  • Material changes to this API are logged in the project CHANGELOG. Endpoint shape (/v1/chat/completions body) is frozen — additive fields only between now and v1 GA.
  • Versioning: /v1 stays stable for at least 12 months past launch. Breaking changes ship under /v2 with a 6-month deprecation window.

Ready to lock in your credits? Preorder credits →