The 95% gross margin nobody talks about
If you have ever wondered how OpenRouter, AIMLAPI, Bytez, or any of the dozens of AI gateways stay in business while charging "only" 5 to 60 percent over the upstream rate, the answer is that the spread is 5 to 60 percent on top of 95 percent gross margin on the upstream side. The upstream wholesale GPU providers (Groq, Cerebras, NVIDIA NIM, Together, Hyperbolic) are themselves running at 70 to 85 percent gross margin against bare-metal H100 cost. The "reseller markup" you see is real, but the upstream margin is the bigger number.
This is not a critique. It is the structure of the business. Inference is a high-fixed-cost, high-variable-margin product and the entire stack is priced accordingly. Every hosted endpoint you talk to is a margin stack and every layer of that stack is taking a cut.
The trick is that the marginal cost of routing a request you brought from your own Anthropic key is zero. There is no token cost to llmdeal because the tokens are billed against your key. There is a tiny compute cost for the routing decision and the proxy hop, which is what the $19/mo BYO Keys add-on covers (or which is included free on the $100/mo and up tiers).
The hosted gateways all want to sell you tokens at a markup. We will too, if you want bundled tokens. But we will also let you bring your own keys and pay us only for the routing layer. That is the part that is actually new.
What "bring your own keys" actually means
You sign up for llmdeal at any tier. In the dashboard, you paste in:
- Your existing Anthropic API key (if you have one).
- Your existing OpenAI API key (if you have one).
- Your existing OpenRouter, Together, Groq, Cerebras, or DeepInfra key (whichever you use).
- Your local Ollama or vLLM endpoint URL (if you run one).
You leave any of those blank if you don't have them. Then in your code, you call llmdeal's OpenAI-compatible endpoint exactly as you would call OpenAI's:
from openai import OpenAI
client = OpenAI(
base_url="https://api.llmdeal.me/v1",
api_key="ll_live_..."
)
response = client.chat.completions.create(
model="smart-route",
messages=[{"role": "user", "content": "Refactor this function..."}]
)
When that request hits llmdeal, the router looks at the prompt shape, the model you requested (or "smart-route" if you let us pick), and decides which lane to use. It might decide:
- "This is a short code-completion request, send it to your Groq key for Llama 3.3 70B at 700 tok/sec."
- "This is a long-context retrieval prompt, send it to your Together key for Qwen3-Next 80B at 128k context."
- "This is a hard reasoning prompt, send it to your Anthropic key for Claude Opus 4.6."
- "This is a vision prompt, send it to your OpenAI key for GPT-4o-vision."
The token cost lands on whichever upstream key handled the request. llmdeal bills you only for the subscription tier (which covers the routing layer, the dashboard, the per-key budgets, and the per-request cost telemetry).
The walkthrough
Here is what a customer typically does on day 1:
- Sign up for llmdeal Pro at $150/mo. Pay in BTC. Get an API key in the dashboard.
- Open the BYO Keys panel. Paste in your Anthropic key (you already had one with $40 of unused credit). Paste in your OpenAI key. Paste in your OpenRouter key.
- Set per-key monthly budgets so the router will not blow past them: "Anthropic, max $60/mo. OpenAI, max $40/mo. OpenRouter, max $25/mo. After that, fall back to llmdeal's hosted Llama 3.3 70B."
- Change your code's base URL from
https://api.anthropic.com(orhttps://api.openai.com/v1) tohttps://api.llmdeal.me/v1. Change the model name tosmart-route. - Start sending requests. The dashboard shows you, per request, which upstream lane was chosen and what it cost.
You stop having three separate billing relationships. You stop having three separate dashboards. You stop having three separate rate-limit pools to remember. You have one OpenAI-compatible endpoint, one dashboard, one bill from us for the routing layer, and N provider relationships consolidated into a single per-request decision.
Why the smart-route layer matters
Token pricing across providers is now wildly heterogeneous. Llama 3.3 70B via Groq costs you roughly $0.59/1M input and $0.79/1M output. The same model via Together is $0.88/1M in/out. The same model via DeepInfra is $0.27/1M input and $0.40/1M output. Claude Sonnet via Anthropic is $3/$15. GPT-5.5 via OpenAI is $1.25/$10.
If you statically pick one provider for everything, you are overpaying by 30 to 80 percent on most of your requests. If you route per-request, you can land on the cheapest provider that meets the latency and quality bar for that specific prompt. The smart-route layer is the thing that picks. It is a classifier plus a price-aware load balancer. It is the part of the business that is not commodity.
Who this is for
- Indie developers who already pay 3 or more AI services. Consolidate them. One bill from us, your existing keys stay with the upstream providers, your relationship with them stays in your name.
- Small teams sharing an OpenAI account across 4 engineers because team plans are expensive. Move the shared key under llmdeal Pro, set per-user budgets, get per-call telemetry, stop fighting about who burned the rate limit.
- Teams with EU compliance requirements who already have a Mistral or local-Ollama deployment for the sensitive prompts and want a router that can mix the sensitive lane with the convenience of Claude for the non-sensitive lane.
- Anyone tired of Cursor's 500-request cap who wants to keep using Cursor's editor experience but route the requests through their own keys.
Who this is not for
If you are signing up for AI for the first time, do not start here. The mental model of "bring your own keys" assumes you already have keys to bring. If you are starting from zero, the Hobbyist tier at $9 (which includes bundled tokens, no BYO required) is the right entry point. Get comfortable with the OpenAI-compatible endpoint, decide whether smart-routing meets your bar, then add your own keys later if you grow into it.
If you only use one provider and you are happy with their rate-limit ceiling, BYO does not save you anything. The unlock is consolidation. If there is nothing to consolidate, you should not pay for the consolidation layer.
The honest pitch
We are not the cheapest gateway. OpenRouter at 5.5% markup will undercut us on hosted tokens. We are not the most flexible. AIMLAPI carries more model SKUs.
We are the gateway that makes BYO the headline product, runs on crypto without KYC, gives you EU or US routing as a per-request toggle, and ships an OpenAI-compatible endpoint that drops into any SDK you already wrote code against. Those four things together describe a customer that does not exist on the other gateways, and that customer is who this product is for.
200K tokens, no card, no KYC. Add your existing Anthropic or OpenAI key during setup if you want to test the routing layer specifically.
Start the Free Trial →Related: the cost breakdown across 4 workloads, the migration guide from Cursor and OpenAI, and the comparison table against Claude Pro, Cursor, Copilot, ChatGPT Plus, and Codeium.