Architecture · 5 min read

BYO keys is the real product.

Every hosted AI gateway runs at roughly 95 percent gross margin. We do too. Here is why we made that the headline feature instead of hiding it, and how the smart-route layer turns three provider accounts into one OpenAI-compatible endpoint.

· llmdeal.me

The 95% gross margin nobody talks about

If you have ever wondered how OpenRouter, AIMLAPI, Bytez, or any of the dozens of AI gateways stay in business while charging "only" 5 to 60 percent over the upstream rate, the answer is that the spread is 5 to 60 percent on top of 95 percent gross margin on the upstream side. The upstream wholesale GPU providers (Groq, Cerebras, NVIDIA NIM, Together, Hyperbolic) are themselves running at 70 to 85 percent gross margin against bare-metal H100 cost. The "reseller markup" you see is real, but the upstream margin is the bigger number.

This is not a critique. It is the structure of the business. Inference is a high-fixed-cost, high-variable-margin product and the entire stack is priced accordingly. Every hosted endpoint you talk to is a margin stack and every layer of that stack is taking a cut.

The trick is that the marginal cost of routing a request you brought from your own Anthropic key is zero. There is no token cost to llmdeal because the tokens are billed against your key. There is a tiny compute cost for the routing decision and the proxy hop, which is what the $19/mo BYO Keys add-on covers (or which is included free on the $100/mo and up tiers).

The hosted gateways all want to sell you tokens at a markup. We will too, if you want bundled tokens. But we will also let you bring your own keys and pay us only for the routing layer. That is the part that is actually new.

What "bring your own keys" actually means

You sign up for llmdeal at any tier. In the dashboard, you paste in:

You leave any of those blank if you don't have them. Then in your code, you call llmdeal's OpenAI-compatible endpoint exactly as you would call OpenAI's:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.llmdeal.me/v1",
    api_key="ll_live_..."
)

response = client.chat.completions.create(
    model="smart-route",
    messages=[{"role": "user", "content": "Refactor this function..."}]
)

When that request hits llmdeal, the router looks at the prompt shape, the model you requested (or "smart-route" if you let us pick), and decides which lane to use. It might decide:

The token cost lands on whichever upstream key handled the request. llmdeal bills you only for the subscription tier (which covers the routing layer, the dashboard, the per-key budgets, and the per-request cost telemetry).

The walkthrough

Here is what a customer typically does on day 1:

  1. Sign up for llmdeal Pro at $150/mo. Pay in BTC. Get an API key in the dashboard.
  2. Open the BYO Keys panel. Paste in your Anthropic key (you already had one with $40 of unused credit). Paste in your OpenAI key. Paste in your OpenRouter key.
  3. Set per-key monthly budgets so the router will not blow past them: "Anthropic, max $60/mo. OpenAI, max $40/mo. OpenRouter, max $25/mo. After that, fall back to llmdeal's hosted Llama 3.3 70B."
  4. Change your code's base URL from https://api.anthropic.com (or https://api.openai.com/v1) to https://api.llmdeal.me/v1. Change the model name to smart-route.
  5. Start sending requests. The dashboard shows you, per request, which upstream lane was chosen and what it cost.

You stop having three separate billing relationships. You stop having three separate dashboards. You stop having three separate rate-limit pools to remember. You have one OpenAI-compatible endpoint, one dashboard, one bill from us for the routing layer, and N provider relationships consolidated into a single per-request decision.

Why the smart-route layer matters

Token pricing across providers is now wildly heterogeneous. Llama 3.3 70B via Groq costs you roughly $0.59/1M input and $0.79/1M output. The same model via Together is $0.88/1M in/out. The same model via DeepInfra is $0.27/1M input and $0.40/1M output. Claude Sonnet via Anthropic is $3/$15. GPT-5.5 via OpenAI is $1.25/$10.

If you statically pick one provider for everything, you are overpaying by 30 to 80 percent on most of your requests. If you route per-request, you can land on the cheapest provider that meets the latency and quality bar for that specific prompt. The smart-route layer is the thing that picks. It is a classifier plus a price-aware load balancer. It is the part of the business that is not commodity.

Who this is for

Who this is not for

If you are signing up for AI for the first time, do not start here. The mental model of "bring your own keys" assumes you already have keys to bring. If you are starting from zero, the Hobbyist tier at $9 (which includes bundled tokens, no BYO required) is the right entry point. Get comfortable with the OpenAI-compatible endpoint, decide whether smart-routing meets your bar, then add your own keys later if you grow into it.

If you only use one provider and you are happy with their rate-limit ceiling, BYO does not save you anything. The unlock is consolidation. If there is nothing to consolidate, you should not pay for the consolidation layer.

The honest pitch

We are not the cheapest gateway. OpenRouter at 5.5% markup will undercut us on hosted tokens. We are not the most flexible. AIMLAPI carries more model SKUs.

We are the gateway that makes BYO the headline product, runs on crypto without KYC, gives you EU or US routing as a per-request toggle, and ships an OpenAI-compatible endpoint that drops into any SDK you already wrote code against. Those four things together describe a customer that does not exist on the other gateways, and that customer is who this product is for.

Try BYO Keys on the Free Trial

200K tokens, no card, no KYC. Add your existing Anthropic or OpenAI key during setup if you want to test the routing layer specifically.

Start the Free Trial →

Related: the cost breakdown across 4 workloads, the migration guide from Cursor and OpenAI, and the comparison table against Claude Pro, Cursor, Copilot, ChatGPT Plus, and Codeium.