Engineering · 8 min read

How to switch LLM providers without rewriting your app

OpenAI-compatible APIs are now the de facto standard for LLM integration. Here is what that phrase actually means, how a provider switch works in practice, where compatibility leaks, and how to architect your app so the switch is always one line of code.

· llmdeal.me

What "OpenAI-compatible API" actually means

The phrase is shorthand for: this service speaks the same HTTP request and response format as OpenAI's Chat Completions endpoint. That endpoint — POST /v1/chat/completions — accepts a JSON body with a model field and a messages array, and returns a choices array with the model's reply.

Because OpenAI published this shape early and attracted massive developer adoption, the rest of the industry converged on it. Anthropic, Google, Mistral, Cohere, and hundreds of inference providers now expose some form of it. Aggregator services like OpenRouter and llmdeal use it as their outward-facing interface. By 2026, over 80% of new AI API providers implement it, according to API aggregator data. The OpenAI Chat Completions format has become the HTTP of LLM integration — not a specification anyone governs, just a shape everyone copied until it became the default.

The practical consequence: if you write your app against this shape rather than against OpenAI specifically, you can point it at any compatible provider without changing your logic — only your base URL and API key change.

Why it matters: the lock-in trap

Vendor lock-in in LLM APIs is real and it has already bitten developers in at least three distinct ways.

Price hikes. OpenAI's GPT-4 launched in March 2023 at $30 per million input tokens / $60 per million output tokens. By mid-2024, GPT-4o launched at $5/$15 — a 4× drop. That sounds good, but the direction can reverse. Providers burning money hand over fist have started stabilising and raising prices as their investor patience runs out. If your unit economics are built around a specific provider at today's price and they raise it 40%, you need to be able to leave quickly.

Model deprecations. OpenAI has run forced migrations on at least five model versions since 2023. gpt-3.5-turbo-0613 sunset June 2024. gpt-4-32k sunset in 2024. gpt-4.5-preview deprecated April 2025. GPT-4o itself is scheduled for removal in February 2026. Each time, developers who had hardcoded model names and implicit assumptions about behaviour had to scramble. If you can swap providers cheaply, a deprecation is one afternoon of work instead of a crisis.

Outages. Single-provider dependency means a provider incident is your incident. If your code can be pointed at a backup in seconds, an outage becomes a minor ops event.

None of this is specific to OpenAI — it applies to every provider. The mitigation is the same regardless of where you start: write your integration so the provider is a config value, not a structural dependency.

How a switch actually works in practice

The mechanics are simple. The OpenAI Python SDK and JavaScript SDK both accept a base_url parameter (or read from the OPENAI_BASE_URL environment variable). Change that value and the same SDK speaks to a different provider.

Python — before (OpenAI directly)

# pip install openai
from openai import OpenAI

client = OpenAI(
    api_key="sk-..."      # OpenAI key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Python — after (any OpenAI-compatible provider)

# Same package, nothing else changes
from openai import OpenAI

client = OpenAI(
    base_url="https://api.llmdeal.me/v1",   # ← only this line
    api_key="your-llmdeal-key"             # ← and this
)

response = client.chat.completions.create(
    model="claude-sonnet-4-5",            # model name changes too
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

JavaScript — same pattern

// npm install openai
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: process.env.LLM_BASE_URL,   // env var — no code change on switch
  apiKey:  process.env.LLM_API_KEY,
});

const response = await client.chat.completions.create({
  model:    process.env.LLM_MODEL,      // model also from env
  messages: [{ role: 'user', content: 'Hello' }],
});
console.log(response.choices[0].message.content);

The JavaScript example above shows the fully portable pattern: all three provider-specific values (baseURL, apiKey, model) come from environment variables. Your application code is unchanged — switching providers is a deploy-time config change, not a code change.

Architecting for provider portability

Beyond the SDK call itself, a few structural choices determine whether swapping providers stays cheap or becomes expensive:

Isolate your LLM client in one place. If you initialise OpenAI() in twenty files, you have twenty places to update. Create one module — call it llm.js or llm_client.py — that instantiates the client from env vars and exports a single instance. Everything else imports from there.

Never hardcode model names in business logic. The model string is provider-specific. gpt-4o is meaningless to Anthropic; claude-sonnet-4-5 is meaningless to OpenAI. Keep a config object that maps logical roles (FAST_MODEL, QUALITY_MODEL) to concrete model strings, and populate those from env vars.

Avoid provider-specific features in the critical path. OpenAI has assistants threads, file search, and vision parameters that don't exist elsewhere. Features like structured JSON output using response_format: { type: "json_object" } exist in most providers but not all. If you use these, wrap them in a thin adapter layer that can be swapped out — don't scatter provider-specific logic through your prompt builders.

Log model and provider with every request. When you switch providers, you need to be able to tell from your logs whether behaviour differences are caused by the model, the provider, or your prompts. Log the full model identifier and the base URL per request. It costs almost nothing and saves hours of debugging.

Where compatibility leaks — honest caveats

The "just change the URL" story is true enough for basic completions. It breaks down at the edges. Here is where to watch out:

Feature Status across providers
Basic completions Universally consistent — the core shape works everywhere
Streaming Works on all major providers; minor delta formatting differences exist. Confirmed issues: Ollama with Gemma 4 drops tool_call chunks in streaming mode
Tool / function calling Supported on most capable providers, but parser behaviour varies in streaming. Some providers only partially implement the tool schema; test before relying on it
JSON mode (response_format) Common but not universal — some providers silently ignore it; others throw
Model names Always provider-specific — there is no cross-provider model alias standard. You must remap model names when switching
Token counting Reported in the same field (usage.prompt_tokens etc.) but tokenisers differ — the same prompt costs different tokens on different models
System prompt handling Works universally; instruction-following strength varies significantly by model
Vision / multimodal Uses the same content-parts format in theory; many providers don't support it at all
Provider-specific APIs OpenAI Assistants, Anthropic Files API, etc. — not portable at all. Avoid in the critical path if portability matters

Streaming tool-call behaviour sourced from open GitHub issues and provider documentation, accessed 2026-05-17 — see references.

The practical advice: treat basic completions + streaming as reliably portable. Treat tool calling as mostly portable — test it. Treat everything else as provider-specific until you've confirmed otherwise on your specific stack.

A practical migration checklist

If you are doing this for real, here is the sequence that tends to work:

  1. Centralise your LLM client initialisation into a single module
  2. Move base_url, api_key, and model name strings into environment variables
  3. Run your existing test suite against the new provider — note any failures
  4. Check tool calling and JSON mode explicitly if you use them
  5. Map your logical model roles to the new provider's model names in a config file
  6. Shadow-traffic a small percentage of real requests to the new provider, compare outputs
  7. Monitor token usage for the first week — tokenisers differ and your cost estimates may need revising

The one thing to get right

The real protection against any provider's price hikes, deprecations, or outages is not finding the perfect provider up front. It is making the switch cheap enough that it never matters which one you're on.

An app wired to one provider by base URL and API key — and nothing else — can be redirected in under an hour. An app that has embedded provider-specific abstractions, fine-tuning pipelines, and monitoring tied to a provider's dashboard faces a migration measured in engineering months. The difference is almost entirely architectural, and most of it can be made at the start for near-zero cost.

llmdeal is an OpenAI-compatible gateway — you point the OpenAI SDK at our endpoint and the switch is exactly the two lines shown above. But more importantly: because it is OpenAI-compatible, leaving is just as easy. No-lock-in is a property of the interface, not a promise from us. Build for that.

Preorder credits   Read the docs

References

  1. OpenAI — Chat Completions API reference — developers.openai.com — accessed 2026-05-17
  2. OpenAI — Model deprecations list, including gpt-3.5-turbo-0613 (sunset June 2024), gpt-4-32k (sunset 2024), gpt-4.5-preview (deprecated April 2025), gpt-4o (scheduled February 2026) — developers.openai.com/api/docs/deprecations — accessed 2026-05-17
  3. TokenMix — "OpenAI-Compatible API Guide 2026" — notes 80%+ of new AI API providers implement OpenAI SDK compatibility — tokenmix.ai — accessed 2026-05-17
  4. Portkey — OpenAI Model Deprecation Guide — historical pricing and migration timeline — portkey.ai — accessed 2026-05-17
  5. Remio — "OpenAI Retiring GPT-4o, GPT-4.1, and o4-mini: The 2026 Transition Guide" — GPT-4o removal date February 2026 — remio.ai — accessed 2026-05-17
  6. GitHub issue — Gemma 4 tool calling fails via Ollama OpenAI-compatible API in streaming mode — github.com/anomalyco/opencode/issues/20995 — accessed 2026-05-17
  7. The Register — "Locked, stocked, and losing budget: AI vendor lock-in bites" — structural lock-in beyond API calls — theregister.com — accessed 2026-05-17
  8. OpenAI — GPT-4 API general availability and legacy Completions deprecation announcement — openai.com — accessed 2026-05-17
  9. Medium / Percolation Labs — "Comparing the streaming response structure for different LLM APIs" — medium.com/percolation-labs — accessed 2026-05-17

Technical claims verified against provider documentation and open-source issue trackers as of 2026-05-17. Provider compatibility behaviour evolves; test against your specific stack before relying on any particular feature across providers. Article published 2026-05-17.