OpenAI-compatible APIs are now the de facto standard for LLM integration. Here is what that phrase actually means, how a provider switch works in practice, where compatibility leaks, and how to architect your app so the switch is always one line of code.
· llmdeal.me
The phrase is shorthand for: this service speaks the same HTTP request and response format as OpenAI's Chat Completions endpoint. That endpoint — POST /v1/chat/completions — accepts a JSON body with a model field and a messages array, and returns a choices array with the model's reply.
Because OpenAI published this shape early and attracted massive developer adoption, the rest of the industry converged on it. Anthropic, Google, Mistral, Cohere, and hundreds of inference providers now expose some form of it. Aggregator services like OpenRouter and llmdeal use it as their outward-facing interface. By 2026, over 80% of new AI API providers implement it, according to API aggregator data. The OpenAI Chat Completions format has become the HTTP of LLM integration — not a specification anyone governs, just a shape everyone copied until it became the default.
The practical consequence: if you write your app against this shape rather than against OpenAI specifically, you can point it at any compatible provider without changing your logic — only your base URL and API key change.
Vendor lock-in in LLM APIs is real and it has already bitten developers in at least three distinct ways.
Price hikes. OpenAI's GPT-4 launched in March 2023 at $30 per million input tokens / $60 per million output tokens. By mid-2024, GPT-4o launched at $5/$15 — a 4× drop. That sounds good, but the direction can reverse. Providers burning money hand over fist have started stabilising and raising prices as their investor patience runs out. If your unit economics are built around a specific provider at today's price and they raise it 40%, you need to be able to leave quickly.
Model deprecations. OpenAI has run forced migrations on at least five model versions since 2023. gpt-3.5-turbo-0613 sunset June 2024. gpt-4-32k sunset in 2024. gpt-4.5-preview deprecated April 2025. GPT-4o itself is scheduled for removal in February 2026. Each time, developers who had hardcoded model names and implicit assumptions about behaviour had to scramble. If you can swap providers cheaply, a deprecation is one afternoon of work instead of a crisis.
Outages. Single-provider dependency means a provider incident is your incident. If your code can be pointed at a backup in seconds, an outage becomes a minor ops event.
None of this is specific to OpenAI — it applies to every provider. The mitigation is the same regardless of where you start: write your integration so the provider is a config value, not a structural dependency.
The mechanics are simple. The OpenAI Python SDK and JavaScript SDK both accept a base_url parameter (or read from the OPENAI_BASE_URL environment variable). Change that value and the same SDK speaks to a different provider.
Python — before (OpenAI directly)
# pip install openai
from openai import OpenAI
client = OpenAI(
api_key="sk-..." # OpenAI key
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
Python — after (any OpenAI-compatible provider)
# Same package, nothing else changes
from openai import OpenAI
client = OpenAI(
base_url="https://api.llmdeal.me/v1", # ← only this line
api_key="your-llmdeal-key" # ← and this
)
response = client.chat.completions.create(
model="claude-sonnet-4-5", # model name changes too
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
JavaScript — same pattern
// npm install openai
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: process.env.LLM_BASE_URL, // env var — no code change on switch
apiKey: process.env.LLM_API_KEY,
});
const response = await client.chat.completions.create({
model: process.env.LLM_MODEL, // model also from env
messages: [{ role: 'user', content: 'Hello' }],
});
console.log(response.choices[0].message.content);
The JavaScript example above shows the fully portable pattern: all three provider-specific values (baseURL, apiKey, model) come from environment variables. Your application code is unchanged — switching providers is a deploy-time config change, not a code change.
Beyond the SDK call itself, a few structural choices determine whether swapping providers stays cheap or becomes expensive:
Isolate your LLM client in one place. If you initialise OpenAI() in twenty files, you have twenty places to update. Create one module — call it llm.js or llm_client.py — that instantiates the client from env vars and exports a single instance. Everything else imports from there.
Never hardcode model names in business logic. The model string is provider-specific. gpt-4o is meaningless to Anthropic; claude-sonnet-4-5 is meaningless to OpenAI. Keep a config object that maps logical roles (FAST_MODEL, QUALITY_MODEL) to concrete model strings, and populate those from env vars.
Avoid provider-specific features in the critical path. OpenAI has assistants threads, file search, and vision parameters that don't exist elsewhere. Features like structured JSON output using response_format: { type: "json_object" } exist in most providers but not all. If you use these, wrap them in a thin adapter layer that can be swapped out — don't scatter provider-specific logic through your prompt builders.
Log model and provider with every request. When you switch providers, you need to be able to tell from your logs whether behaviour differences are caused by the model, the provider, or your prompts. Log the full model identifier and the base URL per request. It costs almost nothing and saves hours of debugging.
The "just change the URL" story is true enough for basic completions. It breaks down at the edges. Here is where to watch out:
| Feature | Status across providers |
|---|---|
| Basic completions | Universally consistent — the core shape works everywhere |
| Streaming | Works on all major providers; minor delta formatting differences exist. Confirmed issues: Ollama with Gemma 4 drops tool_call chunks in streaming mode |
| Tool / function calling | Supported on most capable providers, but parser behaviour varies in streaming. Some providers only partially implement the tool schema; test before relying on it |
JSON mode (response_format) |
Common but not universal — some providers silently ignore it; others throw |
| Model names | Always provider-specific — there is no cross-provider model alias standard. You must remap model names when switching |
| Token counting | Reported in the same field (usage.prompt_tokens etc.) but tokenisers differ — the same prompt costs different tokens on different models |
| System prompt handling | Works universally; instruction-following strength varies significantly by model |
| Vision / multimodal | Uses the same content-parts format in theory; many providers don't support it at all |
| Provider-specific APIs | OpenAI Assistants, Anthropic Files API, etc. — not portable at all. Avoid in the critical path if portability matters |
Streaming tool-call behaviour sourced from open GitHub issues and provider documentation, accessed 2026-05-17 — see references.
The practical advice: treat basic completions + streaming as reliably portable. Treat tool calling as mostly portable — test it. Treat everything else as provider-specific until you've confirmed otherwise on your specific stack.
If you are doing this for real, here is the sequence that tends to work:
base_url, api_key, and model name strings into environment variablesThe real protection against any provider's price hikes, deprecations, or outages is not finding the perfect provider up front. It is making the switch cheap enough that it never matters which one you're on.
An app wired to one provider by base URL and API key — and nothing else — can be redirected in under an hour. An app that has embedded provider-specific abstractions, fine-tuning pipelines, and monitoring tied to a provider's dashboard faces a migration measured in engineering months. The difference is almost entirely architectural, and most of it can be made at the start for near-zero cost.
llmdeal is an OpenAI-compatible gateway — you point the OpenAI SDK at our endpoint and the switch is exactly the two lines shown above. But more importantly: because it is OpenAI-compatible, leaving is just as easy. No-lock-in is a property of the interface, not a promise from us. Build for that.
Technical claims verified against provider documentation and open-source issue trackers as of 2026-05-17. Provider compatibility behaviour evolves; test against your specific stack before relying on any particular feature across providers. Article published 2026-05-17.