OpenAI compatibility

RunInfra’s HTTP API follows OpenAI’s request and response shapes for the endpoints it supports. If your code is already written against the OpenAI SDK, you change the base URL and API key, then pass the model ID your RunInfra deployment serves.

Use the generated snippet in Settings > API Keys or the Deploy tab for the exact production base URL. New snippets default to https://api.runinfra.ai/v1. If your workspace has an API-domain alias, keep the generated value.

Two-line migration

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runinfra.ai/v1",   # <- change
    api_key="YOUR_RUNINFRA_API_KEY",           # <- change
)

response = client.chat.completions.create(
    model="your-model-id",
    messages=[{"role": "user", "content": "Hello"}],
)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.runinfra.ai/v1",   // <- change
  apiKey: "YOUR_RUNINFRA_API_KEY",          // <- change
});

const response = await client.chat.completions.create({
  model: "your-model-id",
  messages: [{ role: "user", content: "Hello" }],
});

curl https://api.runinfra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_RUNINFRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"your-model-id","messages":[{"role":"user","content":"Hello"}]}'

Streaming, tools, structured outputs, and async OpenAI clients use the same request and response shapes where the selected deployment supports those features. Retry behavior still belongs to your client configuration, do not blindly retry charge-bearing requests after a partial stream or binary response may have reached your app.

Endpoints supported today

OpenAI endpoint	Supported	Notes
`POST /v1/chat/completions`	Yes	Streaming, tools, `response_format`
`POST /v1/responses`	Yes	Chat-completions compatibility adapter for compatible LLM and vision-language deployments
`POST /v1/embeddings`	Yes	Single string or array input
`POST /v1/images/generations`	Yes	Image generation deployments return OpenAI-shaped image data
`POST /v1/audio/transcriptions`	Yes	Multipart file upload, Whisper-class models
`POST /v1/audio/speech`	Yes	Text-to-speech, XTTS / Bark / Qwen3-TTS
`GET /v1/models`	Yes	Lists verified deployed models in your workspace

Anything not in this table is not currently supported. That includes /v1/completions (legacy, non-chat), /v1/files, /v1/assistants, /v1/threads, /v1/batches.

What to change in your code

Three lines at most:

Change the base URL

https://api.openai.com/v1 to the RunInfra base URL shown in your dashboard snippet, usually https://api.runinfra.ai/v1.

Change the API key

Use your RunInfra API key from Settings > API Keys instead of an OpenAI sk-... key.

Change the model id

OpenAI model names (e.g. gpt-4o) won’t resolve on RunInfra. Pass the model id your RunInfra pipeline serves. GET /v1/models returns the list.

Response parity

For the supported endpoints, RunInfra returns the same JSON shapes OpenAI does:

Chat completions return id, object, created, model, choices[], usage.
Responses return OpenAI-shaped response events and non-streaming JSON for the supported adapter fields.
Streaming deltas match OpenAI’s SSE format, terminated by data: [DONE].
Image generation returns OpenAI-shaped image data for verified image deployments.
Tool calls return tool_calls on the assistant message with function.name / function.arguments.
Structured output accepts response_format: { type: "json_object" } and response_format: { type: "json_schema", json_schema: {...} }.
Usage metadata follows the endpoint shape: token usage for LLMs, embedding counts for vector calls, and modality-native units for image and audio routes.

Verified libraries and clients

These clients are verified with RunInfra’s supported OpenAI-shaped endpoints:

OpenAI SDKs

openai on Python and Node. The exact SDK OpenAI publishes.

RunInfra SDK

Native RunInfra helpers for scoped keys, pipeline IDs, typed errors, request IDs, audio, images, and webhook verification.

LangChain

ChatOpenAI(openai_api_base=..., openai_api_key=...)

LlamaIndex

OpenAI(api_base=..., api_key=...)

Vercel AI SDK

createOpenAICompatible({ baseURL, apiKey })

Instructor

Use it through the OpenAI Python client for supported structured-output chat flows.

curl / fetch

Plain HTTP works too. No SDK required.

Known differences

Only the endpoints in the table above are available. Features that depend on /v1/assistants, /v1/threads, or /v1/batches will not work until those surfaces ship.

OpenAI model names (gpt-4o, gpt-4.1-mini) don’t alias to RunInfra models. Pass the model id your deployment serves.
/v1/responses is a chat-completions compatibility adapter. It does not implement state, include, reasoning, hosted tools, conversation-item, or background-job semantics.
Model-specific sampling parameters (logit_bias, logprobs, seed determinism) depend on the serving backend behind your deployment.

Unsupported parameters

The following OpenAI parameters are either ignored or rejected when passed to RunInfra:

Parameter	Behavior	Why
`service_tier`	Ignored	RunInfra has its own deployment-mode model (Flex / Active); no service tier concept
`store`	Ignored	RunInfra does not persist completions for replay; use the request id header to correlate logs
`metadata`	Echoed back in the response unchanged	Surfaced in audit logs (Enterprise); has no semantic effect
`parallel_tool_calls=false`	Honored by vLLM / SGLang backends; serving-backend dependent	Some backends always parallel-call by default
`prediction` (speculative decoding hint)	Ignored	RunInfra picks the draft model itself; see Speculation
`audio` (output format on chat)	Rejected	Use `/v1/audio/speech` for TTS instead
`web_search`, `file_search`, `computer_use` tools	Rejected	These are OpenAI-hosted tools, not function calls; bring your own implementation

Error code mapping

RunInfra returns OpenAI-shaped error envelopes ({ error: { message, type, code } }) and uses HTTP status codes consistently:

HTTP	OpenAI `error.type`	When it fires
400	`invalid_request_error`	Schema mismatch, missing field, unsupported parameter
401	`authentication_error`	Bad or missing API key
403	`permission_error`	Key lacks access to the requested pipeline
404	`not_found_error`	Model id does not exist or is not deployed
422	`invalid_request_error`	Input too long, malformed JSON schema, bad image bytes, or idempotent chat/Responses replay unavailable
429	`rate_limit_error`	Per-key budget exceeded; see Rate limits
500	`server_error`	RunInfra internal error; retryable with exponential backoff
502	`server_error`	Upstream serving backend transient failure; retryable
503	`server_error`	All replicas busy, queue full; respect `Retry-After`

The X-Request-Id response header carries a UUID you can quote when filing a support ticket. Always include it.

Fallback behavior

When an upstream serving backend (vLLM, SGLang, TRT-LLM) does not support a parameter the client sent:

Numeric out-of-range (e.g. top_p=1.5): clamped to the legal range, logged in the backend response header X-Param-Adjustments.
Unsupported feature (e.g. logit_bias on a backend that does not support it): the parameter may be ignored by the serving backend, and X-Param-Adjustments lists what was dropped when the gateway can detect it.
Unsupported model capability (e.g. tool calling on a non-instruction-tuned model): returns 400 invalid_request_error with a message naming the missing capability.

Next steps

API reference

Endpoint-by-endpoint parameters and response fields.

RunInfra SDK

Native SDK setup for optimized deployment access.

Tool calling cookbook

Function calling with OpenAI tool schemas.

Rate limits

Per-key limits and the Retry-After header.

Integrations

Framework-specific setup.

​Two-line migration

​Endpoints supported today

​What to change in your code

​Response parity

​Verified libraries and clients

OpenAI SDKs

RunInfra SDK

LangChain

LlamaIndex

Vercel AI SDK

Instructor

curl / fetch

​Known differences

​Unsupported parameters

​Error code mapping

​Fallback behavior

​Next steps

API reference

RunInfra SDK

Tool calling cookbook

Rate limits

Integrations

Two-line migration

Endpoints supported today

What to change in your code

Response parity

Verified libraries and clients

Known differences

Unsupported parameters

Error code mapping

Fallback behavior

Next steps