RunInfra’s HTTP API follows OpenAI’s request and response shapes for the endpoints it supports. If your code is already written against the OpenAI SDK, you change the base URL and API key, then pass the model ID your RunInfra deployment serves.Documentation Index
Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Use the generated snippet in Settings > API Keys or the Deploy tab for the exact production base URL. New snippets default to
https://api.runinfra.ai/v1. If your workspace has an API-domain alias, keep the generated value.Two-line migration
Endpoints supported today
| OpenAI endpoint | Supported | Notes |
|---|---|---|
POST /v1/chat/completions | Yes | Streaming, tools, response_format |
POST /v1/responses | Yes | Responses API event model for compatible LLM and vision-language deployments |
POST /v1/embeddings | Yes | Single string or array input |
POST /v1/images/generations | Yes | Image generation deployments return OpenAI-shaped image data |
POST /v1/audio/transcriptions | Yes | Multipart file upload, Whisper-class models |
POST /v1/audio/speech | Yes | Text-to-speech, XTTS / Bark / Qwen3-TTS |
GET /v1/models | Yes | Lists verified deployed models in your workspace |
/v1/completions (legacy, non-chat), /v1/files, /v1/assistants, /v1/threads, /v1/batches.
What to change in your code
Three lines at most:Change the base URL
https://api.openai.com/v1 to the RunInfra base URL shown in your dashboard snippet, usually https://api.runinfra.ai/v1.Change the API key
Use your RunInfra API key from Settings > API Keys instead of an OpenAI
sk-... key.Response parity
For the supported endpoints, RunInfra returns the same JSON shapes OpenAI does:- Chat completions return
id,object,created,model,choices[],usage. - Responses return OpenAI-shaped response events and non-streaming JSON when supported by the deployment.
- Streaming deltas match OpenAI’s SSE format, terminated by
data: [DONE]. - Image generation returns OpenAI-shaped image data for verified image deployments.
- Tool calls return
tool_callson the assistant message withfunction.name/function.arguments. - Structured output accepts
response_format: { type: "json_object" }andresponse_format: { type: "json_schema", json_schema: {...} }. - Usage metadata follows the endpoint shape: token usage for LLMs, embedding counts for vector calls, and modality-native units for image and audio routes.
Libraries that work out of the box
Because the contract is OpenAI’s, anything that speaks OpenAI speaks RunInfra. The ones we’ve verified:OpenAI SDKs
openai on Python and Node. The exact SDK OpenAI publishes.RunInfra SDK
Native RunInfra helpers for scoped keys, pipeline IDs, typed errors, request IDs, audio, images, and webhook verification.
LangChain
ChatOpenAI(openai_api_base=..., openai_api_key=...)LlamaIndex
OpenAI(api_base=..., api_key=...)Vercel AI SDK
createOpenAI({ baseURL, apiKey })Instructor
Wraps any OpenAI client; works unchanged.
curl / fetch
Plain HTTP works too. No SDK required.
Known differences
- OpenAI model names (
gpt-4o,gpt-4.1-mini) don’t alias to RunInfra models. Pass the model id your deployment serves. - Model-specific sampling parameters (
logit_bias,logprobs,seeddeterminism) depend on the serving backend behind your deployment.
Unsupported parameters
The following OpenAI parameters are either ignored or rejected when passed to RunInfra:| Parameter | Behavior | Why |
|---|---|---|
service_tier | Ignored | RunInfra has its own deployment-mode model (Flex / Active); no service tier concept |
store | Ignored | RunInfra does not persist completions for replay; use the request id header to correlate logs |
metadata | Echoed back in the response unchanged | Surfaced in audit logs (Team+); has no semantic effect |
parallel_tool_calls=false | Honored by vLLM / SGLang backends; serving-backend dependent | Some backends always parallel-call by default |
prediction (speculative decoding hint) | Ignored | RunInfra picks the draft model itself; see Speculation |
audio (output format on chat) | Rejected | Use /v1/audio/speech for TTS instead |
web_search, file_search, computer_use tools | Rejected | These are OpenAI-hosted tools, not function calls; bring your own implementation |
Error code mapping
RunInfra returns OpenAI-shaped error envelopes ({ error: { message, type, code } }) and uses HTTP status codes consistently:
| HTTP | OpenAI error.type | When it fires |
|---|---|---|
| 400 | invalid_request_error | Schema mismatch, missing field, unsupported parameter |
| 401 | authentication_error | Bad or missing API key |
| 403 | permission_error | Key lacks access to the requested pipeline |
| 404 | not_found_error | Model id does not exist or is not deployed |
| 422 | invalid_request_error | Input too long, malformed JSON schema, bad image bytes |
| 429 | rate_limit_error | Per-key budget exceeded; see Rate limits |
| 500 | server_error | RunInfra internal error; retryable with exponential backoff |
| 502 | server_error | Upstream serving backend transient failure; retryable |
| 503 | server_error | All replicas busy, queue full; respect Retry-After |
X-Request-Id response header carries a UUID you can quote when filing a support ticket. Always include it.
Fallback behavior
When an upstream serving backend (vLLM, SGLang, TRT-LLM) does not support a parameter the client sent:- Numeric out-of-range (e.g.
top_p=1.5): clamped to the legal range, logged in the backend response headerX-Param-Adjustments. - Unsupported feature (e.g.
logit_biason a backend that does not support it): the parameter is silently ignored, andX-Param-Adjustmentslists what was dropped. - Unsupported model capability (e.g. tool calling on a non-instruction-tuned model): returns 400
invalid_request_errorwith a message naming the missing capability.
strict_params=true in the request body to escalate silent drops into 400s. This is the recommended setting for code that depends on a specific parameter being honored.
Next steps
API reference
Endpoint-by-endpoint parameters and response fields.
RunInfra SDK
Native SDK setup for optimized deployment access.
Tool calling cookbook
Function calling with OpenAI tool schemas.
Rate limits
Per-key limits and the
Retry-After header.Integrations
Framework-specific setup.