Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

RunInfra’s HTTP API follows OpenAI’s request and response shapes for the endpoints it supports. If your code is already written against the OpenAI SDK, you change the base URL and API key, then pass the model ID your RunInfra deployment serves.
Use the generated snippet in Settings > API Keys or the Deploy tab for the exact production base URL. New snippets default to https://api.runinfra.ai/v1. If your workspace has an API-domain alias, keep the generated value.

Two-line migration

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runinfra.ai/v1",   # <- change
    api_key="YOUR_RUNINFRA_API_KEY",           # <- change
)

response = client.chat.completions.create(
    model="your-model-id",
    messages=[{"role": "user", "content": "Hello"}],
)
Streaming, tools, structured outputs, async clients, and retries work unchanged because the request and response shapes match.

Endpoints supported today

OpenAI endpointSupportedNotes
POST /v1/chat/completionsYesStreaming, tools, response_format
POST /v1/responsesYesResponses API event model for compatible LLM and vision-language deployments
POST /v1/embeddingsYesSingle string or array input
POST /v1/images/generationsYesImage generation deployments return OpenAI-shaped image data
POST /v1/audio/transcriptionsYesMultipart file upload, Whisper-class models
POST /v1/audio/speechYesText-to-speech, XTTS / Bark / Qwen3-TTS
GET /v1/modelsYesLists verified deployed models in your workspace
Anything not in this table is not currently supported. That includes /v1/completions (legacy, non-chat), /v1/files, /v1/assistants, /v1/threads, /v1/batches.

What to change in your code

Three lines at most:
1

Change the base URL

https://api.openai.com/v1 to the RunInfra base URL shown in your dashboard snippet, usually https://api.runinfra.ai/v1.
2

Change the API key

Use your RunInfra API key from Settings > API Keys instead of an OpenAI sk-... key.
3

Change the model id

OpenAI model names (e.g. gpt-4o) won’t resolve on RunInfra. Pass the model id your RunInfra pipeline serves. GET /v1/models returns the list.

Response parity

For the supported endpoints, RunInfra returns the same JSON shapes OpenAI does:
  • Chat completions return id, object, created, model, choices[], usage.
  • Responses return OpenAI-shaped response events and non-streaming JSON when supported by the deployment.
  • Streaming deltas match OpenAI’s SSE format, terminated by data: [DONE].
  • Image generation returns OpenAI-shaped image data for verified image deployments.
  • Tool calls return tool_calls on the assistant message with function.name / function.arguments.
  • Structured output accepts response_format: { type: "json_object" } and response_format: { type: "json_schema", json_schema: {...} }.
  • Usage metadata follows the endpoint shape: token usage for LLMs, embedding counts for vector calls, and modality-native units for image and audio routes.

Libraries that work out of the box

Because the contract is OpenAI’s, anything that speaks OpenAI speaks RunInfra. The ones we’ve verified:

OpenAI SDKs

openai on Python and Node. The exact SDK OpenAI publishes.

RunInfra SDK

Native RunInfra helpers for scoped keys, pipeline IDs, typed errors, request IDs, audio, images, and webhook verification.

LangChain

ChatOpenAI(openai_api_base=..., openai_api_key=...)

LlamaIndex

OpenAI(api_base=..., api_key=...)

Vercel AI SDK

createOpenAI({ baseURL, apiKey })

Instructor

Wraps any OpenAI client; works unchanged.

curl / fetch

Plain HTTP works too. No SDK required.

Known differences

Only the endpoints in the table above are available. Features that depend on /v1/assistants, /v1/threads, or /v1/batches will not work until those surfaces ship.
  • OpenAI model names (gpt-4o, gpt-4.1-mini) don’t alias to RunInfra models. Pass the model id your deployment serves.
  • Model-specific sampling parameters (logit_bias, logprobs, seed determinism) depend on the serving backend behind your deployment.

Unsupported parameters

The following OpenAI parameters are either ignored or rejected when passed to RunInfra:
ParameterBehaviorWhy
service_tierIgnoredRunInfra has its own deployment-mode model (Flex / Active); no service tier concept
storeIgnoredRunInfra does not persist completions for replay; use the request id header to correlate logs
metadataEchoed back in the response unchangedSurfaced in audit logs (Team+); has no semantic effect
parallel_tool_calls=falseHonored by vLLM / SGLang backends; serving-backend dependentSome backends always parallel-call by default
prediction (speculative decoding hint)IgnoredRunInfra picks the draft model itself; see Speculation
audio (output format on chat)RejectedUse /v1/audio/speech for TTS instead
web_search, file_search, computer_use toolsRejectedThese are OpenAI-hosted tools, not function calls; bring your own implementation

Error code mapping

RunInfra returns OpenAI-shaped error envelopes ({ error: { message, type, code } }) and uses HTTP status codes consistently:
HTTPOpenAI error.typeWhen it fires
400invalid_request_errorSchema mismatch, missing field, unsupported parameter
401authentication_errorBad or missing API key
403permission_errorKey lacks access to the requested pipeline
404not_found_errorModel id does not exist or is not deployed
422invalid_request_errorInput too long, malformed JSON schema, bad image bytes
429rate_limit_errorPer-key budget exceeded; see Rate limits
500server_errorRunInfra internal error; retryable with exponential backoff
502server_errorUpstream serving backend transient failure; retryable
503server_errorAll replicas busy, queue full; respect Retry-After
The X-Request-Id response header carries a UUID you can quote when filing a support ticket. Always include it.

Fallback behavior

When an upstream serving backend (vLLM, SGLang, TRT-LLM) does not support a parameter the client sent:
  • Numeric out-of-range (e.g. top_p=1.5): clamped to the legal range, logged in the backend response header X-Param-Adjustments.
  • Unsupported feature (e.g. logit_bias on a backend that does not support it): the parameter is silently ignored, and X-Param-Adjustments lists what was dropped.
  • Unsupported model capability (e.g. tool calling on a non-instruction-tuned model): returns 400 invalid_request_error with a message naming the missing capability.
Set strict_params=true in the request body to escalate silent drops into 400s. This is the recommended setting for code that depends on a specific parameter being honored.

Next steps

API reference

Endpoint-by-endpoint parameters and response fields.

RunInfra SDK

Native SDK setup for optimized deployment access.

Tool calling cookbook

Function calling with OpenAI tool schemas.

Rate limits

Per-key limits and the Retry-After header.

Integrations

Framework-specific setup.