RunInfra exposes an OpenAI-compatible HTTP API for verified deployments. Point the OpenAI Python or JavaScript SDK, the native RunInfra SDK, or any OpenAI-compatible client like LangChain, LlamaIndex, or the Vercel AI SDK at the dashboard-generated base URL with a RunInfra API key.Documentation Index
Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Base URL
https://api.runinfra.ai/v1Auth scheme
Authorization: Bearer YOUR_RUNINFRA_API_KEYKey scopes
RunInfra supports two types of API keys, serving different integration shapes. Most customers should use workspace-scoped keys.Workspace-scoped (recommended)
One key reaches verified deployed models in your workspace. The
model field in the request body selects the target. Matches the OpenAI SDK convention exactly.Pipeline-scoped
One key is bound to a single optimized pipeline. The pipeline ID sits in the URL path:
/v1/{pipelineId}/chat/completions.Supported endpoints
POST /v1/chat/completions
Chat completions with streaming, tools, and structured output.
POST /v1/responses
Responses API for compatible LLM and vision-language deployments.
POST /v1/embeddings
Vector embeddings for semantic search and RAG.
POST /v1/images/generations
Image generation from verified image deployments.
POST /v1/audio/speech
Text-to-speech. Binary audio response.
POST /v1/audio/transcriptions
Speech-to-text. Multipart audio upload.
GET /v1/models
List verified deployed models in your workspace.
Drop-in usage
If your dashboard snippet shows a different production base URL, keep the generated value. Workspace-scoped keys use
/api/v1; pipeline-scoped snippets may include /api/v1/{pipelineId}.Authentication
Bearer token. Keys start with
rp_live_ and are hashed at rest with SHA-256. Rotation and expiration are built in. See Key management.Rate limits by plan
Every API key carries a per-minute request budget. The default budget is set by your workspace’s plan tier; you can lower it when creating a key. The ceiling is the plan’s maximum. Pro cannot set a key beyond 1000/min, and Team cannot exceed 10,000/min.| Plan | Default (req/min) | Ceiling (req/min) |
|---|---|---|
| Starter | Not available | Not available |
| Pro | 500 | 1,000 |
| Team | 5,000 | 10,000 |
| Enterprise | 50,000 | 100,000 (custom contracts override) |
Status codes
| Code | Meaning | Action |
|---|---|---|
| 200 | Success | Read the response body |
| 400 | Malformed request / missing field | Check JSON shape and required fields |
| 401 | Invalid or missing API key | Re-check the Bearer token |
| 402 | Insufficient credits | Top up at Settings > Billing |
| 403 | Plan tier blocks the feature | Upgrade the workspace plan |
| 404 | Model not deployed in workspace | GET /v1/models to list what is available |
| 409 | Idempotency conflict | Reuse the same idempotency key only for the same request body |
| 429 | Rate limit exceeded | Back off per Retry-After header |
| 502 | Upstream GPU error | Retry once; otherwise check Deployments |
| 503 | GPU worker unavailable | Retry after 30s. This is usually a cold start. |
SDK compatibility matrix
RunInfra TypeScript
npm install @runinfra/sdk. Native pipeline IDs, typed errors, request IDs, audio, images, and webhook helpers.RunInfra Python
pip install runinfra. Native helpers for optimized deployment access.OpenAI Python
pip install openai. Change base_url and api_key.OpenAI JS/TS
npm i openai. Change baseURL and apiKey.LangChain
ChatOpenAI(base_url=..., api_key=...).LlamaIndex
OpenAI(api_base=..., api_key=...).Vercel AI SDK
createOpenAI({ baseURL, apiKey }).Instructor
Works out of the box over any OpenAI client.
A full OpenAPI 3 spec ships at
https://api.runinfra.ai/v1/openapi.json for codegen and integration-testing tools.Ready to build?
Start building on RunInfra
Free tier, no credit card. 3 pipelines and 3 optimization sessions per month.