Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

RunInfra exposes an OpenAI-compatible HTTP API for verified deployments. Point the OpenAI Python or JavaScript SDK, the native RunInfra SDK, or any OpenAI-compatible client like LangChain, LlamaIndex, or the Vercel AI SDK at the dashboard-generated base URL with a RunInfra API key.

Base URL

https://api.runinfra.ai/v1

Auth scheme

Authorization: Bearer YOUR_RUNINFRA_API_KEY

Key scopes

RunInfra supports two types of API keys, serving different integration shapes. Most customers should use workspace-scoped keys.

Workspace-scoped (recommended)

One key reaches verified deployed models in your workspace. The model field in the request body selects the target. Matches the OpenAI SDK convention exactly.

Pipeline-scoped

One key is bound to a single optimized pipeline. The pipeline ID sits in the URL path: /v1/{pipelineId}/chat/completions.
Create either at Settings > API Keys.

Supported endpoints

POST /v1/chat/completions

Chat completions with streaming, tools, and structured output.

POST /v1/responses

Responses API for compatible LLM and vision-language deployments.

POST /v1/embeddings

Vector embeddings for semantic search and RAG.

POST /v1/images/generations

Image generation from verified image deployments.

POST /v1/audio/speech

Text-to-speech. Binary audio response.

POST /v1/audio/transcriptions

Speech-to-text. Multipart audio upload.

GET /v1/models

List verified deployed models in your workspace.

Drop-in usage

from openai import OpenAI

# Workspace-scoped key reaches any verified model in your workspace.
client = OpenAI(
    base_url="https://api.runinfra.ai/v1",
    api_key="YOUR_RUNINFRA_API_KEY",
)

# List models
for model in client.models.list().data:
    print(model.id)

# Chat completion
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)
If your dashboard snippet shows a different production base URL, keep the generated value. Workspace-scoped keys use /api/v1; pipeline-scoped snippets may include /api/v1/{pipelineId}.

Authentication

Authorization
string
required
Bearer token. Keys start with rp_live_ and are hashed at rest with SHA-256. Rotation and expiration are built in. See Key management.
Never embed API keys in client-side code. They grant access to your workspace’s inference budget. Use a backend proxy for browser-originated requests, or a Vercel Edge Function / Cloudflare Worker as a thin gateway.

Rate limits by plan

Every API key carries a per-minute request budget. The default budget is set by your workspace’s plan tier; you can lower it when creating a key. The ceiling is the plan’s maximum. Pro cannot set a key beyond 1000/min, and Team cannot exceed 10,000/min.
PlanDefault (req/min)Ceiling (req/min)
StarterNot availableNot available
Pro5001,000
Team5,00010,000
Enterprise50,000100,000 (custom contracts override)
Responses include rate-limit metadata:
X-RateLimit-Limit: 500
X-RateLimit-Remaining: 498
X-RateLimit-Tier: pro
Retry-After: 12   (only on 429)
Plan upgrades take effect immediately. The gateway re-evaluates the ceiling on every authentication, so there is no need to rotate keys after upgrading.

Status codes

CodeMeaningAction
200SuccessRead the response body
400Malformed request / missing fieldCheck JSON shape and required fields
401Invalid or missing API keyRe-check the Bearer token
402Insufficient creditsTop up at Settings > Billing
403Plan tier blocks the featureUpgrade the workspace plan
404Model not deployed in workspaceGET /v1/models to list what is available
409Idempotency conflictReuse the same idempotency key only for the same request body
429Rate limit exceededBack off per Retry-After header
502Upstream GPU errorRetry once; otherwise check Deployments
503GPU worker unavailableRetry after 30s. This is usually a cold start.

SDK compatibility matrix

RunInfra TypeScript

npm install @runinfra/sdk. Native pipeline IDs, typed errors, request IDs, audio, images, and webhook helpers.

RunInfra Python

pip install runinfra. Native helpers for optimized deployment access.

OpenAI Python

pip install openai. Change base_url and api_key.

OpenAI JS/TS

npm i openai. Change baseURL and apiKey.

LangChain

ChatOpenAI(base_url=..., api_key=...).

LlamaIndex

OpenAI(api_base=..., api_key=...).

Vercel AI SDK

createOpenAI({ baseURL, apiKey }).

Instructor

Works out of the box over any OpenAI client.
A full OpenAPI 3 spec ships at https://api.runinfra.ai/v1/openapi.json for codegen and integration-testing tools.

Ready to build?

Start building on RunInfra

Free tier, no credit card. 3 pipelines and 3 optimization sessions per month.