API reference - RunInfra

RunInfra exposes an OpenAI-compatible HTTP API for verified deployments. Point the OpenAI Python or JavaScript SDK, the native RunInfra SDK, or documented integrations such as LangChain, LlamaIndex, and the Vercel AI SDK at the dashboard-generated base URL with a RunInfra API key.

Base URL

https://api.runinfra.ai/v1

Auth scheme

Authorization: Bearer YOUR_RUNINFRA_API_KEY

Key scopes

RunInfra supports two types of API keys, serving different integration shapes. Most customers should use workspace-scoped keys.

Workspace-scoped (recommended)

One key reaches verified deployed models in your workspace. The model field in the request body selects the target through the OpenAI SDK base URL pattern.

Pipeline-scoped

One key is bound to a single optimized pipeline. The pipeline ID sits in the URL path: /v1/{pipelineId}/chat/completions.

Create either at Settings > API Keys.

Supported endpoints

POST /v1/chat/completions

Chat completions with streaming, tools, and structured output.

POST /v1/responses

Responses-shaped adapter over compatible chat-completions deployments.

POST /v1/embeddings

Vector embeddings for semantic search and RAG.

POST /v1/rerank

Text reranking for TEI deployments and multimodal document reranking for compatible vLLM vision rerank deployments.

POST /v1/images/generations

Image generation from verified image deployments.

POST /v1/audio/speech

Text-to-speech. Binary audio response.

POST /v1/audio/transcriptions

Speech-to-text. Multipart audio upload.

GET /v1/models

List verified deployed models in your workspace.

Drop-in usage

from openai import OpenAI

# Workspace-scoped key reaches any verified model in your workspace.
client = OpenAI(
    base_url="https://api.runinfra.ai/v1",
    api_key="YOUR_RUNINFRA_API_KEY",
)

# List models
for model in client.models.list().data:
    print(model.id)

# Chat completion
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.runinfra.ai/v1",
  apiKey: "YOUR_RUNINFRA_API_KEY",
});

// List models
const models = await client.models.list();
console.log(models.data.map((m) => m.id));

// Chat completion
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);

curl https://api.runinfra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_RUNINFRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

If your dashboard snippet shows a different production base URL, keep the generated value. Workspace-scoped keys use /v1; pipeline-scoped snippets may include /v1/{pipelineId}.

Authentication

Authorization

string

required

Bearer token. Keys start with rp_live_ and are hashed at rest with SHA-256. Rotation and expiration are built in. See Key management.

Never embed API keys in client-side code. They grant access to your workspace’s inference budget. Use a backend proxy for browser-originated requests, or a Vercel Edge Function / Cloudflare Worker as a thin gateway.

Rate limits by plan

Every API key carries a per-minute request budget. The default budget is set by your workspace’s plan; you can lower it when creating a key. The ceiling is the plan’s maximum. Core keys default to 5,000/min and can be raised up to 10,000/min; Enterprise ceilings are set by contract.

Plan	Default (req/min)	Ceiling (req/min)
Core	5,000	10,000
Enterprise	50,000	100,000 (custom contracts override)

Responses include rate-limit metadata:

X-RateLimit-Limit: 500
X-RateLimit-Remaining: 498
X-RateLimit-Tier: core
Retry-After: 12   (only on 429)

Plan upgrades take effect immediately. The gateway re-evaluates the ceiling on every authentication, so there is no need to rotate keys after upgrading.

Status codes

Code	Meaning	Action
200	Success	Read the response body
400	Malformed request / missing field	Check JSON shape and required fields
401	Invalid or missing API key	Re-check the Bearer token
402	Insufficient credits	Top up at Settings > Cost
403	Plan blocks the feature	Upgrade the workspace plan (Core or Enterprise)
404	Model not deployed in workspace	`GET /v1/models` to list what is available
409	Idempotency conflict	Reuse the same idempotency key only for the same request body
429	Rate limit exceeded	Back off per `Retry-After` header
502	Upstream GPU error	Retry once; otherwise check Deployments
503	GPU worker unavailable	Retry after 30s. This is usually a cold start.

SDK compatibility matrix

RunInfra TypeScript

npm install @runinfra/sdk. Native pipeline IDs, typed errors, request IDs, audio, images, and webhook helpers.

RunInfra Python

pip install runinfra. Native helpers for optimized deployment access.

OpenAI Python

pip install openai. Change base_url and api_key.

OpenAI JS/TS

npm i openai. Change baseURL and apiKey.

LangChain

ChatOpenAI(base_url=..., api_key=...).

LlamaIndex

OpenAI(api_base=..., api_key=...).

Vercel AI SDK

createOpenAICompatible({ baseURL, apiKey }).

Instructor

Use it through the OpenAI Python client for supported structured-output chat flows.

Ready to build?

Start building on RunInfra

$10 in free credits, no credit card required.

Base URL

Auth scheme

​Key scopes

Workspace-scoped (recommended)

Pipeline-scoped

​Supported endpoints

POST /v1/chat/completions

POST /v1/responses

POST /v1/embeddings

POST /v1/rerank

POST /v1/images/generations

POST /v1/audio/speech

POST /v1/audio/transcriptions

GET /v1/models

​Drop-in usage

​Authentication

​Rate limits by plan

​Status codes

​SDK compatibility matrix

RunInfra TypeScript

RunInfra Python

OpenAI Python

OpenAI JS/TS

LangChain

LlamaIndex

Vercel AI SDK

Instructor

​Ready to build?

Start building on RunInfra

Key scopes

Supported endpoints

Drop-in usage

Authentication

Rate limits by plan

Status codes

SDK compatibility matrix

Ready to build?