The RunInfra SDK is the native access layer for optimized deployments. It keeps the same OpenAI-compatible routes available, then adds RunInfra-specific safety around pipeline IDs, scoped API keys, request IDs, replay-safe idempotency keys, typed errors, audio binaries, native co-located voice pipelines, and local webhook signature verification.Documentation Index
Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Package publication is gated by non-bypassable strict live canaries against verified deployments. Until your package registry release is enabled, use the copy-paste snippets in Settings > API Keys or the Deploy tab. Those snippets use the same API surface shown here.
When to use it
Use the RunInfra SDK
You want scoped RunInfra keys, pipeline IDs, typed errors, request IDs, retries, streaming helpers, binary audio handling, and webhook verification helpers in one client.
Use the OpenAI SDK
You already have OpenAI-compatible code and only need to swap
base_url, API key, and model ID.Install
Base URL and pipeline ID
Use a workspace-scoped key when possible. It reaches every verified deployment in your workspace and selects the target withmodel.
Use a pipeline-scoped key or pipelineId when you want one client locked to one optimized pipeline.
Supported operations
| Modality | SDK operation | OpenAI-compatible route | Notes |
|---|---|---|---|
| LLM and vision-language | chat.completions.create() | POST /v1/chat/completions | Streaming, tools, structured output. |
| LLM and vision-language | responses.create() | POST /v1/responses | Use when your app prefers the Responses API event model. |
| Embeddings | embeddings.create() | POST /v1/embeddings | Replay-safe JSON request with idempotency keys. |
| Text to speech | audio.speech.create() | POST /v1/audio/speech | Returns binary audio. Configure the deployment-supported voice or reference-audio mode. Send an idempotency key for manual retry protection. |
| Speech to text | audio.transcriptions.create() | POST /v1/audio/transcriptions | Multipart audio upload. Send an idempotency key for manual retry protection. |
| Voice pipeline | voice.pipeline.create() | Native /pipeline helper | Pipeline-scoped helper for co-located audio-to-response deployments. Not an OpenAI-compatible route. |
| Image generation | images.generate() | POST /v1/images/generations | Returns OpenAI-shaped image data from verified image deployments. |
| Discovery | models.list() / models.retrieve() | GET /v1/models / GET /v1/models/{model} | Free model discovery for verified active deployments. |
| Webhooks | webhooks.verify_signature() / construct_event() | Local helper only | Delivery routes are not public yet. Verification helpers are available now. |
Streaming chat
Replay-safe JSON requests
Embeddings and image generation are replay-safe JSON operations. Send both a client request ID and an idempotency key.Audio and images
- Text to speech
- Speech to text
- Images
TypeScript
Errors and retries
SDK errors carrystatus, type, and requestId when the gateway provides one.
Automatic retries are limited to transient failures and replay-safe requests. The SDK does not blindly retry streaming requests, binary TTS uploads, or multipart ASR uploads.
The gateway still binds idempotency keys for TTS and ASR. If a binary or multipart request completes and your client loses the response, a manual retry with the same key will not run or charge a second inference after the first request settles.
Webhook verification
Public webhook delivery is not enabled yet, but local verification helpers are available so your handlers can be written before delivery is turned on.Next steps
OpenAI compatibility
Use the OpenAI SDK against the same gateway.
Authentication
Create scoped keys and understand workspace versus pipeline access.
API reference
Endpoint-by-endpoint parameters and response fields.
Rate limits
Per-key limits and retry headers.