Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

The RunInfra SDK is the native access layer for optimized deployments. It keeps the same OpenAI-compatible routes available, then adds RunInfra-specific safety around pipeline IDs, scoped API keys, request IDs, replay-safe idempotency keys, typed errors, audio binaries, native co-located voice pipelines, and local webhook signature verification.
Package publication is gated by non-bypassable strict live canaries against verified deployments. Until your package registry release is enabled, use the copy-paste snippets in Settings > API Keys or the Deploy tab. Those snippets use the same API surface shown here.

When to use it

Use the RunInfra SDK

You want scoped RunInfra keys, pipeline IDs, typed errors, request IDs, retries, streaming helpers, binary audio handling, and webhook verification helpers in one client.

Use the OpenAI SDK

You already have OpenAI-compatible code and only need to swap base_url, API key, and model ID.
Both paths call the same verified public gateway. The dashboard shows only the operations your deployment supports. After a runbook finishes in RunPipe, choose Open Deploy from the runbook handoff. Deploy only shows SDK operations that the verified endpoint supports, so copy the native or OpenAI-compatible snippet from there instead of guessing a route.

Install

npm install @runinfra/sdk

Base URL and pipeline ID

Use a workspace-scoped key when possible. It reaches every verified deployment in your workspace and selects the target with model. Use a pipeline-scoped key or pipelineId when you want one client locked to one optimized pipeline.
import { RunInfra } from "@runinfra/sdk";

const apiKey = process.env.RUNINFRA_API_KEY;
if (!apiKey) throw new Error("Set RUNINFRA_API_KEY.");

const client = new RunInfra({
  apiKey,
  baseURL: "https://api.runinfra.ai/v1",
  pipelineId: "your-optimized-pipeline-id",
});
If your dashboard snippet shows a different production base URL, keep the generated value. The pipeline ID should be the optimized inference pipeline ID from RunInfra, not the Hugging Face model ID.

Supported operations

ModalitySDK operationOpenAI-compatible routeNotes
LLM and vision-languagechat.completions.create()POST /v1/chat/completionsStreaming, tools, structured output.
LLM and vision-languageresponses.create()POST /v1/responsesUse when your app prefers the Responses API event model.
Embeddingsembeddings.create()POST /v1/embeddingsReplay-safe JSON request with idempotency keys.
Text to speechaudio.speech.create()POST /v1/audio/speechReturns binary audio. Configure the deployment-supported voice or reference-audio mode. Send an idempotency key for manual retry protection.
Speech to textaudio.transcriptions.create()POST /v1/audio/transcriptionsMultipart audio upload. Send an idempotency key for manual retry protection.
Voice pipelinevoice.pipeline.create()Native /pipeline helperPipeline-scoped helper for co-located audio-to-response deployments. Not an OpenAI-compatible route.
Image generationimages.generate()POST /v1/images/generationsReturns OpenAI-shaped image data from verified image deployments.
Discoverymodels.list() / models.retrieve()GET /v1/models / GET /v1/models/{model}Free model discovery for verified active deployments.
Webhookswebhooks.verify_signature() / construct_event()Local helper onlyDelivery routes are not public yet. Verification helpers are available now.
Unsupported operations are hidden from deployment snippets or shown with a reason. The SDK should not make a network call for a helper that is not shipped.

Streaming chat

const stream = await client.chat.completions.create({
  model: "your-model-id",
  messages: [{ role: "user", content: "Hello" }],
  max_tokens: 512,
  stream: true,
});

console.error("request id:", stream.requestId);
for await (const event of stream) {
  process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
Streaming POST requests are not retried automatically because the partial stream may already have reached your app. Use non-streaming JSON with an idempotency key when you need replay-safe retries.

Replay-safe JSON requests

Embeddings and image generation are replay-safe JSON operations. Send both a client request ID and an idempotency key.
import { randomUUID } from "node:crypto";

const result = await client.embeddings.create(
  {
    model: "bge-m3",
    input: ["first document", "second document"],
  },
  {
    clientRequestId: randomUUID(),
    idempotencyKey: randomUUID(),
  },
);

console.log(result._request_id, result.data[0].embedding);

Audio and images

TypeScript
import { randomUUID } from "node:crypto";
import { writeFile } from "node:fs/promises";

const voice = process.env.RUNINFRA_TTS_VOICE?.trim();
const refAudio = process.env.RUNINFRA_TTS_REF_AUDIO?.trim();
const refText = process.env.RUNINFRA_TTS_REF_TEXT?.trim();
const taskType = process.env.RUNINFRA_TTS_TASK_TYPE?.trim() || "Base";
const speechVoice = voice
  ? { voice }
  : refAudio && refText
    ? { ref_audio: refAudio, ref_text: refText, task_type: taskType }
    : null;

if (!speechVoice) {
  throw new Error("Set RUNINFRA_TTS_VOICE, or RUNINFRA_TTS_REF_AUDIO and RUNINFRA_TTS_REF_TEXT.");
}

const audio = await client.audio.speech.create(
  {
    model: "your-tts-model-id",
    input: "Hello from your optimized RunInfra endpoint.",
    ...speechVoice,
  },
  {
    clientRequestId: randomUUID(),
    idempotencyKey: randomUUID(),
  },
);

await writeFile("output.wav", Buffer.from(await audio.arrayBuffer()));
console.log(audio.requestId, audio.contentType);

Errors and retries

SDK errors carry status, type, and requestId when the gateway provides one. Automatic retries are limited to transient failures and replay-safe requests. The SDK does not blindly retry streaming requests, binary TTS uploads, or multipart ASR uploads. The gateway still binds idempotency keys for TTS and ASR. If a binary or multipart request completes and your client loses the response, a manual retry with the same key will not run or charge a second inference after the first request settles.

Webhook verification

Public webhook delivery is not enabled yet, but local verification helpers are available so your handlers can be written before delivery is turned on.
import { constructWebhookEvent } from "@runinfra/sdk";

const secret = process.env.RUNINFRA_WEBHOOK_SECRET;
if (!secret) throw new Error("Set RUNINFRA_WEBHOOK_SECRET.");

const event = constructWebhookEvent({
  payload: rawBody,
  signatureHeader: request.headers.get("runinfra-signature") ?? "",
  secret,
});

Next steps

OpenAI compatibility

Use the OpenAI SDK against the same gateway.

Authentication

Create scoped keys and understand workspace versus pipeline access.

API reference

Endpoint-by-endpoint parameters and response fields.

Rate limits

Per-key limits and retry headers.