Rate limits - RunInfra

RunInfra rate-limits per API key. Responses carry headers that tell your client exactly how much budget remains so you can back off correctly.

Response headers

Every response returns:

X-RateLimit-Limit: 500
X-RateLimit-Remaining: 498
X-RateLimit-Reset: 1714502460
Retry-After: 12

Header	Meaning
`X-RateLimit-Limit`	Total requests allowed in the current window
`X-RateLimit-Remaining`	How many you have left before 429
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds to wait before retrying (on 429 and 503)

Defaults by plan

Plan	API key minting	Default per-key limit
Free (trial)	No API keys, playground only	N/A
Core	Allowed	5,000 requests/min (max 10,000)
Enterprise	Allowed	50,000 requests/min (custom)

Free (trial) workspaces do not mint API keys. You can build, optimize, and test in the playground, but deploying a live endpoint and calling it over HTTP requires a paid plan (Core or Enterprise).

Handling 429

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.runinfra.ai/v1",
    api_key="YOUR_RUNINFRA_API_KEY",
    max_retries=5,  # handles 429 with exponential backoff
)
response = client.chat.completions.create(
    model=os.environ["RUNINFRA_MODEL"],
    messages=[{"role": "user", "content": "Health check"}],
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.runinfra.ai/v1",
  apiKey: process.env.RUNINFRA_GATEWAY_KEY,
  maxRetries: 5,
});
const response = await client.chat.completions.create({
  model: process.env.RUNINFRA_MODEL,
  messages: [{ role: "user", content: "Health check" }],
});
console.log(response.choices[0]?.message?.content);

The OpenAI SDK handles 429 with exponential backoff by default. Setting max_retries is usually enough for chat requests. For charge-bearing binary, multipart, embedding, or image calls, use an explicit retry policy and idempotency plan instead of broad automatic retries.

Raise a limit

Up to the per-plan ceiling, you can raise the default from Settings > API Keys. Click the key, edit the per-minute budget, save. Raising above the ceiling requires a plan upgrade or an Enterprise agreement.

Burst behavior

The per-minute limit is enforced with a leaky-bucket model, not a flat per-minute counter. You get a small burst allowance above the steady-state rate, then drain back to the limit over the next ~60 seconds. Concretely, at a 500 req/min limit:

You can fire ~50 requests in the first second without triggering 429.
After that, the bucket drains at ~8.3 req/sec (500/60).
If you sustain above 8.3 req/sec, the bucket empties and the next request returns 429.

This means short bursts of slightly-over-limit traffic are forgiven; sustained over-limit traffic is not.

Workspace vs per-key

Rate limits are per API key. Two keys in the same workspace each get their own budget. There is no workspace-level cap below the per-key sum.

Scope	Limit applies	Notes
Per-key	Yes	Default; what `X-RateLimit-Limit` reports
Per workspace	No	Workspaces can fan out across many keys
Per pipeline	No (today)	All keys can hit any pipeline they have access to

If you need a workspace-wide spend ceiling, your prepaid credit balance is the cap, manage it from Settings > Cost, not rate limits.

Best practices

Always respect Retry-After. Custom backoff schedules that ignore the header produce thundering-herd retries.
Spread keys across client instances. One key per pod or process is fine; one key per user request is waste.
Monitor X-RateLimit-Remaining proactively. If it dips below 20 percent consistently, raise the limit before you start dropping traffic.
Mint a separate key per environment. Production, staging, and CI keys with different limits stop a runaway staging job from eating your prod budget.

Next steps

Errors

Full list of error codes and bodies.

Authentication

Key scopes and rotation.

Autoscaling

Replica budget, not request budget.

Monitoring

Watch rate-limit utilization.

​Response headers

​Defaults by plan

​Handling 429

​Raise a limit

​Burst behavior

​Workspace vs per-key

​Best practices

​Next steps

Errors

Authentication

Autoscaling

Monitoring

Response headers

Defaults by plan

Handling 429

Raise a limit

Burst behavior

Workspace vs per-key

Best practices

Next steps