Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

RunInfra rate-limits per API key. Responses carry headers that tell your client exactly how much budget remains so you can back off correctly.

Response headers

Every response returns:
X-RateLimit-Limit: 500
X-RateLimit-Remaining: 498
X-RateLimit-Reset: 1714502460
Retry-After: 12
HeaderMeaning
X-RateLimit-LimitTotal requests allowed in the current window
X-RateLimit-RemainingHow many you have left before 429
X-RateLimit-ResetUnix timestamp when the window resets
Retry-AfterSeconds to wait before retrying (on 429 and 503)

Defaults by plan

PlanAPI key mintingDefault per-key limit
StarterNo API keys, dashboard-onlyN/A
ProAllowed500 requests/min (max 1000)
TeamAllowed5,000 requests/min (max 10,000)
EnterpriseAllowedCustom
Starter does not mint API keys today. You can build, optimize, and test in the playground, but deploying a live endpoint and calling it over HTTP requires Pro or higher.

Handling 429

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runinfra.ai/v1",
    api_key="YOUR_RUNINFRA_API_KEY",
    max_retries=5,  # handles 429 with exponential backoff
)
response = client.chat.completions.create(...)
The OpenAI SDK handles 429 with exponential backoff by default. Setting max_retries is usually enough. Only write manual retry logic when you need custom jitter.

Raise a limit

Up to the per-plan ceiling, you can raise the default from Settings > API Keys. Click the key, edit the per-minute budget, save. Raising above the ceiling requires a plan upgrade or an Enterprise agreement.

Burst behavior

The per-minute limit is enforced with a leaky-bucket model, not a flat per-minute counter. You get a small burst allowance above the steady-state rate, then drain back to the limit over the next ~60 seconds. Concretely, at a 500 req/min limit:
  • You can fire ~50 requests in the first second without triggering 429.
  • After that, the bucket drains at ~8.3 req/sec (500/60).
  • If you sustain above 8.3 req/sec, the bucket empties and the next request returns 429.
This means short bursts of slightly-over-limit traffic are forgiven; sustained over-limit traffic is not.

Workspace vs per-key

Rate limits are per API key. Two keys in the same workspace each get their own budget. There is no workspace-level cap below the per-key sum.
ScopeLimit appliesNotes
Per-keyYesDefault; what X-RateLimit-Limit reports
Per workspaceNoWorkspaces can fan out across many keys
Per pipelineNo (today)All keys can hit any pipeline they have access to
If you need a workspace-wide cap (e.g. to enforce a spend ceiling), use the Settings > Billing spend limits, not rate limits.

Best practices

  • Always respect Retry-After. Custom backoff schedules that ignore the header produce thundering-herd retries.
  • Spread keys across client instances. One key per pod or process is fine; one key per user request is waste.
  • Monitor X-RateLimit-Remaining proactively. If it dips below 20 percent consistently, raise the limit before you start dropping traffic.
  • Mint a separate key per environment. Production, staging, and CI keys with different limits stop a runaway staging job from eating your prod budget.

Next steps

Errors

Full list of error codes and bodies.

Authentication

Key scopes and rotation.

Autoscaling

Replica budget, not request budget.

Monitoring

Watch rate-limit utilization.