RunInfra rate-limits per API key. Responses carry headers that tell your client exactly how much budget remains so you can back off correctly.Documentation Index
Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Response headers
Every response returns:| Header | Meaning |
|---|---|
X-RateLimit-Limit | Total requests allowed in the current window |
X-RateLimit-Remaining | How many you have left before 429 |
X-RateLimit-Reset | Unix timestamp when the window resets |
Retry-After | Seconds to wait before retrying (on 429 and 503) |
Defaults by plan
| Plan | API key minting | Default per-key limit |
|---|---|---|
| Starter | No API keys, dashboard-only | N/A |
| Pro | Allowed | 500 requests/min (max 1000) |
| Team | Allowed | 5,000 requests/min (max 10,000) |
| Enterprise | Allowed | Custom |
Handling 429
Raise a limit
Up to the per-plan ceiling, you can raise the default from Settings > API Keys. Click the key, edit the per-minute budget, save. Raising above the ceiling requires a plan upgrade or an Enterprise agreement.Burst behavior
The per-minute limit is enforced with a leaky-bucket model, not a flat per-minute counter. You get a small burst allowance above the steady-state rate, then drain back to the limit over the next ~60 seconds. Concretely, at a 500 req/min limit:- You can fire ~50 requests in the first second without triggering 429.
- After that, the bucket drains at ~8.3 req/sec (500/60).
- If you sustain above 8.3 req/sec, the bucket empties and the next request returns 429.
Workspace vs per-key
Rate limits are per API key. Two keys in the same workspace each get their own budget. There is no workspace-level cap below the per-key sum.| Scope | Limit applies | Notes |
|---|---|---|
| Per-key | Yes | Default; what X-RateLimit-Limit reports |
| Per workspace | No | Workspaces can fan out across many keys |
| Per pipeline | No (today) | All keys can hit any pipeline they have access to |
Best practices
- Always respect
Retry-After. Custom backoff schedules that ignore the header produce thundering-herd retries. - Spread keys across client instances. One key per pod or process is fine; one key per user request is waste.
- Monitor
X-RateLimit-Remainingproactively. If it dips below 20 percent consistently, raise the limit before you start dropping traffic. - Mint a separate key per environment. Production, staging, and CI keys with different limits stop a runaway staging job from eating your prod budget.
Next steps
Errors
Full list of error codes and bodies.
Authentication
Key scopes and rotation.
Autoscaling
Replica budget, not request budget.
Monitoring
Watch rate-limit utilization.