RunInfra offers four plans ranging from a free tier for building and testing to enterprise contracts with dedicated GPU infrastructure. This page covers what each plan includes, how inference is billed by token, and how optimization sessions, rollover, and overages work.Documentation Index
Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Plans
| Feature | Starter | Pro | Team | Enterprise |
|---|---|---|---|---|
| Price | Free | $49/mo | $249/seat/mo | Custom |
| Build pipelines | 3 | Unlimited | Unlimited | Unlimited |
| Optimization sessions | 3/month | 20/month + overage | 100/seat pooled | Unlimited |
| Session rollover | No | Up to 40 (2 months) | Up to 200 per seat | Custom |
| Overage price | No | $2.50 / session | $2.50 / session | Volume pricing |
| Playground requests | 100/day | Unlimited | Unlimited | Unlimited |
| Deploy endpoints | No | Yes | Yes | Yes |
| Always-on endpoints | No | No | Yes | Yes |
| TensorRT-LLM | No | No | Yes | Yes |
| Custom model uploads | No | No | Yes | Yes |
| Max replicas | N/A | 8 | 32 | Custom |
| Support | Community | Priority email | Shared Slack | Dedicated CSM |
Token pricing
When you deploy an endpoint, inference is billed per million tokens. The table below shows estimated starting rates by model size, your actual cost depends on your pipeline configuration.| Model size | Input (from) | Output (from) |
|---|---|---|
| Small (1-8B) | $0.08 / MTok | $0.20 / MTok |
| Medium (8-30B) | $0.20 / MTok | $0.80 / MTok |
| Large (30-70B) | $0.45 / MTok | $1.50 / MTok |
| XL (70B+) | $0.80 / MTok | $2.50 / MTok |
Team plans receive a 10% discount at 100M+ tokens/month. Enterprise plans receive up to 40% off. Contact sales to discuss volume pricing.
Two credit pools
Paid plans split usage into two independent credit accounts that never roll into each other:| Pool | What it pays for | How it’s billed |
|---|---|---|
| Optimization credits | Running an optimization session: GPU profiling, AWQ/GPTQ/FP8 search, Forge kernel application | Counted in sessions. Pro: 20/mo included, $2.50 per overage. Team: 100/seat pooled. |
| Inference credits | Production endpoint traffic, per-million-tokens billing on Flex (scale-to-zero) and Active (always-on) | Counted in tokens. Rates per the token pricing table; Active adds a reserved-GPU hourly fee. |
What counts as usage
Optimization session
Optimization session
One optimization run on one pipeline, a single call to start optimization, regardless of how many model variants are benchmarked in that run. Starter gets 3/month. Pro gets 20/month plus overage. Team gets 100/seat pooled across the workspace.
Overage session
Overage session
On Pro and Team, once your included sessions are used up, each additional session costs $2.50, billed automatically from your credit balance. Top up your credit balance at Settings > Billing. Failed or cancelled sessions are refunded automatically.
Session rollover
Session rollover
Unused sessions carry forward to the next billing period, capped at 2× your monthly allowance, 40 sessions on Pro, 200 per seat on Team. Rolled-over sessions clear on any plan change.
Playground request
Playground request
One inference call made in the test playground. Starter is limited to 100 playground requests per day. Pro, Team, and Enterprise have unlimited playground access.
Token
Token
Input tokens (your prompt) and output tokens (the model’s response) are counted and billed separately at the rates shown in the token pricing table above.
Manage your plan
Upgrade or downgrade at Settings > Billing. Track your current optimization session usage and token consumption at Settings > Usage.Next steps
Quickstart
Create an account and deploy your first endpoint in 5 minutes.
GPU tiers and pricing
How GPU selection and deployment mode affect per-token cost.
Deployment modes
Flex scale-to-zero and Active always-on endpoints.
FAQ
Answers to the most common questions about RunInfra.