Plans and Pricing
Free to start. Pay for deployment and inference.
Plans
| Starter | Pro | Team | Enterprise | |
|---|---|---|---|---|
| Price | Free | $99/mo | $249/seat/mo | Custom |
| Build pipelines | 3 | Unlimited | Unlimited | Unlimited |
| Optimize | 3/month | Unlimited | Unlimited | Unlimited |
| Test in playground | 100/day | Unlimited | Unlimited | Unlimited |
| Deploy endpoints | No | Yes | Yes | Yes |
| Always-on endpoints | No | No | Yes | Yes |
| TensorRT-LLM | No | No | Yes | Yes |
| Custom model uploads | No | No | Yes | Yes |
| Max replicas | - | 8 | 32 | Custom |
| Support | Community | Priority email | Shared Slack | Dedicated CSM |
Starter is free forever. Build pipelines, optimize models, and test in the playground. No credit card required.
Pro ($99/mo or $79/mo annual) unlocks deployment. Your pipelines become live API endpoints with scale-to-zero, full optimized model search (AWQ, GPTQ, FP8), and Forge GPU kernel optimization.
Team ($249/seat/mo or $199/seat/mo annual, min 3 seats) adds always-on endpoints with zero cold start, TensorRT-LLM, speculative decoding, custom model uploads, and audit logs.
Enterprise includes dedicated GPU infrastructure, custom SLAs, SOC 2/HIPAA compliance, and volume pricing. Contact sales.
Token pricing
When you deploy an endpoint, inference is billed per million tokens. Estimated starting rates by model size:
| Model size | Input (from) | Output (from) |
|---|---|---|
| Small (1-8B) | $0.08 / MTok | $0.20 / MTok |
| Medium (8-30B) | $0.20 / MTok | $0.80 / MTok |
| Large (30-70B) | $0.45 / MTok | $1.50 / MTok |
| XL (70B+) | $0.80 / MTok | $2.50 / MTok |
These are estimated starting prices. Your actual per-token cost depends on your full pipeline: model choice, quantization method, GPU tier, routing, and deployment mode. RunInfra shows your real estimated cost in the deploy tab before you go live.
Team plans get 10% off at 100M+ tokens/month. Enterprise gets up to 40% off.
What counts as usage
- Optimization session: One optimization run on one pipeline. Starter gets 3/month, resets monthly.
- Playground request: One inference call in the test playground. Starter gets 100/day.
- Token: Input tokens (your prompt) and output tokens (model response) are counted separately.
Manage your plan
Upgrade or downgrade at Settings > Billing. Track usage at Settings > Usage.
How is this guide?