Simple, transparent pricing.
Start free and scale as you grow. Only pay for the GPU compute you use.
Starter
Build and test pipelines, no deployment.
Pro
For solo builders shipping inference endpoints.
Team
For teams running production inference at scale.
Enterprise
Dedicated infrastructure, compliance, volume pricing.
| Compare plans | Starter $0/ month | Pro $49/ month | Team $249/ seat / month | Enterprise Custom |
|---|---|---|---|---|
| Chat-driven builder | ||||
| Optimization credits | 3 trial runs / month | $50 / month included | $250 / seat / month included | Custom postpaid |
| Unused optimization hold | Refunded automatically | Refunded automatically | Contract terms | |
| Active pipelines | 3 | Unlimited | Unlimited | Unlimited |
| Model catalog | Full HF catalog | Full HF catalog | Full HF catalog | HF + private models |
| Deployed API endpoint | ||||
| Kernel Agent (GPU kernel generation) | ||||
| Custom GPU picker | ||||
| Optimization methods | AWQ | AWQ, GPTQ, FP8, RunQuant | AWQ, GPTQ, FP8, RunQuant | All + custom |
| NVIDIA TensorRT-LLM | ||||
| Speculative decoding | ||||
| Smart routing | Basic | Full | Advanced (weighted) | Custom |
| Scaling replicas | Up to 8 | Up to 32 | Custom | |
| Stress testing and preflight | ||||
| Pipeline versioning | ||||
| Inference pricing | Playground only | Per million tokens | 10% off at 100M+ | Up to 40% off |
| Metrics retention | 7 days | 90 days | 1 year | Unlimited |
| SLA guarantee | - | 99.9% | 99.95% | 99.99% |
| SSO (coming soon) and audit logs | ||||
| Support | Community | Priority email | Shared Slack | Dedicated CSM |
Common questions
Can't find what you're looking for? Get in touch
What is RunInfra?
RunInfra is a chat-native AI model optimization and infrastructure platform. You describe the AI application or inference pipeline you want to build, and RunInfra selects the right open-source models, benchmarks GPU tiers, tunes runtime settings, applies optimizations, and ships production-ready infrastructure from one conversation.
Own your AI. We benchmark GPUs, optimize kernels, and deploy open-source models as production APIs.
Start building