RunInfra/Docs
GuideChangelog
Sign inGet started
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog

Plans and Pricing

Free to start. Pay for deployment and inference.

Plans

StarterProTeamEnterprise
PriceFree$99/mo$249/seat/moCustom
Build pipelines3UnlimitedUnlimitedUnlimited
Optimize3/monthUnlimitedUnlimitedUnlimited
Test in playground100/dayUnlimitedUnlimitedUnlimited
Deploy endpointsNoYesYesYes
Always-on endpointsNoNoYesYes
TensorRT-LLMNoNoYesYes
Custom model uploadsNoNoYesYes
Max replicas-832Custom
SupportCommunityPriority emailShared SlackDedicated CSM

Starter is free forever. Build pipelines, optimize models, and test in the playground. No credit card required.

Pro ($99/mo or $79/mo annual) unlocks deployment. Your pipelines become live API endpoints with scale-to-zero, full optimized model search (AWQ, GPTQ, FP8), and Forge GPU kernel optimization.

Team ($249/seat/mo or $199/seat/mo annual, min 3 seats) adds always-on endpoints with zero cold start, TensorRT-LLM, speculative decoding, custom model uploads, and audit logs.

Enterprise includes dedicated GPU infrastructure, custom SLAs, SOC 2/HIPAA compliance, and volume pricing. Contact sales.

Token pricing

When you deploy an endpoint, inference is billed per million tokens. Estimated starting rates by model size:

Model sizeInput (from)Output (from)
Small (1-8B)$0.08 / MTok$0.20 / MTok
Medium (8-30B)$0.20 / MTok$0.80 / MTok
Large (30-70B)$0.45 / MTok$1.50 / MTok
XL (70B+)$0.80 / MTok$2.50 / MTok

These are estimated starting prices. Your actual per-token cost depends on your full pipeline: model choice, quantization method, GPU tier, routing, and deployment mode. RunInfra shows your real estimated cost in the deploy tab before you go live.

Team plans get 10% off at 100M+ tokens/month. Enterprise gets up to 40% off.

What counts as usage

  • Optimization session: One optimization run on one pipeline. Starter gets 3/month, resets monthly.
  • Playground request: One inference call in the test playground. Starter gets 100/day.
  • Token: Input tokens (your prompt) and output tokens (model response) are counted separately.

Manage your plan

Upgrade or downgrade at Settings > Billing. Track usage at Settings > Usage.

How is this guide?

PreviousFAQNextQuickstart

On this page

PlansToken pricingWhat counts as usageManage your plan