RunInfra is now public.See what's new

Simple, transparent pricing.

Start free and scale as you grow. Only pay for the GPU compute you use.

Starter

Build and test pipelines, no deployment.

$
/ month
Chat-driven builder + full Hugging Face catalog
3 trial optimization runs / month
Pipeline playground (100 req/day)
Smart auto GPU + routing
3 active pipelines
7-day metrics retention
Community support

Pro

For solo builders shipping inference endpoints.

$
/ month
$50 / month in Optimization credits for optimization, agent chat, runbooks (yearly grants the full $600 upfront)
Pay-per-million-token Inference credits, top up any time
OpenAI-compatible API at 500 req/min
Deploy tab + scale-to-zero endpoints (under 2s cold start)
Custom GPU picker (T4, L4, A100, H100, H200, B200)
Optimization suite (AWQ, GPTQ, FP8, RunQuant)
Unlimited pipelines, up to 8 replicas
90-day metrics, 99.9% SLA, priority support

Team

For teams running production inference at scale.

$
/ seat / month
$250 / seat / month in Optimization credits, shared pool (yearly grants $3,000 per seat upfront)
Always-on endpoints, zero cold start
OpenAI-compatible API at 5,000 req/min
TensorRT-LLM, speculative decoding, advanced routing
Kernel Agent GPU kernel optimization
Custom model uploads, up to 32 replicas
1-year metrics retention
SSO, audit logs, RBAC
99.95% SLA, shared Slack support

Enterprise

Dedicated infrastructure, compliance, volume pricing.

Custom
Reserved GPU capacity with custom SLAs (up to 99.99%)
OpenAI-compatible API at 50,000+ req/min, custom ceilings
Volume token pricing (up to 40% off)
Custom model uploads at scale, secure ingest
Unlimited metrics retention
SOC 2 Type II compliance
Dedicated CSM and private Slack
Compare plans
Starter
$0/ month
Pro
$49/ month
Team
$249/ seat / month
Enterprise
Custom
Chat-driven builder
Optimization credits3 trial runs / month$50 / month included$250 / seat / month includedCustom postpaid
Unused optimization holdRefunded automaticallyRefunded automaticallyContract terms
Active pipelines3UnlimitedUnlimitedUnlimited
Model catalogFull HF catalogFull HF catalogFull HF catalogHF + private models
Deployed API endpoint
Kernel Agent (GPU kernel generation)
Custom GPU picker
Optimization methodsAWQAWQ, GPTQ, FP8, RunQuantAWQ, GPTQ, FP8, RunQuantAll + custom
NVIDIA TensorRT-LLM
Speculative decoding
Smart routingBasicFullAdvanced (weighted)Custom
Scaling replicasUp to 8Up to 32Custom
Stress testing and preflight
Pipeline versioning
Inference pricingPlayground onlyPer million tokens10% off at 100M+Up to 40% off
Metrics retention7 days90 days1 yearUnlimited
SLA guarantee-99.9%99.95%99.99%
SSO (coming soon) and audit logs
SupportCommunityPriority emailShared SlackDedicated CSM

Common questions

Can't find what you're looking for? Get in touch

What is RunInfra?

RunInfra is a chat-native AI model optimization and infrastructure platform. You describe the AI application or inference pipeline you want to build, and RunInfra selects the right open-source models, benchmarks GPU tiers, tunes runtime settings, applies optimizations, and ships production-ready infrastructure from one conversation.

Deploy your first optimized model
in under 5 minutes

Start Building for Free
RunInfra

Own your AI. We benchmark GPUs, optimize kernels, and deploy open-source models as production APIs.

Start building

© 2026 RunInfra. All rights reserved.