RunInfra is now public.See what's new
Pricing

Simple, transparent pricing

Start free and scale as you grow. Only pay for the GPU compute you use.

Starter

Build and test pipelines, no deployment.

$
/ month

 

Chat-driven pipeline builder
3 optimization sessions / month
Full Hugging Face model catalog
AWQ optimized model search
Smart auto GPU selection
Custom GPU picker (T4 → B200)Pro
Smart routing (complexity, cost, latency)
Pipeline playground (100 req/day)
3 active pipelines
7-day metrics retention
Community support

Pro

For solo builders and small teams shipping AI inference endpoints.

$
/ month

20 optimization sessions. Overage at $2.50 / session.

Everything in Starter, plus:
20 optimization sessions per month
Overage at $2.50 per session (from credit balance)
Unused sessions roll over up to 40 (2 months)
Custom GPU picker (T4, L4, A10G, L40S, A100-40GB, A100-80GB, H100, H200, B200)
Per-GPU cost visibility
Deploy tab unlocked in chat sessions
Pay-per-million-token inference credits (top up any time)
Deployed API endpoints (scale-to-zero)
Unlimited active pipelines
Kernel Agent (GPU kernel generation)
Full optimization suite (AWQ, GPTQ, FP8)
RunQuant: custom quantization engine
Pipeline versioning with comparison
Stress testing and preflight checks
Scaling config (up to 8 replicas)
Fast cold starts (under 2s)
90-day metrics with cost analytics
99.9% SLA, priority support

Team

For teams that need advanced optimization and collaboration.

$
/ seat / month

100 optimization sessions / seat pooled. 10% token discount at 100M+/mo.

Everything in Pro, plus:
100 optimization sessions per seat (pooled workspace-wide)
Overage at $2.50 per session (from shared credit balance)
Rollover up to 200 sessions per seat
Always-on endpoints (zero cold start)
NVIDIA TensorRT-LLM integration
Speculative decoding
Advanced routing (weighted, multi-model)
Custom model uploads
Scaling config (up to 32 replicas)
1-year metrics retention
SSO (coming soon), audit logs, RBAC
99.95% SLA
Shared Slack support

Enterprise

Dedicated infrastructure, compliance, and volume pricing.

Custom

 

Everything in Team, plus:
Dedicated GPU infrastructure with reserved capacity
Private model onboarding (fine-tuned weights)
Custom SLAs (up to 99.99% uptime)
Volume token pricing (up to 40% off)
Unlimited metrics retention
SOC 2 and HIPAA compliance
Dedicated CSM and private Slack
Compare plans
Starter
$0/ month
Pro
$49/ month
Team
$249/ seat / month
Enterprise
Custom
Chat-driven builder
Optimization sessions3 / month20 / month + $2.50 overage100 / seat pooledUnlimited
Session rolloverUp to 40 (2 months)Up to 200 / seatCustom
Active pipelines3UnlimitedUnlimitedUnlimited
Model catalogFull HF catalogFull HF catalogFull HF catalogHF + private models
Deployed API endpoint
Kernel Agent (GPU kernel generation)
Custom GPU picker
Optimization methodsAWQAWQ, GPTQ, FP8, RunQuantAWQ, GPTQ, FP8, RunQuantAll + custom
NVIDIA TensorRT-LLM
Speculative decoding
Smart routingBasicFullAdvanced (weighted)Custom
Scaling replicasUp to 8Up to 32Custom
Stress testing and preflight
Pipeline versioning
Inference pricingPlayground onlyPer million tokens10% off at 100M+Up to 40% off
Metrics retention7 days90 days1 yearUnlimited
SLA guarantee-99.9%99.95%99.99%
SSO (coming soon) and audit logs
SupportCommunityPriority emailShared SlackDedicated CSM
FAQ

Common questions

Can't find what you're looking for? Get in touch

What is RunInfra?

RunInfra is a chat-native AI model optimization and infrastructure platform. You describe the AI application or inference pipeline you want to build, and RunInfra selects the right open-source models, benchmarks GPU tiers, tunes runtime settings, applies optimizations, and ships production-ready infrastructure from one conversation.

Deploy your first optimized model
in under 5 minutes

Start Building for Free
RunInfra

Own your AI. We benchmark GPUs, optimize kernels, and deploy open-source models as production APIs.

Start building

© 2026 RunInfra. All rights reserved.