Name: RunInfra
Brand: RunInfra

RunInfra is now public.See what's new

RunInfraby RightNow

Pricing

Simple, transparent pricing

Start free and scale as you grow. Only pay for the GPU compute you use.

Starter

Build and test pipelines, no deployment.

0123456789

/ month

Chat-driven pipeline builder

3 optimization sessions / month

Full Hugging Face model catalog

AWQ optimized model search

Smart routing (complexity, cost, latency)

Pipeline playground (100 req/day)

3 active pipelines

7-day metrics retention

Community support

Pro

Deploy optimized endpoints. Pay per token.

0123456789

/ month

Inference billed per million tokens

Everything in Starter, plus:

Unlimited optimization sessions

Deployed API endpoints (scale-to-zero)

Unlimited active pipelines

Forge GPU kernel optimization

Full optimization suite (AWQ, GPTQ, FP8)

RunQuant: custom quantization engine

Pipeline versioning with comparison

Stress testing and preflight checks

Scaling config (up to 8 replicas)

Fast cold starts (under 2s)

90-day metrics with cost analytics

99.9% SLA, priority support

Team

For teams that need advanced optimization and collaboration.

0123456789

/ seat / month

Min 3 seats. 10% token discount at 100M+/mo

Everything in Pro, plus:

Always-on endpoints (zero cold start)

NVIDIA TensorRT-LLM integration

Speculative decoding

Advanced routing (weighted, multi-model)

Custom model uploads

Scaling config (up to 32 replicas)

1-year metrics retention

SSO (coming soon), audit logs, RBAC

99.95% SLA

Shared Slack support

Enterprise

Dedicated infrastructure, compliance, and volume pricing.

Custom

Everything in Team, plus:

Dedicated GPU infrastructure with reserved capacity

Private model onboarding (fine-tuned weights)

Custom SLAs (up to 99.99% uptime)

Volume token pricing (up to 40% off)

Unlimited metrics retention

SOC 2 and HIPAA compliance

Dedicated CSM and private Slack

Compare plans	Starter $0/ month	Pro $99/ month	Team $249/ seat / month	Enterprise Custom
Chat-driven builder
Optimization sessions	3 / month	Unlimited	Unlimited	Unlimited
Active pipelines	3	Unlimited	Unlimited	Unlimited
Model catalog	Full HF catalog	Full HF catalog	Full HF catalog	HF + private models
Deployed API endpoint
Forge GPU kernel optimization
Optimization methods	AWQ	AWQ, GPTQ, FP8, RunQuant	AWQ, GPTQ, FP8, RunQuant	All + custom
NVIDIA TensorRT-LLM
Speculative decoding
Smart routing	Basic	Full	Advanced (weighted)	Custom
Scaling replicas		Up to 8	Up to 32	Custom
Stress testing and preflight
Pipeline versioning
Inference pricing	Playground only	Per million tokens	10% off at 100M+	Up to 40% off
Metrics retention	7 days	90 days	1 year	Unlimited
SLA guarantee	-	99.9%	99.95%	99.99%
SSO (coming soon) and audit logs
Support	Community	Priority email	Shared Slack	Dedicated CSM

FAQ

Common questions

Can't find what you're looking for? Get in touch

What is RunInfra?

RunInfra is a GPU optimization platform for open-source LLMs. You pick a model from Hugging Face, and RunInfra benchmarks it across GPU tiers, generates custom Triton kernels, and deploys an optimized production API. No YAML, no DevOps.