Plans and pricing

RunInfra has two plans on one unified credit balance: Core, a self-serve plan where you choose a custom monthly amount, and Enterprise for custom volume and dedicated infrastructure. Usage for the agent, optimization runs, benchmarking, deploys, and inference all draws from the same balance, where 1 credit = $1.

New accounts start with $10 in free credits. No credit card required to try the platform.

Plans

Feature	Core	Enterprise
Price	Custom $50-$ 1000 / month	Custom (above $1000/mo)
Credits	Your monthly amount, 1 credit = $1	Custom credit volume
Credit balance	One unified balance	One unified balance
Seats	No per-seat fees	No per-seat fees
Optimization	AWQ, GPTQ, FP8 quantization	Everything in Core
Standard GPUs (T4-H100)	Yes	Yes
B200 / H200 GPUs	No	Yes
Managed deploy + scale-to-zero	Yes	Yes
OpenAI-compatible API	Yes	Yes
Self-hosted / custom GPU	No	Yes
Audit logs + RBAC	RBAC	Audit logs + RBAC
SLA	99.9%	Custom, up to 99.99%
Compliance	-	SOC 2 Type II
Support	Priority email	Dedicated CSM + private Slack

Core is self-serve at a custom monthly amount you choose between $50 and$ 1000. Your payment converts to credits (1 credit = $1) in a single unified balance that funds optimization runs, the agent, benchmarking, deploys, and inference alike. There are no seats and no per-user fees. Core includes quantization (AWQ, GPTQ, FP8), all standard GPUs (T4, L4, L40S, A100, H100), managed deploy with scale-to-zero endpoints, OpenAI-compatible API endpoints, and unlimited pipelines and versioning. Enterprise is for monthly volume above $1000 and adds self-hosted and custom-GPU deployment, B200 / H200 GPU access, audit logs, custom SLAs up to 99.99%, SOC 2 Type II compliance, a custom credit volume with contract terms, and a dedicated CSM with a private Slack channel. Contact sales for a quote.

Token pricing

When you deploy an endpoint, inference is billed per million tokens, drawn from your unified credit balance. The table below shows estimated starting rates by model size; your actual cost depends on your pipeline configuration.

Model size	Input (from)	Output (from)
Small (1-8B)	$0.08 / MTok	$0.20 / MTok
Medium (8-30B)	$0.20 / MTok	$0.80 / MTok
Large (30-70B)	$0.45 / MTok	$1.50 / MTok
XL (70B+)	$0.80 / MTok	$2.50 / MTok

Your actual per-token cost depends on your full pipeline: model choice, quantization method, GPU tier, routing, and deployment mode. RunInfra shows the projected cost in the deploy tab before you go live, derived from your real benchmark measurements rather than catalog defaults.

Volume pricing is an Enterprise conversation. Contact sales to discuss custom monthly volume above $1000/mo.

One unified credit balance

Everything draws from a single credit balance where 1 credit = $1. There is no split between “optimization” and “inference” credits, the agent, optimization runs, benchmarking, deploys, and production inference all debit the same balance.

What it funds	How it’s billed
Agent chat, plans, optimization runs, benchmarking	Metered to the measured GPU cost of each run, drawn from your balance
Production endpoint traffic	Per-million-tokens (Flex scale-to-zero and Active always-on); Active adds a reserved-GPU hourly fee

New accounts start with $10 in free credits. Add credits any time at Settings > Cost, where you can also track your balance and activity.

What counts as usage

Optimization run

One optimization run on one pipeline, a single call to start optimization, regardless of how many model variants are benchmarked in that run. When a run starts, RunInfra places a temporary hold on your credit balance and settles it to the measured GPU cost once the run finishes, refunding the unused amount. Failed or cancelled runs are refunded in full.

Token

Input tokens (your prompt) and output tokens (the model’s response) are counted and billed separately at the rates shown in the token pricing table above, drawn from your unified credit balance.

Credits

Credits are your single prepaid balance (1 credit =

1) used for everything: the agent, optimization, benchmarking, deploys, and inference. New accounts get

10 in free credits. Credits do not expire while your account is active.

Active-mode reservation

Active (always-on) endpoints keep a GPU warm 24/7 for zero cold start. The reserved GPU time is metered against your unified credit balance; the Deploy tab shows the projected monthly cost before you go live.

Manage your plan

Choose or adjust your Core monthly amount and view billing history at Settings > Billing; add credits and track your balance at Settings > Cost. To move above $1000/mo, contact sales about Enterprise.

Next steps

Quickstart

Create an account and deploy your first endpoint in 5 minutes.

GPU tiers and pricing

How GPU selection and deployment mode affect per-token cost.

Deployment modes

Flex scale-to-zero and Active always-on endpoints.

FAQ

Answers to the most common questions about RunInfra.

​Plans

​Token pricing

​One unified credit balance

​What counts as usage

​Manage your plan

​Next steps

Quickstart

GPU tiers and pricing

Deployment modes

FAQ

Plans

Token pricing

One unified credit balance

What counts as usage

Manage your plan

Next steps