Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

RunInfra offers four plans ranging from a free tier for building and testing to enterprise contracts with dedicated GPU infrastructure. This page covers what each plan includes, how inference is billed by token, and how optimization sessions, rollover, and overages work.
Annual billing saves 20% across paid plans, Pro drops to 39/moandTeamdropsto39/mo and Team drops to 199/seat/mo.

Plans

FeatureStarterProTeamEnterprise
PriceFree$49/mo$249/seat/moCustom
Build pipelines3UnlimitedUnlimitedUnlimited
Optimization sessions3/month20/month + overage100/seat pooledUnlimited
Session rolloverNoUp to 40 (2 months)Up to 200 per seatCustom
Overage priceNo$2.50 / session$2.50 / sessionVolume pricing
Playground requests100/dayUnlimitedUnlimitedUnlimited
Deploy endpointsNoYesYesYes
Always-on endpointsNoNoYesYes
TensorRT-LLMNoNoYesYes
Custom model uploadsNoNoYesYes
Max replicasN/A832Custom
SupportCommunityPriority emailShared SlackDedicated CSM
Starter is free forever. You can build up to 3 pipelines, run 3 optimization sessions per month, and test in the playground up to 100 times per day. No credit card required. Pro (49/mo,or49/mo, or 39/mo on annual billing) unlocks deployment and includes 20 optimization sessions per month. You get scale-to-zero endpoints with fast cold starts, the full optimized model search (AWQ, GPTQ, FP8), and Forge GPU kernel optimization. Unused sessions roll over for up to two months, and extra sessions cost $2.50 each billed from your credit balance. Team (249/seat/mo,or249/seat/mo, or 199/seat/mo annually, minimum 3 seats) adds always-on endpoints with zero cold start, TensorRT-LLM, speculative decoding, custom model uploads, and audit logs. Optimization sessions are pooled across the workspace at 100 per seat. Enterprise includes dedicated GPU infrastructure, custom SLAs, SOC 2/HIPAA compliance, and volume token pricing. Contact sales for a quote.

Token pricing

When you deploy an endpoint, inference is billed per million tokens. The table below shows estimated starting rates by model size, your actual cost depends on your pipeline configuration.
Model sizeInput (from)Output (from)
Small (1-8B)$0.08 / MTok$0.20 / MTok
Medium (8-30B)$0.20 / MTok$0.80 / MTok
Large (30-70B)$0.45 / MTok$1.50 / MTok
XL (70B+)$0.80 / MTok$2.50 / MTok
Your actual per-token cost depends on your full pipeline: model choice, quantization method, GPU tier, routing, and deployment mode. RunInfra shows the projected cost in the deploy tab before you go live, derived from your real benchmark measurements rather than catalog defaults.
Team plans receive a 10% discount at 100M+ tokens/month. Enterprise plans receive up to 40% off. Contact sales to discuss volume pricing.

Two credit pools

Paid plans split usage into two independent credit accounts that never roll into each other:
PoolWhat it pays forHow it’s billed
Optimization creditsRunning an optimization session: GPU profiling, AWQ/GPTQ/FP8 search, Forge kernel applicationCounted in sessions. Pro: 20/mo included, $2.50 per overage. Team: 100/seat pooled.
Inference creditsProduction endpoint traffic, per-million-tokens billing on Flex (scale-to-zero) and Active (always-on)Counted in tokens. Rates per the token pricing table; Active adds a reserved-GPU hourly fee.
The two pools are tracked separately at Settings > Usage. Optimization sessions never debit your inference balance, and inference traffic never spends optimization credits. Top-ups go to whichever bucket you choose at checkout. This split lets you cap experimentation cost without limiting production traffic, and vice versa.

What counts as usage

One optimization run on one pipeline, a single call to start optimization, regardless of how many model variants are benchmarked in that run. Starter gets 3/month. Pro gets 20/month plus overage. Team gets 100/seat pooled across the workspace.
On Pro and Team, once your included sessions are used up, each additional session costs $2.50, billed automatically from your credit balance. Top up your credit balance at Settings > Billing. Failed or cancelled sessions are refunded automatically.
Unused sessions carry forward to the next billing period, capped at 2× your monthly allowance, 40 sessions on Pro, 200 per seat on Team. Rolled-over sessions clear on any plan change.
One inference call made in the test playground. Starter is limited to 100 playground requests per day. Pro, Team, and Enterprise have unlimited playground access.
Input tokens (your prompt) and output tokens (the model’s response) are counted and billed separately at the rates shown in the token pricing table above.

Manage your plan

Upgrade or downgrade at Settings > Billing. Track your current optimization session usage and token consumption at Settings > Usage.

Next steps

Quickstart

Create an account and deploy your first endpoint in 5 minutes.

GPU tiers and pricing

How GPU selection and deployment mode affect per-token cost.

Deployment modes

Flex scale-to-zero and Active always-on endpoints.

FAQ

Answers to the most common questions about RunInfra.