When you deploy a pipeline, RunInfra provisions it on one of three targets. The default is managed RunPod Serverless because it covers most workloads with the least operational burden. The other two exist for teams that need data residency, custom hardware, or their own cost contract.Documentation Index
Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Managed RunPod
Scale-to-zero with FlashBoot. RunInfra runs the infra, you pay per-million-tokens (Flex) or per-second + base fee (Active).
Self-hosted Modal
Run on your Modal account using your reserved GPUs. You own the infra and the bill; RunInfra deploys + manages the pipeline software.
Custom GPU
Bring a Kubernetes cluster, a bare-metal box, or a cloud GPU instance. RunInfra ships a deployable image; you handle the runtime.
Managed RunPod (default)
The fastest path. Two modes:| Mode | Cold start | Billing | Best for |
|---|---|---|---|
| Flex (Pro+) | Under 2 s with Instant Start | Per-million-tokens, scale to zero | Prototyping, bursty traffic, anything with idle gaps |
| Active (Team+) | Zero, always warm | Per-second reserved-GPU fee + lower per-token rate | Steady traffic, latency-critical workloads |
Self-hosted Modal
Run the same RunInfra pipeline against your own Modal account. When to pick this:- You already have reserved GPU capacity on Modal
- You want the billing and audit trail to live in your account
- You need a specific region not offered on managed RunPod
- Compliance requires your data to never leave a specific cloud account
- Your Modal account, billing, and any reserved-capacity contracts
- IAM and access scoping inside Modal
- Region selection and quota
- Pipeline software (serving stack, optimizations, version pinning)
- Routing and rate limiting on the OpenAI-compatible gateway
- Optimization and re-optimization runs
- Monitoring and audit logs surfaced into the RunInfra dashboard
Custom GPU
Bring your own infrastructure. Bare metal, on-prem, a cloud GPU instance, or a Kubernetes cluster. When to pick this:- Strict data residency (air-gapped, regulated industry)
- Existing GPU contract you don’t want to leave (Crusoe, Lambda, CoreWeave, an enterprise NVIDIA contract)
- Workload size that justifies dedicated hardware
- A deployable container image of your optimized pipeline
- The
serve.shboot script, the model artifacts, the runtime config - A
Dockerfileand adocker-compose.yamlfor local validation - OpenAI-compatible HTTP endpoint exposed by the container
- Hardware, networking, storage, observability
- Container orchestration (Docker, Kubernetes, systemd, whatever you run)
- Rolling updates and rollback
How to choose
Pricing
- Managed RunPod (Flex): per-million-tokens, see GPU pricing for current rates
- Managed RunPod (Active): lower per-million-tokens plus a per-second reserved fee, see GPU pricing
- Self-hosted Modal: your Modal bill plus a RunInfra platform fee, billed monthly
- Custom GPU: your infrastructure cost plus a RunInfra platform fee, billed monthly