api.runinfra.ai gateway, and meters usage at zero cost on Pro and above.
Managed RunInfra Cloud
Full custom-image optimization, billed by RunInfra per token. The default, and the only option for image, speech, and voice models.
Your RunPod
Runs on your RunPod account with env-level optimization. You pay RunPod for the GPU; RunInfra inference is free on Pro and above.
When to pick this
- You already have RunPod credits, quota, or a negotiated rate and want the GPU bill on your own account.
- Cost control: you set the scaling, the idle behavior, and the spend ceiling in your RunPod console.
- Account ownership: the serverless endpoint lives in your RunPod console where you can see it, audit it, and remove it at any time.
What it supports, honestly
| Managed | Your RunPod | |
|---|---|---|
| Modalities | LLM, embedding, image, speech, voice | LLM and embedding only |
| Optimization | Full custom image (kernels, quantized artifacts, tuned runtime) | Lighter env-level optimization on a stock RunPod template |
| GPU billing | RunInfra per-token | Your RunPod account, at RunPod’s rates |
| RunInfra inference charge | Per-token | Free (Pro and above) |
| Tier | Flex or Active | Flex only (scaling is yours to manage on RunPod) |
| Plan requirement | Pro+ | Pro+ |
How it works
Connect your RunPod account
In the deploy tab, pick Your RunPod as the Destination and paste a RunPod API key (create one at RunPod console, Settings, API Keys). The key is validated live against your account, then encrypted with AES-256-GCM. It is never shown again, never logged, and only decrypted server-side at provision time.
Deploy
Click Deploy as usual. RunInfra provisions a serverless endpoint on your RunPod account with the optimized serving configuration. The endpoint appears in your own RunPod console.
Use the same gateway
Your API keys, SDK snippets, and
api.runinfra.ai URLs work unchanged. Inference is proxied to your endpoint using your stored key; usage (tokens, latency) is recorded for analytics at zero cost.Key handling and security
- The key you paste is validated against RunPod, encrypted at rest, and never returned to the browser.
- RunInfra’s frontend cannot decrypt it. Only the deployment engine decrypts it, at provision, status, teardown, and inference time, after verifying the credential belongs to your workspace.
- The inference proxy will only sign inference requests to your endpoint with your key; it refuses every other RunPod operation.
- Re-linking a key for the same RunPod account just rotates it. Switching to a different RunPod account is blocked while deployments are live on the current one.
Billing details
- RunPod bills you for GPU time at your account’s rates. RunInfra never adds a margin on top of it.
- RunInfra bills nothing per inference on Your-RunPod deploys for Pro and above. Optimization runs are billed normally.
- If your workspace drops below Pro, Your-RunPod inference is suspended with a
byoc_plan_requirederror until you upgrade or redeploy on managed.
Troubleshooting
| Error | Meaning | Fix |
|---|---|---|
byoc_runpod_not_linked | No RunPod key on file | Connect your RunPod account in the deploy tab |
byoc_modality_unsupported | Pipeline needs a custom image | Deploy on managed RunInfra Cloud |
byoc_plan_required | Workspace is below Pro | Upgrade, or redeploy on managed |
| ”Reconnect RunPod in Settings” | Stored key was unlinked or rejected | Re-link the key, then redeploy |
| ”Check the endpoint in your RunPod console” | Your endpoint is unreachable | Inspect or restart it from the RunPod console |