Skip to main content
Deploy an optimized pipeline to your own RunPod account instead of RunInfra’s managed cloud. You keep full control of the GPU, the spend, and the account; RunInfra provisions the endpoint, serves inference through the same api.runinfra.ai gateway, and meters usage at zero cost on Pro and above.

Managed RunInfra Cloud

Full custom-image optimization, billed by RunInfra per token. The default, and the only option for image, speech, and voice models.

Your RunPod

Runs on your RunPod account with env-level optimization. You pay RunPod for the GPU; RunInfra inference is free on Pro and above.

When to pick this

  • You already have RunPod credits, quota, or a negotiated rate and want the GPU bill on your own account.
  • Cost control: you set the scaling, the idle behavior, and the spend ceiling in your RunPod console.
  • Account ownership: the serverless endpoint lives in your RunPod console where you can see it, audit it, and remove it at any time.

What it supports, honestly

ManagedYour RunPod
ModalitiesLLM, embedding, image, speech, voiceLLM and embedding only
OptimizationFull custom image (kernels, quantized artifacts, tuned runtime)Lighter env-level optimization on a stock RunPod template
GPU billingRunInfra per-tokenYour RunPod account, at RunPod’s rates
RunInfra inference chargePer-tokenFree (Pro and above)
TierFlex or ActiveFlex only (scaling is yours to manage on RunPod)
Plan requirementPro+Pro+
The optimization difference matters: managed deploys run RunInfra’s custom-built image carrying everything the optimizer produced. Your-RunPod deploys run a stock RunPod template with the optimizer’s settings applied as environment configuration. That captures the serving-level wins (engine flags, KV cache, context sizing) but not custom kernels or RunInfra-quantized local artifacts. Pipelines that need those are refused with a clear error and stay deployable on managed.

How it works

1

Connect your RunPod account

In the deploy tab, pick Your RunPod as the Destination and paste a RunPod API key (create one at RunPod console, Settings, API Keys). The key is validated live against your account, then encrypted with AES-256-GCM. It is never shown again, never logged, and only decrypted server-side at provision time.
2

Deploy

Click Deploy as usual. RunInfra provisions a serverless endpoint on your RunPod account with the optimized serving configuration. The endpoint appears in your own RunPod console.
3

Use the same gateway

Your API keys, SDK snippets, and api.runinfra.ai URLs work unchanged. Inference is proxied to your endpoint using your stored key; usage (tokens, latency) is recorded for analytics at zero cost.
4

Tear down and disconnect

Terminate the deployment from RunInfra (it is deleted on your RunPod account), then disconnect the key from the deploy tab or Settings. Disconnecting is blocked while live deployments still need the key for teardown, so nothing gets stranded.

Key handling and security

  • The key you paste is validated against RunPod, encrypted at rest, and never returned to the browser.
  • RunInfra’s frontend cannot decrypt it. Only the deployment engine decrypts it, at provision, status, teardown, and inference time, after verifying the credential belongs to your workspace.
  • The inference proxy will only sign inference requests to your endpoint with your key; it refuses every other RunPod operation.
  • Re-linking a key for the same RunPod account just rotates it. Switching to a different RunPod account is blocked while deployments are live on the current one.

Billing details

  • RunPod bills you for GPU time at your account’s rates. RunInfra never adds a margin on top of it.
  • RunInfra bills nothing per inference on Your-RunPod deploys for Pro and above. Optimization runs are billed normally.
  • If your workspace drops below Pro, Your-RunPod inference is suspended with a byoc_plan_required error until you upgrade or redeploy on managed.

Troubleshooting

ErrorMeaningFix
byoc_runpod_not_linkedNo RunPod key on fileConnect your RunPod account in the deploy tab
byoc_modality_unsupportedPipeline needs a custom imageDeploy on managed RunInfra Cloud
byoc_plan_requiredWorkspace is below ProUpgrade, or redeploy on managed
”Reconnect RunPod in Settings”Stored key was unlinked or rejectedRe-link the key, then redeploy
”Check the endpoint in your RunPod console”Your endpoint is unreachableInspect or restart it from the RunPod console
Related: Deployment targets, GPU pricing, Plans.