Deploy to your own RunPod

Deploy an optimized pipeline to your own RunPod account instead of RunInfra’s managed cloud. You keep full control of the GPU, the spend, and the account; RunInfra provisions the endpoint, serves inference through the same api.runinfra.ai gateway, and meters usage at zero cost on Core and above.

Managed RunInfra Cloud

Full custom-image optimization, billed by RunInfra per token. The default, and the only option for image, speech, and voice models.

Your RunPod

Runs on your RunPod account with env-level optimization. You pay RunPod for the GPU; RunInfra inference is free on Core and above.

When to pick this

You already have RunPod credits, quota, or a negotiated rate and want the GPU bill on your own account.
Cost control: you set the scaling, the idle behavior, and the spend ceiling in your RunPod console.
Account ownership: the serverless endpoint lives in your RunPod console where you can see it, audit it, and remove it at any time.

What it supports, honestly

	Managed	Your RunPod
Modalities	LLM, embedding, image, speech, voice	LLM and embedding only
Optimization	Full custom image (kernels, quantized artifacts, tuned runtime)	Lighter env-level optimization on a stock RunPod template
GPU billing	RunInfra per-token	Your RunPod account, at RunPod’s rates
RunInfra inference charge	Per-token	Free (Core and above)
Tier	Flex or Active	Flex only (scaling is yours to manage on RunPod)
Plan requirement	Core	Core

The optimization difference matters: managed deploys run RunInfra’s custom-built image carrying everything the optimizer produced. Your-RunPod deploys run a stock RunPod template with the optimizer’s settings applied as environment configuration. That captures the serving-level wins (engine flags, KV cache, context sizing) but not custom kernels or RunInfra-quantized local artifacts. Pipelines that need those are refused with a clear error and stay deployable on managed.

How it works

Connect your RunPod account

In the deploy tab, pick Your RunPod as the Destination and paste a RunPod API key (create one at RunPod console, Settings, API Keys). The key is validated live against your account, then encrypted with AES-256-GCM. It is never shown again, never logged, and only decrypted server-side at provision time.

Deploy

Click Deploy as usual. RunInfra provisions a serverless endpoint on your RunPod account with the optimized serving configuration. The endpoint appears in your own RunPod console.RunInfra sizes the readiness window to the model and includes time for a new account or host to pull the runtime and uncached weights. Larger models receive longer startup budgets, which reduces false deployment timeouts during a healthy cold start.

Use the same gateway

Your API keys, SDK snippets, and api.runinfra.ai URLs work unchanged. Inference is proxied to your endpoint using your stored key; usage (tokens, latency) is recorded for analytics at zero cost.

Tear down and disconnect

Terminate the deployment from RunInfra (it is deleted on your RunPod account), then disconnect the key from the deploy tab or Settings. Disconnecting is blocked while live deployments still need the key for teardown, so nothing gets stranded.

Key handling and security

The key you paste is validated against RunPod, encrypted at rest, and never returned to the browser.
RunInfra’s frontend cannot decrypt it. Only the deployment engine decrypts it, at provision, status, teardown, and inference time, after verifying the credential belongs to your workspace.
The inference proxy will only sign inference requests to your endpoint with your key; it refuses every other RunPod operation.
Re-linking a key for the same RunPod account just rotates it. Switching to a different RunPod account is blocked while deployments are live on the current one.

Billing details

RunPod bills you for GPU time at your account’s rates. RunInfra never adds a margin on top of it.
RunInfra bills nothing per inference on Your-RunPod deploys for Core and above. Optimization runs are billed normally.
If your workspace drops below Core, Your-RunPod inference is suspended with a byoc_plan_required error until you upgrade or redeploy on managed.

Troubleshooting

Error	Meaning	Fix
`byoc_runpod_not_linked`	No RunPod key on file	Connect your RunPod account in the deploy tab
`byoc_modality_unsupported`	Pipeline needs a custom image	Deploy on managed RunInfra Cloud
`byoc_plan_required`	Workspace is below Core	Upgrade, or redeploy on managed
”Reconnect RunPod in Settings”	Stored key was unlinked or rejected	Re-link the key, then redeploy
”Check the endpoint in your RunPod console”	Your endpoint is unreachable	Inspect or restart it from the RunPod console

Related: Deployment targets, GPU pricing, Plans.

Get started

Using the agent

Features

Use cases

Deployments

Runbooks

Guides

Using with other libraries

Cookbook

Deploy to your own RunPod

Managed RunInfra Cloud

Your RunPod

When to pick this

What it supports, honestly

How it works

Key handling and security

Billing details

Troubleshooting

Managed RunInfra Cloud

Your RunPod

​When to pick this

​What it supports, honestly

​How it works

​Key handling and security

​Billing details

​Troubleshooting

When to pick this

What it supports, honestly

How it works

Key handling and security

Billing details

Troubleshooting