RunInfra/Docs
GuideChangelog
Sign inGet started
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog

Troubleshooting

Common issues and how to fix them.

Chat and pipeline building

The agent keeps asking questions instead of building

Say: "Just go with your best recommendation and we'll iterate from there."

The agent picked the wrong model

Say the exact model you want: "Switch to Llama 3.1 8B."

I want to start over

Say: "Reset the pipeline and start from scratch."

Optimization

Optimization is taking too long

Optimization typically takes 2-5 minutes. If it seems stuck, ask: "What's the status of the optimization?"

Results don't meet my constraints

Try different approaches:

  • "Optimize again with a smaller model"
  • "Try a faster GPU"
  • "Relax the latency constraint to 300ms"

Quality score is too low with the optimized variant

Try a higher precision variant: "Search for an FP8 version instead of AWQ 4-bit." FP8 preserves more quality, especially on H100/H200 GPUs.

Deployment

Deployment failed

The agent shows error diagnostics. Common fixes:

  • "Try a different GPU tier" (current GPU may be unavailable)
  • "The model might be too large for this GPU. Recommend something bigger."

First request is slow (30-60 seconds)

Normal for the very first request after deployment. The model loads and compiles. Subsequent requests are fast (under 2 seconds cold start) thanks to RunInfra Cloud's weight caching.

Endpoint returns 503

The endpoint is stopped or still provisioning. Check status at Deployments or ask: "What's the status of my deployment?"

API integration

401 Unauthorized

Your API key is invalid, revoked, or missing. Check:

  • Key starts with ri_
  • Authorization: Bearer ri_your_key header is present
  • Key hasn't been revoked

403 Forbidden

The API key doesn't match the pipeline ID in the URL. Each key is scoped to one pipeline.

429 Too Many Requests

Rate limit exceeded. Wait for the Retry-After period, then retry. Increase your key's rate limit at Settings > API Keys.

Upgrade required (403 with upgrade prompt)

You hit a plan limit. Upgrade at Settings > Billing.

Still stuck?

  • Free plan: Community support
  • Pro: Priority email support
  • Team: Shared Slack channel
  • Enterprise: Dedicated customer success manager

Send feedback from within the app or email support from your billing page.

How is this guide?

PreviousFrom Idea to Pipeline

On this page

Chat and pipeline buildingThe agent keeps asking questions instead of buildingThe agent picked the wrong modelI want to start overOptimizationOptimization is taking too longResults don't meet my constraintsQuality score is too low with the optimized variantDeploymentDeployment failedFirst request is slow (30-60 seconds)Endpoint returns 503API integration401 Unauthorized403 Forbidden429 Too Many RequestsUpgrade required (403 with upgrade prompt)Still stuck?