RunInfra/Docs
GuideChangelog
Sign inGet started
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog

Debugging Prompts

When things go wrong and how to guide the agent back on track.

Sometimes the agent misunderstands or picks suboptimal options. Here's how to course-correct.

The agent picked the wrong model

User: I said Llama, not Mistral. Switch to Llama 3.1 8B.

Be direct. Name the exact model you want.

Optimization results are bad

If the results don't meet your expectations:

User: The latency is still too high. Try a faster GPU.
User: Can you try TensorRT-LLM instead of vLLM?
User: Optimize again but prioritize latency over cost.

The agent re-runs optimization with your new parameters. Each run creates a new version you can compare.

The agent is asking too many questions

If the agent keeps asking for clarification instead of building:

User: Just go with your best recommendation and we'll iterate from there.

This tells the agent to make decisions and move forward.

The pipeline is too complex

If the agent added nodes you don't need:

User: Remove the guardrail and the load balancer. 
I just need the model and a cache.

Optimization is taking too long

Optimization typically takes 2-5 minutes. If it seems stuck:

User: What's the status of the optimization?

The agent will show you current progress.

Deployment failed

If deployment fails, the agent shows error diagnostics. Common fixes:

User: Try deploying on a different GPU tier.
User: The model might be too large for this GPU. What do you recommend?

The endpoint is slow on first request

The first request after a cold start takes 1-2 seconds on RunInfra Cloud. This is normal for scale-to-zero endpoints. Subsequent requests are fast.

If cold starts are unacceptable:

User: Switch to always-on deployment so there's no cold start.

(Requires Team plan.)

General debugging tips

  • Be specific: "It's broken" doesn't help. "Latency is 500ms but I need under 100ms" does.
  • Ask the agent to explain: "Why did you pick this GPU?" or "Why is this quantization method better?"
  • Compare versions: "Compare version 1 and version 3" to see what changed.
  • Start over if needed: "Reset the pipeline and let's start from scratch."

How is this guide?

PreviousPrompting Best PracticesNextExample Prompts

On this page

The agent picked the wrong modelOptimization results are badThe agent is asking too many questionsThe pipeline is too complexOptimization is taking too longDeployment failedThe endpoint is slow on first requestGeneral debugging tips