Debugging Prompts
When things go wrong and how to guide the agent back on track.
Sometimes the agent misunderstands or picks suboptimal options. Here's how to course-correct.
The agent picked the wrong model
User: I said Llama, not Mistral. Switch to Llama 3.1 8B.Be direct. Name the exact model you want.
Optimization results are bad
If the results don't meet your expectations:
User: The latency is still too high. Try a faster GPU.User: Can you try TensorRT-LLM instead of vLLM?User: Optimize again but prioritize latency over cost.The agent re-runs optimization with your new parameters. Each run creates a new version you can compare.
The agent is asking too many questions
If the agent keeps asking for clarification instead of building:
User: Just go with your best recommendation and we'll iterate from there.This tells the agent to make decisions and move forward.
The pipeline is too complex
If the agent added nodes you don't need:
User: Remove the guardrail and the load balancer.
I just need the model and a cache.Optimization is taking too long
Optimization typically takes 2-5 minutes. If it seems stuck:
User: What's the status of the optimization?The agent will show you current progress.
Deployment failed
If deployment fails, the agent shows error diagnostics. Common fixes:
User: Try deploying on a different GPU tier.User: The model might be too large for this GPU. What do you recommend?The endpoint is slow on first request
The first request after a cold start takes 1-2 seconds on RunInfra Cloud. This is normal for scale-to-zero endpoints. Subsequent requests are fast.
If cold starts are unacceptable:
User: Switch to always-on deployment so there's no cold start.(Requires Team plan.)
General debugging tips
- Be specific: "It's broken" doesn't help. "Latency is 500ms but I need under 100ms" does.
- Ask the agent to explain: "Why did you pick this GPU?" or "Why is this quantization method better?"
- Compare versions: "Compare version 1 and version 3" to see what changed.
- Start over if needed: "Reset the pipeline and let's start from scratch."
How is this guide?