RunInfra’s agent gets most things right when you give it clear instructions, but sometimes it picks the wrong model, produces optimization results that don’t meet your requirements, or builds a more complex pipeline than you need. Every problem below has a straightforward fix, usually a single follow-up message that redirects the agent without losing your progress.Documentation Index
Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
The agent picked the wrong model
The agent picked the wrong model
Be direct. Name the exact model you want and tell the agent to switch.The agent will swap the model and preserve the rest of your pipeline configuration. If you want to use the
@ mention syntax to be unambiguous:Optimization results are bad
Optimization results are bad
If the results don’t meet your expectations, re-run optimization with more specific instructions. Each run creates a new version you can compare against previous ones.
The agent keeps asking clarifying questions
The agent keeps asking clarifying questions
If the agent asks for clarification instead of building, tell it to proceed with its best judgment.This tells the agent to make decisions and move forward. You can always refine the result afterward, it’s faster than answering several questions upfront.
The pipeline is too complex
The pipeline is too complex
If the agent added nodes you don’t need, tell it exactly what to remove.The agent will simplify the pipeline to match your description. Being explicit about what you want to keep (“I just need the model and a cache”) helps the agent understand the target state, not just what to remove.
Optimization is taking too long
Optimization is taking too long
Optimization typically completes in 2-5 minutes. If it appears stuck, ask for a status update.The agent will show you current progress. If the run has genuinely stalled, you can ask it to restart with a smaller search space, for example, limiting to a single GPU tier or fewer quantization options.
Deployment failed
Deployment failed
When deployment fails, the agent surfaces error diagnostics automatically. Common fixes:If the error message mentions an out-of-memory condition, the model likely exceeds the VRAM available on your selected GPU. Asking the agent for a recommendation gives it a chance to suggest a larger GPU tier or a quantized model variant that fits.
The endpoint is slow on the first request
The endpoint is slow on the first request
The first request after a cold start takes 1-2 seconds on RunInfra Cloud. This is expected behavior for scale-to-zero endpoints, the GPU spins up on demand and subsequent requests are fast.If cold starts are unacceptable for your use case, switch to always-on deployment.
Always-on deployment requires the Team plan.
General debugging tips
When nothing above resolves the issue, these four approaches cover most remaining cases:- Be specific about what’s wrong. “It’s broken” doesn’t give the agent anything to act on. “Latency is 500ms but I need under 100ms” does.
- Ask the agent to explain its decisions. “Why did you pick this GPU?” or “Why is this quantization method better?” surfaces the agent’s reasoning so you can correct any wrong assumptions.
- Compare versions. “Compare version 1 and version 3” shows you exactly what changed and which configuration performs better on your target metrics.
- Start over if needed. If the pipeline has drifted too far from what you want, it’s sometimes faster to reset than to keep patching. Say “Reset the pipeline and let’s start from scratch” and give the agent a cleaner, more specific prompt the second time.
Next steps
Prompting best practices
Write prompts that avoid the problems on this page entirely.
Troubleshooting
Fix pipeline, deployment, and API integration issues by category.
Optimization
Understand how constraints and priority affect ranked results.
Deployment
Flex scale-to-zero and Active always-on endpoints.