RunInfra/Docs
GuideChangelog
Sign inGet started
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog

Prompting Best Practices

How to talk to RunInfra's agent to get the best results.

RunInfra's agent builds your pipeline from natural language. The more specific you are, the better the result. Here's how to write prompts that work.

Include these four things

The best prompts cover:

  1. What you're building - the use case (chatbot, summarizer, translator, code gen)
  2. Which model - name a model or describe what you need ("a fast 7B model")
  3. What matters most - latency, cost, throughput, or quality
  4. Scale - expected traffic ("100 RPM", "internal tool", "production API")

Great prompt

Deploy Mistral 7B as a customer support chatbot. Optimize for latency, 
keep cost under $0.001 per request, and target 500 requests per minute.

The agent knows exactly what to build, which model, what to optimize for, and how to size the infrastructure.

Weak prompt

Make me an AI

Too vague. The agent will ask clarifying questions, which slows things down.

Use @ to mention models

Reference specific models with @:

Optimize @Qwen-2.5-7B with AWQ quantization for a summarization API
Compare @Llama-3.1-8B and @Mistral-7B for code generation

The agent resolves these to Hugging Face model IDs automatically.

Set constraints in natural language

You don't need special syntax. Just say what you need:

Keep latency under 100ms P99
Stay under $500/month total cost
Quality score must be above 0.9
I need at least 1000 requests per minute

The agent translates these into hard constraints that filter optimization results.

Ask for what you want directly

The agent can do more than build pipelines. Just ask:

What you wantWhat to say
Change the model"Switch to Qwen 2.5 14B"
Add caching"Add a response cache"
Change optimization target"Optimize for cost instead"
Run optimization"Optimize now"
Compare versions"Compare version 1 and 2"
Roll back"Go back to version 1"
DeployClick Deploy in the deploy tab
Change GPU"Use an H100 for this"
Export code"Generate deployment code"
Search models"Find a good model for code generation under 10B params"

Iterate, don't restart

You don't need to get everything right in one message. Build incrementally:

User: Build a chatbot with Llama 3.1 8B
Agent: [builds pipeline]

User: Add a cache with 1 hour TTL
Agent: [adds cache node]

User: Actually, switch to Qwen 2.5 7B, it's better for multilingual
Agent: [swaps model]

User: Optimize for latency
Agent: [runs optimization]

User: Deploy as scale-to-zero
Agent: [deploys]

Each message refines the pipeline. The agent remembers the full conversation context.

Let the agent decide when you're unsure

If you don't know which model, GPU, or quantization method to use, ask:

What model would you recommend for a low-cost translation API?
What GPU should I use for a 14B model?
Should I use AWQ or GPTQ for this?

The agent makes recommendations based on your use case, model size, and constraints.

Tips

  • Be specific about performance: "fast" is subjective. "Under 100ms P99" is measurable.
  • Mention the use case: "customer support chatbot" gives the agent context to make better decisions.
  • Don't worry about technical details: The agent handles GPU selection, quantization, serving backend, scaling, and configuration. You focus on what you want, not how to build it.
  • Review before deploying: The agent shows optimization results before deployment. Check the metrics.

How is this guide?

PreviousWelcome to RunInfraNextDebugging Prompts

On this page

Include these four thingsGreat promptWeak promptUse @ to mention modelsSet constraints in natural languageAsk for what you want directlyIterate, don't restartLet the agent decide when you're unsureTips