Use this file to discover all available pages before exploring further.
The best way to learn what RunInfra’s agent can do is to see real prompts in action. Every example below is something you can type directly into the agent. Each one is followed by an explanation of what the agent builds and why, so you can adapt the pattern to your own use case.
Deploy Llama 3.1 8B as a customer support chatbot.Optimize for latency, P99 under 200ms.
What the agent does: Creates a pipeline with Llama 3.1 8B, profiles on L4/L40S GPUs, searches for an optimized AWQ 4-bit variant, benchmarks it, finds the fastest configuration, and deploys as a scale-to-zero endpoint.
I need a summarization API using Qwen 2.5 14B.Optimize for cost, max $0.003 per request.Add a response cache for repeated documents.
What the agent does: Builds a pipeline with Qwen 2.5 14B, a response cache node, and cost-priority optimization. Searches for optimized model variants and picks the cheapest that meets your constraints.
Build a pipeline with two models: Phi-3 Mini for simple questionsand Llama 3.1 70B for complex reasoning. Route based on querycomplexity. Budget is $300/month.
What the agent does: Creates a router that analyzes query complexity, routes simple queries to the cheap small model and complex ones to the large model. Optimizes both models and estimates monthly cost to fit your budget.
Deploy DeepSeek Coder V2 optimized for throughput.I need to handle 1000 RPM for our CI pipeline.
What the agent does: Profiles DeepSeek Coder V2, optimizes for throughput priority, configures scaling to handle 1000 RPM, and deploys with an appropriate replica count.
Cheapest possible chatbot for an internal FAQ tool.Under 50 requests per day. Doesn't need to be fast.
What the agent does: Recommends a small model (Phi-3 Mini or Qwen 2.5 3B), finds an optimized AWQ 4-bit variant, deploys on the cheapest GPU tier, and configures scale-to-zero to minimize cost during idle periods.
I need a translation endpoint that handles English, Spanish, French,German, and Japanese. Use @Qwen-2.5-7B since it's good at multilingual.Optimize for quality.
What the agent does: Builds a Qwen 2.5 7B pipeline with quality-priority optimization. Searches for FP8 and high-quality optimized variants to preserve multilingual accuracy, with a quality score threshold above 0.95.
Set up Mistral Small 22B for batch document processing.I'll send 10,000 documents per day. Optimize for throughputand keep total cost under $50/day.
What the agent does: Configures Mistral Small 22B with throughput priority, calculates the GPU tier and replica count needed for 10K docs/day within your $50 budget, and finds an optimized variant to reduce per-request cost.
I want the absolute fastest inference for Llama 3.1 70B.Use an H100 with TensorRT-LLM. Cost doesn't matter.
What the agent does: Configures Llama 3.1 70B on H100 with TensorRT-LLM backend, finds an FP8 variant (native on H100), and enables speculative decoding. Deploys as always-on for zero cold start.
If you don’t know which model to use, describe what you need and let the agent decide.
I'm building a chatbot for recipe recommendations.What model would you suggest? I want it cheap and fast.
What the agent does: Recommends a small, cost-effective model (likely Phi-3 Mini or Qwen 2.5 3B based on the simple use case), explains the reasoning, and offers to build the pipeline once you confirm.
You don’t have to know anything about model sizes or GPU types to get started. Describing your use case and what matters most (cost, speed, quality) is enough for the agent to make a solid recommendation.
After the agent builds something, keep iterating. The agent remembers the full conversation and updates the pipeline with each message.
The latency is too high, can you try a different GPU?
Switch from AWQ to FP8 and re-optimize
Add a guardrail to filter harmful content
Compare this version with the previous one
Each of these messages updates the pipeline without starting over. You can compare any two versions side by side to see exactly what changed and which one performs better.