RunInfra/Docs
GuideChangelog
Sign inGet started
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog

From Idea to Pipeline

The full workflow from concept to production endpoint.

Here's the complete journey from "I have an idea" to "my API is live in production."

1. Start with the use case

Before opening RunInfra, know what you're building:

  • What task? Chat, summarization, translation, code generation, Q&A, classification
  • Who uses it? End users (needs low latency), internal tools (cost matters), batch jobs (throughput matters)
  • How much traffic? 10 requests/day or 10,000 requests/minute

You don't need to know which model, GPU, or quantization method to use. That's RunInfra's job.

2. Describe it in chat

Open Pipes and write a single prompt that covers the use case:

I'm building a customer FAQ chatbot for our e-commerce site. 
Needs to handle 200 requests per minute. Keep latency under 150ms. 
Budget is $300/month.

The agent builds the pipeline, picks a model, and asks any clarifying questions.

3. Refine through conversation

Don't try to perfect it in one prompt. Iterate:

Actually, make it multilingual - we have Spanish and French customers too.
Add a response cache for common questions.
What model do you recommend for this?

4. Optimize

When the pipeline looks good:

Optimize for latency

Watch the agent profile GPUs, search for optimized model variants, and rank results. This takes 2-5 minutes.

Review the results. If they don't meet your needs:

The cost is too high. Can you try a smaller model?
Try optimizing for cost instead of latency.

5. Test in the playground

Before deploying, send test prompts through the playground. Check:

  • Does the output quality match your expectations?
  • Is the latency acceptable?
  • Do edge cases (empty input, very long prompts) work?

6. Deploy

Deploy this

Or click Deploy in the deploy tab. Choose Flex (scale-to-zero) for most cases. Your endpoint URL and API key appear in 1-3 minutes.

7. Integrate

Drop the endpoint URL and API key into your app. It's OpenAI-compatible:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runinfra.ai/v1/YOUR_PIPELINE_ID",
    api_key="ri_your_api_key",
)

8. Monitor and iterate

Check the Observe dashboard for:

  • Are latency numbers matching what you expected?
  • Any errors?
  • What's the actual cost?

If something needs adjustment, go back to the chat and ask the agent. You can re-optimize, switch GPUs, or change models at any time.

How is this guide?

PreviousExample PromptsNextTroubleshooting

On this page

1. Start with the use case2. Describe it in chat3. Refine through conversation4. Optimize5. Test in the playground6. Deploy7. Integrate8. Monitor and iterate