RunInfra/Docs
GuideChangelog
Sign inGet started
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog

Quickstart

Your first AI inference pipeline in 5 minutes.

Create an account

Sign up at runinfra.ai/sign-up with GitHub or Google. Free plan, no credit card.

Describe what you need

Open Pipes and type what you want:

I need a fast chatbot using Llama 3.1 8B optimized for low latency

The agent builds your pipeline, selects the model, and configures everything automatically.

Want changes? Just say so:

Add a response cache and switch to Qwen 2.5 7B instead

Optimize

The agent benchmarks your model across GPUs, searches for optimized variants, and finds the best configuration. You see real-time progress as experiments complete.

Set specific targets if you want:

Optimize for latency, keep cost under $0.10 per request

Optimization takes 2-5 minutes.

Deploy

Deployment requires Pro plan ($99/mo). The free plan lets you build, optimize, and test in the playground.

Click Deploy in the deploy tab. RunInfra provisions a GPU endpoint with scale-to-zero and fast cold starts (under 2 seconds). Your endpoint URL and API key appear when ready.

Use your endpoint

Your endpoint is OpenAI-compatible. Use any OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.runinfra.ai/v1/YOUR_PIPELINE_ID",
    api_key="ri_your_api_key",
)

response = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "What is RunInfra?"}],
)

print(response.choices[0].message.content)

Next steps

Prompting Guide

Write better prompts, get better pipelines.

Read more
Example Prompts

See real conversations for chatbots, summarizers, and more.

Read more
Deployment

Flex vs Active, scaling, and more.

Read more

How is this guide?

PreviousPlans and PricingNextWelcome to RunInfra

On this page

Create an accountDescribe what you needOptimizeDeployUse your endpointNext steps