Quickstart
Your first AI inference pipeline in 5 minutes.
Create an account
Sign up at runinfra.ai/sign-up with GitHub or Google. Free plan, no credit card.
Describe what you need
Open Pipes and type what you want:
I need a fast chatbot using Llama 3.1 8B optimized for low latencyThe agent builds your pipeline, selects the model, and configures everything automatically.
Want changes? Just say so:
Add a response cache and switch to Qwen 2.5 7B insteadUse your endpoint
Your endpoint is OpenAI-compatible. Use any OpenAI SDK:
from openai import OpenAI
client = OpenAI(
base_url="https://api.runinfra.ai/v1/YOUR_PIPELINE_ID",
api_key="ri_your_api_key",
)
response = client.chat.completions.create(
model="default",
messages=[{"role": "user", "content": "What is RunInfra?"}],
)
print(response.choices[0].message.content)Next steps
Prompting Guide
Read moreWrite better prompts, get better pipelines.
Example Prompts
Read moreSee real conversations for chatbots, summarizers, and more.
Deployment
Read moreFlex vs Active, scaling, and more.
How is this guide?