Build and deploy your first AI inference endpoint. Create an account, describe what you need in plain English, run optimization, deploy to a live URL, and make your first API call. About five minutes, start to finish.Documentation Index
Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Free to start
Starter plan is free forever. No credit card required.
OpenAI-compatible
Works with every OpenAI SDK and LangChain/LlamaIndex.
Under 2s cold start
Cached weights keep even scale-to-zero endpoints fast.
You can build, optimize, and test pipelines in the playground on the free Starter plan. Deploying to a live endpoint requires Pro ($49/mo).
Create an account
Sign up at runinfra.ai/sign-up using GitHub or Google. No credit card is required to get started.
Describe your pipeline
Open Pipes and type what you need in plain English:The agent builds your pipeline, selects the model, and configures everything automatically. To make changes, just continue the conversation:
Optimize
The agent benchmarks your model across GPUs, searches for optimized variants (AWQ, GPTQ, FP8), and identifies the best configuration. You see real-time progress as experiments complete.You can set specific targets before optimization starts:Optimization takes 2-5 minutes. When it finishes, review the results and select the configuration you want to deploy.
Deploy your endpoint
Click Deploy in the deploy tab. RunInfra provisions a GPU endpoint with scale-to-zero and fast cold starts (under 2 seconds). Your endpoint URL and API key appear as soon as the endpoint is ready.
Use your endpoint
Your endpoint is OpenAI-compatible. Use your RunInfra API key and the RunInfra base URL with any OpenAI SDK or HTTP client.Replace
YOUR_RUNINFRA_API_KEY with your actual API key from Settings > API Keys.Next steps
Use cases
Six pre-built workflows: voice, assistants, embeddings, RAG, document AI, transcription.
Prompting best practices
Write better prompts and get better pipelines from the agent.
Deployment
Flex vs Active endpoints, scaling, and cold-start configuration.
API reference
The full OpenAI-compatible HTTP API surface.