Every use case is a starting point you can fork in Pipes. The agent loads the recommended models, the GPU profile, and the optimization recipe that ships with each blueprint, then optimizes against the priority you set (latency, throughput, cost, or quality).Documentation Index
Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Voice agents
Streaming STT, LLM, and TTS on one open stack. Sub-600ms turn-taking on a single L40S.
AI assistants
Hermes, Llama, Qwen with tool use, policy, and streaming on a single GPU.
Embeddings + rerank
BGE encoders plus a cross-encoder reranker fused on one GPU, one round-trip.
RAG search
Hybrid retrieval, grounded generation, citation spans you can audit.
Document AI
Qwen2.5-VL and Llama 3.2 Vision parsing PDFs, forms, and tables to JSON.
Transcription
Open Whisper with diarization, PII redaction, and long-form export.
How to use these
- Open Pipes and start a chat.
- Reference the use case by name (“build a voice agent pipeline”) or paste one of the example prompts from the detail page.
- The agent loads the canonical model stack, profiles GPUs, runs the optimization recipe, and produces a deployable pipeline.
- Review the receipt and deploy. Every parameter (model, quantization, kernel, GPU, batch size, max tokens) is editable.
runinfra.ai/use-cases/<slug>, which carries the latest benchmarks, model list, and pricing math for that workload.
Pick a starting point
I have a clear use case
Jump straight to the matching detail page above.
I want to compare models first
Browse the catalog by modality, license, and parameter count.