Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Every use case is a starting point you can fork in Pipes. The agent loads the recommended models, the GPU profile, and the optimization recipe that ships with each blueprint, then optimizes against the priority you set (latency, throughput, cost, or quality).

Voice agents

Streaming STT, LLM, and TTS on one open stack. Sub-600ms turn-taking on a single L40S.

AI assistants

Hermes, Llama, Qwen with tool use, policy, and streaming on a single GPU.

Embeddings + rerank

BGE encoders plus a cross-encoder reranker fused on one GPU, one round-trip.

RAG search

Hybrid retrieval, grounded generation, citation spans you can audit.

Document AI

Qwen2.5-VL and Llama 3.2 Vision parsing PDFs, forms, and tables to JSON.

Transcription

Open Whisper with diarization, PII redaction, and long-form export.

How to use these

  1. Open Pipes and start a chat.
  2. Reference the use case by name (“build a voice agent pipeline”) or paste one of the example prompts from the detail page.
  3. The agent loads the canonical model stack, profiles GPUs, runs the optimization recipe, and produces a deployable pipeline.
  4. Review the receipt and deploy. Every parameter (model, quantization, kernel, GPU, batch size, max tokens) is editable.
Each detail page also links to its full marketing page at runinfra.ai/use-cases/<slug>, which carries the latest benchmarks, model list, and pricing math for that workload.

Pick a starting point

I have a clear use case

Jump straight to the matching detail page above.

I want to compare models first

Browse the catalog by modality, license, and parameter count.