Use cases - RunInfra

Every use case is a starting point you can fork in the dashboard. The agent loads the recommended models, the GPU profile, and the optimization recipe that ships with each blueprint, then optimizes against the priority you set (latency, throughput, cost, or quality).

Voice agents

Streaming STT, LLM, and TTS on one open stack. Sub-600ms turn-taking on a single L40S.

AI assistants

Hermes, Llama, Qwen with tool use, policy, and streaming on a single GPU.

Embeddings + rerank

BGE encoders plus a cross-encoder reranker fused on one GPU, one round-trip.

RAG search

Hybrid retrieval, grounded generation, citation spans you can audit.

Document AI

Qwen2.5-VL and Llama 3.2 Vision parsing PDFs, forms, and tables to JSON.

Transcription

Open Whisper speech-to-text on an OpenAI-compatible endpoint, with long-form srt/vtt export.

How to use these

Open the dashboard and start a chat.
Reference the use case by name (“build a voice agent pipeline”) or paste one of the example prompts from the detail page.
The agent loads the canonical model stack, profiles GPUs, runs the optimization recipe, and produces a deployable pipeline.
Review the receipt and deploy. Every parameter (model, quantization, kernel, GPU, batch size, max tokens) is editable.

Each detail page also links to its full marketing page at runinfra.ai/use-cases/<slug>, which carries the latest benchmarks, model list, and pricing math for that workload.

Pick a starting point

I have a clear use case

Jump straight to the matching detail page above.

I want to compare models first

Browse the catalog by modality, license, and parameter count.

Monitoring

Voice agents

⌘I

Voice agents

AI assistants

Embeddings + rerank

RAG search

Document AI

Transcription

​How to use these

​Pick a starting point

I have a clear use case

I want to compare models first

How to use these

Pick a starting point