A RAG (retrieval-augmented generation) pipeline takes a question, retrieves relevant chunks from your corpus, reranks them, generates a grounded answer with the LLM, and returns the answer plus the exact citation spans used to produce it. RunInfra ships the recipe end-to-end so the citation evidence is auditable, not just a vibes-check.Documentation Index
Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Architecture
What you get out of the box
- Hybrid retrieval (dense + sparse) with configurable weights
- Cross-encoder reranking so the LLM sees genuinely relevant chunks
- Citation spans in the response: which chunk and which character range
- Eval harness hook so you can score against your own gold set
- One HTTP endpoint that does retrieve + rerank + generate end-to-end
Example prompt
In Pipes:Cookbook
For full code that shows ingestion, embedding, retrieval, and generation against the OpenAI-compatible API, see the RAG cookbook.Eval pattern
The RAG agent expects you to bring your own eval set. Three columns are enough:| column | meaning |
|---|---|
question | The user question |
expected_answer | The reference answer for human or LLM-judge scoring |
expected_citations | The chunk ids the model should cite |