Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

A document AI pipeline takes a PDF, image, or scanned form and returns structured JSON: extracted fields, table cells, line items, signatures, layout. RunInfra ships the recipe with Qwen2.5-VL and Llama 3.2 Vision as the canonical open vision-language models, so you stop paying per-page invoices and start paying per million tokens against a model you own.

Architecture

Document (PDF / image / form)
  -> Page renderer (PDF -> image tiles at the right DPI)
  -> Vision-language model (Qwen2.5-VL 7B or Llama 3.2 11B Vision, FP8)
  -> JSON schema enforcement (structured output)
  -> Validated record
The vision encoder is fused with the LLM at the engine level (vLLM with multimodal support), so a single forward pass processes images + the schema prompt together.

What you get out of the box

  • Schema-driven output: pass a JSON schema, get a validated object back
  • PDF + image input via base64 or data_url references
  • Multi-page batching with one request per document, paged internally
  • OpenAI-compatible chat completions endpoint with response_format
  • Common formats supported: receipts, invoices, forms, ID cards, tables, contracts

Example prompt

In Pipes:
Build a document AI pipeline that extracts invoice data into structured JSON.
Use Qwen2.5-VL 7B. The schema needs: vendor name, invoice number, date,
line items (description, qty, unit price, total), subtotal, tax, total.

Quick example

from openai import OpenAI
import base64

client = OpenAI(base_url="https://api.runinfra.ai/v1", api_key="YOUR_RUNINFRA_API_KEY")

with open("invoice.pdf", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

resp = client.chat.completions.create(
    model="your-pipeline-id",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": f"data:application/pdf;base64,{b64}"}},
            {"type": "text", "text": "Extract this invoice."},
        ],
    }],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "invoice",
            "schema": {
                "type": "object",
                "properties": {
                    "vendor": {"type": "string"},
                    "invoice_number": {"type": "string"},
                    "total": {"type": "number"},
                },
                "required": ["vendor", "invoice_number", "total"],
            },
        },
    },
)

Deeper details

See runinfra.ai/use-cases/document-ai for the marketing page with per-document cost math and supported model list.