Can I provide my own schema?

Yes. Bring a JSON schema and the model is prompted to extract fields that conform. Validation runs on every output.

How does it handle handwriting?

Qwen2.5-VL and Llama 3.2 Vision both handle handwritten text reasonably. For specialized handwriting, fine-tune on your samples.

RunInfraby RightNow

Dashboard Sign in Get started

Document AI with open vision models.

Open vision-language models parsing PDFs, forms, and tables to JSON. Per-page billed becomes per-million-tokens.

Deploy this pipeline Read the stack

Invoices

Contracts

Forms

Tables

What you actually own

The optimization knobs, the codebase, the model choice. None of it locked away.

Per-million-tokens, not per page.

Closed document AI charges per page. RunInfra bills compute on your GPU. Long documents become cheap.

Your documents never leave.

Contracts, invoices, and forms stay on your runtime. No third-party retention windows.

Tune the schema on your real forms.

Bring your schemas and ground-truth labels. The model fine-tunes on your specific layouts.

Three ways to ship document AI

Most teams pick between speed and control. RunInfra keeps both in one workflow.

Deployment comparison for document ai across RunInfra, closed APIs, and DIY self-hosting.
What matters	RunInfraRecommendedFast path with model control and export.	Closed extraction APIsPer-page, hosted.	DIY self-hostingFull control, heavy operations.
01Launch	Pick model, optimize, deploy Start quickly and keep the production path open.	Call provider endpoint Fast first demo, but the runtime stays rented.	Build serving stack first Infrastructure work comes before product learning.
02Model control	Bring the model ID Keep model choice and serving decisions visible.	Provider catalog You use what the provider exposes.	Your model Full control if your team maintains the runtime.
03Tuning	Measured latency and GPU cost Compare serving choices before deployment.	Opaque Latency and batching stay behind the API.	Manual profiling Your team owns tuning and regressions.
04Export	Managed now, export when needed Use the endpoint first and take the deploy package later.	Locked endpoint You keep calling the provider.	Already owned Export exists because you built everything yourself.
05Operations	Low until you choose to own it Operate managed, then export with the same measured plan.	Low, with lock-in Less infra work, less production control.	High You own infra, failures, upgrades, and serving changes.
06Security	SOC 2 Type 2 Audited controls across access, logging, and incident response.	Varies by vendor Compliance depends on the third party sitting in the request path.	You build it Your team owns the audit trail, logging, and access controls.

RunInfra

Recommended

Fast path with model control and export.

Launch

Pick model, optimize, deploy

Start quickly and keep the production path open.

Model control

Bring the model ID

Keep model choice and serving decisions visible.

Tuning

Measured latency and GPU cost

Compare serving choices before deployment.

Export

Managed now, export when needed

Use the endpoint first and take the deploy package later.

Operations

Low until you choose to own it

Operate managed, then export with the same measured plan.

Security

SOC 2 Type 2

Audited controls across access, logging, and incident response.

Code you own. Deploy anywhere.

The full recipe ships with you. Codebase, kernels, engine config, weights. Run it anywhere.

Live

Parse invoices and contracts to JSON. @Qwen2.5-VL-7B with our extraction schema on a single @L40S. Validate every field against the schema.

Agent

On it. I'll profile Qwen2.5-VL on the L40S, compile the schema into a constrained-decoding grammar, then benchmark a 500-page extraction harness.

Profiled Qwen2.5-VL 7B on L40S

0.4s/page, FP16 vision encoder

Tiled patch embeddings

fused with the LLM step

Tuned AWQ INT4 on the LLM head

FP16 stays on the vision encoder

Compiled schema as constrained decoding

xgrammar backend, zero schema violations

Ran reference 500-page harness

$0.018/page, schema validation green

Add a new field to the extraction schema...

Schemas3

runinfra-document-ai/

samples/

evals/

Managed RunInfra

Our GPUs, per-million-tokens billing from L4 to B200.

Your infrastructure

AWS, GCP, RunPod, bare metal. Same Dockerfile, your cluster.

Local workstation

docker compose up. Full pipeline on a single GPU.

Supported HF vision-language models work

Supported vision-language models on Hugging Face run through the compatible recipe. Search the live catalog above. The examples below are just a starting view.

HF

Qwen2.5-VL 7B

Alibaba

7BVision

Qwen2.5-VL 72B

Alibaba

72BVision

Llama 3.2 11B Vision

Meta

11BVision

Llama 3.2 90B Vision

Meta

90BVision

Pixtral 12B

Mistral AI

12BVision

Phi-3.5 Vision

Microsoft

4.2BVision

InternVL2.5 8B

Shanghai AI Lab

8BVision

InternVL2.5 26B

Shanghai AI Lab

26BVision

DeepSeek-VL2

DeepSeek

27B MoEVision

What RunInfra tunes

Every stage of the pipeline, retuned per model and GPU.

Vision encoder fusion

Tiled patch embeddings fused with the LLM step. Page-shape preserved.

Schema-driven extraction

Output constrained to your JSON schema. Field-level confidence scores.

VLM quantization

AWQ INT4 on the LLM head, FP16 preserved on the vision encoder.

Page-aware batching

Variable-page batches with PagedAttention. No padding waste on long contracts.

Multi-page reasoning

KV cache shared across pages so the model sees the full document.

Pipeline scheduling

Render, extract, validate interleaved per page. Backpressure on overload.

Try this pipeline

Edit the model, engine, or GPU inline. Send to retune the stack in the dashboard.

Common questions

Can't find what you're looking for? Get in touch

What document types are supported?

PDFs, scans, photos, and screenshots. Multi-page documents are processed page by page with consistent output.

Deploy your first optimized model, measured before you ship

Describe the goal. RunInfra builds and optimizes the stack.

Start Building View Pricing

End-to-end encryption

Isolated GPU infrastructure

Zero data retention

SOC 2 Type II