Does it support speaker diarization?

Yes. Pyannote-based diarization labels every segment with a speaker ID and runs alongside the ASR pipeline. The diarizer can be swapped without retuning the ASR step.

Can I search across stored transcripts?

Yes. Each diarized, redacted transcript is written to a hybrid full-text plus semantic index. Search by phrase, speaker, time range, or meaning. Export the index to your warehouse if you want to query it alongside other data.

What output formats are supported?

JSON with word-level timestamps and speaker labels, SRT, VTT, and plain text. Custom formats can be added on the postprocess step. Warehouse sinks (BigQuery, Snowflake, S3) are available for batch export.

How many concurrent streams per GPU?

Whisper Large V3 Turbo on an L4 handles roughly 30 concurrent streaming calls at sub-second latency. Batch throughput is the primary axis for this recipe, with a single L4 processing roughly 120x realtime for offline jobs.

RunInfraby RightNow

Dashboard Sign in Get started

Transcribe, diarize, redact on open Whisper.

Long-form audio to searchable transcripts. Speaker labels, PII redaction, exportable on your stack.

Deploy this pipeline Read the stack

Meetings

Calls

Redaction

What you actually own

The optimization knobs, the codebase, the model choice. None of it locked away.

Long-form, not per-minute.

Closed transcription APIs charge per audio minute and stop at the transcript. RunInfra runs diarization, redaction, and search in the same job for one per-million-tokens unit cost.

PII never leaves your stack.

Phone recordings, meetings, and dictations stay on your runtime. Redaction runs before the transcript ever touches storage or search.

Search every call afterward.

Diarized, redacted transcripts get a hybrid full-text and semantic index. Search, audit, and export the corpus without a second vendor.

Three ways to ship long-form transcription

Most teams pick between speed and control. RunInfra keeps both in one workflow.

Deployment comparison for transcription across RunInfra, closed APIs, and DIY self-hosting.
What matters	RunInfraRecommendedFast path with model control and export.	Closed transcription APIsPer-minute, hosted.	DIY self-hostingFull control, heavy operations.
01Launch	Pick model, optimize, deploy Start quickly and keep the production path open.	Call provider endpoint Fast first demo, but the runtime stays rented.	Build serving stack first Infrastructure work comes before product learning.
02Model control	Bring the model ID Keep model choice and serving decisions visible.	Provider catalog You use what the provider exposes.	Your model Full control if your team maintains the runtime.
03Tuning	Measured latency and GPU cost Compare serving choices before deployment.	Opaque Latency and batching stay behind the API.	Manual profiling Your team owns tuning and regressions.
04Export	Managed now, export when needed Use the endpoint first and take the deploy package later.	Locked endpoint You keep calling the provider.	Already owned Export exists because you built everything yourself.
05Operations	Low until you choose to own it Operate managed, then export with the same measured plan.	Low, with lock-in Less infra work, less production control.	High You own infra, failures, upgrades, and serving changes.
06Security	SOC 2 Type 2 Audited controls across access, logging, and incident response.	Varies by vendor Compliance depends on the third party sitting in the audio path.	You build it Your team owns the audit trail, logging, and access controls.

RunInfra

Recommended

Fast path with model control and export.

Launch

Pick model, optimize, deploy

Start quickly and keep the production path open.

Model control

Bring the model ID

Keep model choice and serving decisions visible.

Tuning

Measured latency and GPU cost

Compare serving choices before deployment.

Export

Managed now, export when needed

Use the endpoint first and take the deploy package later.

Operations

Low until you choose to own it

Operate managed, then export with the same measured plan.

Security

SOC 2 Type 2

Audited controls across access, logging, and incident response.

Code you own. Deploy anywhere.

The full recipe ships with you. Codebase, kernels, engine config, weights. Run it anywhere.

Live

Transcribe batch and streaming audio. @Whisper-Large-V3-Turbo on a single @L4. Add pyannote diarization, PII redaction, and a searchable transcript index.

Agent

On it. I'll profile Whisper Turbo on the L4, wire pyannote diarization, run PII redaction on the transcript, then index the diarized output for search.

Profiled Whisper Large V3 Turbo on L4

184ms first chunk, 34x realtime batch

Fused mel-spectrogram with encoder

FlashAttention v2 across all heads

Wired pyannote diarization sidecar

speaker IDs per segment, swappable

Compiled PII redaction filter

names, emails, phones, account IDs

Indexed diarized transcripts for search

hybrid full-text plus semantic, on your stack

Add a custom PII entity to the redactor...

Pipeline modules5

runinfra-transcription/

audio/

evals/

Managed RunInfra

Our GPUs, per-million-tokens billing from L4 to B200.

Your infrastructure

AWS, GCP, RunPod, bare metal. Same Dockerfile, your cluster.

Local workstation

docker compose up. Full pipeline on a single GPU.

Supported HF ASR models work

Supported Whisper-class models on Hugging Face run through the compatible recipe. Search the live catalog above. The examples below are just a starting view.

HF

Whisper Large V3

OpenAI

1.5BASR

Whisper Large V3 Turbo

OpenAI

809MASR

Whisper Medium

OpenAI

769MASR

Whisper Small

OpenAI

244MASR

Distil-Whisper Large V3

Distil-Whisper

756MASR

Canary 1B

NVIDIA

1BASR

Canary 1B Flash

NVIDIA

1BASR

Parakeet TDT 1.1B

NVIDIA

1.1BASR

Moonshine Base

Useful Sensors

61MASR

What RunInfra tunes

Every stage of the pipeline, retuned per model and GPU.

Batch ASR concurrency

Continuous batching across audio shards. 120x realtime on a single L4 for offline jobs.

Whisper kernels

Mel-spectrogram fusion. FlashAttention v2 on the encoder. Same kernels for streaming and batch.

Speaker diarization

Pyannote-based speaker turns. Swap diarizers without retuning the ASR step.

PII redaction

Entity-scoped redaction for names, emails, phones, account IDs, and your custom fields.

Searchable transcript index

Hybrid full-text and semantic search across diarized, redacted segments.

Word and segment timestamps

Per-word time anchors plus speaker-segment ranges. SRT, VTT, JSON, and warehouse exports.

Try this pipeline

Edit the model, engine, or GPU inline. Send to retune the stack in the dashboard.

Common questions

Can't find what you're looking for? Get in touch

How does PII redaction work?

Named entities (person, email, phone, account ID) get masked in the transcript before it is stored or indexed. Custom entity types can be added with a regex or a small classifier. Redaction runs as a sibling step so the original audio is never required downstream.

Deploy your first optimized model, measured before you ship

Describe the goal. RunInfra builds and optimizes the stack.

Start Building View Pricing

End-to-end encryption

Isolated GPU infrastructure

Zero data retention

SOC 2 Type II

RunInfraby RightNow

All systems operational

Backed by

Combinator

AICPA Type II

SOC 2

Ask AI about RunInfra

Part of RightNow

Transcribe, diarize, redact on open Whisper.

Long-form audio to searchable transcripts. Speaker labels, PII redaction, exportable on your stack.

Deploy this pipeline Read the stack

Meetings

Calls

Redaction

What you actually own

The optimization knobs, the codebase, the model choice. None of it locked away.

Long-form, not per-minute.

Closed transcription APIs charge per audio minute and stop at the transcript. RunInfra runs diarization, redaction, and search in the same job for one per-million-tokens unit cost.

PII never leaves your stack.

Phone recordings, meetings, and dictations stay on your runtime. Redaction runs before the transcript ever touches storage or search.

Search every call afterward.

Diarized, redacted transcripts get a hybrid full-text and semantic index. Search, audit, and export the corpus without a second vendor.

Three ways to ship long-form transcription

Most teams pick between speed and control. RunInfra keeps both in one workflow.

Deployment comparison for transcription across RunInfra, closed APIs, and DIY self-hosting.
What matters	RunInfraRecommendedFast path with model control and export.	Closed transcription APIsPer-minute, hosted.	DIY self-hostingFull control, heavy operations.
01Launch	Pick model, optimize, deploy Start quickly and keep the production path open.	Call provider endpoint Fast first demo, but the runtime stays rented.	Build serving stack first Infrastructure work comes before product learning.
02Model control	Bring the model ID Keep model choice and serving decisions visible.	Provider catalog You use what the provider exposes.	Your model Full control if your team maintains the runtime.
03Tuning	Measured latency and GPU cost Compare serving choices before deployment.	Opaque Latency and batching stay behind the API.	Manual profiling Your team owns tuning and regressions.
04Export	Managed now, export when needed Use the endpoint first and take the deploy package later.	Locked endpoint You keep calling the provider.	Already owned Export exists because you built everything yourself.
05Operations	Low until you choose to own it Operate managed, then export with the same measured plan.	Low, with lock-in Less infra work, less production control.	High You own infra, failures, upgrades, and serving changes.
06Security	SOC 2 Type 2 Audited controls across access, logging, and incident response.	Varies by vendor Compliance depends on the third party sitting in the audio path.	You build it Your team owns the audit trail, logging, and access controls.

RunInfra

Recommended

Fast path with model control and export.

Launch

Pick model, optimize, deploy

Start quickly and keep the production path open.

Model control

Bring the model ID

Keep model choice and serving decisions visible.

Tuning

Measured latency and GPU cost

Compare serving choices before deployment.

Export

Managed now, export when needed

Use the endpoint first and take the deploy package later.

Operations

Low until you choose to own it

Operate managed, then export with the same measured plan.

Security

SOC 2 Type 2

Audited controls across access, logging, and incident response.

Code you own. Deploy anywhere.

The full recipe ships with you. Codebase, kernels, engine config, weights. Run it anywhere.

Live

Transcribe batch and streaming audio. @Whisper-Large-V3-Turbo on a single @L4. Add pyannote diarization, PII redaction, and a searchable transcript index.

Agent

On it. I'll profile Whisper Turbo on the L4, wire pyannote diarization, run PII redaction on the transcript, then index the diarized output for search.

Profiled Whisper Large V3 Turbo on L4

184ms first chunk, 34x realtime batch

Fused mel-spectrogram with encoder

FlashAttention v2 across all heads

Wired pyannote diarization sidecar

speaker IDs per segment, swappable

Compiled PII redaction filter

names, emails, phones, account IDs

Indexed diarized transcripts for search

hybrid full-text plus semantic, on your stack

Add a custom PII entity to the redactor...

Pipeline modules5

runinfra-transcription/

audio/

evals/

Managed RunInfra

Our GPUs, per-million-tokens billing from L4 to B200.

Your infrastructure

AWS, GCP, RunPod, bare metal. Same Dockerfile, your cluster.

Local workstation

docker compose up. Full pipeline on a single GPU.

Supported HF ASR models work

Supported Whisper-class models on Hugging Face run through the compatible recipe. Search the live catalog above. The examples below are just a starting view.

HF