Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

A transcription pipeline takes audio files of any length and returns a transcript with timestamps, speaker labels, and optional PII redaction. RunInfra ships the recipe with Whisper (large-v3, distil-large-v3, or turbo) for ASR, pyannote-style diarization for speaker labels, and a small classifier for PII redaction.

Architecture

Audio file (mp3 / mp4 / wav / m4a / webm)
  -> Whisper ASR (large-v3 or distil-large-v3, FP8)
  -> Diarization (speaker turns, optional)
  -> PII redaction pass (names, emails, phone numbers, optional)
  -> Transcript with timestamps + speaker labels + redaction markers
Long-form audio is chunked with overlap, transcribed in parallel batches on the GPU, then stitched with timestamp alignment. The whole stack runs on one L40S for files under 90 minutes.

What you get out of the box

  • OpenAI-compatible /v1/audio/transcriptions endpoint (multipart upload)
  • response_format: json, text, srt, vtt, verbose_json
  • Speaker labels via diarization (set diarize=true)
  • PII redaction with replacement tokens (set redact=true)
  • Long-form support: files up to several hours, chunked and stitched

Example prompt

In Pipes:
Build a transcription pipeline for our recorded support calls.
Use Whisper large-v3 with diarization and PII redaction.
Output should be SRT subtitles plus a JSON transcript with speaker labels.

Quick example

from openai import OpenAI

client = OpenAI(base_url="https://api.runinfra.ai/v1", api_key="YOUR_RUNINFRA_API_KEY")

with open("call.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="your-pipeline-id",
        file=f,
        response_format="verbose_json",
        extra_body={"diarize": True, "redact": True},
    )

for segment in transcript.segments:
    print(f"[{segment['speaker']}] {segment['text']}")

Output shape

{
  "text": "Hello, I'm calling about [REDACTED_EMAIL] order...",
  "segments": [
    { "start": 0.0, "end": 2.1, "speaker": "Speaker 1", "text": "Hello, I'm calling about [REDACTED_EMAIL] order..." }
  ],
  "language": "en"
}

Deeper details

See runinfra.ai/use-cases/transcription for the marketing page with per-minute cost math and supported audio formats.