whisperX

by Max Bain

Faster-whisper + forced alignment + speaker diarization in one pipeline.

TL;DR

Faster-whisper + forced alignment + speaker diarization in one pipeline.

Best for multi-speaker content (podcasts, interviews, meetings) where "who said what" matters. Pricing: free.

Category
Open source
License
BSD-2-Clause
Stars
★ 21.4k
Last push
2026-04-04
Pricing
free
Platforms
Linux, macOS, GPU

What it is

whisperX combines faster-whisper with forced alignment (wav2vec2) for word-accurate timestamps and pyannote for speaker diarization. If your audio has more than one speaker and you care about proper "Speaker 1 / Speaker 2" labeling, this is the open-source default. BSD-2 licensed.

Best for: Multi-speaker content (podcasts, interviews, meetings) where "who said what" matters.
Watch out for: Requires a HuggingFace token to download pyannote diarization models (gated); heavier first-run setup.

Install / use

pip install whisperx

Features

Speaker diarizationYes
Word-level timestampsYes
Streaming / real-timeNo
Languages supported99
HIPAA eligibleNo

Links

GitHub repo ↗

whisperX vs Whipscribe

FeaturewhisperXWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationYesYes
Word timestampsYesYes
StreamingNoNo
Languages9999
PlatformsLinux, macOS, GPUWeb, API, MCP

Alternatives to whisperX

Frequently asked about whisperX

Does whisperX do speaker diarization?

Yes — that's the headline feature. whisperX integrates pyannote for 'Speaker 1 / Speaker 2' labeling on top of faster-whisper transcription, producing per-word speaker-attributed output.

Why does whisperX need a HuggingFace token?

pyannote's diarization models are gated on HuggingFace — you accept the terms of use once and get a token. whisperX uses that token at download time. No cost, just an acceptance step.

How accurate are whisperX timestamps?

More accurate than vanilla Whisper at the word level. whisperX runs forced alignment (wav2vec2) over the transcript so word boundaries match the audio — good for subtitles, short-form clips, and speaker-attributed transcripts.

What license is whisperX?

BSD-2-Clause. Note that the pyannote models it downloads have their own terms; read those before commercial deployment.

Does whisperX support streaming?

No. whisperX is batch-only. For streaming ASR, look at Deepgram, Vosk, or whisper.cpp's stream example.

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.