whisperX
Faster-whisper + forced alignment + speaker diarization in one pipeline.
Faster-whisper + forced alignment + speaker diarization in one pipeline.
Best for multi-speaker content (podcasts, interviews, meetings) where "who said what" matters. Pricing: free.
What it is
whisperX combines faster-whisper with forced alignment (wav2vec2) for word-accurate timestamps and pyannote for speaker diarization. If your audio has more than one speaker and you care about proper "Speaker 1 / Speaker 2" labeling, this is the open-source default. BSD-2 licensed.
Watch out for: Requires a HuggingFace token to download pyannote diarization models (gated); heavier first-run setup.
Install / use
pip install whisperx
Features
| Speaker diarization | Yes |
| Word-level timestamps | Yes |
| Streaming / real-time | No |
| Languages supported | 99 |
| HIPAA eligible | No |
Links
whisperX vs Whipscribe
| Feature | whisperX | Whipscribe |
|---|---|---|
| Category | Open source | Transcription APIs |
| Pricing | free | free beta |
| Speaker diarization | Yes | Yes |
| Word timestamps | Yes | Yes |
| Streaming | No | No |
| Languages | 99 | 99 |
| Platforms | Linux, macOS, GPU | Web, API, MCP |
Alternatives to whisperX
Frequently asked about whisperX
Does whisperX do speaker diarization?
Yes — that's the headline feature. whisperX integrates pyannote for 'Speaker 1 / Speaker 2' labeling on top of faster-whisper transcription, producing per-word speaker-attributed output.
Why does whisperX need a HuggingFace token?
pyannote's diarization models are gated on HuggingFace — you accept the terms of use once and get a token. whisperX uses that token at download time. No cost, just an acceptance step.
How accurate are whisperX timestamps?
More accurate than vanilla Whisper at the word level. whisperX runs forced alignment (wav2vec2) over the transcript so word boundaries match the audio — good for subtitles, short-form clips, and speaker-attributed transcripts.
What license is whisperX?
BSD-2-Clause. Note that the pyannote models it downloads have their own terms; read those before commercial deployment.
Does whisperX support streaming?
No. whisperX is batch-only. For streaming ASR, look at Deepgram, Vosk, or whisper.cpp's stream example.
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.