OpenAI Whisper
The reference open-source multilingual ASR model from OpenAI.
The reference open-source multilingual ASR model from OpenAI.
Best for research, baseline accuracy, teams that want the canonical reference implementation. Pricing: free.
What it is
Whisper is OpenAI's flagship speech-to-text model, trained on 680,000 hours of multilingual audio. The original Python package is the reference implementation — easy to install, but intentionally simple. For production speed you almost always want a derivative (whisper.cpp, faster-whisper, whisperX) that wraps the same weights in a faster runtime. Free, MIT-licensed, runs on CPU or GPU.
Watch out for: Pure-Python inference is slow on CPU; no built-in speaker diarization; batch-only, no streaming.
Install / use
pip install -U openai-whisper
Features
| Speaker diarization | No |
| Word-level timestamps | Yes |
| Streaming / real-time | No |
| Languages supported | 99 |
| HIPAA eligible | No |
Links
OpenAI Whisper vs Whipscribe
| Feature | OpenAI Whisper | Whipscribe |
|---|---|---|
| Category | Open source | Transcription APIs |
| Pricing | free | free beta |
| Speaker diarization | No | Yes |
| Word timestamps | Yes | Yes |
| Streaming | No | No |
| Languages | 99 | 99 |
| Platforms | Linux, macOS, Windows, GPU | Web, API, MCP |
Alternatives to OpenAI Whisper
Frequently asked about OpenAI Whisper
Is OpenAI Whisper free?
Yes. The openai-whisper package and all released model weights are MIT-licensed, free for commercial and non-commercial use. Inference cost is whatever hardware you run it on.
Does Whisper support speaker diarization?
No. Vanilla Whisper outputs text + segment timestamps but does not label speakers. To get 'who said what,' pair it with a diarization library (e.g. pyannote) or use whisperX, which bundles both.
What's the difference between Whisper and faster-whisper?
Same underlying model weights; different runtime. faster-whisper uses CTranslate2 and is roughly 4x faster on GPU with lower VRAM use. Accuracy is essentially identical. For production, faster-whisper is usually the better choice.
Can Whisper run on CPU?
Yes, but it's slow. Real-time factor on a modern laptop CPU with the large-v3 model is well below 1x. For CPU-bound workloads, whisper.cpp is dramatically faster than the reference Python implementation.
Does Whisper produce word-level timestamps?
The reference implementation has a word_timestamps flag, but timings can drift on long-form audio. For more accurate per-word timing, use whisperX (forced alignment) or stable-ts.
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.