OpenAI Whisper

by OpenAI

The reference open-source multilingual ASR model from OpenAI.

TL;DR

The reference open-source multilingual ASR model from OpenAI.

Best for research, baseline accuracy, teams that want the canonical reference implementation. Pricing: free.

Category
Open source
License
MIT
Stars
★ 98.1k
Last push
2026-04-15
Pricing
free
Platforms
Linux, macOS, Windows, GPU

What it is

Whisper is OpenAI's flagship speech-to-text model, trained on 680,000 hours of multilingual audio. The original Python package is the reference implementation — easy to install, but intentionally simple. For production speed you almost always want a derivative (whisper.cpp, faster-whisper, whisperX) that wraps the same weights in a faster runtime. Free, MIT-licensed, runs on CPU or GPU.

Best for: Research, baseline accuracy, teams that want the canonical reference implementation.
Watch out for: Pure-Python inference is slow on CPU; no built-in speaker diarization; batch-only, no streaming.

Install / use

pip install -U openai-whisper

Features

Speaker diarizationNo
Word-level timestampsYes
Streaming / real-timeNo
Languages supported99
HIPAA eligibleNo

Links

GitHub repo ↗

OpenAI Whisper vs Whipscribe

FeatureOpenAI WhisperWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsYesYes
StreamingNoNo
Languages9999
PlatformsLinux, macOS, Windows, GPUWeb, API, MCP

Alternatives to OpenAI Whisper

Frequently asked about OpenAI Whisper

Is OpenAI Whisper free?

Yes. The openai-whisper package and all released model weights are MIT-licensed, free for commercial and non-commercial use. Inference cost is whatever hardware you run it on.

Does Whisper support speaker diarization?

No. Vanilla Whisper outputs text + segment timestamps but does not label speakers. To get 'who said what,' pair it with a diarization library (e.g. pyannote) or use whisperX, which bundles both.

What's the difference between Whisper and faster-whisper?

Same underlying model weights; different runtime. faster-whisper uses CTranslate2 and is roughly 4x faster on GPU with lower VRAM use. Accuracy is essentially identical. For production, faster-whisper is usually the better choice.

Can Whisper run on CPU?

Yes, but it's slow. Real-time factor on a modern laptop CPU with the large-v3 model is well below 1x. For CPU-bound workloads, whisper.cpp is dramatically faster than the reference Python implementation.

Does Whisper produce word-level timestamps?

The reference implementation has a word_timestamps flag, but timings can drift on long-form audio. For more accurate per-word timing, use whisperX (forced alignment) or stable-ts.

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.