OpenAI Whisper

Name: OpenAI Whisper
Author: OpenAI

by OpenAI

The reference open-source multilingual ASR model from OpenAI.

TL;DR

The reference open-source multilingual ASR model from OpenAI.

Best for research, baseline accuracy, teams that want the canonical reference implementation. Pricing: free.

What it is

Whisper is OpenAI's flagship speech-to-text model, trained on 680,000 hours of multilingual audio. The original Python package is the reference implementation — easy to install, but intentionally simple. For production speed you almost always want a derivative (whisper.cpp, faster-whisper, whisperX) that wraps the same weights in a faster runtime. Free, MIT-licensed, runs on CPU or GPU.

Best for: Research, baseline accuracy, teams that want the canonical reference implementation.
Watch out for: Pure-Python inference is slow on CPU; no built-in speaker diarization; batch-only, no streaming.

Install / use

pip install -U openai-whisper

Features

Speaker diarization	No
Word-level timestamps	Yes
Streaming / real-time	No
Languages supported	99
HIPAA eligible	No

Links

GitHub repo ↗

OpenAI Whisper vs Whipscribe

Feature	OpenAI Whisper	Whipscribe
Category	Open source	Transcription APIs
Pricing	free	free beta
Speaker diarization	No	Yes
Word timestamps	Yes	Yes
Streaming	No	No
Languages	99	99
Platforms	Linux, macOS, Windows, GPU	Web, API, MCP

Alternatives to OpenAI Whisper

whisper.cpp

Georgi Gerganov

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

OSS · MIT ★ 48.8k

faster-whisper

SYSTRAN

4× faster than reference Whisper using CTranslate2 — production sweet spot.

OSS · MIT ★ 22.3k

whisperX

Max Bain

Faster-whisper + forced alignment + speaker diarization in one pipeline.

OSS · BSD‑2‑Clause ★ 21.4k

Frequently asked about OpenAI Whisper

Is OpenAI Whisper free?

Yes. The openai-whisper package and all released model weights are MIT-licensed, free for commercial and non-commercial use. Inference cost is whatever hardware you run it on.

Does Whisper support speaker diarization?

No. Vanilla Whisper outputs text + segment timestamps but does not label speakers. To get 'who said what,' pair it with a diarization library (e.g. pyannote) or use whisperX, which bundles both.

What's the difference between Whisper and faster-whisper?

Same underlying model weights; different runtime. faster-whisper uses CTranslate2 and is roughly 4x faster on GPU with lower VRAM use. Accuracy is essentially identical. For production, faster-whisper is usually the better choice.

Can Whisper run on CPU?

Yes, but it's slow. Real-time factor on a modern laptop CPU with the large-v3 model is well below 1x. For CPU-bound workloads, whisper.cpp is dramatically faster than the reference Python implementation.

Does Whisper produce word-level timestamps?

The reference implementation has a word_timestamps flag, but timings can drift on long-form audio. For more accurate per-word timing, use whisperX (forced alignment) or stable-ts.

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.