whisper.cpp

by Georgi Gerganov

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

TL;DR

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

Best for offline / on-device transcription, Apple Silicon Metal acceleration, low-RAM targets. Pricing: free.

Category
Open source
License
MIT
Stars
★ 48.8k
Last push
2026-04-20
Pricing
free
Platforms
macOS, Linux, Windows, iOS, Android, Edge

What it is

whisper.cpp is a dependency-free C/C++ port of Whisper. No PyTorch, no CUDA — it runs everywhere and is the fastest Whisper option on Apple Silicon thanks to the Metal backend. The project is also the upstream of the popular llama.cpp approach. Perfect when you need privacy-preserving, offline transcription on consumer hardware.

Best for: Offline / on-device transcription, Apple Silicon Metal acceleration, low-RAM targets.
Watch out for: No speaker diarization out of the box; model management is manual; diarization needs external pyannote.

Install / use

Features

Speaker diarizationNo
Word-level timestampsYes
Streaming / real-timeYes
Languages supported99
HIPAA eligibleNo

Links

GitHub repo ↗

whisper.cpp vs Whipscribe

Featurewhisper.cppWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsYesYes
StreamingYesNo
Languages9999
PlatformsmacOS, Linux, Windows, iOS, Android, EdgeWeb, API, MCP

Alternatives to whisper.cpp

Frequently asked about whisper.cpp

Does whisper.cpp work on Apple Silicon?

Yes — whisper.cpp is one of the fastest Whisper options on M-series Macs thanks to its Metal backend. Build with the Metal flag enabled and the model runs on the GPU without PyTorch or CUDA.

Do I need a GPU to use whisper.cpp?

No. whisper.cpp is CPU-first and runs on laptops, Raspberry Pis, and phones. On Apple Silicon it also uses Metal; on Nvidia it can use cuBLAS; on x86 it uses AVX/AVX2. A GPU helps but isn't required.

Does whisper.cpp support diarization?

Not out of the box. It outputs text + segment timestamps only. For speaker labels, feed the audio through pyannote separately or use whisperX, which bundles diarization with a similar runtime core.

How do I download the model files?

The repo includes a models/download-ggml-model.sh script. Pick a size (tiny/base/small/medium/large-v3) based on your RAM/CPU budget. Larger models = better accuracy, more memory.

Does whisper.cpp support streaming?

Yes. The stream example in the repo shows live microphone transcription. Latency depends on model size and hardware.

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.