whisper.cpp
C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.
C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.
Best for offline / on-device transcription, Apple Silicon Metal acceleration, low-RAM targets. Pricing: free.
What it is
whisper.cpp is a dependency-free C/C++ port of Whisper. No PyTorch, no CUDA — it runs everywhere and is the fastest Whisper option on Apple Silicon thanks to the Metal backend. The project is also the upstream of the popular llama.cpp approach. Perfect when you need privacy-preserving, offline transcription on consumer hardware.
Watch out for: No speaker diarization out of the box; model management is manual; diarization needs external pyannote.
Install / use
git clone https://github.com/ggerganov/whisper.cpp && make
Features
| Speaker diarization | No |
| Word-level timestamps | Yes |
| Streaming / real-time | Yes |
| Languages supported | 99 |
| HIPAA eligible | No |
Links
whisper.cpp vs Whipscribe
| Feature | whisper.cpp | Whipscribe |
|---|---|---|
| Category | Open source | Transcription APIs |
| Pricing | free | free beta |
| Speaker diarization | No | Yes |
| Word timestamps | Yes | Yes |
| Streaming | Yes | No |
| Languages | 99 | 99 |
| Platforms | macOS, Linux, Windows, iOS, Android, Edge | Web, API, MCP |
Alternatives to whisper.cpp
Frequently asked about whisper.cpp
Does whisper.cpp work on Apple Silicon?
Yes — whisper.cpp is one of the fastest Whisper options on M-series Macs thanks to its Metal backend. Build with the Metal flag enabled and the model runs on the GPU without PyTorch or CUDA.
Do I need a GPU to use whisper.cpp?
No. whisper.cpp is CPU-first and runs on laptops, Raspberry Pis, and phones. On Apple Silicon it also uses Metal; on Nvidia it can use cuBLAS; on x86 it uses AVX/AVX2. A GPU helps but isn't required.
Does whisper.cpp support diarization?
Not out of the box. It outputs text + segment timestamps only. For speaker labels, feed the audio through pyannote separately or use whisperX, which bundles diarization with a similar runtime core.
How do I download the model files?
The repo includes a models/download-ggml-model.sh script. Pick a size (tiny/base/small/medium/large-v3) based on your RAM/CPU budget. Larger models = better accuracy, more memory.
Does whisper.cpp support streaming?
Yes. The stream example in the repo shows live microphone transcription. Latency depends on model size and hardware.
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.