whisper.cpp

Name: whisper.cpp
Author: Georgi Gerganov

by Georgi Gerganov

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

TL;DR

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

Best for offline / on-device transcription, Apple Silicon Metal acceleration, low-RAM targets. Pricing: free.

What it is

whisper.cpp is a dependency-free C/C++ port of Whisper. No PyTorch, no CUDA — it runs everywhere and is the fastest Whisper option on Apple Silicon thanks to the Metal backend. The project is also the upstream of the popular llama.cpp approach. Perfect when you need privacy-preserving, offline transcription on consumer hardware.

Best for: Offline / on-device transcription, Apple Silicon Metal acceleration, low-RAM targets.
Watch out for: No speaker diarization out of the box; model management is manual; diarization needs external pyannote.

Install / use

git clone https://github.com/ggerganov/whisper.cpp && make

Features

Speaker diarization	No
Word-level timestamps	Yes
Streaming / real-time	Yes
Languages supported	99
HIPAA eligible	No

Links

GitHub repo ↗

whisper.cpp vs Whipscribe

Feature	whisper.cpp	Whipscribe
Category	Open source	Transcription APIs
Pricing	free	free beta
Speaker diarization	No	Yes
Word timestamps	Yes	Yes
Streaming	Yes	No
Languages	99	99
Platforms	macOS, Linux, Windows, iOS, Android, Edge	Web, API, MCP

Alternatives to whisper.cpp

OpenAI Whisper

OpenAI

The reference open-source multilingual ASR model from OpenAI.

OSS · MIT ★ 98.1k

faster-whisper

SYSTRAN

4× faster than reference Whisper using CTranslate2 — production sweet spot.

OSS · MIT ★ 22.3k

whisperX

Max Bain

Faster-whisper + forced alignment + speaker diarization in one pipeline.

OSS · BSD‑2‑Clause ★ 21.4k

Frequently asked about whisper.cpp

Does whisper.cpp work on Apple Silicon?

Yes — whisper.cpp is one of the fastest Whisper options on M-series Macs thanks to its Metal backend. Build with the Metal flag enabled and the model runs on the GPU without PyTorch or CUDA.

Do I need a GPU to use whisper.cpp?

No. whisper.cpp is CPU-first and runs on laptops, Raspberry Pis, and phones. On Apple Silicon it also uses Metal; on Nvidia it can use cuBLAS; on x86 it uses AVX/AVX2. A GPU helps but isn't required.

Does whisper.cpp support diarization?

Not out of the box. It outputs text + segment timestamps only. For speaker labels, feed the audio through pyannote separately or use whisperX, which bundles diarization with a similar runtime core.

How do I download the model files?

The repo includes a models/download-ggml-model.sh script. Pick a size (tiny/base/small/medium/large-v3) based on your RAM/CPU budget. Larger models = better accuracy, more memory.

Does whisper.cpp support streaming?

Yes. The stream example in the repo shows live microphone transcription. Latency depends on model size and hardware.

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.