Transcription tools directory
Every transcription service we track — open-source engines, desktop apps, APIs, and products — grouped by category. Click a heading to expand or collapse. Live GitHub stats, features matrix, honest current pricing. Curated by Whipscribe; updated 2026-05-07.
Updated 2026-05-07 · 26 tools trackedOpen source
The reference open-source multilingual ASR model from OpenAI.
C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.
4× faster than reference Whisper using CTranslate2 — production sweet spot.
Faster-whisper + forced alignment + speaker diarization in one pipeline.
CLI that transcribes 150 minutes of audio in ~98 seconds on an A100.
Whisper with stabilised timestamps — more accurate word-level timing.
Swift Whisper for Apple Silicon — CoreML, ANE, Metal. Now part of the Argmax Open-Source SDK (v1.0.0, May 2026) alongside SpeakerKit + TTSKit.
Distilled Whisper: 6× faster, 49% smaller, within 1% WER of the teacher.
Meta's speech-to-text + speech-to-speech + text-to-speech model, 100 languages.
Lightweight offline speech recognition for 20+ languages, runs on a Raspberry Pi.
Cross-platform desktop app for Whisper — open-source MacWhisper alternative.
Transcription APIs
Hosted Whisper large-v3 from OpenAI — $0.006 per minute.
Universal-2 model + diarization, PII redaction, topic detection, summarization.
Nova-2 model, excellent streaming, strong at conversational audio.
The API spin-off of Rev — strong English accuracy, topic detection, custom vocab.
Whisper-based API with diarization, 99-language coverage, pay-per-minute.
Enterprise ASR with strong accents and on-prem deployment options.
Hosted faster-whisper + whisperX with paste-a-URL, batch, and MCP access.
Desktop apps
Products
Meeting-bot transcription product for Zoom/Meet/Teams.
Human + AI transcription, highest accuracy tier on the market.
Audio/video editor that treats the transcript as the timeline — different product category.
Enterprise-focused transcription + collaborative editor for newsrooms.
Meeting-bot transcription + CRM integrations, competitor to Otter.
Frequently asked
faster-whisper vs whisperX — which should I use?
faster-whisper is the speed-optimised runtime. whisperX adds speaker diarization (pyannote) and forced-alignment word timestamps on top. Use faster-whisper if your audio is single-speaker and you only need the transcript. Use whisperX if the content has multiple speakers and you need "who said what."
What's the cheapest transcription API in 2026?
Per-minute pricing (as of 2026-04-20): Deepgram Nova-2 at $0.0043/min is the cheapest streaming API. OpenAI Whisper API is $0.006/min. Self-hosting faster-whisper on a rented GPU is cheaper at scale but requires operational work. Prices shift — check the linked page.
What's the best open-source Otter.ai alternative?
For file-transcription, whisperX (or faster-whisper with pyannote) gives you the same transcript + speaker-label output Otter produces. For the meeting-bot workflow itself, there's no one-click OSS replacement — you'd need to combine Whisper + a bot framework (e.g. meeting-bot libraries) yourself.
Which is best on Apple Silicon (M-series Macs)?
whisper.cpp with the Metal backend is the fastest pure-CLI option. WhisperKit is the Swift-native choice for in-app integration. MacWhisper is the polished desktop app for non-technical users.
I need HIPAA compliance. Which options qualify?
For commercial APIs with HIPAA/BAA paths: Deepgram, AssemblyAI, Rev.ai, and Speechmatics all offer them on appropriate tiers. For self-hosted, HIPAA is your responsibility — the license doesn't grant compliance; your deployment architecture does.
Whisper says it supports 99 languages. Is that real?
The model weights cover 99 languages, but quality varies widely. English, Spanish, German, French, Japanese, and Chinese are excellent. Low-resource languages (e.g. many African and Southeast-Asian languages) are significantly weaker — often below a usable WER. SeamlessM4T is worth checking for those.
Prefer a hosted service over running your own GPU? Whipscribe runs faster-whisper + whisperX behind a web UI, REST API, and MCP server for Claude Desktop.
Try Whipscribe →