Transcription tools directory

Every transcription service we track — open-source engines, desktop apps, APIs, and products — grouped by category. Click a heading to expand or collapse. Live GitHub stats, features matrix, honest current pricing. Curated by Whipscribe; updated 2026-05-07.

Updated 2026-05-07 · 26 tools tracked
Open source · 11 tracked
Self-hostable transcription engines and desktop apps you can run yourself, with source you can read and modify. All open source →
Transcription APIs · 7 tracked
Hosted transcription endpoints you call with an API key — no infrastructure to manage. All transcription apis →
Desktop apps · 3 tracked
Native desktop applications for macOS, Windows, and Linux that transcribe files locally. All desktop apps →
Products · 5 tracked
End-user transcription products — meeting bots, editors, and turnkey workflows. All products →

Frequently asked

faster-whisper vs whisperX — which should I use?

faster-whisper is the speed-optimised runtime. whisperX adds speaker diarization (pyannote) and forced-alignment word timestamps on top. Use faster-whisper if your audio is single-speaker and you only need the transcript. Use whisperX if the content has multiple speakers and you need "who said what."

What's the cheapest transcription API in 2026?

Per-minute pricing (as of 2026-04-20): Deepgram Nova-2 at $0.0043/min is the cheapest streaming API. OpenAI Whisper API is $0.006/min. Self-hosting faster-whisper on a rented GPU is cheaper at scale but requires operational work. Prices shift — check the linked page.

What's the best open-source Otter.ai alternative?

For file-transcription, whisperX (or faster-whisper with pyannote) gives you the same transcript + speaker-label output Otter produces. For the meeting-bot workflow itself, there's no one-click OSS replacement — you'd need to combine Whisper + a bot framework (e.g. meeting-bot libraries) yourself.

Which is best on Apple Silicon (M-series Macs)?

whisper.cpp with the Metal backend is the fastest pure-CLI option. WhisperKit is the Swift-native choice for in-app integration. MacWhisper is the polished desktop app for non-technical users.

I need HIPAA compliance. Which options qualify?

For commercial APIs with HIPAA/BAA paths: Deepgram, AssemblyAI, Rev.ai, and Speechmatics all offer them on appropriate tiers. For self-hosted, HIPAA is your responsibility — the license doesn't grant compliance; your deployment architecture does.

Whisper says it supports 99 languages. Is that real?

The model weights cover 99 languages, but quality varies widely. English, Spanish, German, French, Japanese, and Chinese are excellent. Low-resource languages (e.g. many African and Southeast-Asian languages) are significantly weaker — often below a usable WER. SeamlessM4T is worth checking for those.

Prefer a hosted service over running your own GPU? Whipscribe runs faster-whisper + whisperX behind a web UI, REST API, and MCP server for Claude Desktop.

Try Whipscribe →