Name: WhisperKit
Author: Argmax

Question 1

Is WhisperKit the same as whisper.cpp on Mac?

Accepted Answer

No. whisper.cpp is a portable C/C++ Whisper port that runs Whisper on CPU with optional Metal GPU and an opt-in CoreML encoder; WhisperKit is a Swift-native package that compiles the full encoder and decoder to CoreML and lets the OS schedule layers across the Apple Neural Engine, GPU, and CPU automatically. WhisperKit is the right pick if you are shipping a Swift/SwiftUI app on Apple Silicon; whisper.cpp is the right pick when you need to run Whisper on Linux servers, Windows, Intel Macs, or embedded targets with no Apple framework available.

Question 2

Does WhisperKit use the Apple Neural Engine (ANE)?

Accepted Answer

Yes. The CoreML model bundles published at huggingface.co/argmaxinc/whisperkit-coreml are compiled to run on ANE plus GPU plus CPU, and WhisperKit picks the compute units automatically. You can also pin them — e.g. `cpuAndNeuralEngine` to force ANE, `cpuAndGPU` to force Metal — via WhisperKitConfig.

Question 3

How does WhisperKit compare to faster-whisper on Apple Silicon?

Accepted Answer

faster-whisper is a Python wrapper around CTranslate2 that targets CUDA and CPU; on Apple Silicon it falls back to CPU, so it does not use the Neural Engine and trails WhisperKit on Mac and iPhone. If you control the box and have an NVIDIA GPU, faster-whisper is excellent; if you ship a Mac or iOS app and want hardware acceleration without bundling Python or a CUDA runtime, WhisperKit wins by construction.

Question 4

What's the difference between WhisperKit (open source) and Argmax Pro?

Accepted Answer

WhisperKit and the rest of the Argmax Open-Source SDK are MIT-licensed and ship the OpenAI Whisper, pyannote, and Qwen3-TTS models. Argmax Pro SDK is a closed-source extension with: real-time streaming transcription with live speaker attribution, custom-vocabulary support up to 3,000 keywords for domain accuracy, an Android/Kotlin port, a Deepgram-compatible WebSocket Local Server, and the Pro model variants (whisperkit-pro, parakeetkit-pro, speakerkit-pro). Pricing is on Argmax's site behind a 14-day trial.

Question 5

Does WhisperKit support real-time / streaming transcription?

Accepted Answer

The open-source SDK supports microphone streaming via the CLI's `--stream` flag and partial-result streaming over Server-Sent Events from the local server, so you can build dictation-style apps. True real-time streaming with diarization and word-level latency guarantees is a Pro SDK feature; the open-source path streams transcripts as they're generated but does not promise sub-200ms first-token guarantees.

Question 6

Where do I get the CoreML model files?

Accepted Answer

All variants are hosted at huggingface.co/argmaxinc/whisperkit-coreml. WhisperKit downloads the recommended model on first run; you can override with WhisperKitConfig(model:) using a glob like `large-v3-v20240930_626MB`. For air-gapped builds, run `make download-model MODEL=...` (or `make download-models` for the full set) and ship the resulting .mlmodelc bundles inside your app.

Question 7

Is WhisperKit the same as whisperX?

Accepted Answer

No. whisperX is a Python project layering forced alignment (wav2vec2) and pyannote diarization on top of faster-whisper, primarily on CUDA. WhisperKit is a Swift CoreML inference framework; in the Argmax SDK 1.0.0 release, diarization is now a sibling kit (SpeakerKit, also pyannote-based) you can compose with WhisperKit, but word-level alignment is not part of the open-source surface. Visitors looking for whisperX behavior on Mac usually combine WhisperKit + SpeakerKit, or use Argmax Pro.

Question 8

Does WhisperKit work on iPhone, iPad, Apple Watch, Vision Pro?

Accepted Answer

Yes. The package targets iOS, iPadOS, watchOS, and visionOS. Practical model size is the constraint: tiny and base run on Apple Watch and older iPhones; large-v3 quantized variants (547-626MB) target iPhone 15 Pro and newer with 8GB RAM. Vision Pro and M-series iPads run the full large-v3 comfortably.

Question 9

What models should I use for production?

Accepted Answer

Argmax recommends `large-v3-v20240930_626MB` for maximum multilingual accuracy and `tiny` for fast iteration. The September 2024 v3 checkpoint is OpenAI's last Whisper update and noticeably better than 2023 large-v3 on Spanish, Hindi, and Korean. The `_turbo` suffix variants drop the heavy decoder for a lighter one with negligible WER regression on English; pick `_turbo_600MB` if real-time is the priority and `_626MB` non-turbo if WER is.

Question 10

I searched 'whisperx on mac' — what should I use?

Accepted Answer

On Mac, WhisperKit + SpeakerKit covers the diarization half of whisperX with hardware acceleration the Python whisperX stack can't reach. You lose word-level forced alignment in the open-source path; if you need it, either run whisperX in a Linux Docker container or move to Argmax Pro.

Question 11

Does WhisperKit support faster-whisper-style Apple Silicon Metal acceleration?

Accepted Answer

WhisperKit goes further than Metal: it uses CoreML, which schedules across ANE + GPU + CPU based on layer cost. faster-whisper has no Metal backend at all on Apple Silicon — it is CPU-only there. If your search was 'faster-whisper apple silicon metal support', WhisperKit is the answer for that intent.

Question 12

Is WhisperKit free to use?

Accepted Answer

Yes — WhisperKit and the rest of the Argmax Open-Source SDK are MIT-licensed and free for commercial use. Argmax also publishes a closed-source Pro SDK with custom-vocabulary, real-time speaker-attributed streaming, and an Android port; pricing is on argmaxinc.com.

Question 13

Does WhisperKit run on iOS?

Accepted Answer

Yes. WhisperKit ships on macOS 14+, iOS 17+, watchOS 10+, and visionOS — all CoreML-accelerated on Apple Silicon. Inference happens fully on-device; no network round-trip is required.

Question 14

Does it work on Intel Macs?

Accepted Answer

It installs (Swift package, no architecture lock) but the CoreML weights are tuned for Apple Silicon. Intel Macs have no Neural Engine, so compute falls back to CPU + GPU and performance is similar to whisper.cpp's CPU mode.

Device	Model	Speed	Source
M2 Ultra · ANE only	Whisper Large v3 Turbo	~42× realtime	source ↗
M2 Ultra · GPU + ANE	Whisper Large v3 Turbo	~72× realtime	source ↗
M3 Max · ANE	Large v3 Turbo decoder forward pass	4.6 ms / token (45% reduction vs non-CoreML baseline 8.4 ms)	source ↗

Speaker diarization	No
Word-level timestamps	Yes
Streaming / real-time	Yes
Languages supported	99
HIPAA eligible	No

Feature	WhisperKit	Whipscribe
Category	Open source	Transcription APIs
Pricing	free	free beta
Speaker diarization	No	Yes
Word timestamps	Yes	Yes
Streaming	Yes	No
Languages	99	99
Platforms	macOS, iOS, iPadOS, watchOS, visionOS	Web, API, MCP

WhisperKit

What it is

Install / use

What it really is

Key specs

Performance (cited)

Get started — code

How it compares

vs whisper.cpp

vs whisperX

vs faster-whisper

vs MacWhisper

Who picks this

Every link in one place

Features

WhisperKit vs Whipscribe

Alternatives to WhisperKit

Frequently asked about WhisperKit