WhisperKit

by Argmax

Swift Whisper for Apple Silicon — CoreML, ANE, Metal. Now part of the Argmax Open-Source SDK (v1.0.0, May 2026) alongside SpeakerKit + TTSKit.

TL;DR

Swift Whisper for Apple Silicon — CoreML, ANE, Metal. Now part of the Argmax Open-Source SDK (v1.0.0, May 2026) alongside SpeakerKit + TTSKit.

Best for shipping Whisper inside iOS/macOS/visionOS apps with Apple Neural Engine acceleration and no server round-trip. Pricing: free.

Category
Open source
License
MIT
Stars
★ 6.0k
Last push
2026-04-14
Pricing
free
Platforms
macOS, iOS, iPadOS, watchOS, visionOS

What it is

WhisperKit is Argmax's Swift-native Whisper runtime for Apple Silicon. CoreML-compiled encoder + decoder run on the Neural Engine, GPU, and CPU automatically — no PyTorch, no Python, no CUDA, and no manual conversion step. As of v1.0.0 (2026-05-01) the repo was renamed from argmaxinc/WhisperKit to argmaxinc/argmax-oss-swift and now ships WhisperKit + SpeakerKit (pyannote diarization) + TTSKit (Qwen3-TTS) in a single MIT-licensed Swift package. The whisperkit-cli (Homebrew + `swift run`), an OpenAI-compatible local server, and 27+ pre-converted model variants on HuggingFace make it the default choice for any developer who wants Whisper-grade transcription on Apple platforms without running a backend.

Best for: Shipping Whisper inside iOS/macOS/visionOS apps with Apple Neural Engine acceleration and no server round-trip.
Watch out for: Apple ecosystem only; diarization now sits in a sibling kit (SpeakerKit) rather than in WhisperKit itself; word-level forced alignment is not in the open-source surface.

Install / use

View on GitHub (Argmax Open-Source SDK) github.com

Add via Swift Package Manager in Xcode: File → Add Package Dependencies…

What it really is

WhisperKit is an open-source Swift package from Argmax Inc. that runs OpenAI Whisper speech-to-text models entirely on Apple Silicon devices using CoreML. It exists because the reference Whisper code from OpenAI is Python+PyTorch and whisper.cpp — the most popular C++ port — treats the Apple Neural Engine as an opt-in extra rather than the primary execution path. WhisperKit compiles each Whisper variant into .mlmodelc bundles that the OS schedules across the Apple Neural Engine (ANE), GPU (Metal), and CPU automatically, so a single import gets idiomatic Swift-async transcription with hardware acceleration that whisper.cpp requires extra build flags and converted models to match.

The project was open-sourced under the MIT license in January 2024. On 2026-05-01 it graduated to v1.0.0 and was renamed the Argmax Open-Source SDK (repo argmaxinc/argmax-oss-swift), bundling three turn-key kits in one Swift package: WhisperKit (speech-to-text, Whisper), SpeakerKit (diarization, pyannote), and TTSKit (text-to-speech, Qwen3-TTS). The release adopts Swift 6 strict concurrency and vendors swift-transformers internally so consumer projects no longer pull HuggingFace's Hub library transitively.

Argmax distributes pre-converted CoreML weights for the entire Whisper family on HuggingFace at argmaxinc/whisperkit-coreml — tiny, base, small, medium, large-v2, large-v3, the September 2024 large-v3-v20240930 (better Spanish/Hindi/Korean), Distil-Whisper, plus quantized 'turbo' variants in the 547-955MB range that cut model size in half with minimal WER regression. Models download lazily on first use; whisperkit-cli ships via Homebrew (`brew install whisperkit-cli`) for command-line transcription, and a built-in OpenAI-compatible local server (POST /v1/audio/transcriptions) lets non-Swift apps call WhisperKit through the standard OpenAI SDK. Argmax also publishes a closed-source Pro SDK that adds real-time speaker-attributed transcription, custom vocabulary up to 3,000 terms, an Android Kotlin port, and a WebSocket streaming server compatible with Deepgram. The open-source package targets macOS 14+ and Xcode 16+; Apple Silicon (M1 or later, A14+ on iOS) is required for ANE acceleration. License: MIT.

Key specs

Repository
argmaxinc/argmax-oss-swift (renamed from argmaxinc/WhisperKit, 2026-05-01)
Latest release
v1.0.0 — 2026-05-01
License
MIT
Min platforms
macOS 14+, iOS 17+, watchOS 10+, visionOS 1+ · Xcode 16+
Swift toolchain
Swift 5.10 + Swift 6 strict concurrency
Whisper models
tiny / base / small / medium / large-v2 / large-v3 / large-v3-v20240930 / distil-large-v3 + quantized 'turbo' variants (547MB → 955MB)
Sister kits
SpeakerKit (pyannote diarization) · TTSKit (Qwen3-TTS)
Inference path
CoreML — auto-selects Apple Neural Engine + GPU (Metal) + CPU per layer
HuggingFace pull rate
~10.5M downloads/month for whisperkit-coreml
CLI
whisperkit-cli — `brew install whisperkit-cli` or `swift run`
Local server
Built-in OpenAI-compatible Audio API (POST /v1/audio/transcriptions, /v1/audio/translations, SSE streaming)

Performance (cited)

DeviceModelSpeedSource
M2 Ultra · ANE onlyWhisper Large v3 Turbo~42× realtimesource ↗
M2 Ultra · GPU + ANEWhisper Large v3 Turbo~72× realtimesource ↗
M3 Max · ANELarge v3 Turbo decoder forward pass4.6 ms / token (45% reduction vs non-CoreML baseline 8.4 ms)source ↗

Get started — code

Swift Package install · swift
// Package.swift
dependencies: [
    .package(url: "https://github.com/argmaxinc/argmax-oss-swift.git", from: "1.0.0"),
],
.target(
    name: "YourApp",
    dependencies: [
        .product(name: "WhisperKit", package: "argmax-oss-swift"),
        // Or .product(name: "ArgmaxOSS", ...) for WhisperKit + SpeakerKit + TTSKit
    ]
)
Minimal transcription · swift
import WhisperKit

Task {
    let pipe = try await WhisperKit()
    let result = try await pipe.transcribe(audioPath: "path/to/audio.m4a")
    print(result?.text ?? "")
}

// Pin a specific model:
let pipe = try await WhisperKit(WhisperKitConfig(
    model: "large-v3-v20240930_626MB"
))
CLI · whisperkit-cli · bash
# Install via Homebrew
brew install whisperkit-cli

# Or build from source
git clone https://github.com/argmaxinc/argmax-oss-swift.git
cd argmax-oss-swift
make setup
make download-model MODEL=large-v3-v20240930_626MB
swift run whisperkit-cli transcribe \
  --model-path "Models/whisperkit-coreml/openai_whisper-large-v3-v20240930_626MB" \
  --audio-path audio.m4a

# Mic streaming
swift run whisperkit-cli transcribe --model-path ... --stream
OpenAI-compatible local server (any language) · bash
# Start the WhisperKit server
swift run whisperkit-cli serve --model tiny --port 50060

# Call it with the standard OpenAI SDK:
python - <<'PY'
from openai import OpenAI
client = OpenAI(base_url="http://localhost:50060/v1", api_key="unused")
resp = client.audio.transcriptions.create(
    file=open("audio.wav", "rb"),
    model="tiny",
)
print(resp.text)
PY

How it compares

vs whisper.cpp

whisper.cpp ships as portable C/C++ with a `WHISPER_COREML=1` build flag plus a separate `generate-coreml-model.py` step that converts the encoder only — the decoder still runs in ggml on CPU/Metal. WhisperKit ships pre-converted .mlmodelc bundles for both encoder and decoder, so the Apple Neural Engine handles the heavy attention layers without bridging headers, callback APIs, or manual memory management. On the M3 ANE, Argmax measured a 45% latency reduction (8.4ms → 4.6ms per decoder forward pass) versus a pre-CoreML baseline. Bottom line: whisper.cpp is the right answer for Linux servers and Intel Macs; WhisperKit is the right answer the moment you target Apple Silicon and want native Swift idioms.

vs whisperX

whisperX is a Python project that combines faster-whisper, wav2vec2 forced alignment, and pyannote diarization to produce word-timestamped, speaker-labeled transcripts on CUDA. WhisperKit's open-source surface is transcription only; diarization is now its sibling kit SpeakerKit (also pyannote, in the same Swift package as of v1.0.0); word-level forced alignment is not in the OSS package. To approximate whisperX behavior on a Mac, compose WhisperKit + SpeakerKit and use Whisper segment-level timestamps; for word-level alignment plus real-time speaker labels, Argmax Pro is the supported path.

vs faster-whisper

faster-whisper is a CTranslate2-based Whisper runtime — outstanding on NVIDIA GPUs and very strong on x86 CPU, but on Apple Silicon it cannot use the Neural Engine and lands on CPU. A Swift or Mac developer picking faster-whisper has to bundle Python (or use the C++ ctranslate2 lib through a custom binding), download non-CoreML weights, and lose ANE acceleration. WhisperKit's CoreML stack uses ANE + GPU + CPU automatically, integrates with Swift async/await, and ships through SPM. faster-whisper remains the right pick for Linux/CUDA servers; WhisperKit is the right pick on every Apple platform.

vs MacWhisper

MacWhisper is an end-user Mac transcription app built by Jordi Bruin on top of whisper.cpp; WhisperKit is the SDK other apps embed. Argmax does not publish a flagship end-user app — third-party apps like Superwhisper (App Store ID 6471464415) are the most prominent products in the WhisperKit ecosystem. If you want an app, use MacWhisper or Superwhisper; if you want to build the next one, use WhisperKit.

Who picks this

iOS app developer
Embed Whisper directly in a SwiftUI app for offline voice notes, in-game voice commands, or accessibility captions. Async/await API, single binary, no network calls.
Mac dictation tool builder
Build Superwhisper / Wispr-style global dictation by pairing WhisperKit's mic streaming with a system-wide hotkey. CoreML keeps latency low enough for real-time typing on M1+.
macOS background transcription daemon
Run whisperkit-cli serve as a launchd service exposing OpenAI-compatible HTTP on localhost:50060, then point any OpenAI SDK in any language at it for batch transcription jobs. Avoids per-minute cloud Whisper costs entirely.
Existing Python/Node app on OpenAI Audio API
Drop in WhisperKit's local server as a base_url override; existing client code (openai-python, openai-node) keeps working. Useful for HIPAA / on-device-required deployments.
Multi-kit speech app
Use the ArgmaxOSS umbrella to import WhisperKit + SpeakerKit + TTSKit together for an end-to-end pipeline (transcribe → diarize → respond with synthesized speech) with one Swift package dependency.

Every link in one place

Features

Speaker diarizationNo
Word-level timestampsYes
Streaming / real-timeYes
Languages supported99
HIPAA eligibleNo

WhisperKit vs Whipscribe

FeatureWhisperKitWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsYesYes
StreamingYesNo
Languages9999
PlatformsmacOS, iOS, iPadOS, watchOS, visionOSWeb, API, MCP

Alternatives to WhisperKit

Frequently asked about WhisperKit

Is WhisperKit the same as whisper.cpp on Mac?

No. whisper.cpp is a portable C/C++ Whisper port that runs Whisper on CPU with optional Metal GPU and an opt-in CoreML encoder; WhisperKit is a Swift-native package that compiles the full encoder and decoder to CoreML and lets the OS schedule layers across the Apple Neural Engine, GPU, and CPU automatically. WhisperKit is the right pick if you are shipping a Swift/SwiftUI app on Apple Silicon; whisper.cpp is the right pick when you need to run Whisper on Linux servers, Windows, Intel Macs, or embedded targets with no Apple framework available.

Does WhisperKit use the Apple Neural Engine (ANE)?

Yes. The CoreML model bundles published at huggingface.co/argmaxinc/whisperkit-coreml are compiled to run on ANE plus GPU plus CPU, and WhisperKit picks the compute units automatically. You can also pin them — e.g. `cpuAndNeuralEngine` to force ANE, `cpuAndGPU` to force Metal — via WhisperKitConfig.

How does WhisperKit compare to faster-whisper on Apple Silicon?

faster-whisper is a Python wrapper around CTranslate2 that targets CUDA and CPU; on Apple Silicon it falls back to CPU, so it does not use the Neural Engine and trails WhisperKit on Mac and iPhone. If you control the box and have an NVIDIA GPU, faster-whisper is excellent; if you ship a Mac or iOS app and want hardware acceleration without bundling Python or a CUDA runtime, WhisperKit wins by construction.

What's the difference between WhisperKit (open source) and Argmax Pro?

WhisperKit and the rest of the Argmax Open-Source SDK are MIT-licensed and ship the OpenAI Whisper, pyannote, and Qwen3-TTS models. Argmax Pro SDK is a closed-source extension with: real-time streaming transcription with live speaker attribution, custom-vocabulary support up to 3,000 keywords for domain accuracy, an Android/Kotlin port, a Deepgram-compatible WebSocket Local Server, and the Pro model variants (whisperkit-pro, parakeetkit-pro, speakerkit-pro). Pricing is on Argmax's site behind a 14-day trial.

Does WhisperKit support real-time / streaming transcription?

The open-source SDK supports microphone streaming via the CLI's `--stream` flag and partial-result streaming over Server-Sent Events from the local server, so you can build dictation-style apps. True real-time streaming with diarization and word-level latency guarantees is a Pro SDK feature; the open-source path streams transcripts as they're generated but does not promise sub-200ms first-token guarantees.

Where do I get the CoreML model files?

All variants are hosted at huggingface.co/argmaxinc/whisperkit-coreml. WhisperKit downloads the recommended model on first run; you can override with WhisperKitConfig(model:) using a glob like `large-v3-v20240930_626MB`. For air-gapped builds, run `make download-model MODEL=...` (or `make download-models` for the full set) and ship the resulting .mlmodelc bundles inside your app.

Is WhisperKit the same as whisperX?

No. whisperX is a Python project layering forced alignment (wav2vec2) and pyannote diarization on top of faster-whisper, primarily on CUDA. WhisperKit is a Swift CoreML inference framework; in the Argmax SDK 1.0.0 release, diarization is now a sibling kit (SpeakerKit, also pyannote-based) you can compose with WhisperKit, but word-level alignment is not part of the open-source surface. Visitors looking for whisperX behavior on Mac usually combine WhisperKit + SpeakerKit, or use Argmax Pro.

Does WhisperKit work on iPhone, iPad, Apple Watch, Vision Pro?

Yes. The package targets iOS, iPadOS, watchOS, and visionOS. Practical model size is the constraint: tiny and base run on Apple Watch and older iPhones; large-v3 quantized variants (547-626MB) target iPhone 15 Pro and newer with 8GB RAM. Vision Pro and M-series iPads run the full large-v3 comfortably.

What models should I use for production?

Argmax recommends `large-v3-v20240930_626MB` for maximum multilingual accuracy and `tiny` for fast iteration. The September 2024 v3 checkpoint is OpenAI's last Whisper update and noticeably better than 2023 large-v3 on Spanish, Hindi, and Korean. The `_turbo` suffix variants drop the heavy decoder for a lighter one with negligible WER regression on English; pick `_turbo_600MB` if real-time is the priority and `_626MB` non-turbo if WER is.

I searched 'whisperx on mac' — what should I use?

On Mac, WhisperKit + SpeakerKit covers the diarization half of whisperX with hardware acceleration the Python whisperX stack can't reach. You lose word-level forced alignment in the open-source path; if you need it, either run whisperX in a Linux Docker container or move to Argmax Pro.

Does WhisperKit support faster-whisper-style Apple Silicon Metal acceleration?

WhisperKit goes further than Metal: it uses CoreML, which schedules across ANE + GPU + CPU based on layer cost. faster-whisper has no Metal backend at all on Apple Silicon — it is CPU-only there. If your search was 'faster-whisper apple silicon metal support', WhisperKit is the answer for that intent.

Is WhisperKit free to use?

Yes — WhisperKit and the rest of the Argmax Open-Source SDK are MIT-licensed and free for commercial use. Argmax also publishes a closed-source Pro SDK with custom-vocabulary, real-time speaker-attributed streaming, and an Android port; pricing is on argmaxinc.com.

Does WhisperKit run on iOS?

Yes. WhisperKit ships on macOS 14+, iOS 17+, watchOS 10+, and visionOS — all CoreML-accelerated on Apple Silicon. Inference happens fully on-device; no network round-trip is required.

Does it work on Intel Macs?

It installs (Swift package, no architecture lock) but the CoreML weights are tuned for Apple Silicon. Intel Macs have no Neural Engine, so compute falls back to CPU + GPU and performance is similar to whisper.cpp's CPU mode.

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.