ChatGPT Realtime Audio
- Realtime-2 — voice agent · GPT-5-class reasoning · $32 / $64 per 1M audio tokens
- Realtime-Translate — live translation · 70 → 13 languages · $0.034 / min ≈ $2.04 / hour
- Realtime-Whisper — streaming speech-to-text · $0.017 / min ≈ $1.02 / hour
This page is the honest comparison vs Whipscribe — last price-check 2026-05-08.
If you're a developer building a voice agent and you want GPT-5 reasoning on the audio path with a single endpoint — Realtime-2 is the right tool, and Whipscribe doesn't compete on that.
If you want a transcript — paste a URL, drop a file, or hit an MCP/REST endpoint — Whipscribe is $2/hr PAYG (a bit above Realtime-Whisper's ~$1.02/hr), but Pro at $12/month covers up to 100 hours (effective $0.12/hr — about 8.5× cheaper than ChatGPT Realtime at heavy use). 30 minutes/day free without signup. 99 languages, diarization included, no audio sent to OpenAI servers.
If you care about privacy — Whipscribe runs self-hosted faster-whisper / whisperX with no third-party AI calls on the audio path; ChatGPT Realtime sends every audio stream to OpenAI's US servers, retained per OpenAI's data-usage policy.
The three new models
GPT-Realtime-2
Voice agent with GPT-5-class reasoning. Carries multi-turn dialogue, calls tools, executes actions while the user is still talking. The flagship — and the most expensive of the three.
GPT-Realtime-Translate
Live speech-to-speech translation. 70 input languages → 13 output languages, low-latency streaming. Aimed at meeting bots, dubbing, and customer-support overlay use cases.
GPT-Realtime-Whisper
Streaming speech-to-text. Same Whisper family OpenAI has shipped before, now exposed as a real-time stream so partial transcripts appear as the speaker talks. The closest direct competitor to Whipscribe's transcription API.
Pricing pulled from OpenAI's launch announcement on 2026-05-08. Audio billing is rounded to the nearest second; Realtime-2 token math depends on input length.
At a glance
ChatGPT Realtime Audio vs Whipscribe
| Feature | ChatGPT Realtime Audio OpenAI · 2026-05-08 |
Whipscribe Neugence · privacy-first |
|---|---|---|
| Product category | Voice AI · streaming STT API developer-only |
Transcription utility web app · API · MCP ChatGPT GPT · Mac desktop |
| Cheapest transcription rate | Realtime-Whisper · $0.017/min (~$1.02/hr) Realtime-Translate · $0.034/min (~$2.04/hr) |
$2 / hour PAYG or $12 / month Pro (up to 100 hrs · effective $0.12/hr) credits never expire |
| Voice-agent / GPT-5 reasoning | Yes — Realtime-2 $32 / $64 per 1M tokens |
Not offered we ship transcripts; bring your own model (Claude, GPT, local) |
| Try without signing up | No OpenAI account + API key required |
Yes 30 min/day free no account, no card |
| Free tier | None billed from minute zero |
Anonymous · 30 min/day + 2 hours free on signup |
| Privacy / data residency | Audio sent to OpenAI servers (US) retention per OpenAI data policy not HIPAA-eligible by default |
Self-hosted Whisper (our own GPU cluster) audio never sent to OpenAI no training on uploads see /security + /privacy |
| Speaker diarization | Not built-in | Included whisperX + pyannote no extra fee |
| Word-level timestamps | Streaming token deltas no aligned word timings |
SRT · VTT · JSON · DOCX · TXT word-level alignment |
| Languages | Whisper-realtime · 99 Translate · 70 → 13 |
99 (full Whisper coverage, all tiers) |
| URL input (YouTube, Vimeo, podcast feeds) |
No raw audio stream only |
Yes paste a link, we pull the audio |
| Bulk upload / batch | No one stream per request |
Yes drag many files parallel jobs |
| Editing / library / sharing | None developer API only |
Web library · folders share links · trash · search |
| Live transcription | Yes real-time streaming model |
Live Meeting Notes (beta) streaming Whisper on web |
| Native integrations | OpenAI SDK Realtime API endpoints |
REST API MCP server (Claude/Cursor) ChatGPT Custom GPT Obsidian · Mac desktop Chrome extension |
| Subscription option | Pure usage metering no monthly cap |
$2/hr PAYG Pro · $12/mo (up to 100 hrs) Team · $29/mo (up to 500 hrs) predictable monthly spend |
Pricing — head to head
| Workload | ChatGPT Realtime Audio | Whipscribe |
|---|---|---|
| 1 hour of transcription / month | ~$1.02 / hour (60 min × $0.017, Realtime-Whisper) |
$2.00 PAYG or $0 if under the daily 30-min free tier |
| 10 hours / month | ~$10.20 | $20.00 PAYG or $12 Pro flat (up to 100 hrs included) |
| 40 hours / month (active podcaster / journalist) |
~$40.80 (40 × $1.02/hr) |
$12 Pro flat (effective $0.30/hr · ~3.4× cheaper) |
| 100 hours / month (podcast network · research lab) |
~$102.00 (100 × $1.02/hr) |
$12 Pro flat (effective $0.12/hr · ~8.5× cheaper) |
| 1 hour live translation | ~$2.04 / hour (60 min × $0.034, Realtime-Translate) |
$2.00 transcript + paste into Claude / DeepL |
| Voice-agent app (50K interactions / month) |
Token-metered Realtime-2 · $32 / $64 per 1M tokens |
Out of scope we’re a transcription utility, not a voice agent |
| Try without paying | No free tier | 30 min/day anonymous + 2 hours free on signup |
All numbers from public price pages on 2026-05-08. Realtime-2 voice-agent math depends on conversation length; the line above is illustrative only.
When ChatGPT Realtime Audio is the right call
- You’re building a voice agent that needs GPT-5 reasoning, tool calls, and barge-in. Realtime-2 is genuinely state-of-the-art here.
- You need live speech-to-speech translation in 13 target languages and want a single OpenAI endpoint instead of stitching Whisper + GPT-4o + TTS.
- Your audio is already in OpenAI’s ecosystem (Realtime API, Responses API, Assistants) and a separate vendor adds friction.
When Whipscribe is the better fit
- You want a transcript — TXT, SRT, VTT, DOCX, JSON — not a voice agent.
- You care about privacy: legal, medical, journalism, EU-residency, or anything subject to data-export rules.
- You want to try without signing up — paste a URL or drop a file and read the transcript in 30 seconds.
- You need diarization out of the box, URL input, batch uploads, an MCP server, a ChatGPT Custom GPT, an Obsidian plugin, or a Mac desktop app.
- You want predictable monthly spend: $12 Pro (up to 100 hrs) or $29 Team (up to 500 hrs) — at full use that’s $0.12/hr and $0.058/hr respectively, well under the $1.02/hr Realtime-Whisper rate. PAYG is $2/hr with credits that never expire.
FAQ
Which one should I pick for my use case?
If you need a transcript file (TXT, SRT, DOCX) from a recording, an interview, a meeting, a podcast, or a YouTube link — Whipscribe is the right tool. Drop the file or paste the URL and read the transcript.
If you’re a developer building a live voice agent that needs GPT-5 reasoning, tool calls, and barge-in — ChatGPT Realtime-2 is the right tool. It’s an API, not a transcription product.
If you need live speech-to-speech translation across 70 → 13 languages with low latency — ChatGPT Realtime-Translate is built for that. Whipscribe transcribes; translation is a separate step.
Do I need to write code to use ChatGPT Realtime Audio?
Yes. All three models are developer APIs — you’ll need an OpenAI account, an API key, and code that opens an audio stream to api.openai.com. There is no web app and no upload form.
Whipscribe has a web app at whipscribe.com — paste a URL or drop a file and you get a transcript in seconds, no code, no account required for the first 30 minutes a day.
Can I transcribe a YouTube video or podcast URL with ChatGPT Realtime?
Not directly. ChatGPT Realtime accepts a raw audio stream — you’d need to download or capture the audio yourself and pipe it in. Whipscribe accepts a URL: paste a YouTube, Vimeo, podcast, or direct media link and the audio is fetched for you.
How much will I actually pay for typical workloads?
Realtime-Whisper is billed at $0.017 per minute, rounded to the nearest second.
- 1 hour / month — ChatGPT Realtime ≈ $1.02 · Whipscribe $2.00 PAYG (or $0 if under the daily 30-min free tier)
- 10 hours / month — ChatGPT Realtime ≈ $10.20 · Whipscribe $20 PAYG or $12 Pro flat
- 40 hours / month — ChatGPT Realtime ≈ $40.80 · Whipscribe $12 Pro flat (≈ 3.4× cheaper)
- 100 hours / month — ChatGPT Realtime ≈ $102 · Whipscribe $12 Pro flat (≈ 8.5× cheaper)
Realtime-Translate at $0.034/min (≈ $2.04/hr) and Realtime-2 voice agent ($32 / $64 per 1M tokens) bill the same way — usage-metered, no monthly cap, no free tier.
Is my audio private with each tool?
ChatGPT Realtime Audio: every audio stream is sent to OpenAI’s servers in the United States and retained per OpenAI’s data-usage policy. Default API access is not HIPAA-eligible.
Whipscribe: audio is processed on Whipscribe’s own GPU cluster using self-hosted Whisper / whisperX. Audio is never sent to OpenAI or any third-party AI provider. Recordings are not used for training. The full posture is on /security and /privacy.
Does ChatGPT Realtime work for Zoom / Google Meet / Teams transcripts?
There’s no built-in meeting bot — you’d capture the meeting audio yourself, then stream it in. Whipscribe accepts uploaded recordings (mp4, m4a, mp3, wav and many more) and offers Live Meeting Notes in beta for browser-tab capture.
Which is more accurate?
Realtime-Whisper is the same Whisper family Whipscribe runs (whisper / whisperX). On clean speech the two are very close. Differences in the final transcript come from features layered on top: speaker diarization, word-level alignment, punctuation restoration, and language detection — all included on Whipscribe, not on Realtime-Whisper.
HIPAA / SOC 2 / EU data residency — what are my options?
OpenAI offers HIPAA via their Enterprise tier; default API access is not HIPAA-eligible. EU residency requires an OpenAI Enterprise contract.
Whipscribe runs on Neugence-owned infrastructure with self-hosted models and no third-party AI on the audio path. See /security for the current posture, certifications status, and how to request a DPA.
Can I use both together?
Yes — they solve different problems. A common pattern: use Whipscribe for the transcript (with diarization, timestamps, exports), then feed the text to GPT-5 / Realtime-2 to build a voice agent on top. The Whipscribe MCP server and the ChatGPT Custom GPT make that handoff one click.
Is there a free way to try either one?
ChatGPT Realtime Audio has no free tier — you pay from the first second.
Whipscribe gives every visitor 30 minutes of transcription per day with no signup, plus 2 hours free on signup. Credits don’t expire.
Whipscribe is a managed faster-whisper + whisperX service — privacy-first, $2/hr PAYG or $12/month Pro (up to 100 hrs), no API key to try, 99 languages, diarization included.
Transcribe a file →Cross-references: OpenAI Whisper API (older, $0.006/min) · Deepgram · AssemblyAI · all 27 tools · our security posture · our pricing.