Voice models

Pick the engine
per meeting.

Mac Note Taker ships with Parakeet TDT v3 by default. Need 99-language support? Switch to Whisper Large v3. Want Whisper accuracy at 2-3× the speed? Distil-Large. On a 8 GB Mac? Small.en. One click in Settings → Models. Each model downloads once, caches locally, and runs entirely on the Apple Neural Engine.

All five models

Speed vs accuracy, your call.

  • Parakeet TDT v3

    ParakeetDefault

    FluidAudio's CoreML port of NVIDIA Parakeet TDT 0.6B. Streaming-friendly. Lowest end-to-end latency on M1+. Tight on battery.

    Use when The default. Best balance for daily standups and 1:1s.

    Disk
    250 MB
    Peak RAM
    ~500 MB
    Throughput
    ~17× real-time on M2 Max
    Languages
    25 EU languages
  • Whisper Large v3

    WhisperBest accuracy

    OpenAI's flagship multilingual model. Highest accuracy on accented speech and noisy rooms. Heavier RAM and slower on Intel.

    Use when High-stakes meetings: legal calls, customer interviews, multilingual standups.

    Disk
    1550 MB
    Peak RAM
    ~2200 MB
    Throughput
    ~1× real-time on M2 Pro
    Languages
    99 languages
  • Whisper Distil-Large v3

    WhisperSweet spot

    Knowledge-distilled from Large v3. Same English accuracy at 2-3× the speed. Recommended Whisper variant for most users.

    Use when When you want Whisper accuracy without the Large RAM tax.

    Disk
    760 MB
    Peak RAM
    ~1100 MB
    Throughput
    ~3× real-time on M1
    Languages
    English (distilled)
  • Whisper Small.en

    WhisperLightweight

    OpenAI's compact English model. Fits comfortably alongside other apps. Best fit for older Macs and battery-sensitive workflows.

    Use when 8 GB Macs, long meetings on battery, older M1 hardware.

    Disk
    480 MB
    Peak RAM
    ~700 MB
    Throughput
    ~5× real-time on M1
    Languages
    English
  • Parakeet TDT v2

    ParakeetLegacy

    Older Parakeet build kept around for hardware regressions. Slightly slower than v3. Use only if v3 misbehaves on your machine.

    Use when Fallback only.

    Disk
    240 MB
    Peak RAM
    ~480 MB
    Throughput
    ~10× real-time on M1
    Languages
    English

How switching works.

Open Settings → Models. Pick a row. The first time you select a Whisper variant the app pulls the model from HuggingFace (Parakeet pulls from NVIDIA's CoreML cache). Every subsequent launch loads from disk - no network round-trip, no API key, no SaaS account.

Diarization (pyannote-segmentation-3.0 + CAM++) runs alongside whatever transcription model you pick. Speaker turns, voice fingerprints, and cross-meeting recognition stay identical regardless of which engine produced the words.

Live in-app stats panel shows current memory and CPU per model so you can see exactly what each choice costs.

01

Open Settings → Models

Voice model picker lists Parakeet v3 (active by default), Parakeet v2, Whisper Large v3, Distil-Large v3, Small.en.

02

Click 'Use'

First click downloads the model with a real progress bar. Subsequent clicks switch instantly.

03

Records as usual

Auto-detect, mic + system audio, diarization, AI summary - all flow through the model you picked.

04

Re-transcribe any meeting

Right-click any past meeting → Re-transcribe. Same audio, new model output. Compare and keep the better one.

FAQ

Common questions.

+ Do all models run on-device?

Yes. Parakeet runs via FluidAudio's CoreML port. Whisper variants run via WhisperKit's CoreML port. No audio leaves the Mac. The only network call is the daily license-validate ping.

+ Which model is most accurate?

Whisper Large v3 has the highest raw accuracy across 99 languages and is the recommendation for legal/customer interviews. Distil-Large v3 matches it on English at 2-3× the speed. Parakeet TDT v3 is competitive on English and far faster end-to-end - a better default for daily meetings.

+ Can I run multiple models in parallel?

Not yet. The picker activates one model at a time. You can re-transcribe a past meeting with a different model to compare outputs side-by-side.

+ How much disk space?

Parakeet v3: 250 MB. Whisper Large v3: 1.55 GB. Whisper Distil-Large v3: 760 MB. Whisper Small.en: 480 MB. Parakeet v2: 240 MB. Models are cached under ~/Documents/huggingface/ (Whisper) and ~/Library/Application Support/FluidAudio/ (Parakeet). Delete a folder to free space.

+ Will Whisper run on Intel Macs?

Mac Note Taker requires Apple Silicon (M1+). Whisper Large is borderline real-time on M2 Pro and slower on M1; pick Distil-Large or Small.en on older hardware.

+ What about diarization quality across models?

Diarization is the same regardless of which transcription model you pick - it runs separately via pyannote-segmentation-3.0 + CAM++ embeddings. Speaker labels and voice fingerprints stay consistent.

Five models. Zero subscription.

$149 lifetime · 3 Macs · code FOUDNER for $79.