Reference

Glossary,
in plain English.

The vocabulary of local-first Mac meeting transcription in 2026. ASR, diarization, voice embeddings, ScreenCaptureKit, Neural Engine, Ollama. Two-paragraph definitions, cross-linked, no marketing.

A3 terms

ANE (Apple Neural Engine)
#
Shorter name for the Apple Neural Engine - the per-chip ML accelerator that runs Mac Note Taker's transcription and diarization models.
ANE is the developer-side abbreviation for the Apple Neural Engine. It is the part of the chip that runs Core ML workloads with very low energy cost. The same hardware accelerates ASR (Parakeet, Whisper), diarization (pyannote-segmentation-3.0), and voice embeddings (CAM++) inside Mac Note Taker.
Unlike GPU compute, ANE workloads do not contend with the rendering pipeline of the meeting client itself, which is why ScreenCaptureKit + on-device ASR adds almost no perceptible latency to Zoom, Meet, or Teams.
Related: Apple Neural Engine (ANE), M-series (Apple Silicon), Edge AI
Apple Neural Engine (ANE)
#
The dedicated ML accelerator inside every Apple Silicon chip, used to run ASR and diarization without burning the CPU or GPU.
The Neural Engine is a fixed-function block on every M-series and A-series Apple chip that executes machine-learning workloads at a fraction of the energy cost of running the same model on the CPU or GPU. It is exposed to developers through Core ML and runs models that have been converted into Apple's mlpackage / mlmodelc format.
The ANE is the reason on-device meeting transcription stopped being a research demo and became a shippable feature in 2026. Parakeet TDT, Whisper variants, pyannote-segmentation, and CAM++ all run on the ANE on Mac Note Taker, which is why a 60-minute meeting on an M3 Pro consumes ~6% of battery rather than spiking the fans. See also: ANE.
Related: ANE (Apple Neural Engine), M-series (Apple Silicon), On-device, Edge AI
ASR (Automatic Speech Recognition)
#
The model layer that converts spoken audio into written text. The first stage of any meeting transcription pipeline.
ASR is the umbrella term for the model class that turns an audio waveform into a sequence of words. In a meeting transcription stack, ASR is the first stage after audio capture and voice activity detection. The most common open-source ASR families in 2026 are Whisper (OpenAI), Parakeet (NVIDIA), and Distil-Whisper.
On an Apple Silicon Mac, modern ASR models are converted to CoreML and executed on the Apple Neural Engine, which makes real-time transcription cheap enough to leave running for the duration of a meeting. Mac Note Taker uses Parakeet TDT v3 by default and can swap to Whisper Large v3 when higher multilingual accuracy is needed.
Related: Parakeet TDT, Whisper, VAD (Voice Activity Detection), ANE (Apple Neural Engine)

B1 term

BlackHole
#
An open-source virtual audio driver that routes Mac audio between apps. The old answer to capturing system audio, largely replaced by ScreenCaptureKit.
BlackHole is a free virtual audio driver (formerly known as SoundFlower's spiritual successor) that exposes itself to macOS as both an input and an output device. You point an app's audio output at BlackHole, point your recorder's input at the same BlackHole device, and you can capture system audio. It works, but it requires installing a kernel-adjacent driver, swapping the output device every meeting, and dealing with the fact that you cannot hear the meeting while recording it without a second multi-output device.
In 2026 BlackHole is not the recommended path on macOS 13+. ScreenCaptureKit captures system audio at the OS layer without a driver install or output-routing dance. Mac Note Taker does not require or use BlackHole.
Related: ScreenCaptureKit, Loopback, Core Audio process tap

C1 term

Core Audio process tap
#
The macOS 14.2+ API that lets an app subscribe to another running app's audio output stream, with user permission.
Process tap is a Core Audio API introduced in macOS 14.2 that allows an entitled application to subscribe to the audio output of another process. It is the cleanest path on a modern Mac to record what your Zoom or Meet client is playing without routing through a virtual audio device.
Mac Note Taker layers ScreenCaptureKit (for broad system-audio capture) with process tap (for per-process capture and clean per-source channels) so that the resulting recording arrives with mic and remote audio on separate tracks. This separation makes diarization more accurate because the model is not trying to untangle two channels that have already been summed.
Related: ScreenCaptureKit, BlackHole

E2 terms

Edge AI
#
The pattern of running ML inference on the end-user's device rather than in a centralized data center.
Edge AI describes the architectural choice to run model inference on the device that captured the data, rather than uploading the data to a centralized GPU cluster. The motivations are latency (no round-trip), cost (no per-call inference bill), privacy (data does not leave the device), and offline behavior (it works without connectivity).
Mac Note Taker is an edge-AI app by construction. ASR, diarization, speaker embeddings, and (with Ollama) LLM summaries all run on the user's Mac, on the Neural Engine where possible. The Mac is the edge node; there is no centralized inference component to scale, secure, or audit.
Related: On-device, Apple Neural Engine (ANE), ANE (Apple Neural Engine), M-series (Apple Silicon)
End-to-end encryption (E2EE)
#
A property where only the communicating endpoints can read the content - the service operator cannot. Common in messaging, rarely true for meeting notetakers.
End-to-end encryption is a guarantee that data is encrypted on the sender's device, transits through any intermediary servers in ciphertext, and is decrypted only on the recipient's device. Signal, iMessage, and FaceTime are the canonical examples; the operator cannot read the content even under legal compulsion.
Most cloud meeting notetakers (Otter, Fireflies, Fathom) are not end-to-end encrypted by this definition: they decrypt the audio in their own infrastructure to run ASR and AI, which means the operator has access to the plaintext. Mac Note Taker sidesteps the question entirely by never sending the audio off the Mac in the first place - this is sometimes called 'no-cloud' or 'local-first', a stronger guarantee than E2EE because there is no third party in the path at all.
Related: On-device, HIPAA, GDPR

G1 term

GDPR
#
EU regulation governing personal-data processing. Recording a meeting requires a lawful basis and disclosure; local-only storage simplifies residency.
The General Data Protection Regulation (Regulation EU 2016/679) governs the processing of personal data of people in the EU. Recording a conversation that contains identifiable voices is personal-data processing under GDPR and requires a lawful basis (commonly legitimate interest or explicit consent), a documented purpose, and a record of where the data is stored.
A cloud notetaker handling EU data subjects with US-based processing typically forces a Transfer Impact Assessment and a careful look at subprocessor chains. A local-only flow collapses the data-residency question: the data is on the data subject's own device, which the controller (the user) physically controls. Disclosure to the other meeting participants is still required.
Related: HIPAA, End-to-end encryption (E2EE), On-device

H1 term

HIPAA
#
US law governing protected health information. Cloud notetakers handling clinical audio require a Business Associate Agreement; local-only flows do not.
HIPAA (the Health Insurance Portability and Accountability Act) governs the handling of Protected Health Information by Covered Entities (providers, payers, clearinghouses) and their Business Associates. Audio of a clinical encounter is PHI. Any third party that processes PHI on behalf of a Covered Entity must sign a Business Associate Agreement.
A cloud notetaker that ingests audio is a Business Associate by definition and needs a BAA. Many consumer-tier notetakers do not offer one. A local-only flow on the clinician's encrypted Mac is not a Business Associate transaction at all because there is no third party in the data path. Mac Note Taker plus Ollama keeps the workflow entirely inside the clinician's device; if cloud AI is required, routing through Azure OpenAI under an existing Microsoft BAA is the standard pattern.
Related: GDPR, End-to-end encryption (E2EE), On-device

L2 terms

LLM (Large Language Model)
#
A transformer-based text-generation model. Used in Mac Note Taker for summaries, action-item extraction, and speaker rename suggestions.
LLM is the umbrella term for transformer language models large enough to handle generic instruction-following workloads. In a meeting tool, the LLM consumes a diarized transcript and emits a structured summary, an action-item list, and topical chapters. It does not handle the audio itself; ASR has already converted the audio to text by the time the LLM sees the input.
Mac Note Taker treats the LLM as a swappable component. Any OpenAI-compatible endpoint works: local Ollama (default), OpenAI, Azure OpenAI, Groq, Together, Fireworks, or a self-hosted gateway. The same prompt template and JSON output schema run against all of them without modification.
Related: Ollama, OpenAI-compatible endpoint, On-device
Loopback
#
Rogue Amoeba's paid virtual audio routing app. Powerful, but for meeting transcription it is overkill compared to ScreenCaptureKit.
Loopback is a commercial Mac app from Rogue Amoeba (the makers of Audio Hijack) that lets you compose virtual audio devices out of arbitrary input and output sources. It is the gold standard if you need to mix podcast guests, system audio, and a hardware mic into one stream that OBS can record.
For the narrower job of 'record both sides of a Zoom call', Loopback is more than the workflow needs. ScreenCaptureKit + Core Audio process tap covers the same use case with one Screen Recording permission and no additional purchase. Mac Note Taker uses the native path.
Related: BlackHole, ScreenCaptureKit

M1 term

M-series (Apple Silicon)
#
Apple's family of ARM-based system-on-chip designs (M1 through M4) that power every modern Mac and ship with a Neural Engine.
M-series is the marketing name for Apple's in-house ARM SoCs for the Mac, starting with M1 in November 2020 and continuing through M4 in 2024-2025. Every M-series chip contains a CPU, a GPU, a Neural Engine, and a unified memory architecture that lets all three access the same RAM without copying.
Mac Note Taker requires an M-series Mac (M1 or newer) running macOS 14.2 or later. The on-device ASR, diarization, and embedding models all rely on the Neural Engine; running them on the CPU alone, as on an Intel Mac, would be far too slow to keep up with a live meeting. Unified memory is also why a 7B-class Ollama LLM fits comfortably alongside the recording pipeline on an M3 Pro with 18GB.
Related: Apple Neural Engine (ANE), ANE (Apple Neural Engine), Edge AI

O3 terms

Ollama
#
A local LLM runtime for macOS, Linux, and Windows. The default backend for Mac Note Taker's on-device AI summaries.
Ollama is an open-source tool that packages popular open-weight LLMs (Llama 3.2, Qwen 2.5, Phi-3.5, Mistral, Gemma) into a single command-line install that exposes an OpenAI-compatible REST API on localhost:11434. It handles model download, quantization, GPU offload, and prompt-template management transparently.
Mac Note Taker's AI Assistant ships Ollama as the default provider because it preserves the local-first guarantee end-to-end: audio stays on the Mac, transcription stays on the Mac, and the LLM step that produces summaries and action items also stays on the Mac. On an M3 Pro, qwen2.5:7b-instruct returns a 5-bullet summary of a 30-minute meeting in 8-15 seconds.
Related: LLM (Large Language Model), OpenAI-compatible endpoint, On-device
On-device
#
Processing that happens entirely on the user's hardware - no cloud, no network round-trip, no third-party data processor.
On-device is the opposite of cloud-based. The user's Mac performs the audio capture, ASR, diarization, embedding, and (with Ollama) LLM inference without sending data to any third party. The only network traffic from a strictly on-device flow is the operating system's own background noise and, in Mac Note Taker's case, an optional once-a-day license check.
On-device is the structural answer to NDA, HIPAA, GDPR, and attorney-client-privilege concerns that cloud notetakers struggle to fit. It also has a side benefit unrelated to compliance: the workflow works offline, on a plane, in a SCIF, or behind a corporate firewall that blocks third-party APIs.
Related: End-to-end encryption (E2EE), Edge AI, HIPAA, GDPR
OpenAI-compatible endpoint
#
Any HTTP API that mimics OpenAI's /v1/chat/completions shape, allowing one client to talk to many backends with no code change.
An OpenAI-compatible endpoint is any HTTP server that implements the request and response shape OpenAI defined for /v1/chat/completions. The de-facto status of that API has made it a portable interface for LLM tooling: Ollama, vLLM, llama.cpp, Together, Groq, Fireworks, Azure OpenAI, and many private inference gateways all expose the same shape, so a client built against the OpenAI SDK can switch providers with only a base URL and key change.
Mac Note Taker uses this contract as its only LLM interface. The Settings -> AI Assistant pane asks for a base URL, an API key, and a model name. That is enough to point the app at local Ollama, OpenAI proper, an Azure OpenAI deployment under a HIPAA BAA, or an EU-region inference gateway for GDPR-strict workloads.
Related: LLM (Large Language Model), Ollama

P1 term

Parakeet TDT
#
NVIDIA's open-source ASR model family. Parakeet TDT v3 is Mac Note Taker's default English transcription model on the Neural Engine.
Parakeet TDT (Token-and-Duration Transducer) is a streaming ASR architecture released by NVIDIA in 2024 and updated through 2026. It is trained on a large multilingual corpus and produces word-level timestamps natively, which makes it a clean fit for live meeting transcription where the UI needs to highlight the current word as it is spoken.
Parakeet TDT v3 is Mac Note Taker's default ASR model. The CoreML port runs on the Apple Neural Engine at roughly 8x real-time on an M2 and 12-15x real-time on an M3 Pro. Word error rate on clean Zoom audio sits between 4% and 7% on conversational English in 2026 benchmarks. For 99-language coverage at higher latency, the app can swap to Whisper Large v3.
Related: ASR (Automatic Speech Recognition), Whisper, ANE (Apple Neural Engine)

S3 terms

ScreenCaptureKit
#
Apple's modern macOS API for capturing screen video and system audio without a kernel extension or a virtual audio cable.
ScreenCaptureKit is the framework Apple shipped in macOS 13 to replace older screen-capture pathways (CGDisplayStream and various third-party kexts). For audio specifically, ScreenCaptureKit can tap the system audio output of any application that plays sound through Core Audio - Zoom, Meet, Teams, FaceTime, browser tabs - without that application needing to cooperate.
Mac Note Taker uses ScreenCaptureKit (alongside Core Audio's process tap API on macOS 14.2+) to capture the other side of a call. This replaces the BlackHole / Loopback / SoundFlower virtual-cable workflow that was standard before 2023. No kernel extension is installed and the user only grants the standard Screen Recording permission.
Related: Core Audio process tap, BlackHole, Loopback
Sparkle (auto-update)
#
The open-source framework that handles in-app software updates for unsigned-by-the-App-Store macOS apps like Mac Note Taker.
Sparkle is the de-facto standard auto-update framework for macOS apps that ship outside the Mac App Store. It checks a vendor-controlled appcast XML feed, downloads the signed update package, verifies an Ed25519 signature, and replaces the installed app on the next launch. It has been the standard since the mid-2000s and is currently on the 2.x line.
Mac Note Taker ships Sparkle 2.9.1, bound to a public Ed25519 key whose private counterpart is held by the publisher. Updates are delivered as signed and notarized DMGs from a Hetzner-hosted appcast. The framework is non-sandboxed, which simplifies the install step but requires careful re-signing of Sparkle's helper binaries during the build to satisfy hardened-runtime requirements.
Related: On-device
Speaker diarization
#
The process of segmenting an audio stream into per-speaker turns - the 'who spoke when' problem.
Diarization answers the question 'who spoke when' inside a single recording. The output is a sequence of (speaker_id, start_time, end_time) tuples where speaker_id is a temporary label such as Speaker A or Speaker B that is only meaningful inside that recording. Diarization is not the same as speaker identification; it does not know that Speaker A in today's meeting is the same person as Marko in yesterday's.
The 2026 standard model for production diarization is pyannote-segmentation-3.0, a small (~6M parameter) sliding-window classifier that, for each frame, outputs the probability of each of up to three concurrent speakers. On an Apple Silicon Mac running the CoreML port, it processes audio at roughly 8-15x real-time on the Neural Engine. To turn temporary labels into stable names across meetings, diarization is paired with an embedding model such as CAM++.
Related: Voice embeddings, ANE (Apple Neural Engine), ASR (Automatic Speech Recognition)

V2 terms

VAD (Voice Activity Detection)
#
A lightweight model that flags which slices of audio contain human speech, used to gate the heavier ASR and diarization stages.
VAD is the first signal-processing stage in a transcription pipeline. It scans the incoming audio and emits a binary flag per short window: speech or no speech. Without VAD, ASR and diarization waste compute on silence, music, room tone, and background hum, and they hallucinate text in those regions.
VAD models are tiny by design - a few hundred kilobytes - because they run continuously over the full audio stream. Mac Note Taker uses FluidAudio's VAD, which sits on the CPU and consumes roughly 0.5% of one core during a meeting. The boolean output gates whether the downstream Parakeet ASR and pyannote diarization models are invoked at all.
Related: ASR (Automatic Speech Recognition), Speaker diarization
Voice embeddings
#
A fixed-length numeric fingerprint of a voice, used to match the same speaker across multiple recordings.
A voice embedding is a vector - usually between 128 and 256 floating-point numbers - that represents how a voice sounds. Two clips of the same person produce vectors that are close together in cosine distance; two clips of different people produce vectors that are far apart. The embedding is content-invariant: it does not encode what was said, only how it sounded.
Mac Note Taker uses CAM++, a 192-dimensional embedding model, to fingerprint every speaker turn after diarization. At ~770 bytes per turn, even a heavy meeting habit produces under 100MB of embedding data across a full year. These fingerprints are matched against a local profile library using cosine similarity (default threshold 0.65) to reproduce names across meetings without any cloud lookup.
Related: Speaker diarization, On-device

W1 term

Whisper
#
OpenAI's open-source ASR model. Whisper Large v3 is Mac Note Taker's high-accuracy multilingual option.
Whisper is a family of encoder-decoder Transformer ASR models that OpenAI released in 2022 and continued updating through Large v3. It covers 99 languages and is widely considered the accuracy baseline for open-weight transcription. It is slower than streaming architectures like Parakeet but more robust to noisy or accented speech and to languages outside the European cluster.
Mac Note Taker bundles Whisper Large v3 and Distil-Large v3 (a smaller, faster distillation) as alternatives to Parakeet. The choice is per meeting in Settings -> Transcription. Whisper runs on the Apple Neural Engine via CoreML; on an M3 Pro, Large v3 transcribes a 60-minute meeting in roughly 4-6 minutes after the meeting ends.
Related: ASR (Automatic Speech Recognition), Parakeet TDT, ANE (Apple Neural Engine)

Read the longer pieces

The glossary is the reference. The field notes go deeper on diarization, on-device LLMs, and the 2026 privacy framework.

Field notes See pricing

Glossary,in plain English.

ANE (Apple Neural Engine)

Apple Neural Engine (ANE)

ASR (Automatic Speech Recognition)

BlackHole

Core Audio process tap

Edge AI

End-to-end encryption (E2EE)

GDPR

HIPAA

LLM (Large Language Model)

Loopback

M-series (Apple Silicon)

Ollama

On-device

OpenAI-compatible endpoint

Parakeet TDT

ScreenCaptureKit

Sparkle (auto-update)

Speaker diarization

VAD (Voice Activity Detection)

Voice embeddings

Whisper

Read the longer pieces

Glossary,
in plain English.