ScreenCaptureKit · macOS 14.2+

Both sides of the call.
Both clean.

Mic plus system audio captured side-by-side, mixed at the source, diarized into one transcript. No bot in the meeting. No virtual audio cable. No QuickTime gymnastics.

MIC · YOUlive

AVAudioEngine · 16 kHz mono · zero added latency

SYSTEM · CALLlive

ScreenCaptureKit · audio-only filter · OS-level tap

Get lifetime - $149 ↓ How it works

The old way

A bot in your meeting.

Cloud notetakers join the call as a participant, upload your audio to a third-party server, and bill you monthly. Your customers see the bot. Your compliance team sees a vendor list. Your finance team sees a recurring charge.

The Mac Note Taker way

A tap on macOS.

ScreenCaptureKit (the modern audio path Apple shipped with macOS 13) lets a signed Mac app subscribe to the system audio stream. We mix that with your mic locally, run on-device ASR + diarization, and you get a finished, named-speaker transcript when the call ends. Nothing leaves the Mac.

Tested with

ZoomGoogle MeetMicrosoft TeamsSlack huddlesWebexDiscordFaceTimeAroundAround.coWherebyPopTuple

Anything that plays audio through your Mac is in scope. ScreenCaptureKit doesn't care about the meeting client - it taps the OS mixer.

Pipeline

Two streams in. One transcript out.

01
Permissions, once
On first launch, Mac Note Taker asks for Microphone and Screen Recording permission. macOS handles the prompts. We don't store any extra data because of them - the entitlements just unlock the capture APIs.
02
Two captures, parallel
AVAudioEngine taps the input device for your mic. ScreenCaptureKit subscribes to a system-audio-only stream (no display recording, no video frames). Both stream into the same on-device buffer.
03
VAD splits speech
FluidAudio's voice-activity detector cuts each stream into speech segments. Silence and music get dropped before they hit the heavier models.
04
ASR + diarization, on the Neural Engine
Parakeet TDT v3 transcribes; pyannote-segmentation-3.0 + CAM++ split the segments into speaker turns. All on Apple Neural Engine. Real-time on M1 and newer.
05
Merge by timestamp
The two streams' diarized turns land in one timeline. Your mic stream is force-labeled You; remote speakers get matched against the cross-meeting fingerprint database.

No bot. No upload. No subscription.

$149 lifetime, three Macs. First 100 buyers pay $79 with code FOUDNER.

Common questions

Do I need BlackHole or a virtual audio device?
No. ScreenCaptureKit is the supported way on macOS 13+. Virtual cables are a workaround for an API that's no longer the recommended path.
Will the meeting client see anything weird?
No. ScreenCaptureKit reads at the OS mixer level. The meeting client doesn't see a tap or a virtual device added.
Does it work for FaceTime?
Yes. Same path as Zoom - system audio is system audio.
What about meetings on iPhone, with the Mac taking notes?
Route the iPhone's audio to the Mac via Continuity / AirPlay or take the call on the Mac. ScreenCaptureKit can't see another device.
Battery hit?
About 6-8% of charge for a one-hour meeting on M3 Pro. Less than the Zoom client itself.

Both sides of the call.Both clean.

A bot in your meeting.

A tap on macOS.

Two streams in. One transcript out.

Permissions, once

Two captures, parallel

VAD splits speech

ASR + diarization, on the Neural Engine

Merge by timestamp

No bot. No upload. No subscription.

Common questions

Both sides of the call.
Both clean.