ScreenCaptureKit · macOS 14.2+

Both sides of the call.
Both clean.

Mic plus system audio captured side-by-side, mixed at the source, diarized into one transcript. No bot in the meeting. No virtual audio cable. No QuickTime gymnastics.

MIC · YOUlive

AVAudioEngine · 16 kHz mono · zero added latency

SYSTEM · CALLlive

ScreenCaptureKit · audio-only filter · OS-level tap

The old way

A bot in your meeting.

Cloud notetakers join the call as a participant, upload your audio to a third-party server, and bill you monthly. Your customers see the bot. Your compliance team sees a vendor list. Your finance team sees a recurring charge.

The Mac Note Taker way

A tap on macOS.

ScreenCaptureKit (the modern audio path Apple shipped with macOS 13) lets a signed Mac app subscribe to the system audio stream. We mix that with your mic locally, run on-device ASR + diarization, and you get a finished, named-speaker transcript when the call ends. Nothing leaves the Mac.

Tested with

ZoomGoogle MeetMicrosoft TeamsSlack huddlesWebexDiscordFaceTimeAroundAround.coWherebyPopTuple

Anything that plays audio through your Mac is in scope. ScreenCaptureKit doesn't care about the meeting client - it taps the OS mixer.

Pipeline

Two streams in. One transcript out.

  1. 01

    Permissions, once

    On first launch, Mac Note Taker asks for Microphone and Screen Recording permission. macOS handles the prompts. We don't store any extra data because of them - the entitlements just unlock the capture APIs.

  2. 02

    Two captures, parallel

    AVAudioEngine taps the input device for your mic. ScreenCaptureKit subscribes to a system-audio-only stream (no display recording, no video frames). Both stream into the same on-device buffer.

  3. 03

    VAD splits speech

    FluidAudio's voice-activity detector cuts each stream into speech segments. Silence and music get dropped before they hit the heavier models.

  4. 04

    ASR + diarization, on the Neural Engine

    Parakeet TDT v3 transcribes; pyannote-segmentation-3.0 + CAM++ split the segments into speaker turns. All on Apple Neural Engine. Real-time on M1 and newer.

  5. 05

    Merge by timestamp

    The two streams' diarized turns land in one timeline. Your mic stream is force-labeled You; remote speakers get matched against the cross-meeting fingerprint database.

No bot. No upload. No subscription.

$149 lifetime, three Macs. First 100 buyers pay $79 with code FOUDNER.

Common questions

  • Do I need BlackHole or a virtual audio device?

    No. ScreenCaptureKit is the supported way on macOS 13+. Virtual cables are a workaround for an API that's no longer the recommended path.

  • Will the meeting client see anything weird?

    No. ScreenCaptureKit reads at the OS mixer level. The meeting client doesn't see a tap or a virtual device added.

  • Does it work for FaceTime?

    Yes. Same path as Zoom - system audio is system audio.

  • What about meetings on iPhone, with the Mac taking notes?

    Route the iPhone's audio to the Mac via Continuity / AirPlay or take the call on the Mac. ScreenCaptureKit can't see another device.

  • Battery hit?

    About 6-8% of charge for a one-hour meeting on M3 Pro. Less than the Zoom client itself.