Skip to main content
How we built a personal meeting assistant using Deepgram

How we built a personal meeting assistant using Deepgram

· 5 min read
Lewis Dwyer
Software engineer and technical writer

We built a meeting assistant that sits in on any call, hears everyone, and answers out loud when someone says its name. Most meeting assistants are silent note-takers that transcribe in the background and email you a summary afterward. Ours talks back: ask it something mid-call and the whole room hears the answer.

The part that makes that work is Deepgram. It transcribes the call as people speak, and sends the assistant's replies back in a natural voice, fast enough to feel like a real conversation. Everything else runs on your own machine and works on any call platform (Zoom, Meet, Teams), with no bot joining the meeting. Otto is a complete, runnable open-source project; this post is a short tour of how we used Deepgram to build it, and you can clone it and try it on your own calls.

Otto transcribing a live call and answering a question out loud the moment it's addressed by name View Otto on GitHub →

Deepgram features we used

We use Deepgram for two jobs: turning the call into text as people speak, and turning Otto's answers back into speech.

Transcribing the call

Deepgram's Listen API handles the live transcription over a streaming WebSocket, on the Nova-3 model. We run two streams at once: one for the call's incoming audio (a mix of everyone else on the call) and one for your own microphone. Keeping them separate means we always know whether you or a remote participant said something. A few of its features do the heavy lifting:

  • Diarization labels who said what, so the transcript reads like a conversation instead of a wall of text.
  • Keyterm prompting boosts the wake word "Otto", so the assistant reliably hears its name even over a noisy call.
  • Endpointing detects when someone has finished a thought, so Otto answers promptly without talking over a half-finished sentence.
  • Interim results and smart formatting keep the transcript readable as it fills in, with punctuation and capitalization already in place.

Speaking the reply

When Otto has an answer, Deepgram's Speak API turns the text into audio, streamed back as it's generated. We use one of the natural-sounding Aura voices so Otto doesn't sound robotic in the middle of a human conversation. Because the audio streams in rather than arriving all at once, the reply starts playing almost immediately instead of after an awkward pause.

How it fits together

Deepgram does the listening and the speaking, but a few pieces around it make the live loop work. Here's the whole path, start to finish:

  1. Capture the audio. We grab the call's sound straight from what your computer is already playing, and your microphone separately, using a macOS Core Audio system tap. Nothing in your audio setup changes and no bot joins the meeting. Both feeds go to Deepgram as the two transcription streams above.
  2. Wait for the name. Every turn is transcribed, but Otto stays quiet until it hears "Otto". A quick check decides whether someone actually addressed it, so a passing mention or a "thanks, Otto" doesn't set it off.
  3. Find an answer. Once Otto is addressed, we hand the recent transcript and the question to an LLM. It can pull in notes from past meetings or search the web, then write a short reply.
  4. Say it out loud. Aura turns that reply into audio, which we feed into a virtual microphone that the call app treats as its mic input, so everyone on the call hears it (and it plays to your headphones too).

The whole loop runs on your own machine, so the live transcript and your meeting notes stay local.

Try it yourself

Otto runs on macOS (14.4 or later) and takes a few minutes to set up. You'll need:

  • Homebrew, to install the audio tools
  • A Deepgram API key (free trial available), for transcription and the voice
  • An OpenAI API key, which Otto uses to write its answers (you can swap in another LLM provider)

The README has the full walkthrough. In short, you:

  • Clone the repo and install its dependencies.
  • Install BlackHole, the free virtual audio device Otto speaks through.
  • Add your API keys to a .env file.
  • Run the setup script, which builds the audio helper and flags anything still missing.
  • Grant the recording permission so Otto can hear the call.
  • Start Otto and join your meeting.
  • Set your call app's microphone to BlackHole 2ch, then ask Otto a question.

That's the whole thing: a meeting assistant that listens, thinks, and talks back in real time, with Deepgram handling the hearing and the speaking. The complete project — capture pipeline, wake-word logic, Deepgram streams, and all — is on GitHub. Clone it, change the wake word or the voice, and make it your own.

Get the full project on GitHub →

About the author

Lewis Dwyer
Lewis DwyerSoftware engineer and technical writer

Lewis Dwyer is a software engineer and technical writer at Ritza. He contributes hands-on testing and writing on developer tools and AI products to TechStackups.