Lee Boonstra

Raise your voice!

The hardest part of building a voice AI isn't the AI. It's making all the tools work together in real-time. Here's the complete blueprint.

Raise your voice!
#1about 1 minute

Building a custom voice AI with WebRTC and Google APIs

An overview of the architecture for streaming voice from a browser to a backend for processing with conversational AI.

#2about 4 minutes

Comparing custom voice AI to public assistants

A custom voice AI provides more control over technical requirements and terms of service compared to public platforms like Google Assistant or Alexa.

#3about 1 minute

Handling short versus long user utterances

Public assistants are optimized for short commands, whereas custom AI for use cases like contact centers must be designed to handle long, complex user stories.

#4about 3 minutes

Demo of a voice-enabled self-service kiosk

A demonstration of a web-based airport kiosk that answers user questions spoken in different languages using a custom voice AI.

#5about 1 minute

The core challenge of integrating voice technologies

The main difficulty in building a voice AI is not using individual APIs, but integrating the entire pipeline from frontend audio stream to backend processing.

#6about 3 minutes

Capturing cross-browser microphone audio with RecordRTC

The RecordRTC library is used to abstract away browser inconsistencies and reliably capture microphone audio streams for processing.

#7about 2 minutes

Streaming audio to the backend with Socket.IO

Socket.IO and the socket.io-stream module enable real-time, bidirectional streaming of binary audio data from the browser to a Node.js backend.

#8about 3 minutes

Transcribing audio with the Speech-to-Text API

Google's Speech-to-Text API converts the incoming audio stream into text using a streaming recognition call that handles data as it arrives.

#9about 4 minutes

Understanding user intent with Dialogflow

Dialogflow uses natural language understanding to match transcribed user text to predefined intents, entities, and knowledge bases to determine the user's goal.

#10about 4 minutes

Adding multi-language support with the Translate API

The Translate API enables multi-language support by translating foreign language input to English for Dialogflow processing and then translating the response back.

#11about 3 minutes

Generating audio responses with Text-to-Speech

The Text-to-Speech API synthesizes a natural-sounding voice from the text response, which is then sent back to the browser as an audio buffer to be played.

#12about 1 minute

Deployment considerations and open source code

Deploying a voice application requires HTTPS for microphone access, which can be easily configured using services like App Engine Flex, and the full project code is available on GitHub.

Related jobs
Jobs that call for the skills explored in this talk.

job ad

Saby Company
Delebio, Italy

Intermediate

d

Saby Company
Delebio, Italy

Junior

Featured Partners

Related Articles

View all articles
EM
Eli McGarvie
13 AI Tools You Have to Try
First, it was NFTs, then it was Web3, and now it’s generative AI… it’s probably time to stop collecting pictures of monkeys and kitties. Chatbots and generative AI are the next big thing. This time we’ve jumped on a trend that has real-world applicat...
13 AI Tools You Have to Try
CH
Chris Heilmann
Exploring AI: Opportunities and Risks for Developers
In today's rapidly evolving tech landscape, the integration of Artificial Intelligence (AI) in development presents both exciting opportunities and notable risks. This dynamic was the focus of a recent panel discussion featuring industry experts Kent...
Exploring AI: Opportunities and Risks for Developers
AB
Adrien Book
Top 5 ChatGPT Plugins for Developers
The last few weeks have been very interesting in the AI space. We saw the release of a new updated version of ChatGPT from GPT-3.5 to GPT-4. Within a couple of days, Google soft-launched their competitor AI chatbot, Bard (available in the US and UK)....
Top 5 ChatGPT Plugins for Developers

From learning to earning

Jobs that call for the skills explored in this talk.