Nathaniel Okenwa

Hello JARVIS - Building Voice Interfaces for Your LLMS

Can we escape the uncanny valley of conversation? Learn to build voice AIs that handle interruptions and latency like a true conversational partner.

Hello JARVIS - Building Voice Interfaces for Your LLMS
#1about 2 minutes

Introduction to building JARVIS-like voice interfaces

The goal of building a sophisticated voice AI assistant like Iron Man's JARVIS is now more achievable thanks to modern technologies.

#2about 5 minutes

Why natural voice AI has been so difficult

Fictional AI assistants set a high bar for natural voice interaction that early real-world technologies like Siri failed to meet until the arrival of LLMs.

#3about 3 minutes

Navigating the uncanny valley of AI conversations

To avoid the unsettling 'uncanny valley' in voice AI, systems must handle non-linear conversations, interruptions, and the subtle timing of human speech.

#4about 3 minutes

Architecting a composable text-based voice AI stack

A modern voice AI stack combines speech-to-text, an LLM, and text-to-speech, offering more control and better performance than current speech-to-speech models.

#5about 8 minutes

Live demo of handling user interruptions

The demo shows how to implement interruption handling by stopping the AI's audio output and feeding the context of the interruption back into the LLM prompt.

#6about 3 minutes

Using voice interstitials to manage processing delays

Voice interstitials are pre-emptive audio messages that inform the user an action is in progress, preventing the perception of a system failure during long tasks.

#7about 1 minute

Designing AI agents as a constellation of models

An effective voice AI system is not a single monolithic agent but a constellation of smaller, faster models for specific tasks like checking for wake words or playing interstitials.

#8about 2 minutes

Abstracting voice infrastructure with Twilio Conversation Relay

Twilio's Conversation Relay simplifies development by managing the complex audio pipeline, including speech-to-text, text-to-speech, and interruption handling, via a WebSocket API.

Related jobs
Jobs that call for the skills explored in this talk.

test

Milly
Vienna, Austria

Intermediate

test

Milly
Vienna, Austria

Intermediate

job ad

Saby Company
Delebio, Italy

Intermediate

Featured Partners

Related Articles

View all articles
CH
Chris Heilmann
WWC24 Talk - Scott Hanselman - AI: Superhero or Supervillain?
Join Scott Hanselman at WWC24 to explore AI's role as a superhero or supervillain. Scott shares his 32 years of experience in software engineering, discusses AI myths, ethical dilemmas, and tech advancements. Engage with his live demos and insights o...
WWC24 Talk - Scott Hanselman - AI: Superhero or Supervillain?
DC
Daniel Cranney
Stephan Gillich - Bringing AI Everywhere
In the ever-evolving world of technology, AI continues to be the frontier for innovation and transformation. Stephan Gillich, from the AI Center of Excellence at Intel, dove into the subject in a recent session titled "Bringing AI Everywhere," sheddi...
Stephan Gillich - Bringing AI Everywhere
CH
Chris Heilmann
Exploring AI: Opportunities and Risks for Developers
In today's rapidly evolving tech landscape, the integration of Artificial Intelligence (AI) in development presents both exciting opportunities and notable risks. This dynamic was the focus of a recent panel discussion featuring industry experts Kent...
Exploring AI: Opportunities and Risks for Developers

From learning to earning

Jobs that call for the skills explored in this talk.

AI Engineer, London

AI Engineer, London

Eloquent AI

52K
Intermediate
Node.js
GraphQL
TypeScript
Microservices