Nathaniel Okenwa
Performant Architecture for a Fast Gen AI User Experience
#1about 2 minutes
Building a real-time translator inspired by sci-fi
The Babel fish from "Hitchhiker's Guide to the Galaxy" serves as the inspiration for a real-time audio translation project.
#2about 4 minutes
Analyzing the latency of a basic AI architecture
A demonstration of the initial 2019 architecture using GCloud reveals a significant latency of over ten seconds for a simple translation.
#3about 2 minutes
Reducing latency by upgrading the AI service stack
Switching to modern, specialized APIs like Deepgram and 11 Labs significantly cuts the total processing time from twelve to five seconds.
#4about 2 minutes
Implementing streaming to reduce response wait times
Adopting a streaming approach provides a major performance boost, but a naive implementation results in chaotic and low-quality audio output.
#5about 2 minutes
Using chunking to balance streaming speed and quality
Chunking data based on sentence punctuation controls the streaming waterfall, improving the quality of generated audio without sacrificing speed.
#6about 6 minutes
Eliminating network latency with local and edge models
Running a smaller, local AI model like Whisper on the edge eliminates cross-continental network latency and provides near-instantaneous results.
#7about 3 minutes
Using caching to serve pre-generated AI responses
Implementing caching, from simple request matching to semantic search with vector databases, avoids redundant generation and speeds up common queries.
#8about 2 minutes
Optimizing prompts and user experience for speed
Fine-tuning performance involves optimizing prompts to generate fewer tokens and improving perceived speed with clear loading states for the user.
#9about 2 minutes
Summary of key performance optimization techniques
A final recap covers the essential strategies for building fast Gen AI experiences, including streaming, edge computing, caching, and prompt optimization.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
05:53 MIN
Key architectural challenges in building GenAI apps
Chatbots are going to destroy infrastructures and your cloud bills
00:57 MIN
Navigating the overwhelming wave of generative AI adoption
Developer Experience, Platform Engineering and AI powered Apps
00:15 MIN
Generative AI use cases and cloud provider limitations
Generative AI power on the web: making web apps smarter with WebGPU and WebNN
26:05 MIN
Panelists' wishes for future AI capabilities
The Future of Developer Experience with GenAI: Driving Engineering Excellence
15:22 MIN
The future of translation and human-AI collaboration
Fireside Chat: Deep Learning, Deep Impact: Harnessing AI for Language Innovation
13:51 MIN
The technology behind in-browser AI execution
Generative AI power on the web: making web apps smarter with WebGPU and WebNN
16:36 MIN
Exploring new frontiers in coding and computer interaction
WeAreDevelopers LIVE - Dapr / Pixels and Generative Art / Open Source and Communities / and more
30:54 MIN
Predicting the next era of generative AI
Closing Keynote by Joel Spolsky
Featured Partners
Related Videos
Prompt API & WebNN: The AI Revolution Right in Your Browser
Christian Liebel
Chatbots are going to destroy infrastructures and your cloud bills
Stanislas Girard
Privacy-first in-browser Generative AI web apps: offline-ready, future-proof, standards-based
Maxim Salnikov
Livecoding with AI
Rainer Stropek
Generative AI power on the web: making web apps smarter with WebGPU and WebNN
Christian Liebel
Make it simple, using generative AI to accelerate learning
Duan Lightfoot
aa
aa
How AI Models Get Smarter
Ankit Patel
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

Front End Engineering Manager ( Generative AI experience )
Accenture
GraphQL
React Native
Continuous Integration


Net Engineer with AI Focus
Speech Processing Solutions GmbH
Remote
€65K
Intermediate
GIT
DevOps
.NET Core
+5





AI Engineer Workflows & Agents (e.g. with Langdock, n8n & make)
WaveSix Labs GmbH
Intermediate
GIT
JSON
GraphQL
Microsoft Office
