Aug 27, 2025

Reducing LLM Calls with Vector Search Patterns - Raphael De Lio (Redis)

Large context windows aren't the answer. Learn three vector search patterns to slash your LLM costs and latency.

#1about 3 minutes

The hidden costs of large LLM context windows

Large context windows in models like GPT-5 seem to eliminate the need for RAG, but the high token cost makes this approach expensive and unscalable for every request.

#2about 3 minutes

A brief introduction to vectors and vector search

Text is converted into numerical vector embeddings that capture its semantic meaning, allowing computers to efficiently calculate the similarity between different phrases or documents.

#3about 9 minutes

How to classify text using a vector database

Instead of using a costly LLM for every classification task, you can use a vector database to match new text against pre-embedded reference examples for a specific label.

#4about 5 minutes

Using semantic routing for efficient tool calling

By matching user prompts against pre-defined reference phrases for each tool, you can directly trigger the correct function without an initial, expensive LLM call.

#5about 5 minutes

Reducing latency and cost with semantic caching

Semantic caching stores LLM responses and serves them for new, semantically similar prompts, which avoids re-computation and significantly reduces both cost and latency.

#6about 7 minutes

Strategies for optimizing vector search accuracy

Improve the accuracy of vector search patterns through techniques like self-improvement, a hybrid approach that falls back to an LLM, and chunking complex prompts into smaller clauses.

#7about 3 minutes

Addressing advanced challenges in semantic caching

Mitigate common caching pitfalls, like misinterpreting negative prompts, by using specialized embedding models and combining semantic routing with caching to avoid caching certain types of queries.

Andrew Comp
Berlin, Germany

Intermediate

Java

JavaScript

Admir Comp

Remote

Intermediate

DevOps

Reducing latency and cost with semantic caching

05:33 MIN

Reducing latency and cost with semantic caching

WeAreDevelopers LIVE - Vector Similarity Search Patterns for Efficiency and more

Unlock full access

Log in or set up an account to access this feature and more.

Solving LLM limitations with RAG and vector databases

01:48 MIN

Solving LLM limitations with RAG and vector databases

Accelerating GenAI Development: Harnessing Astra DB Vector Store and Langflow for LLM-Powered Apps

Unlock full access

Log in or set up an account to access this feature and more.

Using semantic classification to categorize text

08:44 MIN

Using semantic classification to categorize text

WeAreDevelopers LIVE - Vector Similarity Search Patterns for Efficiency and more

Unlock full access

Log in or set up an account to access this feature and more.

Advanced patterns for building sophisticated AI applications

04:30 MIN

Advanced patterns for building sophisticated AI applications

Java Meets AI: Empowering Spring Developers to Build Intelligent Apps

Unlock full access

Log in or set up an account to access this feature and more.

Implementing semantic routing for tool calling and guardrails

05:09 MIN

Implementing semantic routing for tool calling and guardrails

WeAreDevelopers LIVE - Vector Similarity Search Patterns for Efficiency and more

Unlock full access

Log in or set up an account to access this feature and more.

Exploring the architecture of a RAG system

03:05 MIN

Exploring the architecture of a RAG system

Building Real-Time AI/ML Agents with Distributed Data using Apache Cassandra and Astra DB

Unlock full access

Log in or set up an account to access this feature and more.

Comparing LLM, vector search, and graph RAG approaches

03:45 MIN

Comparing LLM, vector search, and graph RAG approaches

Give Your LLMs a Left Brain

Unlock full access

Log in or set up an account to access this feature and more.

Using caching to serve pre-generated AI responses

03:21 MIN

Using caching to serve pre-generated AI responses

Performant Architecture for a Fast Gen AI User Experience

Unlock full access

Log in or set up an account to access this feature and more.

Featured Partners

WeAreDevelopers LIVE - Vector Similarity Search Patterns for Efficiency and more

WeAreDevelopers LIVE - Vector Similarity Search Patterns for Efficiency and more

Chris Heilmann, Daniel Cranney, Raphael De Lio & Developer Advocate at Redis

about 6 months ago • WeAreDevelopers LIVE

Accelerating GenAI Development: Harnessing Astra DB Vector Store and Langflow for LLM-Powered Apps

Accelerating GenAI Development: Harnessing Astra DB Vector Store and Langflow for LLM-Powered Apps

Dieter Flick & Michel de Ru

about 2 years ago • World Congress 2024

Carl Lapierre - Exploring Advanced Patterns in Retrieval-Augmented Generation

Carl Lapierre - Exploring Advanced Patterns in Retrieval-Augmented Generation

Carl Lapierre

about a year ago • World Congress 2024

Semantic AI: Why Embeddings Might Matter More Than LLMs

Semantic AI: Why Embeddings Might Matter More Than LLMs

Christian Weyer

about 6 months ago • World Congress 2025

Building Real-Time AI/ML Agents with Distributed Data using Apache Cassandra and Astra DB

Building Real-Time AI/ML Agents with Distributed Data using Apache Cassandra and Astra DB

Dieter Flick

about 2 years ago • World Congress 2023

Three years of putting LLMs into Software - Lessons learned

Three years of putting LLMs into Software - Lessons learned

Simon A.T. Jiménez

about 6 months ago • World Congress 2025

Martin O'Hanlon - Make LLMs make sense with GraphRAG

Martin O'Hanlon - Make LLMs make sense with GraphRAG

Martin O'Hanlon

about 10 months ago • WeAreDevelopers LIVE

How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge

How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge

Meta Atamel & Guillaume Laforge

about 10 months ago • Coffee With Developers

Related Articles

View all articles

CH

Chris Heilmann

Dev Digest 138 - Are you secure about this?

Hello there! This is the 2nd "out of the can" edition of 3 as I am on vacation in Greece eating lovely things on the beach. So, fewer news, but lots of great resources. Many around the topic of security. Enjoy! News and ArticlesGoogle Pixel phones t...

Dev Digest 138 - Are you secure about this?

CH

Chris Heilmann

Dev Digest 134 - Where pixels sing?

News and ArticlesWeAreDevelopers LIVE Data and Security Day is on Wednesday, 25/09/2024. Learn about OPC UA Updates, Best Practices for Using GitHub Secrets, Passwordless Web 1.5, Emerging AI Security Risks, Data Privacy in LLMs and get a chance to t...

Dev Digest 134 - Where pixels sing?

CH

Chris Heilmann

Dev Digest 116 - WWWAI?

This time, learn how to un-AI Google's search results, what's new on the web, avoid a new security hole and go back to BASICS with us. News and ArticlesWhat a week. Google, Microsoft, OpenAI and many others had their big flagship events announcing th...

Dev Digest 116 - WWWAI?

BB

Benedikt Bischof

MLops – Deploying, Maintaining And Evolving Machine Learning Models in Production

Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Bas Geerdink who gave advice on MLOps.‍About the speaker:‍Bas is a programmer, scientist, and IT manager. At ING, he is responsible for the Fast...

MLops – Deploying, Maintaining And Evolving Machine Learning Models in Production

From learning to earning

Jobs that call for the skills explored in this talk.

Machine Learning Expert

Acrolinx
Berlin, Germany

Senior

Automated Testing

Functional Testing

AI & Embedded ML Engineer (Real-Time Edge Optimization)

autonomous-teaming

Remote

GIT

Linux

PyTorch

ML Data Engineer - Object Detection & Active Learning

autonomous-teaming

Remote

NoSQL

NumPy

Pandas

Docker

ML Data Engineer - Object Detection & Active Learning

autonomous-teaming

Remote

NoSQL

NumPy

Pandas

Docker

Conversational AI & Machine Learning Engineer

Deloitte

Machine Learning

Conversational AI & Machine Learning Engineer

Deloitte

DevOps

Docker

PyTorch

Tensorflow

Kubernetes

+2

Machine Learning Algorithm/SW Optimization Engineer

Leuven MindGate
Leuven, Belgium

PyTorch

Tensorflow

Machine Learning

AI/ML Engineer - LLM Systems

Anexia Internetdienstleistungs Gmbh

€54K

DevOps

Docker

Ansible

PyTorch

+2

AI Solution Architect

Merantix AG

Machine Learning