Reducing LLM Calls with Vector Search Patterns - Raphael De Lio (Redis)

Large context windows aren't the answer. Learn three vector search patterns to slash your LLM costs and latency.

Reducing LLM Calls with Vector Search Patterns - Raphael De Lio (Redis)
#1about 3 minutes

The hidden costs of large LLM context windows

Large context windows in models like GPT-5 seem to eliminate the need for RAG, but the high token cost makes this approach expensive and unscalable for every request.

#2about 3 minutes

A brief introduction to vectors and vector search

Text is converted into numerical vector embeddings that capture its semantic meaning, allowing computers to efficiently calculate the similarity between different phrases or documents.

#3about 9 minutes

How to classify text using a vector database

Instead of using a costly LLM for every classification task, you can use a vector database to match new text against pre-embedded reference examples for a specific label.

#4about 5 minutes

Using semantic routing for efficient tool calling

By matching user prompts against pre-defined reference phrases for each tool, you can directly trigger the correct function without an initial, expensive LLM call.

#5about 5 minutes

Reducing latency and cost with semantic caching

Semantic caching stores LLM responses and serves them for new, semantically similar prompts, which avoids re-computation and significantly reduces both cost and latency.

#6about 7 minutes

Strategies for optimizing vector search accuracy

Improve the accuracy of vector search patterns through techniques like self-improvement, a hybrid approach that falls back to an LLM, and chunking complex prompts into smaller clauses.

#7about 3 minutes

Addressing advanced challenges in semantic caching

Mitigate common caching pitfalls, like misinterpreting negative prompts, by using specialized embedding models and combining semantic routing with caching to avoid caching certain types of queries.

Related jobs
Jobs that call for the skills explored in this talk.

test

Milly
Vienna, Austria

Intermediate

test

Milly
Vienna, Austria

Intermediate

job ad

Saby Company
Delebio, Italy

Intermediate

Featured Partners

Related Articles

View all articles
CH
Chris Heilmann
Dev Digest 138 - Are you secure about this?
Hello there! This is the 2nd "out of the can" edition of 3 as I am on vacation in Greece eating lovely things on the beach. So, fewer news, but lots of great resources. Many around the topic of security. Enjoy! News and ArticlesGoogle Pixel phones t...
Dev Digest 138 - Are you secure about this?
CH
Chris Heilmann
Dev Digest 134 - Where pixels sing?
News and ArticlesWeAreDevelopers LIVE Data and Security Day is on Wednesday, 25/09/2024. Learn about OPC UA Updates, Best Practices for Using GitHub Secrets, Passwordless Web 1.5, Emerging AI Security Risks, Data Privacy in LLMs and get a chance to t...
Dev Digest 134 - Where pixels sing?
CH
Chris Heilmann
Dev Digest 116 - WWWAI?
This time, learn how to un-AI Google's search results, what's new on the web, avoid a new security hole and go back to BASICS with us. News and ArticlesWhat a week. Google, Microsoft, OpenAI and many others had their big flagship events announcing th...
Dev Digest 116 - WWWAI?

From learning to earning

Jobs that call for the skills explored in this talk.