Philipp Krenn
Make Your Data FABulous
#1about 7 minutes
Understanding the CAP theorem for distributed systems
The CAP theorem states that a distributed data store can only provide two of three guarantees: consistency, availability, and partition tolerance.
#2about 3 minutes
Introducing the FAB theory for datastore tradeoffs
The FAB theory proposes another set of tradeoffs for data stores, where you can only pick two of three attributes: fast, accurate, or big.
#3about 7 minutes
How terms aggregation trades accuracy for speed
Elasticsearch's terms aggregation may return inaccurate counts by default because each shard only considers its top local results to improve performance.
#4about 8 minutes
Inconsistent relevance scores in distributed full-text search
Full-text search relevance scores using TF-IDF can be inconsistent because inverse document frequency is calculated per-shard, not globally.
#5about 2 minutes
Using a single shard to ensure data accuracy
Forcing an index to use a single shard guarantees accurate aggregations and relevance scores by eliminating distributed calculations, but sacrifices horizontal scaling.
#6about 1 minute
Why you must consciously choose your data tradeoffs
It is crucial to understand and explicitly choose the tradeoffs in your data systems, like those in the CAP and FAB theorems, to avoid unexpected behavior.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
30:35 MIN
Navigating the challenges of distributed aggregations
Distributed search under the hood
25:24 MIN
Q&A on indexing, aggregations, and OpenSearch vs Elasticsearch
Search and aggregations made easy with OpenSearch and NodeJS
16:52 MIN
Optimizing compute, storage, and data transmission
A Hitchhiker's Guide to Resource Efficient Software
21:56 MIN
Optimizing performance with advanced data distribution methods
Fault Tolerance and Consistency at Scale: Harnessing the Power of Distributed SQL Databases
12:14 MIN
Introducing the core principles of Elasticsearch
Distributed search under the hood
14:52 MIN
Recapping Kafka's capabilities for real-time data feeds
Let's Get Started With Apache Kafka® for Python Developers
25:26 MIN
Modern data architectures and the reality of team size
Modern Data Architectures need Software Engineering
21:37 MIN
Distributing data using shards and replicas
Distributed search under the hood
Featured Partners
Related Videos
Distributed search under the hood
Alexander Reelsen
Database Magic behind 40 Million operations/s
Jürgen Pilz
Things I learned while writing high-performance JavaScript applications
Michele Riva
Modern Data Architectures need Software Engineering
Matthias Niehoff
Leveraging Real time data in FSIs
Tim Faulkes
Maximising Cassandra's Potential: Tips on Schema, Queries, Parallel Access, and Reactive Programming
Hartmut Armbruster
Writing a full-text search engine in TypeScript
Michele Riva
Empowering Retail Through Applied Machine Learning
Christoph Fassbach & Daniel Rohr
Related Articles
View all articles
.gif?w=240&auto=compress,format)


From learning to earning
Jobs that call for the skills explored in this talk.

Senior DevOps Engineer - Search & Services - (f/m/x)
AUTO1 Group SE
Berlin, Germany
Intermediate
Senior
ELK
Terraform
Elasticsearch



Elasticsearch - Principal Engineer - Core Infrastructure, & JVM Internals
Elastic
Kubernetes
Elasticsearch
Microsoft Access




