Philipp Krenn

Jun 15, 2022 • World Congress 2022

Make Your Data FABulous

Your Elasticsearch query returns the top 10 results. But what if the real top result is missing entirely? Here's why.

#1about 7 minutes

Understanding the CAP theorem for distributed systems

The CAP theorem states that a distributed data store can only provide two of three guarantees: consistency, availability, and partition tolerance.

#2about 3 minutes

Introducing the FAB theory for datastore tradeoffs

The FAB theory proposes another set of tradeoffs for data stores, where you can only pick two of three attributes: fast, accurate, or big.

#3about 7 minutes

How terms aggregation trades accuracy for speed

Elasticsearch's terms aggregation may return inaccurate counts by default because each shard only considers its top local results to improve performance.

#4about 8 minutes

Inconsistent relevance scores in distributed full-text search

Full-text search relevance scores using TF-IDF can be inconsistent because inverse document frequency is calculated per-shard, not globally.

#5about 2 minutes

Using a single shard to ensure data accuracy

Forcing an index to use a single shard guarantees accurate aggregations and relevance scores by eliminating distributed calculations, but sacrifices horizontal scaling.

#6about 1 minute

Why you must consciously choose your data tradeoffs

It is crucial to understand and explicitly choose the tradeoffs in your data systems, like those in the CAP and FAB theorems, to avoid unexpected behavior.

Andrew Comp
Berlin, Germany

Intermediate

Java

JavaScript

Admir Comp

Remote

Intermediate

DevOps

Navigating the challenges of distributed aggregations

02:56 MIN

Navigating the challenges of distributed aggregations

Distributed search under the hood

Unlock full access

Log in or set up an account to access this feature and more.

Q&A on indexing, aggregations, and OpenSearch vs Elasticsearch

03:31 MIN

Q&A on indexing, aggregations, and OpenSearch vs Elasticsearch

Search and aggregations made easy with OpenSearch and NodeJS

Unlock full access

Log in or set up an account to access this feature and more.

Optimizing compute, storage, and data transmission

05:32 MIN

Optimizing compute, storage, and data transmission

A Hitchhiker's Guide to Resource Efficient Software

Unlock full access

Log in or set up an account to access this feature and more.

Optimizing performance with advanced data distribution methods

04:58 MIN

Optimizing performance with advanced data distribution methods

Fault Tolerance and Consistency at Scale: Harnessing the Power of Distributed SQL Databases

Unlock full access

Log in or set up an account to access this feature and more.

Introducing the core principles of Elasticsearch

04:29 MIN

Introducing the core principles of Elasticsearch

Distributed search under the hood

Unlock full access

Log in or set up an account to access this feature and more.

Recapping Kafka's capabilities for real-time data feeds

01:17 MIN

Recapping Kafka's capabilities for real-time data feeds

Let's Get Started With Apache Kafka® for Python Developers

Unlock full access

Log in or set up an account to access this feature and more.

Modern data architectures and the reality of team size

03:59 MIN

Modern data architectures and the reality of team size

Modern Data Architectures need Software Engineering

Unlock full access

Log in or set up an account to access this feature and more.

Distributing data using shards and replicas

02:40 MIN

Distributing data using shards and replicas

Distributed search under the hood

Unlock full access

Log in or set up an account to access this feature and more.

Featured Partners

Distributed search under the hood

Distributed search under the hood

Alexander Reelsen

about 4 years ago • WeAreDevelopers LIVE

Leveraging Real time data in FSIs

Leveraging Real time data in FSIs

Tim Faulkes

about 2 years ago • WeAreDevelopers LIVE

Modern Data Architectures need Software Engineering

Modern Data Architectures need Software Engineering

Matthias Niehoff

about 2 years ago • World Congress 2024

Making Data Warehouses fast. A developer's story.

Making Data Warehouses fast. A developer's story.

Adnan Rahic

about 4 years ago • JavaScript Congress

Writing a full-text search engine in TypeScript

Writing a full-text search engine in TypeScript

Michele Riva

about 4 years ago • World Congress 2022

Database Magic behind 40 Million operations/s

Database Magic behind 40 Million operations/s

Jürgen Pilz

about 2 years ago • World Congress 2023

In-Memory Computing - The Big Picture

In-Memory Computing - The Big Picture

Markus Kett

about 3 years ago • World Congress 2023

How building an industry DBMS differs from building a research one

How building an industry DBMS differs from building a research one

Markus Dreseler

about 2 years ago • World Congress 2023

Related Articles

View all articles

DD

Dilek Demir

Data Science & more: The Lopez dilemma

Catwalk, Data Science, Hollywood, Google Images, Haute Couture, StackOverflow, Comfort Zone, Dota 2 and Versace – all these topics are connected and influenced by each other. Read here how and why!In 2000 Jennifer Lopez's green Versace dress went vi...

Data Science & more: The Lopez dilemma

BB

Benedikt Bischof

Making Data Warehouses Fast: A Developer’s Story

Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Adnan Rahic who teaches the audience how to make data warehouses.About the Speaker: Adnan is senior developers advocate at Cube. His passion lie...

Making Data Warehouses Fast: A Developer’s Story

CH

Chris Heilmann

Coffee with Developers - Maria Apazoglou - Making AI understandable for all in production

Hello and welcome to another edition of Coffee with Developers. Today, we're excited to share an intriguing conversation with Maria Apazoglou, a leading figure in the AI space at Thomson Reuters. Maria's career journey, insights on AI, and the exciti...

Coffee with Developers - Maria Apazoglou - Making AI understandable for all in production

CH

Chris Heilmann

All the videos of Halfstack London 2024!

Last month was Halfstack London, a conference about the web, JavaScript and half a dozen other things. We were there to deliver a talk, but also to record all the sessions and we're happy to share them with you. It took a bit as we had to wait for th...

All the videos of Halfstack London 2024!

From learning to earning

Jobs that call for the skills explored in this talk.

Senior DevOps Engineer - Search & Services - (f/m/x)

AUTO1 Group SE
Berlin, Germany

Intermediate

Senior

ELK

Terraform

Elasticsearch

Data Engineer (f/m/d) - AI

smartclip Europe GmbH
Hamburg, Germany

Intermediate

Senior

ETL

Java

Scala

Head of Data (w/m/d)

GetAway Group GmbH
Berlin, Germany

Senior

Data Engineer (all genders) in Frankfurt

PRODYNA SE
Frankfurt, Germany

Intermediate

Senior

Data Engineer (all genders) in Berlin

PRODYNA SE
Berlin, Germany

Intermediate

Senior

Data Engineer (m/w/d) mit Fokus auf Databricks

Steadforce GmbH
Munich, Germany

Intermediate

Python

Data Analytics Architekt (m/w/d)

Milestone Consult GmbH
Kamp-Lintfort, Germany

Senior

Analytics Engineer (w/m/d)

Lotum media GmbH
Bad Nauheim, Germany

Intermediate

Senior

Pandas

Data Analytics Architekt (m/w/d)

AraCom IT Services GmbH
Kamp-Lintfort, Germany

Intermediate

Senior

ETL