Dainius Jocas
Don't Change the Partition Count for Kafka Topics!
#1about 5 minutes
An overview of the data indexing pipeline architecture
The system moves data from a MySQL primary data store to an Elasticsearch search server using a Kafka and Kafka Connect pipeline.
#2about 1 minute
Using Kafka partition offset for optimistic concurrency control
The system leverages the Kafka partition offset as the document version number in Elasticsearch to enable parallel indexing without data consistency issues.
#3about 2 minutes
Investigating a mysterious data deletion failure in production
A bug report about Elasticsearch failing to delete documents, which serves stale data, could not be reproduced in local or testing environments.
#4about 5 minutes
Discovering the offset and version number mismatch
Manual inspection reveals that the document version in Elasticsearch is significantly higher than the new message offset in the Kafka topic for the same key.
#5about 4 minutes
How changing partition count breaks message ordering guarantees
Increasing the Kafka topic's partition count changes the key hashing algorithm, causing new messages for the same key to land in different partitions with lower offsets.
#6about 4 minutes
The solution and key lessons for managing Kafka topics
The fix required a full data re-ingestion into a new Kafka topic, highlighting the lesson to never increase partition count when message ordering is critical.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
03:01 MIN
Exploring Kafka's core concepts of events, topics, and partitions
Let's Get Started With Apache Kafka® for Python Developers
14:52 MIN
Recapping Kafka's capabilities for real-time data feeds
Let's Get Started With Apache Kafka® for Python Developers
11:22 MIN
Navigating the Kafka ecosystem and the power of community
Let's Get Started With Apache Kafka® for Python Developers
16:09 MIN
Answering questions on Kafka use cases, careers, and learning
Let's Get Started With Apache Kafka® for Python Developers
15:40 MIN
Common challenges of running Kafka at scale
Tips, Techniques, and Common Pitfalls Debugging Kafka
05:20 MIN
A traditional approach to streaming with Kafka and Debezium
Python-Based Data Streaming Pipelines Within Minutes
10:34 MIN
Decoupling microservices with event streams
From event streaming to event sourcing 101
09:57 MIN
Managing data consistency with change data capture
Software Engineering Social Connection: Yubo’s lean approach to scaling an 80M-user infrastructure
Featured Partners
Related Videos
Practical Change Data Streaming Use Cases With Debezium And Quarkus
Alex Soto
Tips, Techniques, and Common Pitfalls Debugging Kafka
DeveloperSteve
Let's Get Started With Apache Kafka® for Python Developers
Lucia Cerchie
How to Benchmark Your Apache Kafka
Kirill Kulikov
Kafka Streams Microservices
Denis Washington & Olli Salonen
Distributed search under the hood
Alexander Reelsen
From event streaming to event sourcing 101
Gerard Klijs
Single Server, Global Reach: Running a Worldwide Marketplace on Bare Metal in a Cloud-Dominated World
Jens Happe
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

Senior DevOps Engineer - Search & Services - (f/m/x)
AUTO1 Group SE
Berlin, Germany
Intermediate
Senior
ELK
Terraform
Elasticsearch


Technology Architect - Apache Kafka, Confluent Platform - UK
Infosys Limited
€60K
Ansible
Kubernetes
Apache Kafka
Microservices


Java with Kafka Developer
N Consulting Ltd
London, United Kingdom
Senior
Unit testing
Apache Kafka
Microservices

Desarrollador/a Confluent-Kafka
Inetum
Intermediate
JSON
Docker
Jenkins
Apache Kafka
Continuous Integration

(Senior) Software Engineer - Kafka - STACKIT
Schwarz Unternehmenskommunikation GmbH & Co. KG
Senior
Docker
Terraform
Kubernetes
Apache Kafka
Continuous Integration

Elasticsearch - Principal Engineer - Core Infrastructure, & JVM Internals
Elastic
Kubernetes
Elasticsearch
Microsoft Access