Noaa Barki

What we Learned from Reading 100+ Kubernetes Post-Mortems

What's the #1 cause of Kubernetes outages? After analyzing over 100 post-mortems, the answer is surprisingly simple—and completely preventable.

What we Learned from Reading 100+ Kubernetes Post-Mortems
#1about 6 minutes

Understanding the developer versus DevOps cultural divide

A story from a DevOps meetup illustrates the different goals and perspectives that create friction between developers and operations teams.

#2about 2 minutes

Bridge the gap with champions and failure stories

Delegate knowledge to developer champions and learn best practices by studying the post-mortem stories of other companies.

#3about 5 minutes

Common Kubernetes misconfigurations from real outages

Examples from Target and Zalando show how simple errors like incorrect CronJob concurrency policies or missing memory limits can cause major production failures.

#4about 3 minutes

How to introduce policy enforcement gradually

Avoid organizational friction by implementing new policies slowly, starting with a single pilot team to gain agreement and understanding before a wider rollout.

#5about 3 minutes

Categorizing the three types of Kubernetes failures

Kubernetes failures typically fall into three categories: simple syntax errors, gaps in knowledge of best practices, and misalignment with internal company policies.

#6about 2 minutes

Validating Kubernetes YAML for syntax and schema errors

Use tools like yq for YAML format validation and kubeconform for schema validation without requiring direct cluster access for developers.

#7about 4 minutes

The challenges of managing policies as code in Git

Managing policies in Git creates versioning nightmares and lacks features for permissions, dynamic adjustments, and providing clear remediation guidelines.

#8about 4 minutes

Using Datree for centralized policy management

Datree is an open-source tool that provides a centralized location for managing policies, which are then enforced locally and in CI for developers.

#9about 1 minute

The real meaning of shifting responsibility left

True shift-left culture is not just about tools but about delegating responsibility and empowering developers to own their configurations.

Related jobs
Jobs that call for the skills explored in this talk.

test

Milly
Vienna, Austria

Intermediate

test

Milly
Vienna, Austria

Intermediate

Featured Partners

Related Articles

View all articles
Learning Kubernetes made easy with KubeCampus
Learning to use Kubernetes? KubeCampus by Kasten offers free educational content for all skill levels to get you started!Kubernetes is an open-source system for deploying, scaling and managing containerized applications. It allows you to deploy your ...
Learning Kubernetes made easy with KubeCampus
CH
Chris Heilmann
Dev Digest 134 - Where pixels sing?
News and ArticlesWeAreDevelopers LIVE Data and Security Day is on Wednesday, 25/09/2024. Learn about OPC UA Updates, Best Practices for Using GitHub Secrets, Passwordless Web 1.5, Emerging AI Security Risks, Data Privacy in LLMs and get a chance to t...
Dev Digest 134 - Where pixels sing?
Dev Digest 113 - Debugging above the cloud
Hello there and welcome to Dev Digest 113! This time, we got an old friend in the sky back, jQuery asks us to upgrade and AI is eating the web. Also, are you sure the LLM you use is secure against code injection?News and ArticlesGood news everyone! N...
Dev Digest 113 - Debugging above the cloud

From learning to earning

Jobs that call for the skills explored in this talk.