Nele Uhlemann
Handling incidents collaboratively is like solving a rubix cube
#1about 4 minutes
The Rubik's Cube metaphor for engineering teams
Different engineering teams like backend and SREs operate on different sides of the system, requiring collaboration during incidents.
#2about 3 minutes
The first phase of resolving incidents collaboratively
The initial step in incident response is to establish a common understanding and transparency across teams before applying quick fixes.
#3about 2 minutes
Preventing future incidents with best practices
After resolving an incident, teams must collaborate on prevention by documenting best practices for patterns like service retries.
#4about 2 minutes
Discovering incidents through system observability
The discovery phase relies on making systems observable by collecting telemetry data like logs, metrics, and traces.
#5about 2 minutes
Standardizing telemetry collection with OpenTelemetry
OpenTelemetry provides a vendor-neutral standard for instrumenting applications, preventing vendor lock-in for observability backends.
#6about 2 minutes
Simplifying metrics with the Autometrics library
The open-source Autometrics library uses decorators to automatically generate key metrics like latency, errors, and request rate from functions.
#7about 5 minutes
Demo of generating metrics and SLOs from code
A live demo shows how Autometrics provides live metrics in the IDE and helps define SLOs that can be visualized in Grafana.
#8about 1 minute
Summary of collaborative incident management phases
A recap of the three key phases for collaborative incident handling: resolving, preventing, and discovering issues together.
#9about 2 minutes
Q&A on tooling and open source contribution
The speaker answers audience questions about managing tool complexity and the role of community contributions in open-source projects.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
06:30 MIN
Applying agile and SRE principles to incident response
Applying Agile Principles to Incident Management
27:09 MIN
Actionable takeaways for SREs on incident management
Serverless Observability: where SLOs meet transforms
20:29 MIN
Using an incident console to manage response and resolvers
Applying Agile Principles to Incident Management
02:16 MIN
Understanding observability and the need for a process
Mastering AI-Driven Problem Solving in Engineering with Observability
24:30 MIN
Fostering cross-team collaboration with SLOs
Serverless Observability: where SLOs meet transforms
29:58 MIN
How engineers handle production errors and monitoring
DevOps at Netflix
18:09 MIN
Overcoming observability challenges with a unified platform
All your telemetry data from any source in one place
22:38 MIN
Handling operational challenges and infrastructure failures at scale
How building an industry DBMS differs from building a research one
Featured Partners
Related Videos
Applying Agile Principles to Incident Management
Tobias Dunn-Krahn
Mastering AI-Driven Problem Solving in Engineering with Observability
Jemiah Sius
Empathy: The secret sauce of Resilience
Malin Litwinski
SRE Methods In an Agency Environment
Martin Beránek
The AI-Ready Stack: Rethinking the Engineering Org of the Future
Jan Oberhauser, Mirko Novakovic, Alex Laubscher & Keno Dreßel
Unveiling the Dark Side: Navigating the Pitfalls of Digital Ambitions
Johannes Hansen
Metrics Handle with Care: The Paradox of Measuring Team Performance
Stefan Stelzer & Volker Zöpfel
Building resilient .NET applications for the modern age
Sander ten Brinke
Related Articles
View all articles
.gif?w=240&auto=compress,format)
.png?w=240&auto=compress,format)

From learning to earning
Jobs that call for the skills explored in this talk.

Lead Backend Engineer (m/f/d)
Peter Park System GmbH
München, Germany
Senior
Python
Docker
Node.js
JavaScript

![[CH] Site Reliability Engineer (Monitoring & Incident Response Focus)](https://wearedevelopers-staging.imgix.net/staging/public/default-job-listing-cover.png?w=400&ar=3.55&fit=crop&crop=entropy&auto=compress,format)
[CH] Site Reliability Engineer (Monitoring & Incident Response Focus)
Welld Sagl
Lugano, Switzerland
Remote
€187-208K
Linux
Splunk
Docker
+4

Site Reliability EngineerSpeechmatics
Speechmatics
London, United Kingdom
Remote
Linux
Gitlab
Docker
Terraform
+1




