Robert Lehmann

Planet-Scale Dashboards

How do you provide powerful monitoring for thousands of services without the toil? Learn how Google built a zero-configuration dashboard system that just works.

Planet-Scale Dashboards
#1about 3 minutes

The challenge of creating monitoring dashboards from scratch

Monitoring is often an afterthought, leading to painful incident response without the necessary dashboards for troubleshooting.

#2about 3 minutes

Understanding Google's unique observability scaling challenges

Google's massive scale, global distribution, and monorepo architecture created a unique need for a scalable, reusable monitoring solution.

#3about 5 minutes

Building reusable dashboards with templated dimensions

Replace hardcoded values in queries with template variables, called dimensions, to create a single dashboard that can be reused for any service.

#4about 6 minutes

Solving dashboard discovery with scopes and traits

Address the problem of too many dashboards by having users select a "scope" (e.g., a service), which then uses discovered "traits" to show only relevant dashboards.

#5about 2 minutes

Modeling different entities with scope types

Introduce "scope types" to create namespaces for different kinds of monitorable entities, such as servers, databases, or machine learning models.

#6about 4 minutes

Why infrastructure as code is not the right solution

Static provisioning with infrastructure-as-code or dashboards-as-code is insufficient because it lacks dynamic runtime information and creates a stale second source of truth.

#7about 3 minutes

Improving performance at scale with query variants

Use pre-aggregated metrics and define multiple query "variants" within a graph, allowing the system to automatically select the most performant query based on the user's drill-down level.

#8about 1 minute

Visualizing dependencies with a service graph

Leverage the scope and dependency information to build a service graph that helps engineers quickly navigate between related systems during an incident.

#9about 1 minute

Key takeaways for building planet-scale dashboards

A summary of the core principles: use dimensions for reusability, traits for discovery, scope types for genericity, and variants for performance.

Related jobs
Jobs that call for the skills explored in this talk.

test

Milly
Vienna, Austria

Intermediate

test

Milly
Vienna, Austria

Intermediate

Featured Partners

Related Articles

View all articles
Dev Digest 128 - Do not Google Monopoly
Hello fellow developer, who watches the watchmen and what is a monopoly? Well, let's find out and learn a few things about new web features and accessibility along the way.News and ArticlesIt is official that Google has monopolised search through ill...
Dev Digest 128 - Do not Google Monopoly
CH
Chris Heilmann
All the videos of Halfstack London 2024!
Last month was Halfstack London, a conference about the web, JavaScript and half a dozen other things. We were there to deliver a talk, but also to record all the sessions and we're happy to share them with you. It took a bit as we had to wait for th...
All the videos of Halfstack London 2024!

From learning to earning

Jobs that call for the skills explored in this talk.