Markus Dreseler

How building an industry DBMS differs from building a research one

At Snowflake, engineers analyze metadata from billions of queries to drive development. This data-centric approach reveals bottlenecks that academic benchmarks completely miss.

How building an industry DBMS differs from building a research one
#1about 3 minutes

Building a research database prototype versus an industry system

A research database like Hyrise prioritizes open-source experimentation, while industry systems like SAP HANA require navigating large, constantly changing codebases.

#2about 3 minutes

Understanding Snowflake's decoupled compute and storage architecture

Snowflake's architecture separates centralized storage from a scalable compute layer, allowing independent provisioning of resources based on customer demand.

#3about 2 minutes

Core similarities in database processes and documentation culture

Both research and industry databases follow the same fundamental query processing pipeline, and collaborative design documents replace the formal, slow feedback loop of academic papers.

#4about 3 minutes

The complexity of supporting nuanced real-world SQL features

Industry databases must support complex and often overlooked SQL features like collations, versioned time zones, and advanced functions like MATCH_RECOGNIZE that are typically ignored in research.

#5about 5 minutes

Using production metadata for data-driven performance optimization

Access to petabytes of query metadata allows for analyzing real customer workloads, using tools like perf at scale, and A/B testing optimizations, a significant advantage over academic benchmarks.

#6about 4 minutes

Implementing extensive testing strategies for production reliability

Production systems require a multi-layered testing approach, including sanitizers, query permutation testing, and re-executing historical customer queries to ensure correctness without accessing data.

#7about 2 minutes

Using feature flags for safe and gradual code rollouts

New code is protected by parameters or feature flags, enabling instant rollbacks and allowing for a gradual, controlled release from test environments to full production.

#8about 5 minutes

Handling operational challenges and infrastructure failures at scale

An engineer on-call rotation addresses customer issues and handles rare but inevitable problems like faulty cloud hardware by using health checks, retries, and a resilient metadata store.

#9about 3 minutes

Reflecting on the trade-offs between research and industry

While industry work loses the ability to make rapid, sweeping changes, it offers the significant benefit of working on real workloads and seeing a measurable, large-scale impact.

Related jobs
Jobs that call for the skills explored in this talk.

d

Saby Company
Delebio, Italy

Junior

test

Milly
Vienna, Austria

Intermediate

Featured Partners

Related Articles

View all articles
BB
Benedikt Bischof
Making Data Warehouses Fast: A Developer’s Story
Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Adnan Rahic who teaches the audience how to make data warehouses.About the Speaker: Adnan is senior developers advocate at Cube. His passion lie...
Making Data Warehouses Fast: A Developer’s Story
DC
Daniel Cranney
What does the history of data storage tell us about the future?
In the rapidly advancing world of computing, data storage stands as a cornerstone that has evolved profoundly over the decades, adapting to meet growing demands for durability, scalability, and accessibility. From early physical storage methods to to...
What does the history of data storage tell us about the future?

From learning to earning

Jobs that call for the skills explored in this talk.

Solutions Architect

Solutions Architect

Snowflake

Remote
DevOps
PySpark
Microsoft SQL Server
Natural Language Processing
Solutions Architect

Solutions Architect

Snowflake

Remote
DevOps
PySpark
Microsoft SQL Server
Natural Language Processing