Matthias Niehoff

Modern Data Architectures need Software Engineering

Your data pipelines are critical production systems. It's time to apply software engineering rigor, from automated testing and CI/CD to data contracts.

Modern Data Architectures need Software Engineering
#1about 2 minutes

The evolution from data warehouses to data lakes

Data architectures evolved from centralized data warehouses for BI reporting to data lakes that accommodate unstructured data for data science and machine learning.

#2about 2 minutes

Understanding the modern cloud data platform

Cloud data warehouses like Snowflake and Databricks enabled the shift from ETL to ELT and introduced the data lakehouse concept using open table formats like Apache Iceberg.

#3about 3 minutes

Solving centralization bottlenecks with Data Mesh

Data Mesh applies domain-driven design principles to data, promoting decentralized ownership, data as a product, a self-serve platform, and federated governance to avoid central team bottlenecks.

#4about 1 minute

Why data engineering needs software engineering discipline

As data systems become production-critical, the Python-heavy data ecosystem requires rigorous software engineering practices beyond simple scripting to build reliable, maintainable software.

#5about 1 minute

Implementing unit, integration, and data quality tests

Effective data pipelines require a multi-layered testing strategy, including unit tests for logic, integration tests for system connections, and runtime tests to validate data content and quality.

#6about 3 minutes

Managing complex data environments for development and testing

Creating separate dev, test, and prod environments for data is challenging because development often requires access to production-like data, raising issues of data replication, cost, and anonymization.

#7about 5 minutes

Using the Modern Data Stack and DBT for transformations

The Modern Data Stack applies DevOps principles to data, with tools like DBT (Data Build Tool) enabling engineers to manage data transformations with version-controlled SQL, automated testing, and CI/CD.

#8about 4 minutes

Using data contracts to stabilize data integration

Data contracts act as a formal API-like agreement between data producers and consumers, ensuring schema stability and data quality by making breaking changes explicit and enforceable in CI/CD pipelines.

#9about 2 minutes

Building a company-wide data culture and literacy

Fostering a strong data culture through initiatives like data bootcamps helps all employees, including non-technical ones, understand the value of data and the importance of data quality.

#10about 4 minutes

Modern data architectures and the reality of team size

Modern data architectures can range from simple setups using DuckDB to complex cloud platforms like Databricks, but it's crucial to remember that data teams are typically much smaller than software teams.

Related jobs
Jobs that call for the skills explored in this talk.

d

Saby Company
Delebio, Italy

Junior

test

Milly
Vienna, Austria

Intermediate

Featured Partners

Related Articles

View all articles
BB
Benedikt Bischof
Making Data Warehouses Fast: A Developer’s Story
Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Adnan Rahic who teaches the audience how to make data warehouses.About the Speaker: Adnan is senior developers advocate at Cube. His passion lie...
Making Data Warehouses Fast: A Developer’s Story
DC
Daniel Cranney
What does the history of data storage tell us about the future?
In the rapidly advancing world of computing, data storage stands as a cornerstone that has evolved profoundly over the decades, adapting to meet growing demands for durability, scalability, and accessibility. From early physical storage methods to to...
What does the history of data storage tell us about the future?
BB
Benedikt Bischof
How we Build The Software of Tomorrow
Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Thomas Dohmke who introduced us to the future of AI – coding.This is how Thomas describes himself:I am the CEO of GitHub and drive the company’s...
How we Build The Software of Tomorrow
DD
Dilek Demir
Data Science & more: The Lopez dilemma
Catwalk, Data Science, Hollywood, Google Images, Haute Couture, StackOverflow, Comfort Zone, Dota 2 and Versace – all these topics are connected and influenced by each other. Read here how and why!In 2000 Jennifer Lopez's green Versace dress went vi...
Data Science & more: The Lopez dilemma

From learning to earning

Jobs that call for the skills explored in this talk.