Josip Stuhli

Scaling: from 0 to 20 million users

Our first server crashed so often, we survived by manually uploading an HTML file via FTP. Now, we serve 20 million users on a global scale.

Scaling: from 0 to 20 million users
#1about 2 minutes

An overview of scaling a sports app to millions of users

The initial single-server architecture for a sports results app struggled with exponential user growth, leading to frequent server crashes under load.

#2about 6 minutes

Using proactive and manual caching to survive traffic spikes

Early scaling involved using Memcached with proactive caching to pre-load live data, culminating in a manual static HTML file hack to handle a massive event.

#3about 3 minutes

Moving to the cloud and implementing Varnish cache

The first cloud migration to AWS introduced Varnish for superior HTTP caching and request coalescing, alongside stateless AMIs for effective auto-scaling.

#4about 2 minutes

Migrating from MongoDB to Postgres for data reliability

After encountering data type errors and a lack of locking in MongoDB, a live migration to Postgres was performed to gain stability and analytical power.

#5about 2 minutes

Optimizing cache efficiency with a dedicated sharded layer

To solve cache inefficiency from auto-scaling, the architecture was changed to a dedicated, sharded Varnish layer in front of application servers.

#6about 2 minutes

Migrating from cloud to on-premise to reduce costs

High AWS traffic costs prompted a move back to an over-provisioned on-premise data center, drastically reducing infrastructure expenses relative to user growth.

#7about 4 minutes

Solving global latency with a distributed cache network

To improve performance for international users, a globally distributed cache was implemented with geo-routing, reducing average latency from 500ms to 80ms.

#8about 2 minutes

Adopting Kubernetes for multi-datacenter redundancy

After a provider's data center fire, a second data center was added and managed with Kubernetes to ensure high availability and simplify deployments.

#9about 1 minute

Implementing real-time updates with NATS messaging

To eliminate polling delays and deliver instant updates, a pub/sub architecture using NATS messaging was implemented for millions of concurrent client connections.

#10about 2 minutes

Managing petabyte-scale analytics data with ClickHouse

To power AI/ML models and analyze nearly a petabyte of data on-premise, ClickHouse was chosen for its high-performance analytical capabilities.

#11about 2 minutes

Key principles for building scalable and efficient infrastructure

The core lessons learned include prioritizing statelessness, aggressive caching, using queues for slow tasks, and choosing the right tool for each specific job.

Related jobs
Jobs that call for the skills explored in this talk.

test

Milly
Vienna, Austria

Intermediate

test

Milly
Vienna, Austria

Intermediate

job ad

Saby Company
Delebio, Italy

Intermediate

Featured Partners

Related Articles

View all articles
CH
Chris Heilmann
All the videos of Halfstack London 2024!
Last month was Halfstack London, a conference about the web, JavaScript and half a dozen other things. We were there to deliver a talk, but also to record all the sessions and we're happy to share them with you. It took a bit as we had to wait for th...
All the videos of Halfstack London 2024!
DC
Daniel Cranney
What does the history of data storage tell us about the future?
In the rapidly advancing world of computing, data storage stands as a cornerstone that has evolved profoundly over the decades, adapting to meet growing demands for durability, scalability, and accessibility. From early physical storage methods to to...
What does the history of data storage tell us about the future?

From learning to earning

Jobs that call for the skills explored in this talk.