Understanding Snowflake’s Architecture: The Mailroom Analogy

Adam Morton
5 min readSep 18, 2024

--

Thank you for reading my latest article Understanding Snowflake’s Architecture: The Mailroom Analogy.

Here at Medium I regularly write about modern data platforms and technology trends. To read my future articles simply join my network here or click ‘Follow’. Also feel free to connect with me via YouTube.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Introduction

This excerpt is taken from my top-rated SnowPro Core Certification Study Guide’, where I use the analogy of a mailroom to help simplify and clarify the roles of the different tiers in Snowflake’s architecture. Each tier plays a vital role in processing and delivering data efficiently, much like a well-organized mailroom.

The ‘Mailroom’

To help you remember the architecture, the roles which each of the tiers play, and how they interact together, I like to use this analogy based upon a mailroom as shown in figure 2–6 below:

The Database Storage Layer: The Mailroom

The database storage layer acts like a physical mailroom, designed to efficiently organize and store all the different letters and parcels arriving from many different places. Some parcels are very large, some small, while others are just regular letters, but they must all find their place within the mailroom.

The Query Processing Layer: The Courier

The query processing layer acts as the courier. It handles the logistics of taking the mail from the sender and working out the quickest and most efficient route to deliver it. Once delivered, it can then obtain a response and return it to the sender.

The Virtual Warehouse: The Delivery Vehicle

The virtual warehouse provides the resources to allow these deliveries to take place. Sometimes all that might be required is a bicycle, other times a van or truck. In specific cases, a plane might be required for international items.

The Cloud Services Layer: The HQ

The cloud services layer is the HQ, the big boss. It’s the end-to-end tracking system providing oversight across all services. It ensures the post is secured for access when in transit and keeps the lights on to ensure the post reaches its destination within the guaranteed time scale.

Database Storage

Behind the scenes, Snowflake stores data in a compressed, native format known as micro-partitions. These are small blocks of storage which are in 50–500 MB uncompressed — note that once in Snowflake they’re even smaller as all data in Snowflake is compressed. If you have a very large table, then you’ll have millions of these micro-partitions scattered across the storage layer.

How is data organized across these micro-partitions?

Well, a clustering key is selected automatically by Snowflake based on the order the data is loaded. Metadata is collected and stored, which allows the query optimizer to understand what data is contained within each micro-partition. We’ll get into micro-partitions in more detail in the chapter on performance concepts later in this book.

Query Processing Layer

The query processing layer provides the resources to execute the query by using a virtual warehouse. Virtual Warehouses are essentially a basket of compute (CPU) and memory (RAM), which are required to run pretty much any SQL data manipulation language (DML) operations against the data in Snowflake. This also includes loading data into Snowflake tables.

Virtual warehouses come in a range of t-shirt sizes from extra small, small, medium to large with variations in-between. Virtual warehouses can access any of the underlying data in Snowflake, and you can also start, stop, drop and create virtual warehouses as you need them with no impact on any other data or operations within Snowflake.

Cloud Services Layer

The cloud services layer is the head office of our postal service, the brains of the operation. This collection of services — which we break down below — ties together everything we have discussed so far.

Authentication — This manages the service which allows users and applications to logon to the Snowflake platform.

Infrastructure Management — This aspect looks after the management of the underlying cloud infrastructure.

Metadata Management — This service collects metadata when various operations are carried out on the Snowflake platform.

Query Parsing and Execution — This service takes care of the query planning to work out the most efficient way to process queries. It compiles the query to ensure there are no syntactical errors and manages the query execution. Snowflake uses a cost-based optimizer, which determines the fastest path to access the data by generating a range of query plans — based on the metadata available — before assessing which one is optimal to use. This is referred to as query optimization.

When you run a query on Snowflake the query optimizer takes this plan, which contains the information enabling it to pinpoint only those micro-partitions which store the data required to satisfy the query. This process is known as query pruning and aims to improve query performance by skipping over unnecessary micro-partitions.

Access Control — This service ensures that users can only access or carry out operations on the objects and data they are permitted to, based on their privileges.

Summary

In this excerpt from my SnowPro Core Certification Study Guide’, I use the analogy of a mailroom to break down the different layers of Snowflake’s architecture. From the database storage layer acting as the mailroom to the query processing layer functioning as the courier, each tier plays a crucial role in processing and delivering data efficiently. The virtual warehouse acts as the delivery vehicle, while the cloud services layer oversees everything like a headquarters. This analogy simplifies understanding how Snowflake’s architecture works together to create an efficient data platform.

To stay up to date with the latest business and tech trends in data and analytics, make sure to subscribe to my newsletter, follow me on LinkedIn, and YouTube, and, if you’re interested in taking a deeper dive into Snowflake check out my books ‘Mastering Snowflake Solutions and SnowPro Core Certification Study Guide’.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

About Adam Morton

Adam Morton is an experienced data leader and author in the field of data and analytics with a passion for delivering tangible business value. Over the past two decades Adam has accumulated a wealth of valuable, real-world experiences designing and implementing enterprise-wide data strategies, advanced data and analytics solutions as well as building high-performing data teams across the UK, Europe, and Australia.

Adam’s continued commitment to the data and analytics community has seen him formally recognised as an international leader in his field when he was awarded a Global Talent Visa by the Australian Government in 2019.

Today, Adam is dedicated to helping his clients to overcome challenges with data while extracting the most value from their data and analytics implementations. You can find out more information by visiting his website here.

He has also developed a signature training program that includes an intensive online curriculum, weekly live consulting Q&A calls with Adam, and an exclusive mastermind of supportive data and analytics professionals helping you to become an expert in Snowflake. If you’re interested in finding out more, check out the latest Mastering Snowflake details.

--

--

Adam Morton

Our mission is to help people trapped in a career dead end, working with on-premise, legacy technology break into cloud computing by using Snowflake.