Data Loading in Snowflake: Stages

Adam Morton
5 min readOct 8, 2024

--

Thank you for reading my latest article Data Loading in Snowflake: Stages.

Here at Medium I regularly write about modern data platforms and technology trends. To read my future articles simply join my network here or click ‘Follow’. Also feel free to connect with me via YouTube.

Introduction

This excerpt is taken from my top-rated SnowPro Core Study Guide, where I use describe the purpose and usage of Stages in relation to data loading with Snowflake.

Stages

There’s a lot to think about before you can even get started writing your data pipelines. First, you need to consider where best you will rest your data on its way into a Snowflake table. In Snowflake, this area is known as a stage, of which there are two primary types: external stage and internal stage.

In this section, we’ll explore the key differences between these two types and when you should consider each.

External stages

When you create an external stage in Snowflake, it acts as a pointer to a third-party cloud storage location. This storage location can be Amazon S3, Google Cloud Storage, or Microsoft Azure, regardless of what platform you run your Snowflake account on. So, for example, you can run your Snowflake account on Amazon and create an external stage on Google Cloud Storage or Microsoft Azure.

Creating an external stage in Snowflake creates a database object within the selected schema. This object holds the URL (to the cloud storage location) along with any security credentials to access the required cloud storage location. When you drop an external stage, all you are doing is removing the pointer. Therefore, the files in the external stage remain unaffected.

External Tables

Contained within your external stage, you can have external tables. These objects hold metadata, which tells Snowflake where to locate those data files that relate to the table.

This approach allows data to sit outside of Snowflake but appear to users as if it resides within Snowflake, which can be advantageous if you need read-only access to files on external cloud storage. SQL queries can be directed to external tables to analyze archived or historical cloud data, or for ad-hoc queries over files, allowing you to query and access all data from within Snowflake.

The big tradeoff here is performance. Because data is not in Snowflake’s native, compressed, micro-partitioned format, the same approach to efficient query pruning across micro-partitions while leveraging metadata cannot be realized.

There is also a small maintenance overhead charge for manually refreshing the external table metadata using the ALTER EXTERNAL TABLE REFRESH command.

It is possible to refresh this table metadata automatically using the corresponding cloud notification service, such as Amazon’s Simple Queue Service (SQS) or Azure Event Grid. Again, there will be a cost associated with the notification event management and refreshing the metadata, although this approach does ensure that the data is up to date and available to users within Snowflake.

Internal stages

Internal stages reside within the Snowflake platform. There are three types: user, table, and named. Let’s break these types down in this section to understand the purpose of each and how you can refer to them.

USER

The user stage is allocated to each Snowflake user. This is when only one user needs to access the staging data before loading it into multiple target tables. You refer to a user stage with the following prefix @~.

copy into mytable from @~/staged file_format = (format_name = ‘my_csv_format’);

TABLE

Each table has a table stage associated with it. The name of the table stage is always the same as the table it relates to. Multiple users can access the data files, but these files can only load into one target table. You refer to a table stage with the following prefix @%.

You can list the files in a table stage as follows:

LIST @%my_table

NAMED

A named internal stage refers to an area where all the data files exist within Snowflake. Multiple users can access data in this stage, while the data files can also target multiple tables. You refer to a named internal stage with the following prefix @.

create or replace stage my_stage

file_format = my_csv_format;

copy into mytable from @my_stage;

Note: For the exam, make sure you understand how you reference each different type of stage, both internal and external.

When you drop an internal named stage, the stage will be purged along with any files within the stage. These files will not be recoverable. This behavior is different from that of an external stage — where the files remain in place, but the stage is dropped.

If you wish to learn more about loading data into Snowflake then check out the video below.

To stay up to date with the latest business and tech trends in data and analytics, make sure to subscribe to my newsletter, follow me on LinkedIn, and YouTube, and, if you’re interested in taking a deeper dive into Snowflake check out my books ‘Mastering Snowflake Solutions’ and ‘SnowPro Core Certification Study Guide’.

About Adam Morton

Adam Morton is an experienced data leader and author in the field of data and analytics with a passion for delivering tangible business value. Over the past two decades Adam has accumulated a wealth of valuable, real-world experiences designing and implementing enterprise-wide data strategies, advanced data and analytics solutions as well as building high-performing data teams across the UK, Europe, and Australia.

Adam’s continued commitment to the data and analytics community has seen him formally recognised as an international leader in his field when he was awarded a Global Talent Visa by the Australian Government in 2019.

Today, Adam is dedicated to helping his clients to overcome challenges with data while extracting the most value from their data and analytics implementations. You can find out more information by visiting his website here.

He has also developed a signature training program that includes an intensive online curriculum, weekly live consulting Q&A calls with Adam, and an exclusive mastermind of supportive data and analytics professionals helping you to become an expert in Snowflake. If you’re interested in finding out more, check out the latest Mastering Snowflake details.

--

--

Adam Morton

Our mission is to help people trapped in a career dead end, working with on-premise, legacy technology break into cloud computing by using Snowflake.