Import Snowflake Data
With Amplitude's Snowflake integration, you can ingest Snowflake data directly into your Amplitude project. This article walks you through the steps needed to make that happen.
Feature availability
This feature is available in all accounts, including accounts with free plans.
Other Amplitude + Snowflake integrations
This integration imports Snowflake data into Amplitude. Amplitude offers other integrations with Snowflake:
Considerations¶
Depending on your company's network policy, you may need add these IP addresses to your allowlist in order for Amplitude's servers to access your Snowflake instance:
- 52.33.3.219
- 35.162.216.242
- 52.27.10.221
Limits¶
- Maximum running time for a single Snowflake SQL query is 12 hours.
User and Group Properties Sync
Amplitude's Data Warehouse Import sometimes processes events in parallel, so time-ordered syncing of user and group properties on events is not guaranteed in the same way as submitting events directly to the Identify and Group Identify APIs.
Add Snowflake as a source¶
Time-based Import
When using the Time Based Import option, it's important that the dataset includes a separate column that indicates when the data was loaded into the table Amplitude points to when importing. This is often "server upload time", which would be separate from the "event time" (when the actual event occured).
To add Snowflake as a data source in your Amplitude project, follow these steps:
- In Amplitude Data, click Catalog and select the Sources tab.
- In the Warehouse Sources section, click Snowflake.
- Enter the required credentials for the Snowflake instance you want to connect:
- Account: Snowflake account name. Case sensitive. This is the first part of your Snowflake URL, before
snowflakecomputing.com
. Don't include ".snowflakecomputing.com" in your account name. - Database: Name of the database where Amplitude can find the data.
- Warehouse: Used by Amplitude to execute SQL.
- Username: Used by Amplitude for authentication.
- Password: Used by Amplitude for authentication.
- Account: Snowflake account name. Case sensitive. This is the first part of your Snowflake URL, before
- Copy the autogenerated SQL query and run it in Snowflake to give Amplitude the proper permissions.
- After running the query, click Next to test the connection.
- After the test is successful, click Next again to move on to the data selection stage.
-
Choose your configuration options:
- Type of data: This tells Amplitude whether you're ingesting event data, user property data, or group property data.
- Type of import:
- Full Sync: Amplitude periodically ingests the entire dataset, regardless of whether that data has already been imported. This is good for data sets where the row data changes over time, but there is no easy way to tell which rows have changed. Otherwise, the more efficient option would be a time-based import. This option isn't supported for ingesting event data.
- Time-based: Amplitude periodically ingests the most recent rows in the data, as determined by the provided Timestamp column. The first import brings in all available data, and later imports ingest any data with timestamps after the time of the most recent import. To use this, you must include the timestamp of when the data was loaded into Snowflake. For more information on how this works, see the time-based import section.
- Frequency: Choose from several scheduling options ranging from five minutes to one month (when this is selected, ingestion happens on the first of the month).
- SQL query: This is the code for the query Amplitude uses to determine which data is ingested.
-
After you've set your configuration options, click Test SQL to see how the data is coming through from your Snowflake instance. Errors appear on this screen.
- If there are no errors, click Finish.
You'll see a notification indicating you've successfully enabled the new Snowflake source. You'll also be redirected to the Sources listing page, where you'll see the newly created Snowflake source.
If you have any issues or questions while following this flow, contact the Amplitude team.
Time-based import¶
For Amplitude's time-based import option, it's best practice to use a monotonically increasing timestamp value. This value should indicate when the record was loaded into the source table the SQL configuration is querying from (often referred to as a "server upload time"). The warehouse import tool brings data into Amplitude is by continually updating the maximum value of the column referenced in the Timestamp Column Name input within the Import Config UI with each subsequent import.
Example
Upon first import, Amplitude imports all the data returned from the query configured in the Import Config. Amplitude saves a reference of the maximum timestamp referenced in the Timestamp Column Name: timestamp_1
. Upon subsequent import, Amplitude imports all data from the previously saved timestamp (timestamp_1
), to what's now the new maximum timestamp (timestamp_2
). After that import, Amplitude saves timestamp_2
as the new maximum timestamp.
Data fields¶
You must include the mandatory fields for the data type when creating the SQL query. These tables outline the mandatory and optional fields for each data type. You can include other columns beyond those listed here.
Events¶
Column name (must be lowercase) | Mandatory | Column data type | Example |
---|---|---|---|
user_id |
Yes, unless device_id is used |
VARCHAR | datamonster@gmail.com |
device_id |
Yes, unless user_id is used |
VARCHAR | C8F9E604-F01A-4BD9 |
event_type |
Yes | VARCHAR | watch_tutorial |
time |
Yes | Milliseconds since epoch (Timestamp) | 1396381378123 |
event_properties |
Yes | VARIANT (JSON Object) | {"source":"notification", "server":"host-us"} |
user_properties |
No | VARIANT (JSON Object) | {"city":"chicago", "gender":"female"} |
update_time_column |
No (Yes if using time based import) | TIMESTAMP_NTZ | 2013-04-05 01:02:03.000 |
User properties¶
Column name (must be lowercase) | Mandatory | Column data type | Example |
---|---|---|---|
user_id |
Yes | VARCHAR | datamonster@gmail.com |
user_properties |
Yes | VARIANT (JSON Object) | {"city":"chicago", "gender":"female"} |
update_time_column |
No (Yes if using time based import) | TIMESTAMP_NTZ | 2013-04-05 01:02:03.000 |
Group properties¶
Column name (must be lowercase) | Mandatory | Column data type | Example |
---|---|---|---|
groups |
Yes | VARIANT (JSON Object) | {"company":"amplitude", "team":["marketing", "sales"]} |
group_properties |
Yes | VARIANT (JSON Object) | {"location":"seattle", "active":"true"} |
update_time_column |
No (Yes if using time based import) | TIMESTAMP_NTZ | 2013-04-05 01:02:03.000 |
Each group property in group_properties
would be applied to every group in groups
SQL query examples¶
To make the data selection step a bit easier, here are few example SQL snippets to get you started.
Event data example¶
SELECT
EVENT_TYPE_COLUMN AS "event_type",
EVENT_PROPERTIES_VARIANT_COLUMN AS "event_properties",
TIME_EPOCH_MS_COLUMN AS "time",
USER_ID_COLUMN AS "user_id",
USER_PROPERTIES_VARIANT_COLUMN AS "user_properties"
FROM DATABASE_NAME.SCHEMA_NAME.TABLE_OR_VIEW_NAME
User property example¶
SELECT
USER_ID_COLUMN AS "user_id",
USER_PROPERTIES_VARIANT_COLUMN AS "user_properties"
FROM DATABASE_NAME.SCHEMA_NAME.TABLE_OR_VIEW_NAME
Group Property example¶
SELECT
GROUPS_OBJ AS "groups",
GROUP_PROPS_OBJ AS "group_properties"
FROM DATABASE_NAME.SCHEMA_NAME.TABLE_OR_VIEW_NAME
Common snippets¶
Creating a JSON Object:
OBJECT_CONSTRUCT('city', CITY, 'state', STATE) as "user_properties"
Converting timestamp column to milliseconds:
DATE_PART('EPOCH_MILLISECOND', TIMESTAMP_COLUMN) as "time"
Converting milliseconds to TIMESTAMP_NTZ format needed for time-based import. This example uses the scale
argument set to 3
to convert to milliseconds. See the Snowflake documentation for more details.
TO_TIMESTAMP_NTZ(TIME_COLUMN_IN_MILLIS, 3) as "update_time_column"
Converting a timestamp column with a timezone to TIMESTAMP_NTZ format needed for time-based import.
TO_TIMESTAMP_NTZ(CONVERT_TIMEZONE('UTC', TIMESTAMP_TZ_COLUMN)) as "update_time_column"