Skip to content

Import Snowflake Data

With Amplitude's Snowflake integration, you can ingest Snowflake data directly into your Amplitude project. This article walks you through the steps needed to make that happen.

Feature availability

This feature is available in all accounts, including accounts with free plans.

Other Amplitude + Snowflake integrations

This integration imports Snowflake data into Amplitude. Amplitude offers other integrations with Snowflake:

Considerations

  • Depending on your company's network policy, you may need add these IP addresses to your allowlist in order for Amplitude's servers to access your Snowflake instance:

    • Amplitude US IP addresses:
      • 52.33.3.219
      • 35.162.216.242
      • 52.27.10.221
    • Amplitude EU IP addresses:
      • 3.124.22.25
      • 18.157.59.125
      • 18.192.47.195

Limits

  • Maximum running time for a single Snowflake SQL query is 12 hours.

User and Group Properties Sync

Amplitude's Data Warehouse Import sometimes processes events in parallel, so time-ordered syncing of user and group properties on events is not guaranteed in the same way as submitting events directly to the Identify and Group Identify APIs.

Add Snowflake as a source

Time-based Import

When using the Time Based Import option, it's important that the dataset includes a separate column that indicates when the data was loaded into the table Amplitude points to when importing. This is often "server upload time", which would be separate from the "event time" (when the actual event occurred).

To add Snowflake as a data source in your Amplitude project, follow these steps:

  1. In Amplitude Data, click Catalog and select the Sources tab.
  2. In the Warehouse Sources section, click Snowflake.
  3. Enter the required credentials for the Snowflake instance you want to connect:
    • Account: Snowflake account name. Case sensitive. This is the first part of your Snowflake URL, before snowflakecomputing.com. Don't include ".snowflakecomputing.com" in your account name.
    • Database: Name of the database where Amplitude can find the data.
    • Warehouse: Used by Amplitude to execute SQL.
    • Username: Used by Amplitude for authentication.
    • Password: Used by Amplitude for authentication.
  4. Copy the autogenerated SQL query and run it in Snowflake to give Amplitude the proper permissions.
  5. After running the query, click Next to test the connection.
  6. After the test is successful, click Next again to move on to the data selection stage.
  7. Choose your configuration options:

    • Type of data: This tells Amplitude whether you're ingesting event data, user property data, or group property data.
    • Type of import:
      • Full Sync: Amplitude periodically ingests the entire dataset, regardless of whether that data has already been imported. This is good for data sets where the row data changes over time, but there is no easy way to tell which rows have changed. Otherwise, the more efficient option would be a time-based import. This option isn't supported for ingesting event data.
      • Time-based: Amplitude periodically ingests the most recent rows in the data, as determined by the provided Timestamp column. The first import brings in all available data, and later imports ingest any data with timestamps after the time of the most recent import. To use this, include the timestamp of the data load into Snowflake. For more information on how this works, see the time-based import section.
    • Frequency: Choose from several scheduling options ranging from five minutes to one month. With the one month option, Amplitude ingests data on the first of the month.
    • SQL query: This is the code for the query Amplitude uses to decide which data is ingested.
  8. After you've set your configuration options, click Test SQL to see how the data is coming through from your Snowflake instance. Errors appear on this screen.

  9. If there are no errors, click Finish.

Amplitude displays a notification indicating you enable the new Snowflake source and redirects you to the Sources listing page.

If you have any issues or questions while following this flow, contact the Amplitude team.

Time-based import

For Amplitude's time-based import option, it's best practice to use a monotonically increasing timestamp value. This value should show when the record was loaded into the source table the SQL configuration is querying from (often referred to as a "server upload time"). The warehouse import tool brings data into Amplitude by continually updating the maximum value of the column referenced in the Timestamp Column Name input within the Import Config UI with each subsequent import.

Example

Upon first import, Amplitude imports all the data returned from the query configured in the Import Config. Amplitude saves a reference of the maximum timestamp referenced in the Timestamp Column Name: timestamp_1. Upon subsequent import, Amplitude imports all data from the previously saved timestamp (timestamp_1), to what's now the new maximum timestamp (timestamp_2). Then after that import, Amplitude saves timestamp_2 as the new maximum timestamp.

Data fields

Include the mandatory fields for the data type when you create the SQL query. These tables outline the mandatory and optional fields for each data type. Find a list of other supported fields for events in the HTTP V2 API documentation and for user properties in the Identify API documentation. Add any columns not in those lists to either event_properties or user_properties, otherwise it's ignored.

Events

Column name (must be lowercase) Mandatory Column data type Example
user_id Yes, unless device_id is used VARCHAR datamonster@gmail.com
device_id Yes, unless user_id is used VARCHAR C8F9E604-F01A-4BD9
event_type Yes VARCHAR watch_tutorial
time Yes Milliseconds since epoch (Timestamp) 1396381378123
event_properties Yes VARIANT (JSON Object) {"source":"notification", "server":"host-us"}
user_properties No VARIANT (JSON Object) {"city":"chicago", "gender":"female"}
update_time_column No (Yes if using time based import) TIMESTAMP_NTZ 2013-04-05 01:02:03.000

Find other supported fields can in the HTTP V2 API documentation.

User properties

Column name (must be lowercase) Mandatory Column data type Example
user_id Yes VARCHAR datamonster@gmail.com
user_properties Yes VARIANT (JSON Object) {"city":"chicago", "gender":"female"}
update_time_column No (Yes if using time based import) TIMESTAMP_NTZ 2013-04-05 01:02:03.000

Find other supported fields in the Identify API documentation.

Group properties

Column name (must be lowercase) Mandatory Column data type Example
groups Yes VARIANT (JSON Object) {"company":"amplitude", "team":["marketing", "sales"]}
group_properties Yes VARIANT (JSON Object) {"location":"seattle", "active":"true"}
update_time_column No (Yes if using time based import) TIMESTAMP_NTZ 2013-04-05 01:02:03.000

Each group property in group_properties would apply to every group in groups.

Profile properties

Column name (must be lowercase) Mandatory Column data type Example
user_id Yes VARCHAR "user123"
property_name_1 Yes (key value) VARCHAR: VARCHAR "Title": "Data Engineer"
property_name_1 Yes (key value) VARCHAR: VARCHAR "City": "San Francisco"

Amplitude supports profile properties for known users. A user_id value must join each profile property.

SQL query examples

To make the data selection step a bit easier, here are few example SQL snippets to get you started.

Event data example

SELECT
    EVENT_TYPE_COLUMN AS "event_type",
    EVENT_PROPERTIES_VARIANT_COLUMN AS "event_properties",
    TIME_EPOCH_MS_COLUMN AS "time",
    USER_ID_COLUMN AS "user_id",
    USER_PROPERTIES_VARIANT_COLUMN AS "user_properties"
FROM DATABASE_NAME.SCHEMA_NAME.TABLE_OR_VIEW_NAME

User property example

SELECT
    USER_ID_COLUMN AS "user_id",
    USER_PROPERTIES_VARIANT_COLUMN AS "user_properties"
FROM DATABASE_NAME.SCHEMA_NAME.TABLE_OR_VIEW_NAME

Group Property example

SELECT
    GROUPS_OBJ AS "groups",
    GROUP_PROPS_OBJ AS "group_properties"
FROM DATABASE_NAME.SCHEMA_NAME.TABLE_OR_VIEW_NAME

Common snippets

Creating a JSON Object:

OBJECT_CONSTRUCT('city', CITY, 'state', STATE) as "user_properties"

Converting timestamp column to milliseconds:

DATE_PART('EPOCH_MILLISECOND', TIMESTAMP_COLUMN) as "time"

Converting milliseconds to TIMESTAMP_NTZ format needed for time-based import. This example uses the scale argument set to 3 to convert to milliseconds. See the Snowflake documentation for more details.

TO_TIMESTAMP_NTZ(TIME_COLUMN_IN_MILLIS, 3) as "update_time_column"

Converting a timestamp column with a timezone to TIMESTAMP_NTZ format needed for time-based import.

TO_TIMESTAMP_NTZ(CONVERT_TIMEZONE('UTC', TIMESTAMP_TZ_COLUMN)) as "update_time_column"


Was this page helpful?