Databricks Reverse ETL Setup

Set up Databricks as your Reverse ETL source.

At a high level, when you set up Databricks for Reverse ETL, the configured user needs read permissions for any resources (databases, schemas, tables) the query needs to access. Segment keeps track of changes to your query results with a managed schema (__SEGMENT_REVERSE_ETL), which requires the configured user to allow write permissions for that schema.

Required permissions

  • Make sure the user or the service principal you use to connect to Segment has permissions to use that warehouse. In the Databricks console go to SQL warehouses and select the warehouse you’re using. Navigate to Overview > Permissions and make sure the user or the service principal you use to connect to Segment has can use permissions.

  • To grant access to read data from the tables used in the model query, run:

      GRANT USAGE ON SCHEMA <schema_name> TO `<user or service principal you are using to connect to Segment>`; 
      GRANT SELECT, READ_METADATA ON SCHEMA <schema_name> TO `<user or service principal you are using to connect to Segment>`; 
    
  • To grant Segment access to create a schema to keep track of the running syncs, run:

      GRANT CREATE on catalog <name of the catalog, usually hive_metastore or main if using unity-catalog> TO `<user or service principal you are using to connect to Segment>`;
    
  • If you want to create the schema yourself instead and then give Segment access to it, run:

      CREATE SCHEMA IF NOT EXISTS __segment_reverse_etl; 
      GRANT ALL PRIVILEGES ON SCHEMA __segment_reverse_etl TO `<user or service principal you are using to connect to Segment>`;
    

Set up guide

To set up Databricks as your Reverse ETL source:

  1. Log in to your Databricks account.
  2. Navigate to Workspaces and select the workspace you want to use.
  3. Select SQL in the main navigation.
  4. Select SQL Warehouses and select the warehouse you want to use. Note that Segment doesn’t support the Compute connection parameters.
  5. Go to the Connection details tab and keep this page open.
  6. Open your Segment workspace.
  7. Navigate to Connections > Sources > Reverse ETL.
  8. Click + Add Reverse ETL source.
  9. Select Databricks and click Add Source.
  10. Enter the configuration setting for your Databricks source based on information from step 5
    • Hostname: adb-xxxxxxx.azuredatabricks.net
    • Http Path: /sql/1.0/warehouses/xxxxxxxxx
    • Port: 443 (default)
    • Token: <your-token>
    • Catalog [optional]: hive_metastore (default)
  11. Click Test Connection to see if the connection works. If the connection fails, make sure you have the right permissions and credentials, then try again.
  12. Click Create Source if the test connection is successful.

To generate a token, follow the steps listed in the Databricks docs. Segment recommends you create a token with no expiration date by leaving the lifetime field empty when creating it. If you already have a token with an expiration date, be sure to keep track of the date and renew it on time.

Once you’ve succesfully added your Databricks source, add a model and follow the rest of the steps in the Reverse ETL setup guide.

This page was last modified: 13 Nov 2023



Get started with Segment

Segment is the easiest way to integrate your websites & mobile apps data to over 300 analytics and growth tools.
or
Create free account