Databricks
The Databricks data stream enables DXA to deliver session and pageview data to a Databricks bucket every night.
To enable the Databricks data stream:
On the navigation bar, go to Connect > Data Streams > Configure Data Streams > Cloud Storage > Databricks.
Under Databricks, select Databricks (Azure) Integration Enabled.
The following settings appear:
| Setting | Description |
|---|---|
| Data type | Select the type of experience data you want to deliver to Databricks every night. The following options are available:
|
| Schema version | Select a schema to define the data fields available in the session and pageview data sets to be delivered to Databricks. Note: Learn more: Web Schema v7 |
| Custom pageview fields | If necessary, select additional pageview data fields to use alongside the data fields defined in the selected Schema version. |
| Custom session fields | If necessary, select additional session data fields to use alongside the data fields defined in the selected Schema version. |
| Pageview level custom dimensions | Select pageview level custom dimensions. |
| Session level custom dimensions | Select session level custom dimensions. |
| Session Track Event | Select a session track event for delivery to Databricks. |
| Pageview Track Event | Select a pageview track event for delivery to Databricks. |
| Segment | Select a segment for delivery to Databricks. |
| Output type | Select the file type to export session or pageview data for delivery to Databricks. |
| Frequency | Select how frequently DXA exports data to the Databricks bucket. |
| Export hour | Select the hour to export session or pageview data every day. |
| Export period | Select the period to export session or pageview data from. The following periods are available:
|
| Timezone identifier | Select your timezone. |
| Output filename | Enter a filename to give exported session or pageview data. Use one of the following uppercase character combinations to provide a variable date convention:
|
| Use temporary filename | If necessary, select this option to send session or pageview data to a temporary TMP file until the export process completes. This setting is recommended for high-traffic properties. Note: You must create the databricks:DeleteObject IAM user policy in Databricks to use this setting. |
| Deliver as single file instead of multiple hourly files | If selected, data will be aggregated into a single file rather than multiple files according to hour. |
| ZIP files to Archive | If necessary, select this option to deliver session or pageview data to Databricks in one ZIP file instead of 24 individual files for each hour of the day. |
| ZIP Archive filename | If you selected ZIP files to Archive, enter the filename to give the ZIP file. Use one of the following uppercase character combinations to provide a variable date convention:
|
| Date for the archive filename | If you selected ZIP files to Archive, select a date convention to use in the ZIP file name. The following options are available:
|
| Archive encoding | If you selected ZIP files to Archive, select the encoding type to use for the ZIP file. |
| Local Storage | If selected, prepared ZIP/CSV files will be kept in local storage and then transferred to customer server. |
| Session Summary | If selected, session summary will be added to the file as a new column. |
| Type | Select Service Principal or SAS Token. |
| Token | If you selected SAS Token from Type, enter the token. |
| Storage Account Name | If you selected Service Principal from Type, enter the storage account name. |
| Container Name | If you selected Service Principal from Type, enter the container name. |
| Tenant ID | If you selected Service Principal from Type, enter the tenant ID. |
| Client ID | If you selected Service Principal from Type, enter the client ID. |
| Client Secret | If you selected Service Principal from Type, enter the client secret. |
| Directory Path | Server path to place the export file. |
| Encryption Key | Encryption key for the export file. If this field is set, the file(s) will be encrypted before being sent. |
To save changes and integrate with DXA, click Save.
