Databricks

The Databricks data stream enables DXA to deliver session and pageview data to a Databricks bucket every night.

To enable the Databricks data stream:

  1. On the navigation bar, go to Connect > Data Streams > Configure Data Streams > Cloud Storage > Databricks.

    Databricks data stream

  2. Under Databricks, select Databricks (Azure) Integration Enabled.

The following settings appear:

SettingDescription
Data type

Select the type of experience data you want to deliver to Databricks every night. The following options are available:

  • Session — Delivers all session data from the past 24 hours
  • Page — Delivers all pageview data from the past 24 hours
  • All — Delivers all session and pageview data from the past 24 hours
Schema versionSelect a schema to define the data fields available in the session and pageview data sets to be delivered to Databricks.
Note: Learn more: Web Schema v7
Custom pageview fields If necessary, select additional pageview data fields to use alongside the data fields defined in the selected Schema version.
Custom session fieldsIf necessary, select additional session data fields to use alongside the data fields defined in the selected Schema version.
Pageview level custom dimensionsSelect pageview level custom dimensions.
Session level custom dimensionsSelect session level custom dimensions.
Session Track EventSelect a session track event for delivery to Databricks.
Pageview Track EventSelect a pageview track event for delivery to Databricks.
SegmentSelect a segment for delivery to Databricks.
Output typeSelect the file type to export session or pageview data for delivery to Databricks.
FrequencySelect how frequently DXA exports data to the Databricks bucket.
Export hourSelect the hour to export session or pageview data every day.
Export period

Select the period to export session or pageview data from. The following periods are available:

  • Previous 24 Hours — Exports session or pageview data from the 24 hours up to the hour selected in Export hour
  • Previous Day — Exports session or pageview data from the hours of 00:00 to 23:59 the previous day
Timezone identifierSelect your timezone.
Output filename

Enter a filename to give exported session or pageview data.

Use one of the following uppercase character combinations to provide a variable date convention:

  • YYYY — Year, four digits
    • Example: 2021
  • YY — Year, two digits
    • Example: 21
  • MMM — Month, three letters
    • Example: Jan
  • MM — Month, two digits
    • Example: 01
  • FFF — Month, in full
    • Example: January
  • DDD — Day of week, 3 letters
    • Example: Mon
  • DD — Day of month, two digits
    • Example: 31
  • LLL — Day of week, in full
    • Example: Monday
  • HH — Hour, two digits (00-23)
    • Example: 21
  • II — Minute, two digits (00-59)
    • Example: 57
  • SS — Second, two digits (00-59)
    • Example: 37
  • EEE — Time zone, three letters
    • Example: GMT
  • CCC — Date and time, ISO 8601
    • Example: 2024-02-12T15:19:21+00:00
  • RRR — Date and time, RFC 2822
    • Thu,_21_Dec_2021_16:01:07_+0200
  • AAA — Date and time, Atom
    • Example: 2015-08-22T15:52:01+00:00
  • OOO — Date and time, cookie
    • Example: Monday,_15-Aug2019_15:52:01_UTC
Use temporary filename

If necessary, select this option to send session or pageview data to a temporary TMP file until the export process completes.

This setting is recommended for high-traffic properties.
Note: You must create the databricks:DeleteObject IAM user policy in Databricks to use this setting.
Deliver as single file instead of multiple hourly filesIf selected, data will be aggregated into a single file rather than multiple files according to hour.
ZIP files to ArchiveIf necessary, select this option to deliver session or pageview data to Databricks in one ZIP file instead of 24 individual files for each hour of the day.
ZIP Archive filename

If you selected ZIP files to Archive, enter the filename to give the ZIP file.

Use one of the following uppercase character combinations to provide a variable date convention:

  • YYYY — Year, four digits
    • Example: 2021
  • YY — Year, two digits
    • Example: 21
  • MMM — Month, three letters
    • Example: Jan
  • MM — Month, two digits
    • Example: 01
  • FFF — Month, in full
    • Example: January
  • DDD — Day of week, 3 letters
    • Example: Mon
  • DD — Day of month, two digits
    • Example: 31
  • LLL — Day of week, in full
    • Example: Monday
  • HH — Hour, two digits (00-23)
    • Example: 21
  • II — Minute, two digits (00-59)
    • Example: 57
  • SS — Second, two digits (00-59)
    • Example: 37
  • EEE — Time zone, three letters
    • Example: GMT
  • CCC — Date and time, ISO 8601
    • Example: 2024-02-12T15:19:21+00:00
  • RRR — Date and time, RFC 2822
    • Thu,_21_Dec_2021_16:01:07_+0200
  • AAA — Date and time, Atom
    • Example: 2015-08-22T15:52:01+00:00
  • OOO — Date and time, cookie
    • Example: Monday,_15-Aug2019_15:52:01_UTC
Date for the archive filename

If you selected ZIP files to Archive, select a date convention to use in the ZIP file name. The following options are available:

  • Date zip file is created — Date used in the filename reflects when the ZIP file was created
  • Date of earliest included data — Date used in the filename reflects the date of the earliest item of data in the ZIP file
Archive encodingIf you selected ZIP files to Archive, select the encoding type to use for the ZIP file.
Local Storage If selected, prepared ZIP/CSV files will be kept in local storage and then transferred to customer server.
Session SummaryIf selected, session summary will be added to the file as a new column.
TypeSelect Service Principal or SAS Token.
TokenIf you selected SAS Token from Type, enter the token.
Storage Account NameIf you selected Service Principal from Type, enter the storage account name.
Container NameIf you selected Service Principal from Type, enter the container name.
Tenant IDIf you selected Service Principal from Type, enter the tenant ID.
Client IDIf you selected Service Principal from Type, enter the client ID.
Client SecretIf you selected Service Principal from Type, enter the client secret.
Directory PathServer path to place the export file.
Encryption KeyEncryption key for the export file. If this field is set, the file(s) will be encrypted before being sent.

To save changes and integrate with DXA, click Save.