Path Analysis

A path is a chain of consecutive events performed by a given user or cohort. For example, a user logs on to your e-commerce platform, browses through a few product pages, creates an account, and then logs off. This path is portrayals the events performed during the user's session.

Path analysis is used in behavioral analytics to investigate and understand user behavior and gain actionable insights.

SELECT clause

The PATH query SELECT clause can contain any User/Session Properties and/or path functions.

CLUSTER PATH BY

The following is the syntax diagram for the CLUSTER PATH BY clause:

CLUSTER PATH BY session 
[REMOVE DUPLICATES]
{ CONTAINS <event condition> [AND|OR] ... } | { MATCH  
  [ STARTS WITH] {any | [NOT]  event_name=<event_name> [OR ...] } ]
  [ FOLLOWED [DIRECTLY] BY { any | [NOT] event_name=<event_name> [OR ...] } ]
  [ ENDING [DIRECTLY] WITH { any | [NOT] event_name=<event_name> [OR ...] } ]
[ HAVING PATH  ]

CLUSTER PATH BY is composed of the following elements:

  • CLUSTER PATH BY session: lists the events performed during a session. If you need to drill down to data at the events scope use other CQL functions.
  • REMOVE DUPLICATES: (optional) when this element is used, repeated events are treated as one event. Example: the following path:[login, viewItem, viewItem, viewItem, saveItem, purchaseItem]will remove all consecutive viewItem events and be translated to:[login, viewItem, saveItem, purchaseItem].
  • CONTAINS: Path contains is a non-sequential expression resulting in all the paths that contain any the events, regardless of their order. "Contains" should be used instead of the match expression, and can include multiple conditions using AND or OR operators .
  • MATCH : Describes a sequential path, comprised of steps, each defined by a condition. All steps are optional – any combination of them can be applied.
    • Path steps:
      • STARTS WITH: The first event.
      • FOLLOWED BY: Any following event. Can be used multiple times, consecutively.
      • ENDING WITH: The last event.
    • Supported conditions:
      • any: matching any event. Would replace the step's entire condition.
      • [NOT] event_name = <event_name>: matching a specific events. A preceding NOT will negate the condition.
      • OR: matching a set of event names linked by OR, to indicate that any of these events can be applied to this step.
      • DIRECTLY: Forces the events to be consecutive, i.e. no events occurred between the preceding step and the current one. Applicable to any step except the first.
  • HAVING PATH: supports aggregate path functions (such as path_to_string, path_duration, path_count) with standard mathematical operators: >, <, =, =<, =>, !=, <>.

Path analysis can be used in two ways – sequential path, and non-sequential path:

Examples

Non-Sequential Path (Contains)The sequence of events of users who made a purchase, and then performed a social action: Like or Share.

SELECT path_to_string(path) AS path
FROM cooladata 
WHERE date_range(last 30 days) 
CLUSTER PATH BY session 
CONTAINS("Purchase") AND CONTAINS("Share") OR CONTAINS("Like")

Sequential Path

The top 5 paths by number of session of users who started by logging in, did a search, and made a purchase.

SELECT path_to_string(path) AS path, count(sid) AS visit_count
FROM cooladata 
WHERE date_range(last 30 days) 
CLUSTER PATH BY session 
  REMOVE DUPLICATES
  MATCH starts with event_name="Login" 
    followed by event_name="Search"
    ending with event_name="Purchase"
GROUP BY path 
ORDER BY visit_count DESC 
LIMIT 5

Path Functions

  • sub_path(path, event1, event2, …. eventn)This function will show only the paths that contain all the events in the list and in this order, and remove preceding and succeeding events. All events must be included at least once within the path. Events preceding the first event in the list or succeeding the last event in the list will not be shown. The function should include the property "path" and then a list of events.Example:
    sub_path(path, view_item, add_to_cart, checkout)

    Original Path: login, search, view_item, add_to_cart, share, checkout, searchResult: view_item, add_to_cart, share, checkout

  • part_path (path, event1, event2, …eventn)This function will show only the events included in this list of events, from paths which include all the events in the list, regardless of their order in the path. The function should include the property "path" and then a list of events.Example:
    part_path (path, checkout, add_to_cart)

    Original Path: login, search, view_item, add_to_cart, share, checkout, searchResult: add_to_cart, checkout

  • path_to_string(path | category_path)This function converts the path or category path (as indicated by the reserved parameters) into a comma delimited string.Example:
    path_to_string(path)
  • path_duration()This function returns an integer representing the duration (in milliseconds) of a session path. Example:
    path_duration()
  • path_count()This function returns an integer representing the number of events in a path.Example:
    path_count()
  • session_start_time()This function returns a timestamp of the session's first event.
    session_start_time()
  • session_end_time()This function returns a timestamp of the session's last event.
    session_end_time()
  • session_event_counter()This function returns the number of times the desired event appears in the session path, for every session. This function does not need to run on a path query.
    session_events_counter ("Login", "Like", "Create playlist"...)
    

    This query will return the number of times the events "login", "like", and "create playlist" appear in the session path for every session (running on the full path column).

  • first(path_to_string(path),[number_of_events])This presents the first x events in a path. It enables the analyst to learn what the popular first steps are that users take in each session. The function is intended for use in SELECT clause of the query.Example:
    first(path_to_string(path),2)

    In the example above we are interested in the learning the first events users take in a session.

  • last(path_to_string(path),[number_of_events])This presents the last x events in a path. It enables the analyst to learn what the popular last steps are that users take in each session. The function is intended for use in the SELECT clause of the query.Example:
    last(path_to_string(path),2)

    In the example above we are interested in the learning the last events users take in a session.

  • time_between(first_event, second_event[, first_event_condition, second_event_condition])This function calculates the time elapsed between two events with the option to apply conditions on each event. The function is intended for use in SELECT clause of the query. Event names must be quoted. Conditions may be any valid SQL style condition.Example:
    time_between("Register", "Purchase", membership_type = "Gold", order_total > 5)

    In the example above we are interested in the time elapsed between Gold users registration and a purchase they make over 5 dollars.As any field in the SELECT clause it is allowed to apply additional functions on top of the time_between(…) function result. For example:

    SELECT user_country,
        COUNT(user_id) AS user_count, 
        AVG(time_between("Register","Purchase",membership_type = "Gold",order_total > 5)) AS avg_time
    FROM cooladata
    WHERE date_range(BETWEEN 2013-01-01 AND 2013-04-30)
    GROUP BY user_country
    ORDER BY user_count