Common reasons for data discrepancies

Companies often use several tools to understand their user's behavior. Alongside Journey Analytics, most companies also use Google Analytics or their own DB for comparison.

We suggest investigating discrepancies if there is more than a 5% difference between Journey Analytics and other tools. Any less is likely not material enough to warrant a full tracking audit, since often analytics are used to identify trends (e.g. how fast are we growing?), rather than exact numbers. If the difference is greater than 5% across all events, or specific events don't match between systems, then further investigation is called for.

Check this guide to identify common reasons for data discrepancies between various systems:

Timezone

Journey Analytics's default timezone is UTC. When comparing data between Journey Analytics and other system such as Google Analytics, consider the time zone differences.

Double-check your query

When weird numbers appear in a report, it's not always a data issue. Sometimes the report is just not querying what we intend it to query. Make sure you are querying the correct date range, and no filters are applied, filtering out relevant data.

Invalids

Journey Analytics automatically validates the data sent to Journey Analytics in order to prevent Garbage In – Garbage Out situations. Events marked as invalid are not stored with the rest of the valid events, but in a separate, designated table for invalid events, which you can query to check what went wrong. See Handling Invalid Events to learn more.

Data Sampling

Google Analytics sometimes use sampled data in reports, causing discrepancies in the numbers from Journey Analytics. Google Analytics sampling occurs automatically when more than 500K sessions are collected for a report. Google Analytics state that a report is based on sampling in text above the report. When comparing Journey Analytics to a sampled Google Analytics report, discrepancies are to be expected.

Session Definition

A session in Journey Analytics starts when someone visits your site or app, sending an event, and ends after thirty minutes of inactivity. The session duration is calculated as the difference between the first and last event in that session. This thirty minutes timeframe is a configurable parameter. If you are the project's admin, you can see this parameter under 'Session timeout" in your project settings page. Most analytics tools, such as Google Analytics use this thirty minute definition, which might cause discrepancies when comparing session duration or number of sessions if you set this parameter to be different than thirty minutes in Journey Analytics. Also, consider that Google Analytics will count additional sessions for clicks on AdWords campaigns, and will hard stop all sessions at midnight, whereas in Journey Analytics sessions occurring across midnight (starting before midnight and ending afterwards) would be stored as one session. Other mobile analytics tools platforms also end sessions if the user moved the app to the background for more than a minute.

Events are sent differently

A common cause for discrepancy is the way the events are sent to Journey Analytics and other tools. For instance, if one tool is receiving events from the server-side and the other from the client side, differences in numbers will most likely occur.

Even if both tools are sent from the client side, the code needs to be checked. Sometimes, there is a logical condition for sending an event to one tool which is not the same as the code sending the event to the other tool. When using JS SDK, the location of the trackEvent code is important. If the call sending the event to one tool is at the top of the code and the call sending the event to another tool is at the bottom of the code, there might be some discrepancies, due to an error in the code or if the user manually closed the window before the trackEvent function was called.

Bots and Test Users

Some tools automatically filter out events created by bots, Journey Analytics does not. Journey Analytics does have several solutions available if you wish to slice out bad IP's or bots. To find the best solution for you, contact your Customer Support Manager.

If your operational DB automatically cleans test user's activity, make sure you filter out test users in Journey Analytics as well.

Funnel and Conversion Definitions

Funnels in Journey Analytics count distinct (unique) users who completed the funnel in the date range in question, in the time window set in the report. Conversions are sometimes defined differently in other tools. For instance, Google Analytics count the number of sessions in which the funnel's steps were completed.

Notice that if you choose to set the funnel in Journey Analytics to show users who completed the funnel by X days, the funnel will only include users who did the first event X days before the end of the report date range when looking at the last week or month. This is done in order to give users who came in at the beginning of the date range, the same "chance" to complete the funnel as users who came in at the end of the date range.

Redirects and Self Referrals

When looking at Journey Analytics's "referring_url" and "referring_domain" you should see the url and domain the user was referred from. Sometimes, you see URLs and domain's that you do not expect, such as your own URL. This happens when your site uses redirecting rules, usually set up by your site admin.

Journey Analytics Sessions Table

Journey Analytics stores your data in two separate tables: one is your event table, in which every row is an event, and the other is your sessions table, in which every row is a session. The event table holds all the event-level data and event-scope properties, as well as user and session scope properties. The sessions table holds session-specific properties (such as session duration and the session path) as well as session and user scope properties. In order to optimize performance, Journey Analytics automatically shifts your queries to run on top of the sessions table if all the data you are searching for is there. For instance, Daily Active Users (DAU), counting number of unique users per day will run over the sessions table, but if you want to count Daily Active Payers (pDAU), you will have to run over the events table in order to add a filter for the payment event. Shifting between the sessions and events table might cause for slight differences due to the fact that when running over the sessions table, the date range is filtered according to the session start time (session_start_time_ts) whereas when running over the events table the date range is filtered according to the event (event_time_ts) timestamp.