GA4 and BigQuery: how to match the Console traffic grouping metrics
Google Analytics 4 offers excellent functionality with its GA4 BigQuery dataset. However, you might find it challenging to align your GA4 UI figures with the unnested sources, mediums, and campaigns in BigQuery.
So what happens in BigQuery compared to GA4?
When data is generated, it’s sent directly to BigQuery as raw data without any pre-processing. However, once it’s in GA4, Google applies various models and transformations. This includes joining data, modeling session traffic, adding cost data, and replacing ISO2 codes with language names.
Matching up your numbers
If you unnest the data in GA4 to extract the session source, you’ll notice significant discrepancies between your GA4 UI and GA4 BigQuery data, particularly if you have extensive paid and social activity.
The sample code below displays the raw traffic source, resulting in more users appearing as ‘direct’ or ‘organic’ than in the UI. This discrepancy arises because the data is unprocessed.
(select value.string_value from unnest(event_params) where key = 'source') AS source,
(select value.string_value from unnest(event_params) where key = 'medium') AS medium,
(select cast(value.int_value as string) from unnest(event_params) where key = 'ga_session_id')))) AS session_count
WHERE event_name = 'page_view'
AND (select value.int_value from unnest(event_params) where key = 'entrances') =1
GROUP BY 1,2
ORDER BY 3 DESC
To mirror the GA4 User Interface, you’ll need to further customize this code. This is due to Google’s handling of direct and organic traffic for sessions in the UI. Specifically, if a user comes from a direct source, the system will use the user’s traffic source.
This observation is based on my extensive testing and data analysis. By making this adjustment, the figures align more closely with the GA4 UI, clarifying why CPC mediums might appear much lower in GA4 BigQuery compared to the GA4 UI.
( case when (select value.string_value from unnest(event_params)…