BigQuery: The art of unnesting google analytics 4 event parameters
When you first catch a glimpse of the BigQuery Google Analytics or Firebase BigQuery tables, you may be instantly taken aback by the structure of these.
Fear not, the data is nested (arrayed) and is incredibly powerful in storing huge volumes of data in a highly accessible format. Once you master unnesting arrays and create them, you will see how much these will optimise your tables and reduce the number of columns required to store elements of data.
Google Analytics 4 and Firebase Tables
The standard tables for the Firebase and GA4 streams are the same, which means your data models can be reused across projects on both platforms if you are using the older Firebase SDKs.
The way to recognise the nesting is by looking for a column with the type of record. Underneath this record, you will see the text on the nested columns grey, with all nested values having the same root as the record type — in this case event_params. Each drop-down helps you identify a level of nesting and in the schema preview, they are always shown underneath each other.
You can also spot a nested column in the preview mode, where you will see the nesting results in greyed-out space.
So how do you get started?
Simple steps:
When your developer teams implement events, they will also implement parameters, likewise, Google provides standard and recommended events and parameters. The way to envisage this is similar to a pivot table.
The parameter taxonomy is the key that contains the name added within the code. One of the four columns of values will be populated depending on what data structure this parameter is.
To extract the values and essentially flatten the tables we must use unnesting. Luckily due to the simplified structure compared to GA360 BigQuery datasets, we can unnest within the select statement.
This is done by identifying the parameter data type, unnesting the event_params and selecting the parameter you wish to unnest. If you select the wrong datatype the code will still process, however you will have null values.
SELECT
DISTINCT (
SELECT
value.string_value #param data type
FROM
UNNEST(event_params) #unnest the events
WHERE
key = 'page_title' ) AS page_title #param name
FROM
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
ORDER BY
1
If you have access to a project, try it out yourself with the above code which uses the Google Analytics 4 public datasets. I have added some more complex processes to unnesting in my previous blog around UDFs.
Unnesting at the from statement
Like GA360 you are also able to unnest at the from level of the data, this can be useful if you use effective filtering. It does mean effectively for every parameter available within an event, that there will be a row of data. The code below unnests and generates a expanded view of the parameters. However use this with caution since the table generated will have a lot of rows compared to unnesting in the select statement.
SELECT
DISTINCT
TIMESTAMP_MICROS(event_timestamp) AS time_micros,
event_name,
ep.key as key ,#param name
ep.value.string_value as string_value , #param data type
ep.value.int_value as int_value ,
ep.value.float_value as float_value ,
ep.value.double_value as doub_value ,
FROM
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`,
unnest(event_params) as ep
WHERE event_name = 'view_search_results'
#add key filter
ORDER BY
1,2,3
Common issues
When you first use BigQuery’s streamed data you may see the errors in the following images. These are super easy to resolve and are due to the way BigQuery reads in the nested columns.
When you use distinct * on nested columns
When you try to call the event_param.key without unnesting
One final note, is whilst nested variables look scary at first once you get used to them you’ll see a clear benefit in the cost of storage and ease of access. I’ll be writing further posts on how to design events, generate arrays and much much more. Be sure to message me if you have a suggestion of a blog post.