BigQuery and GA4: how to redact parameters in code

Lace Chantelle Rogers
4 min readOct 3, 2022

As more and more people migrate from Universal Analytics to Google Analytics 4, we see more opportunities for people to use BigQuery to build truly enhanced reporting and optimised analytics.

However, it is also now more challenging to redact PII via Google Tag Manager which creates more reliance on robust code methodically to encrypt and prevent your websites and apps from sending PII to Google Analytics. This is a really positive move in the fact, since if the correct methodology is employed then you can be confident PII is not processed by Google.

So what happens when PII is sent to GA4? Firstly, you can process data deletion requests via the Google Analytics 4 interface as described in this link https://support.google.com/analytics/answer/9940393?hl=en.

But what happens to your BigQuery GA4 data when the worst happens and you detect PII? At present — nothing.

However, there are a number of choices, you could quite simply delete the tables with PII detected OR use a little BigQuery mastery to clean and redact the impacted data — allowing you to keep a clean and complete dataset.

The Code

This code will let you run each table update manually, with the condition you require to remove the PII. In this case, I have added two conditions, firstly for where the string value contains an @ but not where I validly have an email address and only for specific parameters where I have detected PII. The second is cleaning a Page string where I have defined conditions that are erroneously reporting PII. Instead of redacting this entire string, which would cause the loss of valuable UTM data, I have used safe split on the string to only return results after the UTM campaign.

If PII is present it will normally be in the string-based parameters, but since we are updating a nested value we still need to call the other parameter types to retain the nested array’s integrity. You’ll also notice there is mention of two arrays and this is because the param values are nested within the param keys, which are nested on in the events.

To update the code you must first edit the project name, I highly recommend making a copy of your impacted datasets to test against.

Lace Chantelle Rogers

https://www.bigquery.co.uk/ Head of data and analytics . GCP, BigQuery and GA4 expert specialising in data engineering and data science, with Python,sql