BigQuery: Asserting your data is viable for your code
How often have you taken a peek at a report only to be thrown by really odd figures? This could be down to many reasons, but often if the code has been running well for some time, it’s likely to be the underlying data or dependent tables.
Many programmers would have heard of unit tests, which I will be talking about in a future article. However, Google Cloud Platform has provided some super handy functionality in the form of Assert.
What is an Assertion?
Put it simply it is a rule which checks if data will process, Assert can be used in a number of ways, in this case, I will be using it to conditionally run a code dependent on the Google Analytics 4 tables.
In this case, the first part of the code will check if the daily event table exists and count the number of tables that match the where condition. If the number of tables is greater than 1, the code below will run. Note it shouldn't be possible to have more than one table for a date but I have generated this issue to demo Assert functionality.
table_name LIKE 'events_2%'
AND (PARSE_DATE("%Y%m%d",REGEXP_REPLACE(table_name,r'[^0-9.]', '')) ) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
) >1 AS 'Table must EXIST.';
CREATE OR REPLACE TABLE analysis.asserts_example
COUNT(DISTINCT( user_pseudo_id) ) AS users
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d' , DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
The result is the error below, which means the rest of the code will not process.
So what happens if you adjust the Assert in line 10 to run if there is one table which meets the condition? Well voila - your table updates.
You can take this even further by adding postcondition checks, which means you can effectively be aware of issues in the data even before they sneak into reporting. Neat stuff.
If you have audit logs streaming to BigQuery you can take this even further, however, I will be adding a further article about this powerful feature at a later date including some very handy code.
In the meantime, if you select Logs Explorer, you can then filter down to your results to see the errors occurring. These can be exported into tools such as BigQuery, cloud storage and much more to help you automate error reports.
These tools are fantastic for automation, but also ensure you are the first person aware of issues taking place, especially as you scale up and add complexity to your BigQuery reporting.