BigQuery: User-Defined Functions to supercharge your SQL

As we all know Google is continuously improving and expanding the Google Cloud Platform and BigQuery has truly benefited from this. Taking a simple scalable data warehouse into what is now a comprehensive and dynamic system. As you may guess, I’m a massive fan of BigQuery.

BigQuery offers a huge amount of functionality including DDL and DML, however, sometimes, there simply isn’t the function to help you complete what you are doing. Often in the past, I would end up accessing BigQuery via Python or SAS and applying the missing function.

However User Defined Functions cover some of the current gaps, and since they are now persistent (I’ll explain this further later) you can share and re-use these with your colleagues.

So what is a UDF?

An Example

Below shows the SQL logic being generated into a temporary function which is then applied to your string. Whilst this function is now defunct since recently BigQuery has introduced INITCAP, I wanted to show you the effect of the UDF compared to the INITCAP function.

So how do you generate your own SQL UDFs?

Steps to create your first UDF

CREATE [OR REPLACE] [TEMPORARY | TEMP] FUNCTION [IF NOT EXISTS]

The name of your function, if you are creating a temp function you only need the function name. I generally recommend using a literal value that defined what you are planning to achieve where possible and ensure you maintain your database guidelines like using snake case and core naming conventions.

[[`project_name`.]dataset_name.]function_name

You will then need to add what parameter you expect to return plus what data type it returns. One note on this, you can have multiple params returned as long as they are comma-delimited. I’ve added examples here since I personally took a few attempts to work out what to return.

([named_parameter[, ...]])
[RETURNS data_type]
#Example
(userAge INT64)
#Multi value example
(userAge INT64, gender STRING)

Here’s where the fun starts — adding in your SQL expression. This can be a Scalar query or a simple script. Just think how much time you could save on joining back onto your data to apply logic. For any SAS users, this is very similar to a macro.

AS (sql_expression/logic/js)#Google's example 
CREATE TEMP FUNCTION
addFourAndDivideAny(x ANY TYPE,
y ANY TYPE) AS ( (x + 4) / y );
SELECT
addFourAndDivideAny(3 /*x*/ ,3 /*y*/ ) AS integer_output

So let's see this in action:

(SELECT VALUE.INT_VALUE FROM UNNEST(EVENT_PARAMS) WHERE KEY = 'CCC') 

However, that is super painful to repeat multiple times and likely to produce errors, so using a UDF, scripting and the fab SQL execute statement, I have generated the code below, which will automatically unnest your Firebase or GA4 data.

Be careful to select the specific date frame your reviewing and select specific events you wish to review. You can even remove the SQL execute and simply copy the query into another window or a tool such as Python.

So how do you save a UDF to your project?

CREATE FUNCTION  UDFS.ga4_firebase

In fact, a great UDF community has formed an open-sourced UDF project which you can actually apply to your own work, including many statistical techniques such as P Values and much more simply by using the following syntax. Be sure to check them out:

SELECT bqutil.fn.udf_name(variable)

I hope you have enjoyed this brief intro to UDFs and have plenty of fun trying them out in your code. My next blog post will be focused on creating JS UDFs as well as accessing these when stored on the cloud (which open up a world of statistical opportunities).

Lead data specialist, and programming obsessive. Specialising in data team development. Love statistical languages and focusing on the GCP platform on apps.