It’s always a challenge making BigQuery DataWarehouses more efficient. So today I want to share with you the joys of DDL and how you can reduce the time it takes to produce tables from your queries. In the past 6 months BigQuery’s DDL and DML functionality has exploded with plenty of new features and massive time savers, which mean less time spent trying to keep your BigQuery warehouse tidy.
So firstly- what even is DDL and why would you even use it?
DDL stands for Data Definition Language and is primarily used to create and modify tables. It is a universal term for many types of SQL, but in BigQuery it is used for:
- Creating tables and views
- Modifying tables and views
- Deleting tables and views
- Creating User-Defined Functions
- A little more about UDFs in a later article (as they’re great)
It’s used in the console and within scripting to help frame and design your table structure. Within the BigQuery console, it saves lots of time and means you can automate code with things like clustering when it’s not currently available on the Scheduling interface.
So you have written a code that you wish to run daily. You could use the BigQuery scheduling tools, however, you also want to add clustering and descriptions to the variable names.
The above is a simple query from the public datasets and available here
So how do we adjust this query to create the table?
Firstly you should ask yourself: do you want to insert, create, or replace your table?
Insert into … if you want to append data into the table, just ensure you are not accidentally adding duplicates into the table
Create table … If you want to create a new table, however, this will only work as a one-off
Create or replace table … You’ll overwrite the existing table or create if it doesn’t exist, so remember to retain the data you wish to carry over
Create table if not exists… You’ll create a table if it doesn’t already exist, but again be sure to retain data you wish to carry over
All of these are have great uses but also should be used with caution (see our last article Link if things do go wrong)
Here is a sample for you to test out using DDL table creation queries in your own project
For more examples take a look at our repository for this post, github
So you’ve decided to create or replace your table, so how do you enhance your queries further?
There are so many options when it comes to DDL these include
- Adding descriptions to columns and tables
- Data retention policy
In this example, we create a table that has a lot of features, we will discuss features such as partitioning and how they can improve your table structure and query in future articles.
This will use “create table”, this table will generate clusters and partitions which will help you and your colleagues subsequently use this table efficiently. Make sure you use the create and replace should your table already exist, however if you have not previously added partitioning or clustering you should append historical data and generate a new table.
What this looks like on the resulting table. Please note that if you overwrite a table you will lose the descriptions.
You will also see more information on the Details tab- this includes the options to add descriptions, labels and data retention. You can manually edit these values, but including them in the code means you and automate your process if you wish this process to run regularly.
I hope you find this article useful, and DDL helps you automate some of your daily tasks to support your day to day work.
If you want to discuss more about DDL or the other great features and resources available via GCP and BigQuery, drop me a message.