How to time-travel with BigQuery and undelete tables
One of my favourite things is working on data warehouses using BigQuery; in fact, BigQuery provides a wealth of tools to automate and process data to build a robust ETL.
As you may know, BigQuery has some super powerful features, which allow us to build, manipulate and even run machine learning algorithms within the interface. However, we’re all human, and things can go wrong- so what do you do when that happens?
Imagine one morning you’re on the BigQuery interface and want to simply do an aggregation of your table to check the volumes, but you forget to clear down the options before running.
Just as the code finishes running, you realise months of daily updates have been lost as the table was accidentally overwritten and, even worse, any reporting in dashboard tools like Data Studio or Tableau using this table will now fail.
Before you look to completely re-run all your historical data which could both end up costing a lot of your time and increase your BigQuery spend, Google Cloud Platform has some great tools which allow you to recover and revert your datasets.
What can you do?
- If you have deleted a table, or have an error on your table, you have seven days to recover. Effectively you can time travel.
And what isn’t possible?
- Remember at present you cannot restore a deleted dataset
The syntax
The syntax itself for reverting the tables is relatively simple and is not expensive to run.
- The interval can be seconds/hours/days; however, it cannot be over seven days due to the reversion time limit.
- You must remember when you are reverting if you have added correct data in the period you are reverting to you will lose this, so it’s crucial to check when the error occurred if possible.
- You can either use Data Manipulation Language (DML)* or the output functionality to overwrite the table you are reverting to, without a destination, you will simply see the data in a temp table.
- Here are step by step instructions from Google Link
Here is a link to the script on Github
Then voilà, check your table has reverted and if all looks well, have a tea and relax in the knowledge your reports feeding on this table will be back up and running when they next process (phew).
Whilst we should always be super careful when amending or overwriting tables if something does go wrong, remember to look at this syntax before you re-run the full history.
* And what is DML I hear you say?
DML stands for Data Manipulation Language. DML statement is used to insert, update or delete the records on tables and is super powerful when creating tables. With DML and DDL you can both effectively script and manage your tables and code effectively and I will be writing more extensively on these tools over the next few weeks.