BigQuery: Easily manage dataset access permissions with DCL
As we all know one of the biggest challenges of managing a data warehouse is permission to access data. Often this can be challenging to keep on top off, and even more so when you have specific dataset level permissions.
Luckily BigQuery has recently released Data Control Language functionality, which means you can quickly grant BigQuery dataset level access without manually using the UI or using tools such as Python.
This means you can have a BigQuery data warehouse which services multiple departments or different levels of permissions depending on the team. This is key when your warehouse may have valid PII data which you only want specific team members to have access to.
Before you start ensure you have the permissions as mentioned in this link. I also recommend ensuring you double-check you know what region your dataset is stored in since you will need to ensure you run the DCL in this region.
The syntax below allows you to either grant access to or revoke access from either at a dataset or table level. In testing, I have not seen an error occur if the user is granted or revoked access multiple times, however, an error will occur if the user does not exist on GCP.
ON SCHEMA/ON TABLE `projectname`.dataset
TO/FROM "user:email@example.com", "user:firstname.lastname@example.org", group:email@example.com ,
If you are part of a larger business where department heads or IT security managers onboard new teams, I highly recommend creating google groups per department, with the onus on these being managed by department heads. This way you will not have to manage the individual level access and the responsibility to remove and add new users. Likewise, you can design groups and control them as the warehouse administrator.
You’ll then be able to add these groups with the required permissions, which limits maintenance. This does come at risk if department managers do not effectively control groups so I suggest inforcing quarterly access reviews and data access agreements with the managers of these lists.
Before getting started with DCL, consider whether the user will be accessing your warehouse via the project/billing account where the data is stored, or if they have departmental projects which have unique billing accounts or project names. The benefit of the second approach is managing billing to a department level but will come with some risks such as not having oversight of how the data is used and where.
Firstly add all users BigQuery Job User(
roles/bigquery.jobUser) if your groups or users are going to be working on your project and billing account. This means they have access to BigQuery but not to specific datasets or tables.
More advanced processes
If you have a warehouse in which you have a standard naming schema for datasets, you can combine DCL with tools such as Infomation_schema, to write your DCL code for you.
If you have a structured schema for your datasets you can then automate adding or revoking users to multiple datasets. For example:
Department_type_detail for example would be analytics_reporting_snapshots
If you then run the following code, editing the string to reflect the accesses you’re granting or revoking, a string of code will be generated. Copy and paste this output into a new BigQuery tab and voila — you have written your DCL.
BASE AS (
CONCAT("GRANT `roles/bigquery.admin` ON SCHEMA ", schema_name, " TO 'user:firstname.lastname@example.org' ;'") AS string_value,
schema_name LIKE 'analytics_reporting_%' )
STRING_AGG(string_value, "\n") AS OUT_VAL
- hint: make sure you are processing the code in the same location as you are granting permissions.
As always scripting and procedural language helps you work smarter for less time. And be sure to send me suggestions for other GCP tools you would like me to write about.