By Rudderstack Team

How to Access and Query Your Google BigQuery Data Using Python and R

Overview

In this post, we see how to load Google BigQuery data using Python and R, followed by querying the data to get useful insights. We leverage the Google Cloud BigQuery library for connecting BigQuery Python, and the bigrquery library is used to do the same with R.

We also look into the two steps of manipulating the BigQuery data using Python/R:

  • Connecting to Google BigQuery and accessing the data
  • Querying the data using Python/R
In this post, we assume that you have all your customer data stored in Google BigQuery.

If you are interested in learning more about how to get your data from your data sources into tools like Google BigQuery and other data warehouse solutions in real-time, you should explore the Customer Data Infrastructure tools like RudderStack.

Python

Python is one of the most widely-used general-purpose programming languages out there. It has gained a lot of attention and popularity because of its ease of use and flexibility.

Many engineers and data science teams also prefer Python because of the extensive libraries and tools available at their disposal to connect with other third-party systems to manipulate the data.

Connecting Google BigQuery with Python

To query your Google BigQuery data using Python, we need to connect the Python client to our BigQuery instance. We do so using a cloud client library for the Google BigQuery API. You can also choose to use any other third-party option to connect BigQuery with Python; the BigQuery-Python library by tylertreat is also a great option.

We use the Google Cloud BigQuery library because it is stable and officially supported by Google.

For this post, we assume that you already have a Python development environment set up. If not, we highly recommend you refer to the Python Development Environment Setup Guide.

To install the library, run the following command from your terminal:

pip install --upgrade google-cloud-bigquery

Next, we connect the client to the database. To do this, you will need to download a JSON file that contains the BigQuery service account credentials. If you don’t have a service account, follow this guide to create one, and then proceed to download the JSON file to your local machine.

Now that we have everything set up, we proceed to initialize the connection. The following Python code is used to do so:

from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(
'path/to/file.json')
project_id = 'my-bq'
client = bigquery.Client(credentials= credentials,project=project_id)

In the snippet above, you will need to specify the project_id and the location of your JSON key file by replacing the 'path/to/file.json' with the actual path to the locally stored JSON file.

In Google BigQuery, the project is a top-level container and provides default access control across all the datasets.

Executing Queries on BigQuery Data with Python

Now that we have the BigQuery client set up and ready to use, we can execute quer