By Rudderstack Team

How to load data from Google Search Console to Google BigQuery

This post will help you with syncing your Google Search Console data to Google BigQuery. By doing this you will be able to perform advanced analytics on a system that is designed for this kind of data payloads, like Google BigQuery. Alternatively, you can simplify the process of syncing data from Google Search Console to Google BigQuery by using RudderStack, where the whole process will be handled by Rudderstack and you can focus on what matters, the analysis of your Google Search related data.

Access Your Data On Google Search Console

The first step in loading your Google search Console data to any kind of data warehouse solution, is to access your data and start extracting it.

You access your data for the Google Search Console through the Search Console APIs. There are two APIs available there,

  1. Search Console API
  2. URL Testing Tools API

From the two, we are interested in the first API, which allows us to access the data we are interested for.

As every other Google product, you need to authorize yourself to get access to the API through an implementation of the OAuth 2.0 protocol. The API is web-based following a REST-like architecture, but Google also offers some SDKs that you can use for some popular languages like Java and Python.

The things that you have to keep in mind when dealing with any API like the one the Google Search Console has, are:

  1. Rate limits. Every API has some rate limits that you have to respect.
  2. Authentication. You authenticate on Google using an OAuth.
  3. Paging and dealing with big amount of data. Platforms like Google tend to generate a lot of data. Pulling big volumes of data out of an API might be difficult, especially when considering and respecting any rate limits that the API has.

About Google Search Console

Google Search Console is a product offered by Google to web administrators. It allows you to submit sitemaps to Google, trigger the indexing of your website and see statistics about what’s going on, like possible errors and speed-related problems.

Most importantly, Google Search Console offers a wealth of statistics about the queries that users are performing to click on a link and get on one of your landing pages. This information can help tremendously in search engine optimization and when you are serious about content marketing.

You need to have in mind the following about Google Search Console.

  1. You see only sample data, and
  2. You can get up to 90 days of data

So, it’s important to start collecting and storing your Google Search Console data as soon as possible and make sure that you sync all the available data.

Transform And Prepare Your Google Search Console Data

After you have accessed your data on Google Search Console, you will have to transform it based on two main factors,

  1. The limitations of the database that the data will be loaded onto
  2. The type of analysis that you plan to perform

Each system has specific limitations on the data types and data structures that it supports. If you want to push data into Google BigQuery, you can send nested data like JSON directly. But when you are dealing with tabular data stores, like PostgreSQL, this is not an option. Instead, you will have to flatten out your data before loading into the database.

Also, you have to choose the right data types. Again, depending on the system you will send the data and the data types that the API exposes to you, you will have to make the right choices. These choices are important because they can limit the expressivity of your queries and limit your analysts on what they can do directly out of the database.

Google Search Console data is modeled around the concept of a report, just like Google Analytics but with a much more limited number of dimensions and metrics.

At the end, you will need to map one report to a table on your database and make sure that all the data is stored into it. Dimensions and metrics will become columns of the tables.

You need to take special care of the fact that the reports you will be getting from Google Search Console, do not have primary keys given by Google, in order to avoid duplicates.

For more information on how you can query your Search Analytics data, please see here.

Load Data From Google Search Console To Google Bigquery

If you want to load Google Search Console data to Google BigQuery, you have to use one of the following supported data sources.

  1. Google Cloud Storage
  2. Sent data directly to BigQuery with a POST request
  3. Google Cloud Datastore Backup
  4. Streaming insert
  5. App Engine log files
  6. Cloud Storage logs

From the above list of sources, 5 and 6 are not applicable in our case.

For Google Cloud Storage, you first have to load your data into it. There are a few options on how to do this. For example, you can use the console directly as described here and do not forget to follow the best practices.

Another option is to post your data through the JSON API, as we see again, APIs play an important role in both the extraction and the loading of data into our data warehouse. It’s just a matter of one HTTP POST request using a tool like CURL or Postman in its simplest case.

After you have loaded your data into Google Cloud Storage, you have to create a Load Job for BigQuery to actually load the data into it. This Job should point to the source data in Cloud Storage that have to be imported. This happens by providing source URIs that point to the appropriate objects.

The Best Way To Load Data From Google Search Console To Google Bigquery And Possible Alternatives

So far, we just scraped the surface of what can be done with Google BigQuery and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from, and the requirements of your use case.

Things can get even more complicated if you want to integrate data coming from different sources. Instead of writing, hosting, and maintaining a flexible data infrastructure, a possible alternative is to use a product like Rudderstack that can automatically handle this kind of problem for you.

Rudderstack integrates with multiple sources or services like databases, CRM, email campaigns, analytics, and more. Quickly and safely move all your data from Google Search Console into Google BigQuery and start generating insights from your data.