By Rudderstack Team

How to load data from Google Search Console to PostgreSQL

This post helps you with loading your Google Search Console data to PostgreSQL. If you are looking to get analytics-ready data without the manual hassle, use RudderStack to integrate Search Console to PostgreSQL in a few clicks. Focus on what matters, getting value out of your business data.

Access your data on Google Search Console

The first step in loading your Search Console data to any kind of data warehouse solution is to access your data and start extracting it.

There are two APIs available to access your data from Search Console:

  1. Search Console
  2. URL Testing Tools

From the two, we are interested in the first API which allows us to access the data we are interested in.

As with every other Google product, you need to authorize yourself to get access to the API through the implementation of the OAuth 2.0 protocol. The API is web-based following a REST-like architecture but Google also offers some SDKs that you can use for some popular languages like Java and Python.

The things that you have to keep in mind when dealing with any API like the one the Google Search Console has, are:

  1. Rate limits. Every API has some rate limits that you have to respect.
  2. Authentication. You authenticate on Google using an OAuth.
  3. Paging and dealing with a big amount of data. Platforms like Google tend to generate a lot of data. Pulling big volumes of data out of an API might be difficult, especially when you consider and respect any rate limits that the API has.

About Google Search Console

Search Console is a product offered by Google to web administrators. It allows you to submit sitemaps to Google, trigger the indexing of your website and see statistics about what’s going on, like possible errors and speed-related problems.

Most importantly, Google Search Console offers a wealth of statistics about the queries that users are performing in order to click on a link and get on one of your landing pages. This information can help tremendously in search engine optimization and when you are serious about content marketing.

You need to have in mind the following about Google Search Console.

  1. You see only sample data, and
  2. Can get up to 90 days of data

So, it’s important to start collecting and storing your Google Search Console data as soon as possible and make sure that you sync all the available data.

Transform and prepare your Google Search Console data

After you have accessed your data on Google Search Console, you will have to transform it based on two main factors,

  1. The limitations of the database that the data will be loaded onto
  2. The type of analysis that you plan to perform

Each system has specific limitations on the data types and data structures that it supports. If for example, you want to push data into Google BigQuery, then you can send nested data like JSON directly. But when you are dealing with tabular data stores, like PostgreSQL, this is not an option. Instead, you will have to flatten out your data before loading it into the database.

Also, you have to choose the right data types. Again, depending on the system that you will send the data to and the data types that the API exposes to you, you will have to make the right choices. These choices are important because they can limit the expressivity of your queries and limit your analysts on what they can do directly out of the database.

Google Search Console data is modeled around the concept of a report, just like Google Analytics but with a much more limited number of dimensions and metrics.

In the end, you will need to map one report to a table on your database and make sure that all the data is stored in it. Dimensions and metrics will become columns of the tables.

You need to take special care of the fact that the reports you will be getting from Google Search Console, do not have primary keys given by Google, in order to avoid duplicates.

For more information on how you can query your Search Analytics data, please refer here.

Each table is a collection of columns with a predefined data type as an integer or VARCHAR. PostgreSQL, like any other SQL database, supports a wide range of different data types.

A typical strategy for loading data from Google Search Console to a Postgres database is to create a schema where you will map each API endpoint to a table. Each key inside the API endpoint response should be mapped to a column of that table and you should ensure the right conversion to a Postgres compatible data type.

Load data from Google Search Console to PostgreSQL

For example, if an endpoint from Google Search Console returns a value as String, you should convert it into a VARCHAR with a predefined max size or TEXT data type. tables can then be created on your database using the CREATE SQL statement.

Once you have defined your schema and you have created your tables with the proper data types, you can start loading data into your database.

The preferred way of adding larger datasets into a PostgreSQL database is by using the COPY command. COPY is copying data from a file on a file system that is accessible by the Postgres instance, in this way much larger datasets can be inserted into the database in less time. COPY requires physical access to a file system in order to load data.

Nowadays, with cloud-based, fully managed databases, getting direct access to a file system is not always possible. If this is the case and you cannot use a COPY statement, then another option is to use PREPARE together with INSERT, to end up with optimized and more performant INSERT queries.

Updating your Google Search Console data on PostgreSQL

As you will be generating more data on Google Search Console, you will need to update your older data on PostgreSQL. This includes new records together with updates to older records that for any reason have been updated on Google Search Console.

You will need to periodically check Search Console for new data and repeat the process that has been described previously while updating your currently available data if needed. Updating an already existing row on a PostgreSQL table is achieved by creating UPDATE statements.

Another issue that you need to take care of is the identification and removal of any duplicate records on your database. Either because Google Search Console does not have a mechanism to identify new and updated records or because of errors on your data pipelines, duplicate records might be introduced to your database.

In general, ensuring the quality of the data that is inserted in your database is a big and difficult issue and PostgreSQL features like TRANSACTIONS can help tremendously, although they do not solve the problem in the general case.

The best way to load data from Google Search Console to PostgreSQL