How to load data from Google Search Console to MS SQL Server
Access Your Data on Google Search Console
The first step in loading data from Search Console to any data warehouse solution is accessing them and starting extracting it.
You access data for the Google Search Console through the Search Console APIs. There are two APIs available there,
1. Search Console API
2. URL Testing Tools API
We are interested in the first API, which allows us to access any data we are interested in.
As with every other Google product, you need to authorize yourself to access the API by implementing the OAuth 2.0 protocol. The API is web-based following a REST-like architecture, but Google also offers some SDKs that you can use for some popular languages like Java and Python.
The things that you have to keep in mind when dealing with any API like the one the Google Search Console has, are:
Rate limits - Every API has some rate limits that you have to respect.
Authentication - You authenticate on Google using an OAuth.
Paging and dealing with big data - Platforms like Google tend to generate a lot of data. Pulling big volumes out of an API might be difficult, especially when considering and respecting any rate limits that the API has.
About Google Search Console
Google Search Console is a product offered by Google to web administrators. It allows you to submit sitemaps to Google, trigger your website’s indexing, and see statistics about what’s going on, like possible errors and speed-related problems.
Most importantly, Google Search Console offers a wealth of statistics about the queries that users perform to click on a link and get on one of your landing pages. This information can help tremendously in search engine optimization and when you are serious about content marketing.
You need to have in mind the following about Google Search Console:
1. You see only sample data, and
2. You can get data up to 90 days
So, it’s important to start collecting and storing your Google Search Console data as soon as possible and make sure that you sync all the available data.
Transform and Prepare Your Google Search Console Data
After you have accessed data on Search Console, you will have to transform it based on two main factors:
1. The limitations of the database that is going to be used
2. The type of analysis that you plan to perform
Each system has specific limitations on the data types and data structures that it supports. If you want to push data to Google BigQuery, you can send nested data like JSON directly. But when you are dealing with tabular data stores, like PostgreSQL, this is not an option. Instead, you will have to flatten out data before loading into the database.
Also, you have to choose the right data types. Again, depending on the system you will send the data to and data types that the API exposes to you, you will have to make the right choices. These choices are important because they can limit your queries’ expressivity and limit your analysts on what they can do directly out of the database.
Google Search Console data is modeled around the concept of a report, just like Google Analytics but with a much more limited number of dimensions and metrics.
In the end, you will need to map one report to a table on your database and make sure that all data is stored in it. Dimensions and metrics will become columns of the tables.
You need to take special care that the reports you will be getting from Google Search Console do not have primary keys given by Google to avoid duplicates.
For more information on how you can query your Search Analytics data, please see here.
Load Your Google Search Console Data Into Microsoft Sql Server
So, after you have managed to access data on Google Search Console and you have also figured out the structure that data will have on your database, you need to load every data into the database, in our case, into a Microsoft SQL Server.
As a feature-rich and mature product, MS SQL Server offers a large and diverse set of methods for loading data in a database. One way of importing data into your database is by using the SQL Server Import and Export Wizard. With it and through a visual interface, you will be able to bulk load data from a number of supported data sources.
Another way for importing bulk data to an SQL Server, both on Azure and on-premises, is by using the BCP utility. This is a command-line tool built specifically for bulk loading and unloading data using an MS SQL database.
Finally, you can BULK INSERT SQL statements for compatibility reasons, especially if you are managing databases from different vendors.
Similarly, and as it happens with the rest of the databases, you can also use the standard INSERT statements, where you will be adding data row-by-row directly to a table. It is the most basic and straightforward way of adding data in a table, but it doesn’t scale very well with larger datasets.
Updating Your Google Search Console Data On Ms Sql Server
As you will be generating more data on Google Search Console, you must update your older data on an MS SQL Server database. This includes new records, together with updates to older records that for any reason have been updated on Google Search Console.
You will need to periodically check Google Search Console for new data and repeat the process described previously while updating your currently available data if required. Updating an already existing row on a SQL Server table is achieved by creating UPDATE statements.
Another issue that you need to take care of is identifying and removing any duplicate records on your database. Either because Google Search Console does not have a mechanism to identify new and updated records or because of errors on any data pipelines, duplicate records might be introduced to your database.
In general, ensuring the quality of data inserted in your database is a big and challenging issue, and MS SQL Server features like TRANSACTIONS can help tremendously. However, they do not solve the problem in the general case.
The best way to load data from Google Search Console to MS SQL Server
So far, we just scraped the surface of what you can do with MS SQL Server and how to load data. Things can get even more complicated if you want to integrate data coming from different sources.
Are you striving to achieve results right now?
Instead of writing, hosting, and maintaining a flexible data infrastructure, RudderStack can handle everything automatically for you.
RudderStack, with one click, integrates with sources or services, creates analytics-ready data, and syncs your Search Console to MS SQL Server right away.