🗓️ Live Webinar August 17: How Allbirds solves identity resolution in the warehouse with dbt Labs, Snowflake, and RudderStack

Register Now
By Rudderstack Team

How to load data from Shopify to Google BigQuery

This post helps you with loading your data from Shopify to BigQuery. Suppose you are looking to get analytics-ready data without the manual hassle. In that case, you can integrate Shopify to BigQuery with RudderStack, so you can focus on what matters, getting value out of your shop data.

What we will see:

  • Extract your preferred data using Shopify and Shopify API (the hard way)
  • Prepare your Shopify Data for Google BigQuery
  • Data load from Shopify to BigQuery
  • The best way to load any data from Shopify to BigQuery (the easy way)

How to Extract my data from Shopify?

Shopify exposes its complete platform to developers through its API. It is used by thousands of developers to create applications that are then sold through the Shopify marketplace.

As a Web API following the RESTful architecture principles, it can be accessed through HTTP. As a RESTful API, interacting with it can be achieved by using tools like CURL or Postmanor by using HTTP clients for your favorite language or framework.

A few suggestions:

Shopify also offers a number of SDKs that are officially supported and maintained by them and that can be used to access their platform for different uses. For example, by using the iOS and Android Buy SDK, it is possible to add Shopify checkout capabilities to your mobile application. Other SDKs that are offered are:

Shopify API Authentication

There are two different types of applications in Shopify when it comes to authentication, private and public apps.

Private apps will only function on individual stores, so you don’t need to authenticate them through Oauth. You can get Private app credentials through your Partner dashboard or by logging into any Shopify admin. You can also use this API key to manipulate your store using the API console without the need for building a fully functional app.

The apps you create in your Partners dashboard function as public apps that can be made available for download in Shopify’s App Store if you meet the criteria.

Shopify rate limiting

The API call limit operates using a “leaky bucket” algorithm as a controller. This allows for infrequent bursts of calls and allows your app to continue to make an unlimited amount of calls over time.

The bucket size is 40 calls (which cannot be exceeded at any given time), with a “leak rate” of 2 calls per second that continually empties the bucket. If your app averages 2 calls per second, it will never trip a 429 error (“bucket overflow”).

Endpoints and available resources

Shopify exposes 35+ endpoints covering all the possible touchpoints of e-commerce. Some of the most important resources that can be accessed through these endpoints are the following:

  • Abandoned checkouts: used to return abandoned checkouts. A checkout is considered abandoned when a customer has entered their billing & shipping info but has yet to complete the purchase.
  • ApplicationCharge: Request to charge a shop a one-time fee by issuing this call.
  • Article: Operations concerning articles on your Blog.
  • Asset: files that make up the theme of a shop.
  • Blog: Shopify, in addition to the e-commerce features also offers an environment where the merchant can create a Blog for her shop.
  • CarrierService: A Carrier Service (also known as a Carrier Calculated Service or Shipping Service) provides real-time shipping rates to Shopify.
  • Collect: An object that connects a product to a custom collection.
  • Customer: A customer resource instance represents a customer account with the shop.
  • Event: Events are generated by specific Shopify resources when specific things happen, such as the creation of an article.
  • Order: An order is a customer’s completed request to purchase one or more products from a shop.
  • Product: A product is an individual item for sale in a Shopify shop.
  • Transaction: Transactions are created for every order that results in an exchange of money.

For a complete list of endpoints, you can see here.

It is clear that with such a rich platform and API, the data that can be pulled out of Shopify are both valuable and come in large quantities. So, let’s assume that we want to pull all events out of Shopify to use them for further analysis.

To do so, we need to make a request to the Event endpoint like this.

GET /admin/events.json?filter=Product,Order

This request will get us back to all the events that are related to Products and Orders for our shop. The response will be in JSON and will look like this:

HTTP/1.1 200 OK
{
"events": [
{
"id": 677313116,
"subject_id": 921728736,
"created_at": "2008-01-10T08:00:00-05:00",
"subject_type": "Product",
"verb": "create",
"arguments": [
"IPod Touch 8GB"
],
"body": null,
"message": "created a new product: <a href=\"\/admin\/products\/921728736\">IPod Touch 8GB<\/a>.",
"author": "Shopify",
"description": "created a new product: IPod Touch 8GB.",
"path": "\/admin\/products\/921728736"
},
{
"id": 365755215,
"subject_id": 632910392,
"created_at": "2008-01-10T07:00:00-05:00",
"subject_type": "Product",
"verb": "create",
"arguments": [
"IPod Nano - 8GB"
],
"body": null,
"message": "created a new product: <a href=\"\/admin\/products\/632910392\">IPod Nano - 8GB<\/a>.",
"author": "Shopify",
"description": "created a new product: IPod Nano - 8GB.",
"path": "\/admin\/products\/632910392"
}
]
}

Inside the response, there will be an array of objects with each one representing one Order or Product.

Events are generated for the following resources:

  • Articles
  • Blogs
  • Custom Collections
  • Comments
  • Orders
  • Pages
  • Products
  • Smart Collections

By using the “limit” and “page” parameters it is possible to page your results in the case that you need to pull out a large number of events. Additionally, there are a number of ways that you can filter the results for example based on a date, so in a continuous data extraction process, it is possible to pull only new data and avoid duplicates.

This can be achieved by using the “created_at_min” and “created_at_max” parameters in your GET request.

After you have successfully pulled data out of the Shopify API you are ready to extract and prepare them for BigQuery. Of course, the above process is only for one of the available resources. If you would like to have a complete view of all the available data, then you will have to create a much complex ETL process, including the majority of the 35+ resources that Shopify has.

What is Shopify?

Shopify is one of the easiest and simple to use managed e-commerce platforms. It offers an easy way for anyone to create an online store and start selling goods. It natively integrates with services like Stripe and Paypal to make things even easier for potential users of the platform. Shopify also offers a very rich app marketplace, where its users can find powerful extensions and plugins that can enhance their online shops.

They have a quite flexible pricing model that can cover the needs of very small to large online shops, you can start with the lite version, and as your business grows quite easily you can jump to a larger plan. Some of the benefits of Shopify are:

  • Easy to start. Both the experience the platform offers and the pricing model makes it extremely easy for anyone to set up and run an online shop.
  • No technical skills required. As a self-managed SaaS product, Shopify requires minimum technical skills from its users.
  • It is hosted. This means that you don’t have to care about where to host your online shop, they offer all the infrastructure and services needed without requiring from you to delve into technical details.
  • It can be personalized. Although a hosted platform, it can be easily customized to meet your brand.

Additionally, Shopify has succeeded in creating a unique ecosystem of developers who build added value services and extensions on their platform. A large number of plug-ins are available that help Shopify users to personalize even more their e-shops while developers can generate a good income by selling their applications and extensions. Of course, such a success was built on top a well-designed API that exposes the whole platform to the developers. This API will be also used in this article to show how we can pull out valuable data using Shopify.

How can I prepare my data to be sent from Shopify to Google BigQuery?

Before you load data into BigQuery, you should make sure that it is presented in a format supported by it, so for example, if the API you pull out data returns XML, you have first to transform it into a serialization that BigQuery understands. Currently, two data formats are supported:

  • CSV, and
  • JSON

You also need to make sure that the data types you are using are the ones supported by BigQuery, which are the following:

  • STRING
  • INTEGER
  • FLOAT
  • BOOLEAN
  • RECORD
  • TIMESTAMP

For more information, please check the Preparing Data for BigQuery page on the documentation.

What is Google BigQuery?

BigQueryis the data warehousing solution of Google. It’s part of the Google Cloud Platform and it also speaks SQL like Redshift does. Queries are executed against append-only tables using the processing power of Google’s infrastructure.

It is also fully managed and is offered as a service over the cloud. You can interact with it through its web UI, using a command-line tool while a variety of client libraries exist so you can interact with it through your application.

Data load from Shopify to Google BigQuery

If you want to load any data from Shopify to BigQuery, you have to use one of the following supported data sources.

  • Google Cloud Storage
  • Sent data directly to BigQuery with a POST request
  • Google Cloud Datastore Backup
  • Streaming insert
  • App Engine log files
  • Cloud Storage logs

From the above list of sources, 5 and 6 are not applicable in our case.

For Google Cloud Storage, you first must load the preferred data into it. There are a few options on how to do this; for example, you can use the console directly as it is described here and does not forget to follow the best practices.

Another option is to post them through the JSON API, as we see again APIs play an important role in both the extraction and the loading of data into our data warehouse. In its simplest case, it’s just a matter of one HTTP POST request using a tool like CURL or Postman. It should look like the following example.

POST /upload/storage/v1/b/myBucket/o?uploadType=media&name=myObject
HTTP/1.1 Host: www.googleapis.com
Content-Type: application/text
Content-Length: number_of_bytes_in_file
Authorization: Bearer your_auth_token yourShopify data

and if everything went ok, you should get something like the following as a response from the server:

HTTP/1.1 200 Content-Type: application/json { "name": "myObject" }

Working with Curl or Postman is good only for testing, if you would like to automate the process of loading your own data into Bigquery, you should write some code to send them to Cloud in Google Storage.

In case you are developing on the Google App Engine you can use the library that is available for the languages that are supported by it:

If you are using one of the above languages and you are not coding for the Google App Engine, you can use it to access the Storage in the Cloud from your environment. Interacting such a feature-rich product like Google Cloud Storage can become quite complicated depending on your use case, for more details on the different options that exist you can check Google Storage documentation.

If you are looking for a less engaged and more neutral way of using Cloud Storage, you can consider a solution like RudderStack.

After you have loaded any data into Google Cloud Storage, you have to create a Load Job for BigQuery to actually load the data into it, this Job should point to the source data in Cloud Storage that have to be imported. This happens by providing source URIs that point to the appropriate objects.

The previous method described used a POST request to the Google Cloud Storage API for storing the data there and then load it into BigQuery. Another way is to do a direct HTTP POST request to BigQuery with the data you would like to query. This approach is similar to how we loaded the data to Storage in the Cloud through the JSON API, but it uses the appropriate end-points of BigQuery for loading the data there directly.

The way to interact with it is quite similar, for more information can be found on the BigQuery API Reference and on the page that describes how can you load data you own into BigQuery using POST. You can interact with it using the HTTP client library of the language or framework of your choice, a few options are:

The best way to load data from Shopify to Google BigQuery

So far, we just scraped the surface of what you can do with BigQuery and how can you load every data into it. Things can get even more complicated if you want to integrate data coming from different sources.

Are you striving to achieve results right now?

Instead of writing, hosting, and maintaining a flexible data infrastructure use RudderStack that can handle everything automatically for you.

RudderStack, with one click, integrates with sources or services, creates analytics-ready data, and syncs your Shopify to BigQuery right away.

Get Started Image

Get started today

Start building smarter customer data pipelines today with RudderStack. Our solutions engineering team is here to help.

Customer Data Platform for Developers | RudderStack
HIPPA Compliant
SOC 2 TYPE 2