How to load data from Shopify to SQL Data Warehouse

This post will help you define a process or pipeline for getting your e-commerce related data from Shopify and load it into SQL Data Warehouse for further analysis. We will see how to access and extract your data from Shopify through its API and how to load it into SQL Data Warehouse. This process requires you to write the code to get the data and ensure that this process will run every time new data is generated. Alternatively, you can use products like RudderStack that can handle this kind of problem automatically for you.

What is Shopify?

Shopify is one of the easiest and simplest to use managed e-commerce platforms. It offers an easy way for anyone to create an online store and start selling goods. It natively integrates with services like Stripe and Paypal to make things even easier for potential users of the platform. Shopify also offers a very rich app marketplace, where its users can find powerful extensions and plugins that can enhance their online shops. They have a quite flexible pricing model that can cover the needs of very small to large online shops, you can start with the lite version and as your business grows quite easily, you can jump to a larger plan. Some of the benefits of Shopify are:

  • Easy to start. Both the experience the platform offers and the pricing model makes it extremely easy for anyone to set up and run an online shop.
  • No technical skills required. As a self-managed SaaS product, Shopify requires minimum technical skills from its users.
  • It is hosted. This means that you don’t have to care about where to host your online shop. They offer all the infrastructure and services needed without requiring you to delve into technical details.
  • It can be personalized. Although a hosted platform, it can be easily customized to meet your brand.

Additionally, Shopify has succeeded in creating a unique ecosystem of developers who build added value services and extensions on their platform. Of course, such a success was built on top of a well-designed API that exposes the whole platform to the developers. A large number of plug-ins are available that help Shopify users personalize even more their e-shops. At the same time, developers can generate a good income by selling their applications and extensions. This API will also be used in this article to show how we can pull out valuable data from Shopify.

What is Microsoft Azure SQL Data Warehouse?

SQL Data Warehouse is the data warehousing solution that you can use if you are a user of Microsoft Azure. It’s an elastic data warehousing as a service solution, emphasizing its enterprise focus. It also speaks SQL like the previous two solutions, and it supports querying both relational and non-relational data.

It offers a number of enterprise-class features like support for hybrid cloud installations and strong security. It’s probably the less mature solution compared to the two others, though, and it’s still in “Preview” mode although accessible to existing Azure subscribers.

How to Extract my data from Shopify?

Shopify exposes its complete platform to developers through its API. Thousands of developers use it to create applications that are then sold through the Shopify marketplace.

As a Web API following the RESTful architecture principles, it can be accessed through HTTP. As a RESTful API, interacting with it can be achieved by using tools like CURL or Postman or by using http clients for your favorite language or framework. A few suggestions:

Shopify also offers a number of SDKs that are officially supported and maintained by them and that can be used to access their platform for different uses. For example, by using the iOS and Android Buy SDK it is possible to add Shopify checkout capabilities to your mobile application. Other SDKs that are offered are:

Shopify API Authentication

There are two different types of applications in Shopify regarding authentication, private and public apps.

Private apps will only function on individual stores, so you don’t need to authenticate them through Oauth. You can get Private app credentials through your Partner dashboard or by logging into any Shopify admin. You can also use this API key to manipulate your store using the API console without building a fully functional app.

The apps you create in your Partners dashboard function as public apps, which can be made available for download in Shopify’s App Store if you meet the criteria.

Shopify rate limiting

The API call limit operates using a “leaky bucket” algorithm as a controller. This allows for infrequent bursts of calls and allows your app to continue to make an unlimited amount of calls over time. The bucket size is 40 calls (which cannot be exceeded at any given time), with a “leak rate” of 2 calls per second that continually empties the bucket. If your app averages 2 calls per second, it will never trip a 429 error (“bucket overflow”).

Endpoints and available resources

Shopify exposes 35+ endpoints covering all the possible touchpoints of e-commerce. Some of the most important resources that can be accessed through these endpoints are the following:

  • Abandoned checkouts: used to return abandoned checkouts. A checkout is considered abandoned when a customer has entered their billing & shipping info but has yet to complete the purchase.
  • ApplicationCharge: Request to charge a shop a one-time fee by issuing this call.
  • Article: Operations concerning articles in your Blog.
  • Asset: files that make up the theme of a shop.
  • Blog: Shopify and the e-commerce features also offer an environment where the merchant can create a Blog for her shop.
  • CarrierService: A Carrier Service (also known as a Carrier Calculated Service or Shipping Service) provides real-time shipping rates to Shopify.
  • Collect: An object that connects a product to a custom collection.
  • Customer: A customer resource instance represents a customer account with the shop.
  • Event: Events are generated by specific Shopify resources when specific things happen, such as the creation of an article.
  • Order: An order is a customer’s completed request to purchase one or more products from a shop.
  • Product: A product is an individual item for sale in a Shopify shop.
  • Transaction: Transactions are created for every order that results in an exchange of money.

For a complete list of endpoints, you can see them here.

It is clear that with such a rich platform and API, the data that can be pulled out of Shopify are both valuable and come in large quantities. So, let’s assume that we want to pull all events out of Shopify to use them for further analysis. To do so, we need to request the Event endpoint like this.

JAVASCRIPT
GET /admin/events.json?filter=Product,Order

This request will get us back to

JAVASCRIPT
HTTP/1.1 200 OK
{
"events": [
{
"id": 677313116,
"subject_id": 921728736,
"created_at": "2008-01-10T08:00:00-05:00",
"subject_type": "Product",
"verb": "create",
"arguments": [
"IPod Touch 8GB"
],
"body": null,
"message": "created a new product: <a href=\"\/admin\/products\/921728736\">IPod Touch 8GB<\/a>.",
"author": "Shopify",
"description": "created a new product: IPod Touch 8GB.",
"path": "\/admin\/products\/921728736"
},
{
"id": 365755215,
"subject_id": 632910392,
"created_at": "2008-01-10T07:00:00-05:00",
"subject_type": "Product",
"verb": "create",
"arguments": [
"IPod Nano - 8GB"
],
"body": null,
"message": "created a new product: <a href=\"\/admin\/products\/632910392\">IPod Nano - 8GB<\/a>.",
"author": "Shopify",
"description": "created a new product: IPod Nano - 8GB.",
"path": "\/admin\/products\/632910392"
}
]
}

all the events that are related to Products and Orders for our shop. The response will be in JSON and will look like this:

Inside the response, there will be an array of objects with each one representing one Order or Product.

Events are generated for the following resources:

  • Articles
  • Blogs
  • Custom Collections
  • Comments
  • Orders
  • Pages
  • Products
  • Smart Collections

Using the “limit” and “page” parameters, it is possible to page your results if you need to pull out a large number of events. Additionally, there are a number of ways that you can filter the results, for example, based on a date, so in a continuous data extraction process, it is possible to pull only new data and avoid duplicates. This can be achieved by using the “created_at_min” and “created_at_max” parameters in your GET request.

After successfully pulling your data from the Shopify API, you are ready to extract and prepare them for SQL Data Warehouse. Of course, the above process is only for one of the available resources, and if you would like to have a complete view of all the available data, then you will have to create a much more complex ETL process, including the majority of the 35+ resources that Shopify has.

How can I Load my data from Shopify to SQL Data Warehouse?

SQL Data Warehouse support numerous options for loading data, such as:

  • PolyBase
  • Azure Data Factory
  • BCP command-line utility
  • SQL Server integration services

As we are interested in loading data from online services by using their exposed HTTP APIs, we will not consider the usage of BCP command-line utility or SQL server integration in this guide. We’ll consider the case of loading our data as Azure storage Blobs and then use PolyBase to load the data into SQL Data Warehouse.

Accessing these services happens through HTTP APIs, as we see again APIs play an important role in both the extraction but also the loading of data into our data warehouse. You can access these APIs by using a tool like CURL or Postman. Or use the libraries provided by Microsoft for your favorite language. Before you actually upload any data you have to create a container which is something similar to a concept to the Amazon AWS Bucket, creating a container is a straightforward operation and you can do it by following the instructions found on the Blog storage documentation from Microsoft. As an example, the following code can create a container in Node.js.

JAVASCRIPT
blobSvc.createContainerIfNotExists('mycontainer', function(error, result, response){
if(!error){
// Container exists and allows
// anonymous read access to blob
// content and metadata within this container
}
});

After the creation of the container you can start uploading data to it by using again the given SDK of your choice in a similar fashion:

JAVASCRIPT
blobSvc.createBlockBlobFromLocalFile('mycontainer', 'myblob', 'test.txt', function(error, result, response){
if(!error){
// file uploaded
}
});

When you are done putting your data into Azure Blobs you are ready to load it into SQL Data Warehouse using PolyBase. To do that you should follow the directions in the Load with PolyBase documentation. In summary, the required steps to do it, are the following:

  • create a database master key
  • create a database scoped credentials
  • create an external file format
  • create an external data source

PolyBase’s ability to transparently parallelize loads from Azure Blob Storage will make it the fastest tool for loading data. After configuring PolyBase, you can load data directly into your SQL Data Warehouse by simply creating an external table that points to your data in storage and then mapping that data to a new table within SQL Data Warehouse.

Of course, you will need to establish a recurrent process that will extract any newly created data from your service, load them in the form of Azure Blobs and initiate the PolyBase process for importing the data again into SQL Data Warehouse. One way of doing this is by using the Azure Data Factory service. In case you would like to follow this path you can read some good documentation on how to move data to and from Azure SQL Warehouse using Azure Data Factory.



What is the best way to load data from Shopify to SQL Data Warehouse? Which are the possible alternatives?

So far, we just scraped the surface of what can be done with Microsoft Azure SQL Data Warehouse and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from, and the requirements of your use case. Things can get even more complicated if you want to integrate data coming from different sources. Instead of writing, hosting, and maintaining a flexible data infrastructure, a possible alternative is to use a product like RudderStack that can automatically handle this kind of problem for you.

RudderStack integrates with multiple sources or services like databases, CRM, email campaigns, analytics, and more.

Sign Up For Free And Start Sending Data

Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.

Get Started Image

Get started today

Start building smarter customer data pipelines today with RudderStack. Our solutions engineering team is here to help.