🗓️ Live Webinar August 17: How Allbirds solves identity resolution in the warehouse with dbt Labs, Snowflake, and RudderStack

Register Now
By Rudderstack Team

How to load data from Magento to SQL Data Warehouse

How may I load data from Magento to SQL Data Warehouse? The purpose of this guide is to help you define a process or pipeline, for getting your data from Magento and loading it into SQL Data Warehouse for further analysis. Information will be given on how to access and extract your data from Magento through its API and how to load it into SQL Data Warehouse. This process requires you to write the code to get the data and make sure that this process will run every time new data is generated. Alternatively, you can use products like RudderStack that can handle this kind of problem automatically for you.

About Magento

Magento is an e-commerce platform built on open source technology which provides online merchants with a flexible shopping cart system, as well as control over the look, content and functionality of their online store. Magento offers powerful marketing, search engine optimization, and catalog-management tools. Some of its main characteristics are the following:

  • Feature rich. Magento is very rich in functionality and offers an in-depth and powerful platform.
  • Powerful SEO. Magento is well known for its SEO capabilities, it offers one of the, out-of-the-box, SEO optimisation for the stores hosted on it.
  • Magento, ensures that your store can seamlessly grow along with your business.
  • Flexibility. Its template based architecture allows you to pretty customise everything.
  • Magento community edition is open source and free to use.
  • Security. Magento is built with security in its core.
  • User friendly. The administration area exposes a simple back end with intuitive navigation and well organised store management features.
  • Community. Being open source, guarantees a healthy ecosystem to support and further develop the platform.

Magento offers three different versions of its platform:

  • Community edition. It’s the open source and free version of Magento.
  • Enterprise edition. Magento Enterprise Edition is designed to empower merchants to rapidly innovate and deliver engaging experiences to customers across all channels and devices.
  • Enterprise cloud edition. The Enterprise Edition of the platform as a service delivered over the cloud.

About Microsoft Azure SQL Data Warehouse

SQL Data Warehouse is the data warehousing solution that you can use if you are a user of Microsoft Azure. It’s an elastic data warehousing as a service solution, emphasizing its enterprise focus. It also speaks SQL like the previous two solutions and it supports querying both relational and non-relational data. It offers a number of enterprise-class features like support for hybrid cloud installations and strong security. It’s probably the less mature solution compared to the two others though, it’s still in “Preview” mode although accessible to existing Azure subscribers.

Extract your data from Magento

Magento exposes its platform through both a REST and a SOAP interface. Both can be used to pull data from it, which is also the scope of this article, but also to interact with the platform. By using these interfaces, developers create rich applications and plugins for Magento. In this post we will use the REST version of the Magento platform.

As a Web API following the RESTful architecture principles, it can be accessed through HTTP. As a RESTful API, interacting with it can be achieved by using tools like CURL or Postman or by using http clients for your favourite language or framework. A few suggestions:

Magento does not publish official SDKs but by using the SOAP interface it is possible to automatically generate clients that can act as SDKs for your favourite language or platform. For example in Java, you can create a client in Eclipse by providing the WSDL file that Magento exposes after you set up the platform.

Magento API Authentication

Magento is a self hosted platform, unless you are using the Enterprise cloud edition, so you have much more control over its access than other solutions, but if you want to access its data through the REST API that it has, then you will have to use oAuth for authentication which is supported by Magento.

Magento rate limiting

As a platform hosted on your own premises it doesn’t really imposes any rate limiting. In any case as you would like to avoid stressing your e-commerce platform that is facing your customers you should make sure that your pipeline process does not over stress your Magento installation. But this is completely at your discretion.

Endpoints and available resources

Magento exposes the following resources:

  • Products. Retrieve the list of products, create, update, delete a product.
  • Product categories. Retrieve the list of categories assigned to a product, assign and unassign the category from a product.
  • Product images. Retrieve the list of websites assigned to a product, assign, unassign a website to/from a product.
  • Customers. Retrieve the list of customers, create, delete a customer, and update the customer information.
  • Customer Addresses. Retrieve the list of customer addresses, create, update, and delete the customer address.
  • Inventory. Retrieve the list of stock items, update required stock items.
  • Sales Orders. Retrieve the list of sales orders with detailed information on order addresses, items, and comments.
  • Order Addresses. Retrieve information on the specified order comments.
  • Order Items. Retrieve information on specified order items.

The API is possible to return either JSON or XML responses, this is something that you can control by providing the appropriate Accept header content type.

For all the above resources we can request from the Magento platform to pull out a list of results with all the associated data, so ideally we would like to pull all the data and make sure that we keep them up to date on our analytics platform of choice for further analysis. For this post we will just see how we can pull data for one resource, the Sales Orders, the process is the same for all other resources.

To pull data for the Sales Orders, we need to execute a get request to the following endpoint:

http://magentohost/api/rest/orders

As a platform hosted by you, you need to replace the “magentohost” part of the URL with the actual URL of the host that has Magento running. The rest of the URL is the same as the above. The default response is in XML and looks like the following:

<?xml version="1.0"?>
<magento_api>
<data_item_1>
<customer_id>3</customer_id>
<base_discount_amount>0.0000</base_discount_amount>
<base_shipping_amount>455.0000</base_shipping_amount>
<base_shipping_tax_amount>0.0000</base_shipping_tax_amount>
<base_subtotal>13650.0000</base_subtotal>
<base_tax_amount>0.0000</base_tax_amount>
<base_total_paid></base_total_paid>
<base_total_refunded></base_total_refunded>
<tax_amount>0.0000</tax_amount>
<total_paid></total_paid>
<total_refunded></total_refunded>
<base_shipping_discount_amount>0.0000</base_shipping_discount_amount>
<base_subtotal_incl_tax>13650.0000</base_subtotal_incl_tax>
<base_total_due>14105.0000</base_total_due>
<total_due>14105.0000</total_due>
<base_currency_code>USD</base_currency_code>
<tax_name></tax_name>
<tax_rate></tax_rate>
<addresses>
<data_item>
<region>Palau</region>
<postcode>19103</postcode>
<lastname>Doe</lastname>
<street>2356 Jody Road Philadelphia, PA 19103</street>
<city>PA</city>
<telephone>610-634-1181</telephone>
<country_id>US</country_id>
<firstname>John</firstname>
<address_type>billing</address_type>
<prefix></prefix>
<middlename></middlename>
<suffix></suffix>
<company></company>
</data_item>
<data_item>
<region>Palau</region>
<postcode>19103</postcode>
<lastname>Doe</lastname>
<street>2356 Jody Road Philadelphia, PA 19103</street>
<city>PA</city>
<telephone>610-634-1181</telephone>
<country_id>US</country_id>
<firstname>John</firstname>
<address_type>shipping</address_type>
<prefix></prefix>
<middlename></middlename>
<suffix></suffix>
<company></company>
</data_item>
</addresses>
<order_items>
<data_item>
<sku>Sunglasses_1</sku>
<price>150.0000</price>
<base_price>150.0000</base_price>
<base_original_price>150.0000</base_original_price>
<tax_percent>0.0000</tax_percent>
<tax_amount>0.0000</tax_amount>
<base_tax_amount>0.0000</base_tax_amount>
<base_discount_amount>0.0000</base_discount_amount>
<base_row_total>13650.0000</base_row_total>
<base_price_incl_tax>150.0000</base_price_incl_tax>
<base_row_total_incl_tax>13650.0000</base_row_total_incl_tax>
</data_item>
</order_items>
</data_item_1>
<data_item_2>
<customer_id>3</customer_id>
<base_discount_amount>0.0000</base_discount_amount>
<base_shipping_amount>95.0000</base_shipping_amount>
<base_shipping_tax_amount>0.0000</base_shipping_tax_amount>
<base_subtotal>3350.0000</base_subtotal>
<base_tax_amount>0.0000</base_tax_amount>
<base_total_paid>2445.0000</base_total_paid>
<base_total_refunded>1845.0000</base_total_refunded>
<tax_amount>0.0000</tax_amount>
<total_paid>2445.0000</total_paid>
<total_refunded>1845.0000</total_refunded>
<base_shipping_discount_amount>0.0000</base_shipping_discount_amount>
<base_subtotal_incl_tax>3350.0000</base_subtotal_incl_tax>
<base_total_due>1000.0000</base_total_due>
<total_due>1000.0000</total_due>
<base_currency_code>USD</base_currency_code>
<tax_name></tax_name>
<tax_rate></tax_rate>
<addresses>
<data_item>
<region>Palau</region>
<postcode>19103</postcode>
<lastname>Doe</lastname>
<street>2356 Jody Road Philadelphia, PA 19103</street>
<city>PA</city>
<telephone>610-634-1181</telephone>
<country_id>US</country_id>
<firstname>John</firstname>
<address_type>billing</address_type>
<prefix></prefix>
<middlename></middlename>
<suffix></suffix>
<company></company>
</data_item>
<data_item>
<region>Palau</region>
<postcode>19103</postcode>
<lastname>Doe</lastname>
<street>2356 Jody Road Philadelphia, PA 19103</street>
<city>PA</city>
<telephone>610-634-1181</telephone>
<country_id>US</country_id>
<firstname>John</firstname>
<address_type>shipping</address_type>
<prefix></prefix>
<middlename></middlename>
<suffix></suffix>
<company></company>
</data_item>
</addresses>
<order_items>
<data_item>
<sku>Sunglasses_1</sku>
<price>150.0000</price>
<base_price>150.0000</base_price>
<base_original_price>150.0000</base_original_price>
<tax_percent>0.0000</tax_percent>
<tax_amount>0.0000</tax_amount>
<base_tax_amount>0.0000</base_tax_amount>
<base_discount_amount>0.0000</base_discount_amount>
<base_row_total>1350.0000</base_row_total>
<base_price_incl_tax>150.0000</base_price_incl_tax>
<base_row_total_incl_tax>1350.0000</base_row_total_incl_tax>
</data_item>
<data_item>
<sku>Sun_glasses</sku>
<price>200.0000</price>
<base_price>200.0000</base_price>
<base_original_price>200.0000</base_original_price>
<tax_percent>0.0000</tax_percent>
<tax_amount>0.0000</tax_amount>
<base_tax_amount>0.0000</base_tax_amount>
<base_discount_amount>0.0000</base_discount_amount>
<base_row_total>2000.0000</base_row_total>
<base_price_incl_tax>200.0000</base_price_incl_tax>
<base_row_total_incl_tax>2000.0000</base_row_total_incl_tax>
</data_item>
</order_items>
</data_item_2>
</magento_api>

As we can see, we get back a list of items with each one representing an order, that contains all the information that we would like to use for further analysis. information like the discount that we might have applied, the taxes paid, the base price of the order, etc. As we might have many order objects to retrieve, we should paginate through the results. To do that, we need to provide the “page” and “limit” parameters to our GET request.
Now that we have the results from our Magento shop, we can further process them before we are able to load them into the BI platform of our choice.

Load Data from Magento to SQL Data Warehouse

SQL Data Warehouse support numerous options for loading data, such as:

  1. PolyBase
  2. Azure Data Factory
  3. BCP command-line utility
  4. SQL Server integration services

As we are interested in loading data from online services by using their exposed HTTP APIs, we are not going to consider the usage of BCP command-line utility or SQL server integration in this guide. We’ll consider the case of loading our data as Azure storage Blobs and then use PolyBase to load the data into SQL Data Warehouse.

Accessing these services happens through HTTP APIs, as we see again APIs play an important role in both the extraction but also the loading of data into our data warehouse. You can access these APIs by using a tool like CURL or Postman. Or use the libraries provided by Microsoft for your favorite language. Before you actually upload any data you have to create a container which is something similar as a concept to the Amazon AWS Bucket, creating a container is a straightforward operation and you can do it by following the instructions found on the Blog storage documentation from Microsoft. As an example, the following code can create a container in Node.js.

blobSvc.createContainerIfNotExists('mycontainer', function(error, result, response){
if(!error){
// Container exists and allows
// anonymous read access to blob
// content and metadata within this container
}
});

After the creation of the container you can start uploading data to it by using again the given SDK of your choice in a similar fashion:

blobSvc.createBlockBlobFromLocalFile('mycontainer', 'myblob', 'test.txt', function(error, result, response){
if(!error){
// file uploaded
}
});

When you are done putting your data into Azure Blobs you are ready to load it into SQL Data Warehouse using PolyBase. To do that you should follow the directions in the Load with PolyBase documentation. To summarize, the required steps to do it, are the following:

  • create a database master key
  • create a database scoped credentials
  • create an external file format
  • create an external data source

PolyBase’s ability to transparently parallelize loads from Azure Blob Storage will make it the fastest tool for loading data. After configuring PolyBase, you can load data directly into your SQL Data Warehouse by simply creating an external table that points to your data in storage and then mapping that data to a new table within SQL Data Warehouse.

Of course, you will need to establish a recurrent process that will extract any newly created data from your service, load them in the form of Azure Blobs and initiate the PolyBase process for importing the data again into SQL Data Warehouse. One way of doing this is by using the Azure Data Factory service. In case you would like to follow this path you can read some good documentation on how to move data to and from Azure SQL Warehouse using Azure Data Factory.

The best way to load data from Magento to SQL Data Warehouse and possible alternatives

So far we just scraped the surface of what can be done with Microsoft Azure SQL Data Warehouse and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from, and the requirements of your use case. Things can get even more complicated if you want to integrate data coming from different sources. A possible alternative, instead of writing, hosting and maintaining a flexible data infrastructure, is to use a product like RudderStack that can handle this kind of problems automatically for you.

RudderStack integrates with multiple sources or services like databases, CRM, email campaigns, analytics and more.

Get Started Image

Get started today

Start building smarter customer data pipelines today with RudderStack. Our solutions engineering team is here to help.

Customer Data Platform for Developers | RudderStack
HIPPA Compliant
SOC 2 TYPE 2