How to load data from Magento to MS SQL Server

Magento is one of the most popular e-commerce platforms. How can you combine your Magento data with other sources to gain new insights? Let’s see how to get the data we have on Magento to an MS SQL Server Analytics data warehouse.

This post will help you define a process or pipeline, for getting your e-commerce related data from Magento and load it into SQL Server for further analysis. We will see how to access and extract your data from Magento through its API and how to load it into a SQL Server database.

This process requires you to write the code to get the data and make sure that this process will run every time new data are generated. Alternatively, you may check out RudderStack to load your data from Magento to MS SQL Server in minutes for you.

Extract your data from Magento

Magento exposes its platform through both a REST and a SOAP interface. Both can be used to pull data from it, which is also the scope of this article but also to interact with the platform. By using these interfaces, developers create rich applications and plugins for Magento. In this post, we will use the REST version of the Magento platform.

As a Web API following the RESTful architecture principles, it can be accessed through HTTP. As a RESTful API, interacting with it can be achieved by using tools like CURL or Postman or by using HTTP clients for your favorite language or framework. A few suggestions:

  • Apache HttpClient for Java
  • Spray-client for Scala
  • Hyper for Rust
  • Ruby rest-client
  • Python http-client

Magento does not publish official SDKs but by using the SOAP interface it is possible to automatically generate clients that can act as SDKs for your favorite language or platform. For example in Java, you can create a client in Eclipse by providing the WSDL file that Magento exposes after you set up the platform.

Magento API Authentication

Magento is a self-hosted platform unless you are using the Enterprise cloud edition, so you have much more control over its access than other solutions, but if you want to access its data through the REST API that it has, then you will have to use oAuth for authentication which is supported by Magento.

Magento Rate Limiting

As a platform hosted on your own premises it doesn’t really impose any rate-limiting. In any case, as you would like to avoid stressing your e-commerce platform that is facing your customers you should make sure that your pipeline process does not overstress your Magento installation. But this is completely at your discretion.

Endpoints and Available Resources

Magento exposes the following resources:

  1. Products. Retrieve the list of products, create, update, delete a product.
  2. Product categories. Retrieve the list of categories assigned to a product, assign and unassign the category from a product.
  3. Product images. Retrieve the list of websites assigned to a product, assign, unassign a website to/from a product.
  4. Customers. Retrieve the list of customers, create, delete a customer, and update the customer information.
  5. Customer Addresses. Retrieve the list of customer addresses, create, update, and delete the customer address.
  6. Inventory. Retrieve the list of stock items, update required stock items.
  7. Sales Orders. Retrieve the list of sales orders with detailed information on order addresses, items, and comments.
  8. Order Addresses. Retrieve information on the specified order comments.
  9. Order Items. Retrieve information on specified order items.

The API is possible to return either JSON or XML responses, this is something that you can control by providing the appropriate Accept header content type.

For all the above resources we can request from the Magento platform to pull out a list of results with all the associated data, so ideally we would like to pull all the data and make sure that we keep them up to date on our analytics platform of choice for further analysis. For this post we will just see how we can pull data for one resource, the Sales Orders, the process is the same for all other resources.

To pull data for the Sales Orders, we need to execute a get request to the following endpoint:

TEXT
http://magentohost/api/rest/orders

As a platform hosted by you, you need to replace the “magentohost” part of the URL with the actual URL of the host that has Magento running. The rest of the URL is the same as the above. The default response is in XML and looks like the following:

MARKDOWN
<?xml version="1.0"?>
<magento_api>
<data_item_1>
<customer_id>3</customer_id>
<base_discount_amount>0.0000</base_discount_amount>
<base_shipping_amount>455.0000</base_shipping_amount>
<base_shipping_tax_amount>0.0000</base_shipping_tax_amount>
<base_subtotal>13650.0000</base_subtotal>
<base_tax_amount>0.0000</base_tax_amount>
<base_total_paid></base_total_paid>
<base_total_refunded></base_total_refunded>
<tax_amount>0.0000</tax_amount>
<total_paid></total_paid>
<total_refunded></total_refunded>
<base_shipping_discount_amount>0.0000</base_shipping_discount_amount>
<base_subtotal_incl_tax>13650.0000</base_subtotal_incl_tax>
<base_total_due>14105.0000</base_total_due>
<total_due>14105.0000</total_due>
<base_currency_code>USD</base_currency_code>
<tax_name></tax_name>
<tax_rate></tax_rate>
<addresses>
<data_item>
<region>Palau</region>
<postcode>19103</postcode>
<lastname>Doe</lastname>
<street>2356 Jody Road Philadelphia, PA 19103</street>
<city>PA</city>
<telephone>610-634-1181</telephone>
<country_id>US</country_id>
<firstname>John</firstname>
<address_type>billing</address_type>
<prefix></prefix>
<middlename></middlename>
<suffix></suffix>
<company></company>
</data_item>
<data_item>
<region>Palau</region>
<postcode>19103</postcode>
<lastname>Doe</lastname>
<street>2356 Jody Road Philadelphia, PA 19103</street>
<city>PA</city>
<telephone>610-634-1181</telephone>
<country_id>US</country_id>
<firstname>John</firstname>
<address_type>shipping</address_type>
<prefix></prefix>
<middlename></middlename>
<suffix></suffix>
<company></company>
</data_item>
</addresses>
<order_items>
<data_item>
<sku>Sunglasses_1</sku>
<price>150.0000</price>
<base_price>150.0000</base_price>
<base_original_price>150.0000</base_original_price>
<tax_percent>0.0000</tax_percent>
<tax_amount>0.0000</tax_amount>
<base_tax_amount>0.0000</base_tax_amount>
<base_discount_amount>0.0000</base_discount_amount>
<base_row_total>13650.0000</base_row_total>
<base_price_incl_tax>150.0000</base_price_incl_tax>
<base_row_total_incl_tax>13650.0000</base_row_total_incl_tax>
</data_item>
</order_items>
</data_item_1>
<data_item_2>
<customer_id>3</customer_id>
<base_discount_amount>0.0000</base_discount_amount>
<base_shipping_amount>95.0000</base_shipping_amount>
<base_shipping_tax_amount>0.0000</base_shipping_tax_amount>
<base_subtotal>3350.0000</base_subtotal>
<base_tax_amount>0.0000</base_tax_amount>
<base_total_paid>2445.0000</base_total_paid>
<base_total_refunded>1845.0000</base_total_refunded>
<tax_amount>0.0000</tax_amount>
<total_paid>2445.0000</total_paid>
<total_refunded>1845.0000</total_refunded>
<base_shipping_discount_amount>0.0000</base_shipping_discount_amount>
<base_subtotal_incl_tax>3350.0000</base_subtotal_incl_tax>
<base_total_due>1000.0000</base_total_due>
<total_due>1000.0000</total_due>
<base_currency_code>USD</base_currency_code>
<tax_name></tax_name>
<tax_rate></tax_rate>
<addresses>
<data_item>
<region>Palau</region>
<postcode>19103</postcode>
<lastname>Doe</lastname>
<street>2356 Jody Road Philadelphia, PA 19103</street>
<city>PA</city>
<telephone>610-634-1181</telephone>
<country_id>US</country_id>
<firstname>John</firstname>
<address_type>billing</address_type>
<prefix></prefix>
<middlename></middlename>
<suffix></suffix>
<company></company>
</data_item>
<data_item>
<region>Palau</region>
<postcode>19103</postcode>
<lastname>Doe</lastname>
<street>2356 Jody Road Philadelphia, PA 19103</street>
<city>PA</city>
<telephone>610-634-1181</telephone>
<country_id>US</country_id>
<firstname>John</firstname>
<address_type>shipping</address_type>
<prefix></prefix>
<middlename></middlename>
<suffix></suffix>
<company></company>
</data_item>
</addresses>
<order_items>
<data_item>
<sku>Sunglasses_1</sku>
<price>150.0000</price>
<base_price>150.0000</base_price>
<base_original_price>150.0000</base_original_price>
<tax_percent>0.0000</tax_percent>
<tax_amount>0.0000</tax_amount>
<base_tax_amount>0.0000</base_tax_amount>
<base_discount_amount>0.0000</base_discount_amount>
<base_row_total>1350.0000</base_row_total>
<base_price_incl_tax>150.0000</base_price_incl_tax>
<base_row_total_incl_tax>1350.0000</base_row_total_incl_tax>
</data_item>
<data_item>
<sku>Sun_glasses</sku>
<price>200.0000</price>
<base_price>200.0000</base_price>
<base_original_price>200.0000</base_original_price>
<tax_percent>0.0000</tax_percent>
<tax_amount>0.0000</tax_amount>
<base_tax_amount>0.0000</base_tax_amount>
<base_discount_amount>0.0000</base_discount_amount>
<base_row_total>2000.0000</base_row_total>
<base_price_incl_tax>200.0000</base_price_incl_tax>
<base_row_total_incl_tax>2000.0000</base_row_total_incl_tax>
</data_item>
</order_items>
</data_item_2>
</magento_api>

As we can see, we get back a list of items with each one representing an order, that contains all the information that we would like to use for further analysis. information like the discount that we might have applied, the taxes paid the base price of the order, etc. As we might have many order objects to retrieve, we should paginate through the results. To do that, we need to provide the “page” and “limit” parameters to our GET request.

Now that we have the results from our Magento shop, we can further process them before we are able to load them into the BI platform of our choice.

Magento Data Preparation for Microsoft SQL Server

As in every relational database, SQL Server requires a well-defined database schema before we start populating with data. Data is organized in schemas, which are distinct namespaces where database objects belong to.

The most common database objects are of course tables which have a number of columns with each one having a declared data type. MS SQL Server supports a large number of different data types. This gives us great flexibility in expressing the data that we have and at the same time optimizing our data warehouse.

When working with data coming from web services, where data is usually serialized in JSON, it is important to correctly map the data to the right data types. As changing the data types in the future is a process that might cost in downtime of your database, it is important to spend enough time thinking about the proper data type assignments.

For example, dates in JSON are just strings, but when storing date objects in a database, we can enhance analytics with great capabilities by transforming the raw string data into an appropriate date type. A typical strategy for loading data from Magento to an SQL Server database is to create a schema where you will map each API endpoint to a table. Each key inside the Magento API endpoint response should be mapped to a column of that table and you should ensure the right conversion to an SQL Server compatible data type.

Of course, you will need to ensure that as the data types from the Magento API might change, you will adapt your database tables accordingly, there’s no such thing as automatic data typecasting. After you have a complete and well-defined data model or schema for Microsoft SQL Server, you can move forward and start loading your data into the database.

Load data from Magento to MS SQL Server

As a feature-rich and mature product, MS SQL Server offers a large and diverse set of methods for loading data into a database. One way of importing data into your database is by using the SQL Server Import and export Wizard. With it and through a visual interface you will be able to bulk load data from a number of data sources that are supported.

You can import data from another SQL Server, from an Oracle database, from Flat Files, from an Access Data Source, PostgreSQL, MySQL, and finally Azure Blob Storage. Especially if you are using a managed version of MS SQL Server on Azure, you should definitely consider utilizing the Azure Blob Storage connection.

In this way, you will be loading data as Blobs on Azure and your MS SQL Server database will sync with it through the Import and Export Wizard.

Another way for importing bulk data into an SQL Server, both on Azure and on-premises, is by using the bcp utility. This is a command-line tool that is built specifically for bulk loading and unloading of data from an MS SQL database.

Finally and for compatibility reasons, especially if you are managing databases from different vendors, you can BULK INSERT SQL statements.

In a similar way and as it happens with the rest of the databases, you can also use the standard INSERT statements, where you will be adding data row-by-row directly to a table. It is the most basic and straightforward way of adding data into a table but it doesn’t scale very well with larger data sets.

So for bulk datasets, you better consider one of the previous methods.

Updating your Magento data on MS SQL Server

As you will be generating more data on Magento, you will need to update your older data on an MS SQL Server database. This includes new records together with updates to older records that for any reason have been updated on Magento.

You will need to periodically check Magento for new data and repeat the process that has been described previously while updating your currently available data if needed. Updating an already existing row on a SQL Server table is achieved by creating UPDATE statements.

Another issue that you need to take care of is the identification and removal of any duplicate records on your database. Either because Magento does not have a mechanism to identify new and updated records or because of errors on your data pipelines, duplicate records might be introduced to your database.

In general, ensuring the quality of the data that is inserted in your database is a big and difficult issue and MS SQL Server features like TRANSACTIONS can help tremendously, although they do not solve the problem in the general case.

About Magento

Magento is an e-commerce platform built on open source technology that provides online merchants with a flexible shopping cart system, as well as control over the look, content, and functionality of their online store. Magento offers powerful marketing, search engine optimization, and catalog-management tools. Some of its main characteristics are the following:

  • Feature-rich. Magento is very rich in functionality and offers an in-depth and powerful platform.
  • Powerful SEO. Magento is well known for its SEO capabilities, it offers one of the out-of-the-box, SEO optimizations for the stores hosted on it.
  • Magento ensures that your store can seamlessly grow along with your business.
  • Flexibility. Its template-based architecture allows you to pretty customize everything.
  • Magento community edition is open source and free to use.
  • Security. Magento is built with security in its core.
  • User friendly. The administration area exposes a simple back end with intuitive navigation and well-organized store management features.
  • Community. Being open-source guarantees a healthy ecosystem to support and further develop the platform.

Magento offers three different versions of its platform:

  1. Community edition. It’s the open-source and free version of Magento.
  2. Enterprise edition. Magento Enterprise Edition is designed to empower merchants to rapidly innovate and deliver engaging experiences to customers across all channels and devices.
  3. Enterprise cloud edition. The Enterprise Edition of the platform as a service delivered over the cloud.

About Microsoft SQL Server

Microsoft SQL Server is one of the oldest and most mature database systems. Its first version was introduced about 28 years ago, in 1989, and Microsoft has been consistently supporting and extending the product until today.

So, it’s no surprise that Microsoft SQL Server has one of the richest feature sets among the currently available database systems.

SQL Server is delivered in different editions or flavors. The most notable being the Enterprise edition which can manage databases as large as 524 petabytes utilizing up to 12 terabytes of memory and 640 CPU processors. A free and scaled-down version is called Express.

A Business Intelligence version focusing on use cases where BI is performed on-premises. This version is actually a bundle of different products, including the core database system, together with other Microsoft-related products that can be used for BI purposes like visualization and data management.

In addition, there are also plenty of specialized versions of the database like the Compact edition that can be used on small devices and of course the Azure version, which is the cloud-based edition of SQL Server. Microsoft SQL Server incorporates a modular architecture that can extend the database with additional services. Replication services can extend the database to a cluster version and thus help with scaling and fault tolerance.

The SQL Server Analysis Services, augment the database with OLAP and data mining capabilities, making the database ideal for the workloads that we care about in this guide.

The best way to load data from Magento to MS SQL Server and possible alternatives

So far we just scraped the surface of what can be done with MS SQL Server and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from, and the requirements of your use case.

Things can get even more complicated if you want to integrate data coming from different sources. A possible alternative, instead of writing, hosting, and maintaining a flexible data infrastructure, is to use a product like RudderStack that can handle this kind of problem automatically for you.

RudderStack integrates with multiple sources or services like databases, CRM, email campaigns, analytics, and more.

Sign Up For Free And Start Sending Data

Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.

Get Started Image

Get started today

Start building smarter customer data pipelines today with RudderStack. Our solutions engineering team is here to help.