SQL Data Warehouse support numerous options for loading data, such as:
- PolyBase
- Azure Data Factory
- BCP command-line utility
- SQL Server integration services
As we are interested in loading data from online services by using their exposed HTTP APIs, we will not consider the usage of BCP command-line utility or SQL server integration in this guide. We’ll consider the case of loading our data as Azure storage Blobs and then use PolyBase to load the data into SQL Data Warehouse.
Accessing these services happens through HTTP APIs. APIs play an important role in both the extraction and the loading of data into our data warehouse. You can access these APIs by using a tool like CURL or Postman. Or use the libraries provided by Microsoft for your favorite language. Before you actually upload any data you have to create a container which is something similar to a concept to the Amazon AWS Bucket, creating a container is a straightforward operation and you can do it by following the instructions found on the Blog storage documentation from Microsoft. As an example, the following code can create a container in Node.js:
Waiting for gist...
After the creation of the container you can start uploading data to it by using again the given SDK of your choice in a similar fashion:
Waiting for gist...
When you are done putting your data into Azure Blobs you are ready to load it into SQL Data Warehouse using PolyBase. To do that you should follow the directions in the Load with PolyBase documentation. In a summary the required steps to do it, are the following:
- create a database master key
- create a database scoped credentials
- create an external file format
- create an external data source
PolyBase’s ability to transparently parallelize loads from Azure Blob Storage will make it the fastest loading data tool. After configuring PolyBase, you can load data directly into your SQL Data Warehouse by simply creating an external table that points to your data in storage and then mapping it to a new table within SQL Data Warehouse.
You will need to establish a recurrent process that will extract any newly created data from your service, load them in the form of Azure Blobs and initiate the PolyBase process for importing the data again into SQL Data Warehouse. One way of doing this is by using the Azure Data Factory service. In case you would like to follow this path, you can read some good documentation on how to move data to and from Azure SQL Warehouse using Azure Data Factory.