By Rudderstack Team

How to load data from Salesforce to PostgreSQL

The purpose of this post is to help you define a pipeline and load data from Salesforce to a PostgreSQL database for further analysis. Information will be given on how you access and extract your Salesforce data through its API and how to load it to Postgres. Alternatively, you may check out RudderStack to load your data from Salesforce to PostgreSQL in minutes for you, so you can focus on what matters, which is how to extract more value from your data.

Extract your data from Salesforce

You can’t use a Data Warehouse without data, so the first and most important step is to extract the data you want from Salesforce. Salesforce has many products, and it’s also a pioneer in cloud computing and the API economy. This means that it offers a plethora of APIs to access the services and the underlying data. In this post, we’ll focus only on Salesforce CRM, which again exposes many APIs. More specifically, and as it can be found in this excellent post from their Helpdesk about which API to use, we have the following options.

  • Chatter REST API
  • Bulk API
  • Metadata API
  • Streaming API
  • Apex REST API
  • Apex SOAP API
  • Tooling API

Pull data from the Salesforce REST API

From the above list, the complexity and feature richness of the Salesforce API is more than evident. The REST API and the SOAP API are exposing the same functionalities but using different protocols. Interacting with the REST API can be done using tools like CURL or Postman or by using HTTP clients for your favorite language or framework. A few suggestions:

  • Apache HttpClient for Java
  • Spray-client for Scala
  • Hyper for Rust
  • Ruby rest-client
  • Python http-client

The Salesforce REST API supports OAuth 2.0 authentication. More information can be found in the Understanding Authentication article. After you successfully authenticate with the REST API, you must start interacting with its resources and start fetching data from it to load them on a data warehouse. It’s easy to get a list of all the resources we have access to. For example, using curl we can execute the following:

curl -H "Authorization: Bearer token"

A typical response from the server will be a list of available resources in JSON or XML, depending on what you have asked as part of your request.

"sobjects" : "/services/data/v26.0/sobjects",
"licensing" : "/services/data/v26.0/licensing",
"connect" : "/services/data/v26.0/connect"