How to load data from MailChimp to PostgreSQL

Extract your data from MailChimp

First of all, Mailchimp updated its API to v3 recently, so ensure that anything you do will be with this version as the previous are all deprecated although still supported. For more information, visit MailChimp API v3.0 documentation. MailChimp was always a promoter of APIs and encouraged integration with other systems. It has a rich API that exposes a large number of endpoints for interacting with the resources of the applications. More specifically, there are endpoints for the following resources:

  • Automations – offers functionality related to automated tasks that we define on MailChimp
  • Batch operations – for managing batch processes on our MailChimp account
  • Campaign folders – helps you organize your campaigns into groups
  • Campaigns – for managing your campaigns
  • Conversations – helps you track threads of emails with specific users
  • File manager files – for managing your static assets like images
  • File manager folders – for creating folders to organize your assets
  • Lists – CRUD operations on lists of users
  • Reports – for accessing reports with statistics on your campaigns
  • Template folders – operations on creating folders for organizing your templates for your emails
  • Templates – operations on templates for your emails

Something interesting to note about the MailChimp API is that we don’t see a root-level resource for users anywhere. This makes sense of course, if we take into consideration that MailChimp is all about mail campaigns so users or to put in the correct context, subscribers, are not a stand-alone resource but instead they exist only inside the lists we manage.

It is important to understand that every service perceives the world from a different perspective which of course it’s relevant to the value it offers, so even if what we care about is information about our users, it makes complete sense for someone like MailChimp to organize everything around lists and campaigns.

If we check the model of another service, like Intercom for example, we’ll notice that the user is at the top resources, which again makes sense because Intercom is all about one-to-one communication with users.

MailChimp and any other service that you might be using have figured out (hopefully) the optimal model for its operations. Still, when we fetch their data, we usually want to answer questions or do things that are not part of the context that these services operate, something that makes these models sub-optimal for your analytics needs.

For this reason, we should always keep in mind that when we work with data coming from external services we need to remodel it and bring it to the right form for our needs.

Interacting with the MailChimp REST API can be done by using tools like CURL or Postman or by using http clients for your favorite language or framework. A few suggestions:

  • Apache HttpClient for Java
  • Spray-client for Scala
  • Hyper for Rust
  • Ruby rest-client
  • Python http-client

The MailChimp REST API supports OAuth 2.0 authentication. More information can be found in the Authorised apps section of the API documentation. After you successfully authenticate with the REST API, you have to start interacting with its resources and start fetching data from it in order to load them on your data warehouse.

Extract your User Data from the MailChimp API

So let’s assume that we want to get all the information we got on MailChimp for our users in order to enrich our user records inside our data warehouse. To do that, we need to do the following.

First, we need to fetch all the lists that we have created on MailChimp. We can do this by performing a GET request to the appropriate endpoint.

SH
curl --request GET
--url 'https://usX.api.mailchimp.com/3.0/lists'
--user 'anystring:apikey'
—include

We should get back a response like the following:

JSON
{
"lists": [
{
"id": "57afe96172",
"name": "Freddie's Jokes",
"contact": {
"company": "MailChimp",
"address1": "675 Ponce De Leon Ave NE",
"address2": "Suite 5000",
"city": "Atlanta",
"state": "GA",
"zip": "30308",
"country": "US",
"phone": ""
},……

Using the ID we get from the response for each list, we iterate through all the lists and make requests to the appropriate end-points to get the members for the list.

SH
curl --request GET
--url 'https://usX.api.mailchimp.com/3.0/lists/57afe96172/members'
--user 'anystring:apikey'
—include

and we should get a response back like the following:

JSON
{
"members": [
{
"id": "f777bbffab8d1ceca8b757df63c47cb8",
"email_address": "urist.mcvankab+1@freddiesjokes.co",
"unique_email_id": "882e9bec19",
"email_type": "html",
"status": "subscribed",
"status_if_new": "",
"merge_fields": {
"FNAME": "",
"LNAME": ""
},
"interests": {
"9143cf3bd1": true,
"3a2a927344": false,
"f9c8f5f0ff": false,
"f231b09abc": true,
"bd6e66465f": false
},…………

Keep in mind that a user might appear in more than one list, which means that you also need to reduplicate your results based on this fact. After we get the results we need and collect all the user-related information from various other resources, we need to map it to our Data Warehouse repository model before we do the actual loading.

MailChimp Data Preparation for PostgreSQL

To populate a PostgreSQL database instance with data, first, you must have a well-defined data model or schema that describes the data. As a relational database, PostgreSQL organizes data around tables.

Each table is a collection of columns with a predefined data type as an integer or VARCHAR. PostgreSQL, like any other SQL database, supports a wide range of different data types.

A typical strategy for loading from MailChimp to PostgreSQL is to create a schema where you will map each API endpoint to a table. Each key inside the MailChimp API endpoint response should be mapped to a column of that table, and you should ensure the right conversion to a PostgreSQL compatible data type.

For example, if an endpoint from MailChimp returns a value as String, you should convert it into a VARCHAR with a predefined max size or TEXT data type. Tables can then be created on your database using the CREATE SQL statement.

Of course, you will ensure that as the data types from the MailChimp API might change, you will adapt your database tables accordingly. There’s no such thing as automatic data typecasting.

After you have a complete and well-defined data model or schema for PostgreSQL, you can move forward and start loading your data into the database.

Load data from MailChimp to PostgreSQL

Once you have defined your schema and you have created your tables with the proper data types, you can start loading data into your database.

The most straightforward way to insert data into a PostgreSQL database is by creating and executing INSERT statements. With INSERT statements, you will be adding data row-by-row directly to a table. It is the most basic and straightforward way of adding data into a table, but it doesn’t scale very well with larger data sets.

The preferred way for adding larger datasets into a PostgreSQL database is by using the COPY command. COPY is copying data from a file on a file system that is accessible by the PostgreSQL instance. In this way, much larger datasets can be inserted into the database in less time.

You should also consult the documentation of PostgreSQL on how to populate a database with data. It includes a number of very useful best practices on how to optimize the process of loading data into your PostgreSQL database.

COPY requires physical access to a file system in order to load data. Nowadays, with cloud-based, fully managed databases, getting direct access to a file system is not always possible. If this is the case and you cannot use a COPY statement, then another option is to use PREPARE together with INSERT to end up with optimized and more performant INSERT queries.

Updating your MailChimp data on PostgreSQL

PostgreSQL. This includes new records together with updates to older records that for any reason have been updated on MailChimp.

You will periodically check MailChimp for new data and repeat the process that has been described previously while updating your currently available data if needed. Updating an already existing row on a PostgreSQL table is achieved by creating UPDATE statements.

Another issue that you need to take care of is the identification and removal of any duplicate records on your database. Either because MailChimp does not have a mechanism to identify new and updated records or because of errors on your data pipelines, duplicate records might be introduced to your database.

In general, ensuring the quality of the data that is inserted in your database is a big and difficult issue, and PostgreSQL features like TRANSACTIONS can help tremendously. However, they do not solve the problem in the general case.

The best way to load data from MailChimp to PostgreSQL

So far, we just scraped the surface of what you can do with PostgreSQL and how to load data into it. Things can get even more complicated if you want to integrate data coming from different sources.

Are you striving to achieve results right now?

Instead of writing, hosting, and maintaining a flexible data infrastructure, use RudderStack, which can handle everything automatically for you.

RudderStack, with one click, integrates with sources or services, creates analytics-ready data, and syncs your MailChimp to PostgreSQL right away.

Sign Up For Free And Start Sending Data
Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.
Don't want to go through the pain of direct integration? RudderStack's Mailchimp integration makes it easy to send data from Mailchimp to PostgreSQL.