Feeling stuck with Segment? Say 👋 to RudderStack.

SVG
Log in

How to load data from Stripe to PostgreSQL

Extract your data from Stripe

Stripe is an API-first product, it’s a unified set of APIs and tools that instantly enables businesses to accept and manage online payments. It is a web API following the RESTful principles, they try to use as many HTTP built-in features to make it accessible to off-the-shelf HTTP clients and the serialization they support for their responses is JSON.

They also have two different types of keys used for authentication, one for testing mode and one for live mode, using the testing mode key it becomes easy to test every aspect of the API without messing with your real data. Also, keep in mind that the calls you make to the Stripe API have to be over HTTPS only for security reasons, plain HTTP calls will fail, same happens for non-authenticated calls, so do not forget to use your testing mode key in case you want to experiment with the API.

Currently, the Stripe API is built around the following ten core resources:

  • Balance – an object that represents your stripe balance.
  • Charges – to charge a credit or debit card you create a charge
  • Customers – Customer objects allow you to perform recurring charges and track multiple charges that are associated with the same customer.
  • Dispute – A dispute occurs when a customer questions your charge with their bank or credit card company.
  • Events – Events are our way of letting you know when something interesting happens in your account.
  • File uploads – There are various times when you’ll want to upload files to Stripe (for example, when uploading dispute evidence).
  • Refunds – Refund objects allow you to refund a charge that has previously been created but not yet refunded.
  • Tokens – Tokens can be created with your publishable API key.
  • Transfers – When Stripe sends you money or you initiate a transfer to a bank account
  • Transfer reversals – A previously created transfer can be reversed if it has not yet been paid out.

All of the above resources support CRUD operations by using HTTP verbs on their associated endpoints. As a web API, you can access it using by using tools like CURL or Postman or your favorite http client for the language or framework of your choice. Some options are the following:

  • Apache HttpClient for Java
  • Spray-client for Scala
  • Hyper for Rust
  • Ruby rest-client
  • Python http-client

There’s also a large number of libraries that wrap around the Stripe API and offer an easier way to interact with it, both communities developed and from Stripe. For more information, you can check the libraries section in the API documentation.

Stripe and any other service that you might be using, has figured out (hopefully) the optimal model for its operations, but when we fetch their data we usually want to answer questions or do things that are not part of the context that these services operate, something that makes these models suboptimal for your analytic needs.

For this reason, we should always keep in mind that when we work with data coming from external services we need to remodel it and bring it to the right form for our needs.

So let’s assume that we want to perform some churn analysis for our company and to do that we need customer data that indicates when they have canceled their subscriptions. To do that we’ll have to request the customer objects that Stripe holds for our company. We can do that with the following command:

JAVASCRIPT
curl https://api.stripe.com/v1/charges?limit=3
-u sk_test_BQokikJOvBiI2HlWgH4olfQ2

and a typical response will look like the following:

JAVASCRIPT
{
"object": "list",
"url": "/v1/charges",
"has_more": false,
"data": [
{
"id": "ch_17SY5f2eZvKYlo2CiPfbfz4a",
"object": "charge",
"amount": 500,
"amount_refunded": 0,
"application_fee": null,
"balance_transaction": "txn_17KGyT2eZvKYlo2CoIQ1KPB1",
"captured": true,
"created": 1452627963,
"currency": "usd",
"customer": null,
"description": "thedude@grepinnovation.com Account Credit",
"destination": null,
"dispute": null,
"failure_code": null,
"failure_message": null,
"fraud_details": {
},

Inside the customer object there’s a list of subscription objects that look like the following JSON document:

JAVASCRIPT
{
"id": "sub_7hy2fgATDfYnJS",
"object": "subscription",
"application_fee_percent": null,
"cancel_at_period_end": false,
"canceled_at": null,
"current_period_end": 1455306419,
"current_period_start": 1452628019,
"customer": "cus_7hy0yQ55razJrh",
"discount": null,
"ended_at": null,
"metadata": {
},
"plan": {
"id": "gold2132",
"object": "plan",
"amount": 2000,
"created": 1386249594,
"currency": "usd",
"interval": "month",
"interval_count": 1,
"livemode": false,
"metadata": {
},
"name": "Gold ",
"statement_descriptor": null,
"trial_period_days": null
},
"quantity": 1,
"start": 1452628019,
"status": "active",
"tax_percent": null,
"trial_end": null,
"trial_start": null
}

These objects together with part of the customer object, contain the information we need to perform churn analysis. Of course, we’ll have to extract all the information we need, map it to the schema of our data warehouse repository and then load the data to it following the instructions of this post.

[@portabletext/react] Unknown block type "aboutNodeBlock", specify a component for it in the `components.types` prop

Stream data from the Stripe API to PostgreSQL

It is also possible to set up a streaming data infrastructure that will collect Stripe’s data and push them into your data warehouse in a streaming fashion. This can be achieved by using the webhooks functionality that Stripe supports, you register some events to it and every time something happens, Stripe will push a message to your webhook.

For more information about that, check the API documentation on webhooks.

Stripe Data Preparation for PostgreSQL

To populate a Postgres database instance with data, first, you must have a well-defined data model or schema that describes the data. As a relational database, Postgres organizes data around tables.

Each table is a collection of columns with a predefined data type like an integer or VARCHAR. PostgreSQL, like any other SQL database, supports a wide range of different data types.

A typical strategy for loading data from Stripe to a Postgres database is to create a schema where you will map each API endpoint to a table. Each key inside the Stripe API endpoint response should be mapped to a column of that table and you should ensure the right conversion to a Postgres compatible data type. For example, if an endpoint from Stripe returns a value as String, you should convert it into a VARCHAR with a predefined max size or TEXT data type. tables can then be created on your database using the CREATE SQL statement.

Of course, you will ensure that as the data types from the Stripe API might change, you will adapt your database tables accordingly, there’s no such thing as automatic data type casting.

After you have a complete and well-defined data model or schema for Postgres, you can move forward and start loading your data into the database.

Load data from Stripe to PostgreSQL

Once you have defined your schema and you have created your tables with the proper data types, you can start loading data into your database.

The most straightforward way to insert data into a Postgres database is by creating and executing INSERT statements. With INSERT statements, you will be adding data row-by-row directly to a table. It is the most basic and straightforward way of adding data into a table but it doesn’t scale very well with larger data sets.

The preferred way of adding larger datasets into a PostgreSQL database is by using the COPY command. COPY is copying data from a file on a file system that is accessible by the Postgres instance, in this way much larger datasets can be inserted into the database in less time.

You should also consult the documentation of PostgreSQL on how to populate a database with data. It includes a number of very useful best practices on how to optimize the process of loading data into your PostgreSQL database.

COPY requires physical access to a file system in order to load data. Nowadays, with cloud-based, fully managed databases, getting direct access to a file system is not always possible. If this is the case and you cannot use a COPY statement, then another option is to use PREPARE together with INSERT, to end up with optimized and more performant INSERT queries.

Updating your Stripe data on PostgreSQL

As you will be generating more data on Stripe, you will need to update your older data on Postgres. This includes new records together with updates to older records that for any reason have been updated on Stripe.

You will need to periodically check Stripe for new data and repeat the process that has been described previously while updating your currently available data if needed. Updating an already existing row on a Postgres table is achieved by creating UPDATE statements.

Another issue that you need to take care of is the identification and removal of any duplicate records on your database. Either because Stripe does not have a mechanism to identify new and updated records or because of errors on your data pipelines, duplicate records might be introduced to your database.

In general, ensuring the quality of the data that is inserted in your database is a big and difficult issue and PostgreSQL features like TRANSACTIONS can help tremendously, although they do not solve the problem in the general case.

The best way to load data from Stripe to PostgreSQL

So far we just scraped the surface of what you can do with PostgreSQL and how to load data into it. Things can get even more complicated if you want to integrate data coming from different sources.

Are you striving to achieve results right now?

Instead of writing, hosting, and maintaining a flexible data infrastructure use RudderStack that can handle everything automatically for you.

RudderStack, with one click, integrates with sources or services, creates analytics-ready data, and syncs your Stripe to PostgreSQL right away.

Sign Up For Free And Start Sending Data

Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.