Warehouse-First Healthcare Data Integration

Blog Banner

In my previous role as a Senior Product Manager at a major public pharmacy in the Health Tech space, I worked with engineers and architects on multiple issues that are endemic in today’s data-driven business environment. The most glaring issue that comes to mind is the immense lift of implementing integrations from multiple data sources to multiple destinations. There are a few reasons this was so challenging:

  • Managing live event integrations created messy architecture due to the different needs of each involved system. This often equated to a mish-mash of Kafka topics, SNS/SQS pipelines, point-to-point integrations, and multiple abstraction layers
  • Before integrations could become useful, we had to build out multiple transformation microservices just to ensure our downstream systems would receive usable data. This was particularly painful when thinking about EMR/EHR interoperability, conforming to HL7 FHIR specifications, and HITECH compliance
  • Managing roles and permissions was nontrivial with a large and diverse team
  • We had to ensure we had total control of PHI and PII data storage because we operated in a HIPAA-compliant environment

Overcoming these data integration challenges with health data would take months of planning, building architecture runway, design, and implementation. One of these issues alone would require substantial developer time and cost money. While there are a number of tools out there that could help relieve much of this pain, our requirements and healthcare industry standards took the vast majority of these off of the table. This is because most of the tools:

  • Failed to meet all of our technical requirements
  • Quickly ballooned in price and were not cost-effective
  • Could not get passed legal because of where data was persisted or stored

So, the inevitable would happen. We would build the data pipelines, architecture runway, and microservices ourselves. We could not afford to wait for a 3rd party to become HIPAA compliant or for legal and procurement to go back and forth for months drafting up contracts that both parties found adequate enough to start implementation. Couple this with the constant drive of time-to-market, and we felt like we had no choice but to build everything ourselves. We did not take these decisions lightly, considering the immense investment of developer capacity and financial resources. That’s why I’m excited about RudderStack’s HIPAA compliance announcement. Keep reading, and I’ll detail how RudderStack solves the challenges I outlined above.

Managing live events, integrations, sources, and destinations

As I mentioned, building out the integration processes and data pipelines required to spin up a net new product in the digital health space was no small feat. It included building new workflows, configuring different sources and destinations, and managing live events. More importantly, it meant figuring out how we were actually going to connect all of these things to drive meaningful outcomes for various stakeholders.

Considering these projects involved disparate sources of data encompassing electronic health records, electronic medical records, 3rd party devices, big name 3rd party software suites, SaaS tools, PaaS tools, S3 buckets, CMS systems, warehouses, on-prem servers, and legacy systems, well… they could get messy.

Healthcare organizations simply have a lot to navigate when it comes to leveraging health information to drive innovation and improve patient care. Thankfully, I was blessed with an amazing architecture and dev team that tore through this, but their efforts were nothing short of miraculous.

RudderStack greatly reduces or completely eliminates these connectivity challenges. Let’s take live events as an example. RudderStack SDKs simplify event data collection. You can track multiple user event types on both web and mobile devices. These events can then stream in real-time directly to your warehouse where they can be accessed by other tools and teams. For example, for click stream events and user journeys, you can send the data to tools like Tableau or Google Analytics. This data can then be leveraged by marketing or product teams. Likewise, events in Salesforce can be used to drive notifications in Braze. All of this can be connected using RudderStack as shown in the diagram below.

As you can imagine, this is infinitely cleaner than the build-it-yourself approach because all setup and configuration happens in a singular location. This would've made life infinitely easier for my developers, architects, and myself. What took 2-3 quarters of work could’ve been accomplished in a matter of weeks or months if we had access to RudderStack.

In-flight data transformation

After everything was connected we still had 3 major issues.

  • Data formatting (i.e. data format from the source did not meet the needs of the destination.)
  • Clinical data or PHI needed to conform to HL7 FHIR
  • Hashing and encrypting PII/PHI

This meant building out multiple microservices to perform transformations prior to a data package reaching its destination. This too was time-consuming and expensive.

With RudderStack Transformations, users can perform inflight data transformations out of the box. This means that any of the data that we collected, regardless of source, could've been transformed into HL7 FHIR-compliant data prior to reaching its destination. Similarly, with access to common libraries in RudderStack (SHA, MD5, etc.) all relevant PII and PHI can be encrypted in flight. Lastly, any other filtering or data formatting can be applied using the Transformation tool.

Role and permission management

With our homegrown systems, managing permissions for our team was a nightmare. It was difficult to make sure everyone who needed access to a particular data set to do their job could do so easily while ensuring individuals had access to only the data required to do their jobs effectively. RudderStack’s permissions management features drastically reduce the overhead here.

The power of warehouse-first architecture

The cornerstone of RudderStack’s healthcare-friendly position is its warehouse-first approach. Warehouse-first is one of the most appealing aspects of RudderStack, and it's what initially attracted me to the company. Consider two things here:

  • There is a lot of power in centralizing disparate sources of data and eradicating organizational silos
  • With the warehouse-first approach, you get the convenience of plug and play data integrations, but you don’t give up control. Your store your data in your own warehouse

This means all of your security controls and data permissions are still 100% valid. Also, because RudderStack does not permanently store or persist any data, conversations with compliance officials and legal departments are infinitely easier.

RudderStack does temporarily cache your data for up to 3 hours during a transfer to make sure your data gets where it needs to go, but after the 3-hour period the data is deleted permanently. You also have the choice to opt-in if you would like us to persist your data for up to 30 days as a redundancy. With its warehouse-first foundation, RudderStack is both HIPAA and SOC2 compliant, so you can rest assured we always handle your data with the utmost care.

Start managing your data integration pain with RudderStack today

With its warehouse-first approach and HIPAA compliance, RudderStack puts an option on the table for healthcare data teams that can meet technical requirements, is not cost-prohibitive, and isn’t a nonstarter with legal. It’s the tool I wish I had. If any of these pain points resonated with you, schedule a demo with our team today. We’re ready to sign BAAs to make your data projects less painful.

September 7, 2022
Logan Keith

Logan Keith

Product Manager