📊 Replace Google Analytics with warehouse analytics.

Log in

Learning Center

Learning Topics
Subscription

Subscribe

We'll send you updates from the blog and monthly release notes.

Learning Center

How to Manage Data Retention

As organizations accumulate data, the associated costs for storing, maintaining, and protecting it increase. Data retention describes the policies and processes you implement to keep data that is still useful or must be kept for regulatory or compliance reasons, while discarding data that is no longer required.

In this article, you’ll learn more about what data retention is, why it is valuable to your organization, and how to shape your own data retention policies while following best practices.

What Is Data Retention?

Data retention is the act of storing data for future use. A data retention policy is an organization’s system of rules for managing the information it generates and collects. This includes identifying the information and deciding how it is stored, the period for which it is stored, and how it is deleted afterwards.

The primary factors that should be considered when defining your data retention policy are your business requirements and the use case for the data, the costs of storing it, and any regulatory or compliance concerns that may surround it.

What are the risks of retaining too much data?

Storing data that no longer has any usefulness can have detrimental effects on your organization. As your data inventory increases, it compounds the following negative effects on your business:

  • Increased clutter: Large amounts of data are more likely to become disorganized, requiring additional infrastructure and tools to store and manage.
  • Regulatory burdens and security risks: This disorganization can lead to sensitive data being accidentally disclosed. Additionally, while historical personal data about your customers is rarely useful for ongoing business operations, it is a popular target for theft. This leaves you open to regulatory and civil legal repercussions.
  • Costs and labor: Additional infrastructure and tools increase the costs of retaining data, as do the labor and resources required to maintain its availability while keeping it protected.

In addition to these business factors, there are legal requirements that specify the timelines some categories of data can be kept for. Some require the immediate deletion of data after it has been used, while others demand that data be kept indefinitely.

By building a comprehensive retention policy, all of these concerns can be recognised and addressed in a single document. You can then share this document within your organization and your users, ensuring that stakeholders are aware of how all of your data is handled, and acting as the basis for automated data retention actions.

How to build a good data retention policy

Regardless of which industry your business operates in, you must implement a data retention policy to protect your valuable data assets and remain compliant. The processes that your data retention policy establishes should accomplish the following objectives:

  • Improve the speed and efficiency of managing and accessing data
  • Reduce costs by cutting down on storage infrastructure
  • Eliminate potential failure points and vulnerabilities inherent in huge data systems
  • Limit liability and ensure compliance with industry guidelines and government regulations
  • Communicate how the above is achieved in clear language so that it can be precisely implemented and communicated with your users

The following practices should be considered necessary when deciding on your data retention policy and its implementation.

Categorize and periodically audit data

Before you decide on how you will handle your data retention, you must know what data you have.

The first step to building your retention policy should be identifying what data your organization collects and categorizing it based on how critical it is, whether it is proprietary, and whether it currently serves or will serve any future business needs.

Carrying out periodic audits to determine the usefulness of stored or backed up data is important. Many organizations hold on to data longer than required because they are worried that they may need it later. This leads to unwieldy data stores that are expensive to maintain, and contain so much data that even sorting it to reduce the costs of storing it becomes a prohibitively time-consuming or expensive act in itself.

Data identification and auditing is not just the first step in creating your data retention policy, but it should also be repeated — usually annually or monthly — depending on how often the kinds of data you handle change. The auditing process also provides the opportunity to check and update your organization’s compliance with your existing data retention policy, as well as your legal obligations as to how the data is handled.

Identify legal requirements and schedule retention periods

Once your data has been categorized, you must decide on timelines for keeping it. Your different categories of data will most likely fall into one of the following groups:

  • Data to be retained indefinitely for business purposes: This will consist of proprietary business data, ranging from technical to financial information, and operating data that must be retained and protected for business continuity.
  • Data that must be retained for a set period for regulatory purposes: This may include tax records and records proving compliance with legal requirements. Some of this data may also need to be retained indefinitely.
  • Data that can be periodically purged as it is no longer required: Some data, such as third-party audience data, goes stale quickly, and can be discarded when it is no longer of any value. Other kinds of data also lose their usefulness over time — for example, the raw media assets used in publishing take up a lot of storage space, but are often no longer needed after the final product has been produced.
  • Data that must be periodically purged for regulatory purposes: Personally identifiable information (PII) is heavily regulated, and can often only be retained for the duration of the purpose it was collected for.
  • Data that must be identified for removal at a user’s request: Users may have the right to request that their data be destroyed, and it must usually be done within a certain time span after the request has been made.

The schedule and duration of each data retention period should address the answers to these two questions: For how long will you retain the data? and How frequently is your data updated?

When establishing your timelines, you should consider scheduled backups and data purging activities. Frequent backups should be taken, while regular data purging may be required, including from historical backups. Automating this process will reduce the chance of human error leading to non-compliance.

Enact backups and backup testing

The purpose of regularly backing up is to ensure business data can be retrieved quickly and completely to secure business continuity. If access to your data is lost you must be confident that you can resume operations from a restored backup within a reasonable amount of time.

Regular backups must be taken — the frequency will depend on how often your data changes, which determines how fresh a backup needs to be for it to be useful. Backup policies should be enacted and checked as part of your larger data retention scheme.

When building your backup policy, consider where your backups are stored and identify what parts of the process need to be tested. Set a schedule for testing so that each backup medium is regularly assessed for fitness, and periodically run through your full recovery process to ensure that you can recover your data, and that the recovered data is fully intact.

Multiple copies of your backups should be stored in different physical locations to protect against catastrophic failure. If you are storing data in the cloud, a backup should be stored locally or with another cloud provider to protect against loss of access.

Ensure data purging and disposal measures are defined

When disposing of data, whether because it is no longer needed or for compliance purposes, it must be destroyed in a way that it cannot be recovered.

Throwing a hard disk in the trash without properly wiping it could leave the data on it — including potentially sensitive business or customer information — readable for anyone who finds it.

Similarly, deleting data from your live systems but failing to remove it from backups does not mean it has been deleted, as it could still fall victim to a data breach if that backup is later accessed. It’s important to label and categorize data at the time of collection or creation so that you can easily find it within historical backups if it needs to be redacted later.

Automate the processes detailed in your policy

Data retention policy best practices

Once you’ve decided on a data retention policy — including any backup and data purging measures — you should automate the implementation as much as possible. Manual implementation is generally unworkable for large volumes of data and increases the potential for mistakes.

Minimal administrative oversight is required for automated archiving, backup, and purging operations based on established policies. You must, however, ensure that your automations are kept up to date with your retention policies as they evolve with your data and legal requirements.

Data retention and legal compliance

Failure to comply with regulations and laws governing data records management in your industry, state, or country could leave your organization open to criminal and/or financial sanctions.

Harsh fines, in the tens of millions, are often levied against businesses that violate privacy regulations. Additionally, parties that suffer damage as a result of data breaches may have avenues to seek their own compensation if it is found that the business responsible was not compliant with privacy law, or had not met required security standards.

Below are some example laws and industry guidelines that you may need to consider when designing your data retention policy.

Regional privacy regulations (GDPR, CCPA, LGPD)

The General Data Protection Regulation (GDPR) is widely regarded as the world’s strictest privacy and security law, applying to any organization in any location that targets or collects data relating to European Union citizens.

Article 5(e) of GDPR stipulates that all personal data that you hold on your customers be kept for no longer than is necessary for the purpose for which it was collected. Users also have the right to request that their data be revised or deleted at any time.

In addition to the GDPR, other jurisdictions are implementing their own legal privacy frameworks: California has implemented the California Consumer Privacy Act (CCPA) and Brazil the General Data Protection Law (LGPD).

HIPAA

The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a set of standards designed to protect sensitive patient information from being disclosed without their consent. The HIPAA Privacy Rule set by the US government provides guidelines to healthcare providers, specifying that they are required to retain HIPAA-related data for a minimum period of six years from the date the data was created.

PCI DSS

The Payment Card Industry Data Security Standard (PCI DSS) is a set of operational and technical requirements administered globally for organizations that handle credit card information. The goal is to protect cardholder data in storage and across public networks against attacks.

Organizations are required to restrict access to cardholder data (CHD) and sensitive authentication data (SAD), and such data may not be stored by merchant or payment processors. In situations where storing CHD and SAD is inevitable, PCI requires that encryption, truncation, masking, or hashing be used for safekeeping.

SOX

The Sarbanes-Oxley Act (SOX) applies to financial data reliability and retention for public corporations. SOX requires publicly listed enterprises to carry out an annual audit that provides proof of accurate and secured data. Furthermore, it declares different data retention dates depending on the document type defined by the Securities and Exchange Commission (SEC).

FERPA

The Family Educational Rights and Privacy Act (FERPA) in the US protects the privacy of student educational records within schools and associated institutions. Under FERPA, institutions are legally required to hold a student’s data for six years after they are no longer active in the institution.

BSA

The Bank Secrecy Act (BSA) is an anti-money laundering (AML) act in the US that requires financial institutions to keep records of cash purchases, file reports, and report suspicious activities that may signify money laundering, tax evasion, and other criminal activities. The data retention period is usually up to five years for businesses operating under this act.

Other retention policy requirements

Other examples of record retention policy law include the Gramm-Leach-Bliley Act (GLBA) for financial institutions, which requires retention of privacy notices forever; Equal Employment Opportunity (EEO) laws, which mandate employers to keep employee records for one year after termination of employment; and the Fair Labor Standards Act (FLSA), which mandates employers to retain payroll records for a minimum of three years.

This is by no means an exhaustive list of the regulatory and legal requirements, and new regulations are constantly being enacted for different regions and industries. As a data professional or executive in an organization, it is your responsibility to investigate the specific legal requirements that may affect your business and integrate them into your data retention policies.

Example data retention policy

Your data retention policy will be unique to your business and the data you collect. By identifying the data you want to retain for your own ongoing use, and the data that must be retained to comply with regulation, you can set appropriate lifetimes. You should also identify data that must be destroyed, and put mechanisms in place so that your users can request their data be amended or deleted if the law requires it.

Data retention policies should be as simple as possible and written in plain language so that they can be easily followed and communicated. Policies should both dictate how you handle data, and be shared with your customers to keep them informed. Google’s data retention policy is a good example of this — it clearly communicates how the data is to be treated, and informs the user of their rights.

Data retention and customer data platforms

Customer data platforms (CDPs) collect data from a variety of sources, process it, and then store it — usually in a data warehouse or data lake. You can then use this data to gain insights into your audience and users so that you can deliver the best product possible for them.

The CDP you choose for your data pipelines should allow for the identification of potentially sensitive information and the categorization of data being stored. This lets you create and automate appropriate retention policies, ensuring that your valuable data is safe and compliant.

Further reading

This article explained the best practices for implementing data retention, and the steps you need to take to create a dynamic and robust data retention policy. Find out more about how data should be properly handled and maintained in our related Rudderstack Learning Center articles:

Get the Data Maturity Guide

Our comprehensive, 80-page Data Maturity Guide will help you build on your existing tools and take the next step on your journey.

Get the Guide

Build a data pipeline in less than 5 minutes

Create an account

See RudderStack in action

Get a personalized demo

Collaborate with our community of data engineers

Join Slack Community