Data Safety Levels Framework: The foundation of how we look at data in Block

One of our foundational principles at Block is incorporating privacy and the protection of customer data into every layer of our software systems. This commitment goes beyond meeting the numerous regulatory requirements for how we process and manage customer data that we face as a financial technology company: we believe protecting this customer data is essential to building and maintaining our customers’ trust in us.

One of the biggest challenges in protecting customer data turns out to be devising a system for thinking about data sensitivity that lends itself to engineering scalable solutions that can be automated and built transparently into our systems so they simply work. Data itself is complex and sensitivity can vary based on context. Solutions often either ignore the sensitivity variance or overly simplify this complexity, resulting in under-protection of the data for the customer or overly rigid systems that hinder innovation and limit our ability to serve customers effectively.

In this post, we introduce the Data Safety Levels (DSL) Framework that we initially built for Cash App and have since extended across the rest of our diverse product ecosystem, including Square and TIDAL. The DSL framework forms the foundation of the way we understand data. It acknowledges the complexities of data by recognizing that data:

This framework has created a strong foundation for us to build guidelines and policies on top of which allow us to better show not just our compliance with our regulatory requirements but also our commitment to customer trust.

Our Origin Story

We had long had an internal policy around classifying and handling sensitive data, especially PCI-relevant data and Personally Identifying Information (PII). This framework, for the most part, classified each semantic type of data as being either Public, Confidential, Basic PII, or Secret PII. Over time, it grew increasingly complicated with specific requirements around particular data being covered by either a PCI standard, SOX, PII, or MNPI. This policy made engineering increasingly complicated as it required both service and platform engineers to be aware of the nuances of various standards and regulations when their underlying questions were really: “can Security sign-off on my design doc yet?”. It also resulted in many questions to security teams like, “is this particular data type PII?” for which the answer was always (frustratingly), “well, it depends.”

Coincidentally or not, with a lot of extra time to read things on the Internet during a global pandemic, we learned about the US Centers for Disease Control and Prevention (CDC) Biosafety Level system for rating the risk levels of biological agents and approving facilities for storing and handling them. The World Health Organization also publishes laboratory biosafety manuals with more elements of this framework including the risk assessment methodology that assigns one of four levels to particular biological agents as well as laboratory safety requirements for handling biological agents at each level. The framework of assigned risk levels and increasing control requirements made sense to us as inspiration for another type of thing that we did not want to accidentally expose to people: regulated and sensitive data.

Why a Dataset-Oriented Approach?

In practice data usually exists as part of a larger set, where the relationships between elements can impact their overall sensitivity. A phone number on its own may not be as sensitive as a phone number combined with a precise home address and transaction history. The DSL framework allows us to reason about such combinations and ensure that data is classified appropriately based on its aggregate sensitivity, not just on the sensitivity of individual elements.

For example, in our Cash App Investing operations, the DSL classification for customer data doesn’t just consider individual components like an account number or government-issued ID—it considers how these pieces combine to potentially elevate the risk of exposure. Thus, each dataset’s DSL is determined by considering the highest level of sensitivity found within its components, ensuring that we adopt the strictest safeguards when necessary.

Problems DSL Framework Addresses

The DSL Framework was developed to address these needs. It provides:

Key Components of the DSL Framework

The DSL framework at Block is actionable for both automated and manual processes, providing a clear roadmap for platform and product development teams to understand what protections they must implement based on the data they are handling. Here are some of its critical components:

Real-World Examples of the DSL Framework at Work

Tokenized Payment Data

Payment card data, such as Primary Account Numbers (PANs) and Card Verification Codes, are highly sensitive and classified as DSL-4. By applying our DSL Framework, we require this data to be encrypted at the application layer before it is stored or transmitted. Fidelius, our tokenization service, manages such data to ensure it remains secure during payment processing and at rest. The DSL Framework allows downstream systems, with lower safety level capabilities, to process this data without compromising on security, as long as strict encryption standards are upheld.

Cash App Investing Data

Cash App Investing (CAI) data, such as trading patterns or Social Security Numbers (SSNs), also falls into higher DSLs—typically DSL-3 or DSL-4, depending on the specifics. The DSL classification ensures that appropriate access controls and encryption are in place, including requiring employee fingerprinting for access to the most sensitive records. This not only adheres to regulatory requirements, such as FINRA rules, but also demonstrates our commitment to proactively protecting customer data.

Tax Return Information

Tax Return Information (TRI) collected through Cash App Taxes is classified as DSL-4, given its highly sensitive nature. Compliance with IRS requirements and ensuring privacy of TRI is a non-negotiable part of our operations. The DSL Framework supports this by enforcing strict encryption, auditability, and access controls—all designed to minimize the likelihood of unauthorized disclosure or misuse.

DSL is just the Start

The DSL framework is live and has expanded steadily over the years of its adoption. New products as well as feedback loops from internal audits, security incidents, and regulatory changes have translated to identification of new semantic types and classification rubrics as well as new mapping of data to safety levels.

Developing our perspective on data has been a collaborative effort between Security, Governance and Compliance and most importantly, Product. Starting from our inspiration in WHO’s biosafety levels, we have intentionally challenged ourselves to understand data, its lifecycle and its requirements in a holistic and systematic manner, with the knowledge that automation is a must given the scale of the data we deal with.

This framework is also just the beginning of the story. Now that we have a systematic way of conceptualizing our data, we need to complement it by our Data Safety Guidelines and the implementation of these guidelines in a scalable, automated and transparent way that seamlessly integrates into our systems.

This blog is also the first in its series as we describe some of the challenges and solutions we have encountered in this space.

Data Safety is for Everyone

Block is committed to improving Data Safety in our community. In the coming months, we hope to open source the DSL framework and allow others to not just use and adopt this foundation but also build upon it and enhance the protection of customer data across the industry. We look forward to hearing from you.