Data Safety Levels Framework: The foundation of how we look at data in Block
One of our foundational principles at Block is incorporating privacy and the protection of customer data into every layer of our software systems. This commitment goes beyond meeting the numerous regulatory requirements for how we process and manage customer data that we face as a financial technology company: we believe protecting this customer data is essential to building and maintaining our customers’ trust in us.
One of the biggest challenges in protecting customer data turns out to be devising a system for thinking about data sensitivity that lends itself to engineering scalable solutions that can be automated and built transparently into our systems so they simply work. Data itself is complex and sensitivity can vary based on context. Solutions often either ignore the sensitivity variance or overly simplify this complexity, resulting in under-protection of the data for the customer or overly rigid systems that hinder innovation and limit our ability to serve customers effectively.
In this post, we introduce the Data Safety Levels (DSL) Framework that we initially built for Cash App and have since extended across the rest of our diverse product ecosystem, including Square and TIDAL. The DSL framework forms the foundation of the way we understand data. It acknowledges the complexities of data by recognizing that data:
- Exists as part of a larger set, with sensitivity being an emergent property of the set rather than individual elements.
- Is contextual, requiring us to consider context when determining its management and usage policies.
This framework has created a strong foundation for us to build guidelines and policies on top of which allow us to better show not just our compliance with our regulatory requirements but also our commitment to customer trust.
Our Origin Story
We had long had an internal policy around classifying and handling sensitive data, especially PCI-relevant data and Personally Identifying Information (PII). This framework, for the most part, classified each semantic type of data as being either Public, Confidential, Basic PII, or Secret PII. Over time, it grew increasingly complicated with specific requirements around particular data being covered by either a PCI standard, SOX, PII, or MNPI. This policy made engineering increasingly complicated as it required both service and platform engineers to be aware of the nuances of various standards and regulations when their underlying questions were really: “can Security sign-off on my design doc yet?”. It also resulted in many questions to security teams like, “is this particular data type PII?” for which the answer was always (frustratingly), “well, it depends.”
Coincidentally or not, with a lot of extra time to read things on the Internet during a global pandemic, we learned about the US Centers for Disease Control and Prevention (CDC) Biosafety Level system for rating the risk levels of biological agents and approving facilities for storing and handling them. The World Health Organization also publishes laboratory biosafety manuals with more elements of this framework including the risk assessment methodology that assigns one of four levels to particular biological agents as well as laboratory safety requirements for handling biological agents at each level. The framework of assigned risk levels and increasing control requirements made sense to us as inspiration for another type of thing that we did not want to accidentally expose to people: regulated and sensitive data.
Why a Dataset-Oriented Approach?
In practice data usually exists as part of a larger set, where the relationships between elements can impact their overall sensitivity. A phone number on its own may not be as sensitive as a phone number combined with a precise home address and transaction history. The DSL framework allows us to reason about such combinations and ensure that data is classified appropriately based on its aggregate sensitivity, not just on the sensitivity of individual elements.
For example, in our Cash App Investing operations, the DSL classification for customer data doesn’t just consider individual components like an account number or government-issued ID—it considers how these pieces combine to potentially elevate the risk of exposure. Thus, each dataset’s DSL is determined by considering the highest level of sensitivity found within its components, ensuring that we adopt the strictest safeguards when necessary.
Problems DSL Framework Addresses
The DSL Framework was developed to address these needs. It provides:
- Actionable Guidance for Teams: The framework translates data sensitivity into clear data safety levels that dictate the required policies for handling specific data sets, from consumer information to merchant financials.
- Consistency Across the Organization: By using a unified framework, all teams across the various products at Block Inc. (Cash App, Square, TIDAL) have a common language and understanding of how to secure data. This consistency is crucial for ensuring we meet the highest security standards globally.
- Compliance Across Jurisdictions: Block operates in multiple jurisdictions, each with its own set of regulatory requirements for data security and privacy. The DSL Framework is an effective tool for mapping these regulatory requirements to our internal security practices. By using the DSLs as the benchmark, we can ensure that we meet or exceed the data security obligations in regions like the U.S., Europe, and Asia. Our DSL rubrics and guidelines help streamline product development by providing a uniform and self-service framework for engineering teams to ensure that all data meets the necessary standards.
- Incremental Security Controls: With four Data Safety Levels ranging on a numerical scale from DSL-1 to DSL-4 (lowest to highest sensitivity), each DSL builds on the preceding level with additional controls to ensure the security measures we apply are calibrated to the risks involved. For example, highly sensitive data, such as tax return information, is classified at DSL-4, meaning it requires stringent protections like application-layer encryption and multi-party authorization.
Key Components of the DSL Framework
The DSL framework at Block is actionable for both automated and manual processes, providing a clear roadmap for platform and product development teams to understand what protections they must implement based on the data they are handling. Here are some of its critical components:
- Data Classification Rubrics: To determine a dataset’s DSL requirement, we apply specific rubrics designed for different data domains to perform a risk assessment of the data set to determine the appropriate DSL requirement. We have rubrics for Consumer Personal Data, Payment Card Data, Merchant Data, and more. These rubrics standardize how we assess sensitivity, ensuring that each dataset receives a consistent and accurate classification.
- Data Safety Guidelines: The DSL framework is complemented by our Data Safety Guidelines, which define the minimum protections that systems must implement based on their DSL rating. These guidelines include measures like access controls, encryption standards, auditability, and more. Systems approved at a particular DSL must meet all the prescribed security controls for that level and any lower levels, ensuring a robust baseline of security.
- Automation and Manual Processes: The DSL framework is designed to integrate seamlessly into our workflows, leveraging automation to classify data and verify that proper protections are in place. At the same time, manual reviews ensure that our systems comply with specific regulatory requirements and address any nuanced security needs that automation alone cannot handle.
- Access and Usage Controls: The framework’s effectiveness also relies on enforcing appropriate access controls. For example, datasets classified at DSL-4 or higher are generally protected with multi-party authorization (MPA), ensuring that no single individual can access sensitive data without additional oversight. This prevents unilateral actions that could jeopardize data security and demonstrates our commitment to upholding customer privacy.
Real-World Examples of the DSL Framework at Work
Tokenized Payment Data
Payment card data, such as Primary Account Numbers (PANs) and Card Verification Codes, are highly sensitive and classified as DSL-4. By applying our DSL Framework, we require this data to be encrypted at the application layer before it is stored or transmitted. Fidelius, our tokenization service, manages such data to ensure it remains secure during payment processing and at rest. The DSL Framework allows downstream systems, with lower safety level capabilities, to process this data without compromising on security, as long as strict encryption standards are upheld.
Cash App Investing Data
Cash App Investing (CAI) data, such as trading patterns or Social Security Numbers (SSNs), also falls into higher DSLs—typically DSL-3 or DSL-4, depending on the specifics. The DSL classification ensures that appropriate access controls and encryption are in place, including requiring employee fingerprinting for access to the most sensitive records. This not only adheres to regulatory requirements, such as FINRA rules, but also demonstrates our commitment to proactively protecting customer data.
Tax Return Information
Tax Return Information (TRI) collected through Cash App Taxes is classified as DSL-4, given its highly sensitive nature. Compliance with IRS requirements and ensuring privacy of TRI is a non-negotiable part of our operations. The DSL Framework supports this by enforcing strict encryption, auditability, and access controls—all designed to minimize the likelihood of unauthorized disclosure or misuse.
DSL is just the Start
The DSL framework is live and has expanded steadily over the years of its adoption. New products as well as feedback loops from internal audits, security incidents, and regulatory changes have translated to identification of new semantic types and classification rubrics as well as new mapping of data to safety levels.
Developing our perspective on data has been a collaborative effort between Security, Governance and Compliance and most importantly, Product. Starting from our inspiration in WHO’s biosafety levels, we have intentionally challenged ourselves to understand data, its lifecycle and its requirements in a holistic and systematic manner, with the knowledge that automation is a must given the scale of the data we deal with.
This framework is also just the beginning of the story. Now that we have a systematic way of conceptualizing our data, we need to complement it by our Data Safety Guidelines and the implementation of these guidelines in a scalable, automated and transparent way that seamlessly integrates into our systems.
This blog is also the first in its series as we describe some of the challenges and solutions we have encountered in this space.
Data Safety is for Everyone
Block is committed to improving Data Safety in our community. In the coming months, we hope to open source the DSL framework and allow others to not just use and adopt this foundation but also build upon it and enhance the protection of customer data across the industry. We look forward to hearing from you.