Data Trends & Insights

The Data Trust Matrix

June 14, 2023

Patrik Liu Tran

Are data teams delivering business value?

For over a decade, data has been “the new gold”. Yet, many data teams still struggle to deliver the sought-after business value that has warranted huge investments in data teams and infrastructure. Data quality in particular is a big blocker for company value and data trust. It begs the question: Is it time for data teams to finally reevaluate how they approach data quality and the business value these teams deliver?

In this article, I’ll summarize and discuss:

Learnings from discussions with over 400 data teams.
How data teams can deliver business value without bad data getting in their way—using the Data Trust Matrix as a guide.

Are data teams going through an existential crisis?

With the economic conditions of 2023, every company dollar is scrutinized. As a result, every function, tool, and team member must prove their value to their organization if they want to stay—data teams and their tooling are no exception.

I know from first-hand experience that plenty of companies do an excellent job at this. They keep the business case spreadsheet close at hand and have no problem defending their cloud bill. It is a negligible cost vs. the value their data products deliver. Babyshop Group, with Marcus Svensson at the helm, is an excellent example.

On the other end of the spectrum, some data teams struggle to argue for their own return on investment. Benn Stancil wrote an article titled “Do data teams have product-market fit?” which I think is a brilliant read—excellently titled. The article highlights the existential question of why data teams are needed.

The two choices data teams have to make

Modern teams solve data quality through data observability, and for that, they must make two choices:

How deep to go on data validation
How focused to be on critical assets

Data Trust Matrix: Alert Fatigue, False Security Corner, Trust Erosion Lot and Data Trust Zone

When getting started with data observability, data teams must make two choices: How deep to go on data validation, and how focused to be on business-critical assets.

1. Degree of depth

Data teams can either go deep or shallow on data observability. Shallow data observability means the “easy” and basic validations of data quality, meaning metadata monitoring of volume, freshness, schema, and null value distributions. Deep data observability, on the other hand, offers automated and in-depth validation of the actual data in each field, in addition to basic metadata checks, and includes:

Machine Learning-based anomaly detection, advanced data distributions, and row-level granularity
Dynamic thresholds that adapt to data over time and detect outliers from trends and seasonality patterns
Dynamic segmentation that slices a dataset into relevant subsegments and validates each subsegment independently using dynamic thresholds. This lets you find anomalies in detailed segments of your data you didn’t know to look for
Doing all of the above at scale, in real-time, and preferably at a low cost

2. Degree of focus

The second choice data teams have to make regarding data observability is how focused to go. Should they apply the same set of checks to every dataset, or should they have a more focused approach to differentiate between high-value and low-value data assets?

It is important to acknowledge that data is never perfect. You always need to be prepared for potential data quality issues. Given that the amount of data that companies collect is rapidly increasing in the era of big data, it is important to prioritize and focus on the data assets that really matter. It is common to see that 5-15% of companies’ data provide 85%+ of the business value–not all data is equally valuable. To achieve an outsized impact as a data team, while saving costs on infrastructure and tooling, you should therefore focus on the 5-15% of your data assets that matter most for the business.

The focus is also important to help organizations design reasonable processes around data quality issues. The ability to identify the data quality issues is one thing, to promptly respond to and mitigate them, with the right stakeholders involved, is another. The focus is crucial in enabling the right stakeholders to address the identified data quality issues in a timely manner.

Let’s dive into the implications of each choice the data teams have to make.

The implications of getting it wrong

Four different combinations result from each choice of focus and depth. Let’s see what it looks like when data teams get these wrong.

False Security Corner

Data teams enter the false security corner when they apply shallow validation with low focus. That is, when applying basic checks on all data assets. This level of data observability mainly answers the question “Did all of my data pipelines run?”. But it says very little about the actual quality of the data from the business perspective, which is essential to deliver real value. Companies in this corner usually get a false sense of security regarding data quality initially, until business-critical data issues slip through.

Trust Erosion Lot

Next up is the Trust Erosion Lot; this is where data teams go shallow (basic checks) on a few key datasets. The good thing is that the data assets being focused on are selected based on the value they deliver to the organization. For example, they might be the key tables required for marketing optimization or a pricing algorithm. However, without going deep on validation, bad data can slip through the cracks. When (not if) this happens, data trust is significantly eroded in the organization.

Data Trust Matrix in depth: Alert Fatigue

However, if data teams decide to go deep on all data assets, both business and non-business critical, they will likely experience massive alert fatigue. Applying deep and sophisticated validators on all data in your warehouse, lake, or stream causes a massive overload–you will find a lot of data quality issues. People will stop responding to potential data quality issues since most don’t matter because of the irrelevant underlying data assets. Prioritizing the most critical data assets is crucial; if everything is important, nothing is.

Applying deep and sophisticated validators on all data in your warehouse, lake, or stream causes a massive overload; if everything is important, nothing is.

Getting it right: The Trust Zone

Data Trust Matrix overview: Data Trust Zone highlighted

What data teams should be doing instead is going deep on the business’s most critical data assets. This takes them to the Data Trust Zone. How this usually works in practice is:

The data team starts by connecting and building relationships with key business stakeholders and decision-makers in their organization.
They ask these stakeholders how data can drive business value for the organization. Specifically, they want to identify what data use cases are most important to the business.
For the most high-value data assets, the teams define what data quality means together with business stakeholders (remember that data quality always depends on context and use case).
Now, they have identified business-critical data assets and how their data quality is determined. The remaining step is ensuring they have a data observability solution that can support the focus and depth the team needs and design a process to handle data quality issues with the right stakeholders. This will make them fully confident in the data that matters most to the business.

As mentioned, one of my favorite examples of data teams who successfully establish data trust in their organization is one of our customers Babyshop Group. Read more here about how Babyshop applies Deep Data Observability to one of their most important data use cases.

To sum up, bad data undermines the efforts of data teams to deliver business value. To avoid this, they need to establish data trust by making sure to:

Focus on the data assets that are critical for the business—this is how they can prove return on investment.
Go deep on validating these assets with intelligent and granular data validation rules.

The Data Trust Matrix is a simple framework that helps data teams establish data trust by assessing and improving their focus and depth of data observability.

I hope this article has given you some insights and inspiration on improving data quality for your business's most valuable assets. As data teams, we are responsible for delivering business value and ensuring data trust. If you want to learn more about how to apply the Data Trust Matrix in your organization, feel free to reach out to me on Linkedin or contact my team.

Achieve data trust

Get in touch if you want to learn more about how Validio can increase data trust in your organization