Platform

Data Quality & Observability

Detect anomalies anywhere in your data, in real time

Lineage

Get to the root cause and resolve issues quickly

Data Asset Insights

Discover data assets and understand how they are used

Discover the product for yourself

Take a tour
Pricing

Learn more

Customer stories

Hear why customers choose Validio

Blog

Data news and feature updates

Reports & guides

The latest whitepapers, reports and guides

Events & webinars

Upcoming events and webinars, and past recordings

Heroes of Data

Join Heroes of Data - by the data community, for the data community

Get help & Get started

OfferFit take their ML models to the next level with Validio

Read the case study
Product Updates

Level up your dbt tests to catch loud and silent data issues at scale

February 15, 2024
Emil BringEmil Bring

Many data- or analytics engineers use dbt tests to check the health of their data before running data pipelines. But when data use cases grow in complexity, only using dbt tests comes with some inherent challenges: it lacks scalability and more advanced testing. 

That’s why we’re happy to announce Validio's new dbt integration. With it, you can further improve your dbt tests by leveraging Validio’s ML powered anomaly detection to reveal unknown issues that otherwise remain hidden.

TABLE OF CONTENTS

Silent data issues can slip through your dbt tests

Silent data issues are the ones that happen under your radar–subtle issues that don’t appear in your data quality tests. This is also what makes them detrimental—with their hard-to-detect nature, silent data issues can stay around in your data for days or even months, causing huge negative impact. When setting up dbt tests, you only provide information about what you’re specifically looking to test. But what about potential failures you don’t know about? Or areas you don’t have time or resources to test? 

This is when you want to layer in data observability—to not only catch the loud data issues, but also the silent ones, that can happen at any stage of your data pipeline. 

It’s hard to scale dbt tests

In dbt tests, each test is implemented manually as part of each model. This can quickly become challenging because dbt projects tend to grow fast—really fast. When data sources both grow in numbers and change over time, it tends to become too cumbersome to manually maintain sufficient testing. Any tests must be manually updated across massive numbers of tables and columns. 

Let’s illustrate the magnitude of this challenge:

That’s why our Product & Engineering Team has been hard at work to enable this dbt integration. It will help data teams deliver higher quality data at a greater scale, by combining the power of dbt tests with Validio’s Data Trust Platform.

How to level up your dbt tests with Validio

1. Scale testing with automated data observability

The ultimate goal of both testing and data observability is to identify and resolve issues. However, they have different methods. Testing is biased - it comes from a human stating a condition they expect to be true. Observability is impartial - it asks, do we know the state of the data at all times? How has it changed?

As organizations scale, there are endless possibilities to test. While you can only write tests for a limited number of things, you can add data observability on top of your tests. It will help you detect the issues you can’t create manual tests for.

In Validio, dbt users can:

  • Choose between integrating to dbt Core or dbt Cloud
  • Use Validio’s UI to quickly scale tests to cover all the data assets needed
  • Monitor for dbt jobs that tend to take longer than expected
  • See detailed information about dbt tests and model runs
  • Use Validio’s Data catalog to notify data owners and collaborate on resolving issues

2. Catch loud and silent issues

Dbt shines in adding constraints to data transformations early in the pipelines, but that tends to stay in the realm of engineers. As data scientists and analysts have equally high expectations of their data to be accurate and reliable, they need to know if and how data changes unexpectedly anywhere in the pipeline. 

Valdio’s dbt integration allows data consumers to:

  • Combine Validio’s data validation setup with information from dbt tests and model runs.
  • Leverage Validio’s ML-based thresholds to reveal anomalies otherwise hidden in segments of your data
  • Adjust sensitivity to reduce false positives and alert fatigue
  • Troubleshoot data incidents by checking the dbt models and tests associated with the affected tables in Validio’s lineage graph (e.g. a table in Snowflake that is managed by dbt)
Validio uses machine learning to reveal anomalies in segments of your data.

3. Put data quality into the hands of your data consumers

As mentioned, dbt is a great tool to cover for known issues before data is sent to end users. But by combining dbt tests with Validio, data quality can also be put into the hands of data consumers like data scientists, analysts, and business users. These roles can now ensure data is reliable at each stage of their data pipelines: from transformation with dbt tests to production with Validio’s Data Trust Platform.

dbt + data observability: break free from data silos

Dbt serves as a great starting point for engineers who want to ensure their data models behave as expected during transformation. That might be enough for small companies with low data volumes. But for any growing and data-led organization, it’s important to also layer in data observability to detect complex data changes and provide visibility into the health and performance of your data throughout the data lifecycle. Otherwise, the matter of data quality quickly becomes siloed to your engineers who quickly become bottlenecks when business users want help with data issues.

Elevate your dbt tests with Validio today

Book a demo