Product Updates

Level up your dbt tests to catch loud and silent data issues at scale

Thursday, Feb 15, 20244 min read
Emil Bring

Many data- or analytics engineers use dbt tests to check the health of their data before running data pipelines. But when data use cases grow in complexity, only using dbt tests comes with some inherent challenges: it lacks scalability and more advanced testing. 

That’s why we’re happy to announce Validio's new dbt integration. With it, you can further improve your dbt tests by leveraging Validio’s ML powered anomaly detection to reveal unknown issues that otherwise remain hidden.

TABLE OF CONTENTS

  • 1. Silent data issues can slip through your dbt tests
  • 2. It's hard to scale dbt tests
  • 3. How to level up your dbt tests with Validio
  • 4. dbt+data observability
  • Silent data issues can slip through your dbt tests

    Silent data issues are the ones that happen under your radar–subtle issues that don’t appear in your data quality tests. This is also what makes them detrimental—with their hard-to-detect nature, silent data issues can stay around in your data for days or even months, causing huge negative impact. When setting up dbt tests, you only provide information about what you’re specifically looking to test. But what about potential failures you don’t know about? Or areas you don’t have time or resources to test? 

    This is when you want to layer in data observability—to not only catch the loud data issues, but also the silent ones, that can happen at any stage of your data pipeline. 

    It’s hard to scale dbt tests

    In dbt tests, each test is implemented manually as part of each model. This can quickly become challenging because dbt projects tend to grow fast—really fast. When data sources both grow in numbers and change over time, it tends to become too cumbersome to manually maintain sufficient testing. Any tests must be manually updated across massive numbers of tables and columns. 

    Let’s illustrate the magnitude of this challenge:

    Manually creating tests for many tables can easily turn into an unfeasible workload. 

    That’s why our Product & Engineering Team has been hard at work to enable this dbt integration. It will help data teams deliver higher quality data at a greater scale, by combining the power of dbt tests with Validio’s Data Trust Platform.

    How to level up your dbt tests with Validio

    1. Scale testing with automated data observability

    The ultimate goal of both testing and data observability is to identify and resolve issues. However, they have different methods. Testing is biased - it comes from a human stating a condition they expect to be true. Observability is impartial - it asks, do we know the state of the data at all times? How has it changed?

    As organizations scale, there are endless possibilities to test. While you can only write tests for a limited number of things, you can add data observability on top of your tests. It will help you detect the issues you can’t create manual tests for.

    In Validio, dbt users can:

  • Choose between integrating to dbt Core or dbt Cloud
  • Use Validio’s UI to quickly scale tests to cover all the data assets needed
  • Monitor for dbt jobs that tend to take longer than expected
  • See detailed information about dbt tests and model runs
  • Use Validio’s Data catalog to notify data owners and collaborate on resolving issues
  • 2. Catch loud and silent issues

    Dbt shines in adding constraints to data transformations early in the pipelines, but that tends to stay in the realm of engineers. As data scientists and analysts have equally high expectations of their data to be accurate and reliable, they need to know if and how data changes unexpectedly anywhere in the pipeline. 

    Valdio’s dbt integration allows data consumers to:

  • Combine Validio’s data validation setup with information from dbt tests and model runs.
  • Leverage Validio’s ML-based thresholds to reveal anomalies otherwise hidden in segments of your data
  • Adjust sensitivity to reduce false positives and alert fatigue
  • Troubleshoot data incidents by checking the dbt models and tests associated with the affected tables in Validio’s lineage graph (e.g. a table in Snowflake that is managed by dbt)
  • Validio uses machine learning to reveal anomalies in segments of your data.

    Validio uses machine learning to reveal anomalies in segments of your data.

    3. Put data quality into the hands of your data consumers

    As mentioned, dbt is a great tool to cover for known issues before data is sent to end users. But by combining dbt tests with Validio, data quality can also be put into the hands of data consumers like data scientists, analysts, and business users. These roles can now ensure data is reliable at each stage of their data pipelines: from transformation with dbt tests to production with Validio’s Data Trust Platform.

    In Validio, users can see detailed information about their dbt models and test results.

    dbt + data observability: break free from data silos

    Dbt serves as a great starting point for engineers who want to ensure their data models behave as expected during transformation. That might be enough for small companies with low data volumes. But for any growing and data-led organization, it’s important to also layer in data observability to detect complex data changes and provide visibility into the health and performance of your data throughout the data lifecycle. Otherwise, the matter of data quality quickly becomes siloed to your engineers who quickly become bottlenecks when business users want help with data issues.

    Elevate your dbt tests with Validio today