Product Updates

Validio introduces Data Catalog

Tuesday, Jan 30, 20244 min read
Emil Bring

Last year we introduced data lineage for improved data observability. Now, we’re taking the next big step in providing users with an all-in-one Data Trust Platform: data cataloging.

The backbone of getting returns on data investments is to know which data assets to prioritize–but making sense of what data your organization has, where it came from, and who owns it, can be a challenging endeavor for anyone. That is what data catalog tools generally aim to solve. However, while cataloging makes data easier to discover and manage, any issues in the actual data still need to be caught and fixed. 

That’s why we’re excited to announce Data Catalog in Validio’s platform, where deep data quality & observability and end-to-end lineage are already integral parts of our core offering. 

By unifying Data Observability & Quality, Lineage, and Catalog features, Validio becomes the first all-in-one platform for data trust. 

Why are data catalogs important for your data quality? 

At its core, a data catalog is an inventory of all the data you own or use in an organization. It collects and organizes metadata (data about your data) for all your data assets (such as tables or streams). This brings multiple benefits:

  • Get insights into your data assets, such as ownership, popularity, and utilization
  • Easily discover data assets with global search and filtering, and see who to contact when issues occur
  • Understand downstream impact of issues using detailed lineage and traceability
  • Collaborate and share data with other users and stakeholders, breaking free from data silos
  • Facilitate regulatory compliance, such as EU AI Act regulations or financial regulations, by improving data quality and data governance.

  • Many companies treat all data assets as if they are equally important, but as little as 10% are generally driving business value. By using data catalogs to understand what data assets are important for the business, companies can validate and improve the data that matters, while managing and optimizing data usage and storage to save costs. It doesn't make sense to have extensive data validation in place for unused data.

    Prioritizing your data is critical to delivering business value

    A well-operated data catalog with metrics for utilization, validation coverage, and compliance adherence, makes it much easier to identify what data assets are valuable for your business. Then you will know where to set up deep data validation to catch both loud and silent data issues.

    By setting up sufficient validation for your most valuable data assets, it enables disparate users and stakeholders to collaborate more closely when resolving issues and improving data quality. Then you are able to maintain an efficient Data Trust Workflow across your organization, as illustrated below. 

    The data prioritization flywheel

    1. Prioritize among your data assets (i.e. you must know what data to prioritize)
    2. Validate your most critical data with deep and automated tests
    3. Improve data quality through extensive resolution and close collaboration

    The Data Trust Workflow is made up of three steps: Prioritize, Validate, Improve.

    Prioritizing data assets and improving data quality are both heavily enabled by the data catalog feature. Combining that with Validio’s capabilities to go deep on data validation completes the prioritization flywheel, making Validio’s Data Trust Platform one single place to power your data quality and improve return on data investments.

    Key features of Validio’s Data Catalog 

    Let’s dive into how the Data Catalog works.

    Validio’s Data Catalog was designed for technical and non-technical users equally. The goal was for it to be straightforward for users to manage and search for data assets and understand how they should be prioritized to drive business value and save costs. Here is a list of the current key features:

    Key Features

  • Utilization statistics: See how frequently your tables are being accessed (read) vs modified (written) and view the most recent queries that have been run. Tables with low utilization should be considered for moving to cold storage to save costs.
  • Popularity: See which tables are heavily or rarely used as a proxy to power cost optimization and optimal monitoring functionality. Highlight which tables are most commonly joined with a given table. See which user queries a table frequently.
  • Schema coverage: See how many of a data asset’s fields are covered by data validation.
  • Quality: See how many of all validated datapoints are detected as anomalies.
  • Manage ownership and access control: Set ownership of data assets to drive governance and accountability. Manage access control lists for increased security.
  • Import/attach metadata tags & descriptions: See table and column-level metadata (i.e. tags and documentation), either user-defined or imported from external sources, and improve data classification (such as PII).
  • Global search function and filter: Find and filter the data you need based on their metadata tags and descriptions.
  • Integrated data lineage: Trace how your data assets move across your data warehouse and understand their relationship and ownership. Facilitate collaboration between different users, teams, and data asset owners.
  • Field-level lineage shows how data travels in your organization, which facilitates collaboration across teams, and simplifies root-cause analysis when data issues occur.

    Easily manage and understand your data in Validio’s Data Trust Platform

    Managing data can be challenging, especially if you work in large organizations with lots of data to unpack. Our Data Catalog is an efficient way to discover, manage, and understand your data. Combining this with deep data observability and lineage, Validio gives you complete trust in the data that matters most for your use cases.

    See how it works for yourself