Stream-lake-warehouse lineage is game-changing for collaboration
Validio’s next generation data lineage works across data sources and offers the highest level of transparency. This in turn fosters collaboration across data teams responsible for various data sources.
In other words, Validio’s lineage is not limited to Data Warehouses; Data teams can visualize and understand lineage from all sources including Data Warehouses, Object Storages and Streams, combined. For example, users can assess incidents based on the lineage from a dataset in Amazon S3 bucket to a BigQuery table, given that they have defined custom relations between these two sources.
Why does this matter? Companies’ data ecosystems often contain a mix of source types. With Validio, users have an overview of a larger part of those ecosystems. This feature enables:
- Better root cause analysis: users can explore issues that occur closer to the source, even as early as in the data streams
- Better impact analysis: data team members can become aware of the impact beyond a specific source type when they are to make changes or when data incidents happen
These types of analyses are not possible when data teams rely on tools that only cover lineage for a specific source system. In contrast, the Validio platform helps users to easily carry out root cause- and impact analysis when incidents happen anywhere in the whole data ecosystem, whether in a Kafka stream, an Amazon S3 dataset, or a BigQuery table.
If there are strange data points in the BigQuery table, users can trace them all the way up to the Kafka stream to investigate the root cause. Similarly, if a new field is to be added to a Kafka stream, the owners of Amazon S3 datasets and BigQuery tables are informed and can act accordingly. This high level of collaboration is made through the power of Validio’s stream-lake-warehouse lineage.