Validate left of the warehouse—for real
The data warehouse is often the heart of the modern data stack, and Validio can validate tables in all common warehouses. In addition, Validio helps pinpoint the exact location of the bad data—as detailed as on record level. What’s more, you can even choose to egress bad data points to a data location of your choice. Read more about Validio's data warehouse integrations here.
Data often lands in a data lake before it’s loaded into the data warehouse, which is why Validio supports validation for data in all major data lakes. Similarly to databases and data warehouses, Validio can write out individual bad data points to a data lake folder or bucket of your choice for auto-resolutions or manual inspection. Read more about data lake integrations here.
Validio supports validation for data in all major streaming services. As with the other storage types, Validio can write out individual bad data points to a data stream topic of your choice for auto-resolutions or manual inspection. Read more about Validio's data stream integrations here.
Configure for Deep Data Observability
Validio validators use state-of-the-art methods coming from multivariate statistics and information theory in order to derive accurate quality metrics from the actual data rather than just metadata on a superficial level. In other words, our validators operate on data at all different levels and combinations—not just observation of the data at the surface level. The platform hosts the market's most exhaustive list of aggregate validators for any data observability need. These include data freshness, data volume (including relative volume), all measures of data distribution, categorical statistics, referential data, and much more.
Validio's Deep Data Observability goes deeper than the aggregates, and provides powerful validators for individual datapoints that are necessary to fully operationalize data. These validators include anomaly detection, formatting errors, unreasonable dates, values belong to a predefined set, booleans, and more.
Custom SQL validators
In addition to Validio's aggregate- and datapoint validators, the platform supports creation of manual SQL statements to check business specific logic and semantics.
To fully operationalize data quality at scale, Validio is built to validate data in real-time—without delays for your business. This is a must-have to validate data in streams. Read more about Validio's data stream integrations here.
Don't let your data validation cadence be limited by tooling. Instead, let Validio adapt to your business data- and workflows and run our validators in real-time, every minute, hour, day, week, or whatever other cadence you choose.
Make it yours
Validio features dynamic segmentation across any number of data features so you can quickly understand if something went wrong even within the most granular segment. The platform elegantly handles thousands of segments, and is intelligent enough to recognize when new ones show up in the data. Segmentation with Validio works out of the box without crude and manual GROUP BY statements or by creating separate static tables per segment.
Dynamic segmentation acts as root-cause analysis on steroids as it will immediately identify and let you know where the issue occurred.
Filters enable you to specify rules for what raw data to be validated. For example, let’s say you want to look only at the subset of the data where the feature country is not NULL, then filters help you do this. Think of filters as a "WHERE" statement in SQL, helping you to increase the specificity and customization of your validators.
Aggregations allow you to specify how data should be transformed before being validated, e.g. count, average or relative entropy.
Windows allow you to specify a start and an end point for any validation, so you can be in full control of how often data is validated or how many records are collected before analysis is performed.
Validio features smart rules that change over time as more data is ingested into the system—even accounting for seasonal trends and patterns. Validio scans historical data to recommend appropriate thresholds for what values are considered normal versus anomalous, meaning Validio can provide insights in less than five minutes (the time it takes to install the platform—no need to wait multiple days for algorithm training).
Smart rules are especially useful for large data infrastructures with many tables or sources where manual rule-setting simply wouldn’t scale. With Validio, it’s not necessary to maintain and update rule thresholds—saving data teams tons of time.
Validio allows you to add manual thresholds to any validator. This helps you to capture business logic and domain knowledge that stays relatively constant over time.
Avoid alert fatigue
Validio features an intuitive user interface, leveraging the powers unique to a graphical UI with batch operations and smart recommendations for comprehensive data quality. Alerting your co-worker of an anomaly is as easy as copy-pasting an URL. Unlike many Data Observability Platforms, Validio’s user interface provides a fully flexible time scale where data can be inspected on multiple granularities; whether it be second-per-second as data enters the system in real-time, or weekly batch jobs.
Validio integrates seamlessly with the collaboration tool your team uses, such as Slack, Teams Pagerduty, and emails. Read more about Validio's tool integrations here.
Automate resolution of bad data
Validio doesn't just alert you about bad data, the platform also puts the power in your hands of what to do with that bad data.
With Validio's egress feature, you can write out bad data to a separate data destination of choice. Based on this, it's possible to automate data fixes so bad data doesn’t break downstream pipelines—especially when some ratio of data is expected to be bad. Read our data quality whitepaper here to learn more.
Validio's dynamic segmentation feature is the most powerful way to perform root-cause analysis when something goes wrong. Read more here.
Workflow integrations (in progress)
Naturally, you'll want to halt workflows if data doesn't behave as expected. Validio allows you to set up circuit breakers to do just that. This enables you to prevent bad data from propagating through downstream data pipelines
Speed things up
Validio can handle more than 1B data records per day—in real-time for operational decision-making at scale. This performance under the hood is unmatched by other Deep Data Observabiity platforms.
The platform is built for cutting-edge scalability, enabling you to smoothly apply data quality to all your data, without upgrading the platform, increasing the number of data team members or reducing scope of other work.
Cost-control (in progress)
When processing massive amounts of data, warehouse costs become important to consider. Validio’s intelligent compiler selects the most efficient way to execute a validator—choosing between a streaming- or pushdown approach.
Experience Validio in actionRequest a demo?
Make it a team effort
Validio allows for multiple users to interact with the platform with varying rights and privileges. In this way, you and your team can collaborate on settings and thresholds.
Infra-as-code & CLI
Validio combines a code first (YAML) as well as a no-code interface in order for multiple users to be able to interact with the platform in whichever way suits their needs. In addition, the platform can be managed through a CLI interface.
Host where you want
Validio's deployment in your virtual public cloud means data never leaves your environment, and you are in full control of Kubernetes resources deployed.
Validio is also offered as a fully managed solution where our team manages deployment of the platform in our cloud environment.
We'd love to have a chat!Request a demo