Data Trends & Insights

Semantic layer 101: Why your data team should focus on metrics over data

March 11, 2024

Patrik Liu Tran

Most companies today want to become data-driven, but only 31% of them are.

To become data-driven, it is not sufficient to only collect and store data. You need to use it to make smart decisions. That’s where metrics come in. Because business stakeholders don’t consume data tables, they consume metrics—and metrics help them answer questions like:

What is the revenue per week, per market, per product category?
What is the customer lifetime value?
What is the customer acquisition cost per market?
What is the current expected credit loss?

Many data teams don’t pay enough attention to metrics. They often get distracted serving data tables and dashboards, without understanding what they really mean. This can lead to confusion and frustration among data consumers who may not trust or understand the data they receive.

In this article, we will show you why metrics should be your top priority as a data team, and what role the semantic layer plays in achieving data-driven success through metrics.

TABLE OF CONTENTS

What is a semantic layer?
The semantic layer makes data accessible to everyone
Powering LLMs with the semantic layer
The Data Trust Platform that enables your semantic layer

What is a semantic layer?

The proposed architecture of a unified semantic layer. Credit goes to Benn Stancil for creating the original graphic.

In today’s architecture, raw data gets processed and transformed before it is delivered for different data use cases such as BI, analytics & data science, and operational applications. For each data use case, metrics are then defined and calculated based on the transformed data.

Let’s illustrate with an example:

Suppose that you are working in a fashion e-commerce business and want to set up metrics on cost per unit for each product category across different markets. To set up the metric, you need to make several decisions, including:

Definition of cost: Do you include only direct cost (e.g. the cost of goods sold) or also indirect cost (e.g. overhead cost from administrative functions) associated with the products? If you decide to include indirect cost, how do you allocate the cost across the different products?
Definition of product categories: How granular do you want to be in classifying product categories? For example, do you split the category of pants into swim pants, sports pants, jeans, etc. or do you just keep it as a single big category?
Market definitions: How do you define the markets? Do you split it up by city, country, region or continent? Let’s say that you decide to split it up per region, how do you then define them?

Traditionally, without a semantic layer in place, the definition of the metric including decisions around cost, product category and market is made per data use case through e.g. SQL queries. As you can imagine, there is a big risk that inconsistent decisions are made across different data use cases. This would mean that a metric with the same name might have different underlying definitions and thereby show totally different numbers across the different data use cases.

The semantic layer, also called metrics layer, solves this problem. It is placed right after the transformed data tables and right before the data use cases and acts as a single place where all the metrics are being centrally defined, oftentimes through SQL statements. This allows data consumers to access the defined metrics without having to make their own metric definitions in SQL. Instead, they can access the already defined metrics through natural language.

In short, a semantic layer helps everyone get the same answers from the data across all use cases in the organization—in terms they can understand.

The semantic layer makes data accessible to everyone

One of the main benefits of the semantic layer is that it empowers downstream stakeholders, such as analysts, data scientists, and business users, to get their own answers from data, without relying on data engineers and data producers.

In doing so, the semantic layer promotes consistency and builds trust among stakeholders, ensuring that everyone uses the same definitions and calculations for their key metrics.

With the semantic layer, downstream stakeholders get:

More trust in data: The semantic layer ensures that everyone gets the same answers everywhere and every time by providing predefined and validated metrics and calculations in a single source of truth.
Freedom to choose tooling: The semantic layer allows downstream stakeholders to use any analytics tool they prefer, or even multiple analytics tools, without compromising data quality or reliability. It future-proofs data consumption outlets, as it allows data teams to swap out tools or platforms without affecting the integrity of metric definitions.
Data democratization: The semantic layer enables downstream stakeholders to access and analyze data using standard search terms or natural language, without requiring technical skills such as SQL.

In essence, the semantic layer centralizes all metric definitions into one unified layer of code—providing a scalable and streamlined way to deliver consistent metrics to the whole organization. In addition, it effectively improves governance, lineage, and efficiency of key metrics, by providing a single source of truth and a common language for data.

Powering LLMs with the semantic layer

Due to the mentioned benefits of the semantic layer, it has become the center of attention for businesses looking to create new data experiences with AI and large language models (LLMs). One of the holy grail use cases of LLMs these days is to be able to ask any data questions in natural language to a LLM, which would then translate it into SQL and query the data warehouse for the answer before returning it to the user. This has the potential to free up a lot of time from analysts who otherwise spend a big portion of their days serving the rest of the organization with simple data and dashboard requests.

The accuracy of the LLM in answering data questions has been shown to go up by as much as 300% if it integrates into a semantic layer instead of directly targeting the transformed tables. Why is that? Let’s revisit our example above on the metric definition of cost per unit across different markets. Without the semantic layer in place, the LLM has to make assumptions about how to define cost, product category, and market. The risk of making inaccurate assumptions is high. With a semantic layer in place, the assumptions are already made and agreed upon by the business, and the LLM only needs to follow the already-defined metrics and surface back the answers to the users.

As the adoption of LLMs increases, the benefit of semantic layers will become even more clear for data-driven organizations.

The Data Trust Platform that enables your semantic layer

The metrics in the semantic layer will be the essential outputs that your business users rely on, making it critical to ensure the quality and reliability of the data that make up the metrics. Validio’s Data Trust Platform covers the entire data journey, preventing any issues from impacting your semantic layer, so you can trust your single source of truth.

In summary, metrics are the key to unlocking the full potential of data-driven success. By shifting the focus from data and dashboards to metrics, organizations can establish a consistent, unified, and easily accessible view of their data with a semantic layer—as long as they have the tools necessary to ensure full trust in the metrics.

Make your business metrics-driven with Validio

Book a demo