In today’s architecture, raw data gets processed and transformed before it is delivered for different data use cases such as BI, analytics & data science, and operational applications. For each data use case, metrics are then defined and calculated based on the transformed data.
Let’s illustrate with an example:
Suppose that you are working in a fashion e-commerce business and want to set up metrics on cost per unit for each product category across different markets. To set up the metric, you need to make several decisions, including:
- Definition of cost: Do you include only direct cost (e.g. the cost of goods sold) or also indirect cost (e.g. overhead cost from administrative functions) associated with the products? If you decide to include indirect cost, how do you allocate the cost across the different products?
- Definition of product categories: How granular do you want to be in classifying product categories? For example, do you split the category of pants into swim pants, sports pants, jeans, etc. or do you just keep it as a single big category?
- Market definitions: How do you define the markets? Do you split it up by city, country, region or continent? Let’s say that you decide to split it up per region, how do you then define them?
Traditionally, without a semantic layer in place, the definition of the metric including decisions around cost, product category and market is made per data use case through e.g. SQL queries. As you can imagine, there is a big risk that inconsistent decisions are made across different data use cases. This would mean that a metric with the same name might have different underlying definitions and thereby show totally different numbers across the different data use cases.
The semantic layer, also called metrics layer, solves this problem. It is placed right after the transformed data tables and right before the data use cases and acts as a single place where all the metrics are being centrally defined, oftentimes through SQL statements. This allows data consumers to access the defined metrics without having to make their own metric definitions in SQL. Instead, they can access the already defined metrics through natural language.
In short, a semantic layer helps everyone get the same answers from the data across all use cases in the organization—in terms they can understand.