Semantic Layer strategy in the era of Self-Service BI

The increasing use of self-service BI capabilities, such as reporting, data exploration and analytics, by corporate end-users, has opened a debate about who and how to develop Semantic Layers (SL), in order to support corporate needs.

One of the new BI tools` advantages is that they embed metadata management capabilities. These capabilities allow end-users to quickly and easily create their own semantic layers. However, the use of diverse self-service technologies can make their maintenance and integration challenging.

On the other hand, extending existing DWH semantic layers will prevent the proliferation of semantic layers and reports’ incongruences, but centralized sematic layers are often rigid and require long implementation time to be expanded.

A more radical position would be not to use semantic layers at all. Data exploration and analytics generally work on raw data. They need data catalogs only!

So, which one is the right approach?

To answer to this question, we should analyze two points:

  1. what is a semantic layer and how is it used?
  2. what is the difference between a tradition corporate BI environment and the new ones with self-service capabilities?

Semantic layer

“A semantic layer is a business representation of corporate data that helps end users access data autonomously using common business terms. A semantic layer maps complex data into familiar business terms such as product, customer, or revenue to offer a unified, consolidated view of data across the organization” (From Wikipedia)

A semantic layer aligns higher-level business terms (e.g. customers or products) with the source data. In this way, end-users can have a complete view of all data relevant to that term, with no need to know the underlying relational structure or to write SQL queries to access the data. In other words, semantic layer decouples the data sources from business representation. of the data.

With a semantic layer in place, end-users don’t see data as a collection of tables and relationships, but they see data as lists of “business fields” which are organized into one or more hierarchical structures. If a data source table is changed, there is no impact in the associated “business field” of the semantic layer.

The picture below shows an example of mapping between data sources tables (right) and business terms (left). The arrows represent the SQL queries to create the business terms. 

New corporate BI environments

The pressure of digital transformation is dramatically changing Corporate BI environments: end-users are continuously asking for more data volumes, more sources (even external) and more insight capabilities. The result is that new corporate BI environments are more complex than traditional ones.

The New BI environments:

  1. must be able to manage both external and internal data
  2. their visualization layer should be able to produce reporting as well as analytics and data exploration. Reporting is both self and ad-hoc reporting
  3. their visualization layer should embed metadata management and semantic layer capabilities
  4. must allow the direct access to the data

Semantic layer strategy for new corporate BI environments

In this new scenario, Enterprises should carefully evaluate their existing enterprise DWH and semantic layers, in order to define the right implementation strategy, such as:

  1. expand current enterprise semantic layers to cover the new self-service scenarios and use cases (IT-driven model). The centralized semantic layer is the best solution to guarantee data coherency at corporate level, but it is complex to build and could be too rigid to fully support all self-service needs and use cases. This model requires a strong data governance and a high level of partnership between IT and End-users.
  2. allow end-users to develop their own semantic layers (End-User-driven model). This model provides high flexibility and speed, but it is hard to maintain and it doesn’t guarantee data coherency.
  3. develop a centralized and common semantic layer into which end-users can integrate their specific semantic layers (Hybrid model). Only part of the semantic layer is centralized and build by IT; the rest is built and managed by end-users. This model provides a good compromise between implementation speed, maintenance efforts and data coherency.

Each of the above approaches has its own pro and cons as shown in the table below:

The hybrid model is the most viable approach as it offers the best compromise among flexibility, speed, data coherence and maintainability. The centralized (IT-created) Semantic layer contains all the common terms, while the user-created semantic layers contain the subject-area or department-specific terms. Users can easily and independently expand their own SLs, while IT is responsible only for the common part, that is more stable and needs less enhancements.

It is important to observe that:

  • IT role can vary according on subject area or departments. Depending on end-users` knowledge and capabilities, IT can either develop the “common” Semantic layer only, or be also involved in the creation of subject area SLs. Consequently, the “thickness” of IT-created SL can depend on subject or macro-subject areas as shown in the picture below
  • The model is dynamic: user-created SLs (or part of them) can be “promoted” to the common layer when they are stable and/or when they interest more than one team. This “promotion” mechanism should be part of the Data governance process.
  • As Metadata can grow up quickly, it is important to create a comprehensive, consistent and automated Data catalog. A good practice should be the integration of the data catalog with data preparation (i.e., design a fully automated process that scans every raw data file ingested and write to the Data catalog). The Data catalog is not only important to prevent (or reduce) the inconsistency in the semantic layer but also to allow direct access to the data for exploration and analytics activities.

In conclusion, even if the right approach to the Semantic layer depends on the combination of technical, organizational and process aspects, the hybrid model seems the best way to expand corporate Semantic layers to support new BI self-service BI capabilities.