What are they, how do they work, and when should your enterprise deploy an OLAP cube to speed up analytic processes?
Companies move as fast as their slowest insight.
In the era of big data, information is often produced faster than it can be consolidated and analyzed throughout the Enterprise.
Analysts battle for the limited engineering resources, so they can construct data sets for their analytics endeavors to ultimately provide insights to fast-growing enterprises.
But business intelligence (BI) is often stalled by the limited engineers’ hours or analysts’ lack of engineering skills needed to construct such datasets from raw data.
This is where OLAP cubes come into play.
They are pre-aggregated data structures that allow analysts a multitude of analytic queries without having to constantly ask engineers for their time.
In this guide, we will look at OLAP cubes: what are they, how do they work, and when should your enterprise deploy an OLAP cube to speed up analytic processes.
OLAP or OnLine Analytical Processing refers to the architecture and features of databases that specialize in analytical queries.
Unlike OLTP (OnLine Transaction Processing) relational databases, that tune query performance for transactional data (small but fast data writes), OLAP systems specialize in performing complex read queries over large amounts of data.
The OLAP cube is a specialized data structure that results from aggregating data in a useful manner.
OLAP cubes are a multidimensional representation of data.
That’s a mouthful.
To understand the multidimensional array of OLAP cubes, let us start simple - with a 2-dimensional representation we all know: Microsoft Excel.
Excel spreadsheets organize data into two dimensions: rows for each entry, and columns for attributes of that entry:
OLAP cubes go one step further and aggregate data across dimensions. For example, an OLAP cube could aggregate total sales amount across three dimensions: city, product, and time.
The data cube allows analysts to quickly get aggregated sales data. For example, it takes only one lookup query to get total sales in London in 2021 ($ 30,000).
Unlike the alternative, writing SQL queries over Excel-like data that multiply the necessary GROUP BY, WHERE, and UNION operations to make the same results happen.
OLAP cubes aggregate a metric (often called a fact, saved in a fact table) such as “total sales” or “number of purchases” over dimensions.
Dimensions are anything you can group data by. Common dimensions are time (group by year, quarter, month, …), geolocation (continent, region, country, …), product categories, etc.
The OLAP cube usually has three dimensions, but it can be constructed of an infinite number of dimensions (called MOLAP - multidimensional OLAP cube). Each dimension is saved into its dimension tables.
To organize metrics and dimensions in a format an OLAP cube can use to construct aggregate data, we model the original data into fact tables and dimension tables using data modeling paradigms called star schemas or snowflake schemas (because their shape resembles a star or a snowflake).
Dimensions are organized in what we call a concept hierarchy. That is, dimension values can have hierarchical parents and children. For example, the location value for city “London” has a parent dimension value country equal to “UK”.
This allows us to perform aggregations and multiple operations on different granulations of the data cube.
OLAP cube supports multiple operations that come in handy in the analyst’s workday.
Slicing allows us to look at data by fixing (or picking) just one value of a dimension. In the example below, we have sliced the data cube to look at data from only 2004.
The dice operation is very similar to the slice, but instead of taking a single value for a dimension, we take more than one value in the dimensions we are choosing.
In the example below, we have diced the data cube along the product dimension (picked just “Accessories”, “Outdoor protective equipment”, and “Golf equipment”) while we kept the date dimension and the region dimension intact.
In essence, dicing operations create a subcube of the original OLAP cube.
Drill-down and roll-up allow us to go deeper into the hierarchy of one dimension (drill-down) or aggregate an entire dimension up (roll-up).
In the example below, if we move from left to right, we have drilled down into the “Outdoor protective equipment” to show only its children (“Insect repellant”, “Sunblock”, and “First aid”).
While if we moved in the opposite direction, and aggregated total_sales across “Insect repellant”, “Sunblock”, and “First aid”, we would have rolled the values up into the “Outdoor protective equipment”.
The pivot operation is going to ring some Excel bells. Its spreadsheet equivalent is the (in)famous pivot table operation.
The OLAP pivot operation just rotates the cube around its axes, giving us a view that is better equipped for the various reporting and visualizations we are creating.
OLAP cubes are extremely useful for regular reporting and ad hoc analyses.
By structuring the data into dimensions and facts, OLAP cubes also support a variety of analytic endeavors, such as:
However, OLAP cubes have also shortcomings. Depending on the practical implementation, OLAP cubes might either need a specialized server (increases overall operational costs) or need specialized knowledge for implementation and maintenance.
Let’s take a look at the practical implications of OLAP cubes.
Historically, OLAP cubes were built on specialized OLAP (or MOLAP or ROLAP) servers. An example of that is Microsoft’s SQL Server Analysis Services (SSAS).
But the industry shift has been to move away from dedicated OLAP servers and into modern data warehouses.
Modern data warehouses utilize three features for OLAP analytics:
With all three characteristics in place, it has become progressively easier to perform OLAP cube queries directly on the databases within data warehouses. Without needing to construct a specialized data structure on its devoted server.
But are OLAP cubes the right solution for your business? Let us take a look at the advantages and disadvantages to better understand the tradeoffs?
OLAP cubes are extremely useful for regular reporting and ad hoc analyses.
By structuring the data into dimensions and facts, OLAP cubes also support a variety of analytic endeavors, such as:
However, OLAP cubes have also shortcomings.
There are multiple challenges to working with OLAP cubes:
So what can be used to compensate for the OLAP cubes’ shortcomings?
There are two ways you can compensate for the OLAP cubes' shortcomings.
First, there is software that automates data mining and exploration and makes it accessible to lay users. An example of that is ThoughtSpot. ThoughtSpot allows any users within your company to explore data themselves and discover interesting trends and insights. Without your engineers needing to devote time to build and maintain the underlying data.
Check how you can use ThoughtSpot and Keboola together to get insights faster.
Second, the Modern Data Stack offers a different approach to data engineering and data architecture.
OLAP cubes come from a time when data engineers and developers were the only people who could build metrics and KPIs to track progress across the company. Everyone else depended on them.
With the rise of platforms like Keboola, that empower non-technical people to build data pipelines themselves (check how easy it is) and the move from centralized architectures to data mesh architectures, OLAP cubes are replaced by more flexible data pipelines. Data pipelines that are constructed (and deconstructed) by domain experts to track metrics and KPIs for domain experts.
The Modern Data Stack, with its tools and architectural designs, offers an alternative to OLAP. OLAP is the reliable, old-school BI tool. While the Modern Data Stack is the flexible, new approach to constructing metrics and KPIs.
Whether you are a die-hard OLAP fan, or want to build a Modern Data Stack, Keboola is here to assist you.
With over 250 integrations between sources and destinations, Keboola can help you automate the engineering you need to create OLAP cubes, data mesh architectures, or even integrate with advanced tooling like ThoughtSpot. Extract, transform, and load data from your sources to the (OLAP) destination of your choice with a couple of clicks.
Try it out.
Keboola offers a no-questions-asked, always-free tier, so you can play around and build your pipelines with a couple of clicks.