Data Cubes is a data format especially useful for heavy reads. It has been popularized in business environments where the main use for data was to make reports (many reads). This also links with the OLAP (Online Analytical Processing) vs OLTP (Online Transaction Processing) concepts, where one is optimized for reads and the other for writes.
The main driver behind data cubes was business intelligence. While traditional relational database systems are focused on the day-to-day business of a company and record keeping (with customers placing or- ders, inventories kept up to date, etc), business intelligence is focused on the production of high-level reports for supporting C-level executives in making informed decisions.
The Data Model
We basically have some dimensions that index the data (which can be more than three in the image), and we have some information for each. The important part is that they are **dense** meaning we usually have information for every entry.Clearly this model can be decomposed into a relational table, which is called fact table in this context. For example, the following is a $2\times 2\times 2$ cube:
Operations
Slicing
We just take a subset of the cube, for example, we can take all the data for a specific value of one of the dimensions. Equivalent of a selection on the fact table.
Dicers
If we slice in a way such that we only have (1,2,3)-dimensions left, then we have a nice way to represent that information in 2D table. This is what dicing means.
Aggregation
We collapse along some dimension (e.g. having averages or something of that kind).
Hierarchies
We can have hierarchies in the dimensions, for example, we can have a hierarchy of time (year, month, day). TODO:
Roll-Up
TODO.
Implementation
In this section we will present how data cubes are usually implemented.
ROLAP
ROLAP is the storage of the data cube in a relational database. This is done by having a fact table and multiple dimension tables.
Pivoting
We can pivot the data cube to have a 2D table, which is useful for visualization. The easiest example is when we have a column in a table to be an enum, then we can explode this column into multiple columns, one for each value of the enum. This is called Pivoting. The reverse process is called unpivoting.
Satellite Tables
These are tables that contain information about the dimensions. The main table just contains the keys to these tables, the full information is in the satellite tables and can be recovered with joins. If the satellited tables are also normalized (see Structured Query Language) we derive the snowflake schema (if not, is called star schema).
Querying Cube Data
TODO: (the main problem is doing with multiple aggregates).
MOLAP
In this case the support for cubes is implemented from scratch. We will not cover this method. MOLAP stands for Multidimensional OLAP. They also have a special query language called MDX (Multi Dimensional Expressions).