understanding bitmap indexing

understanding bitmap indexing

bitmap indexing in OLAP

Motivation

When you are designing a database (OLTP or OLAP), it is common to design an index based on a query pattern. In this blog, we are going to talk about bitmap indexing and when to select them compared with another index such as B+ tree.

database indexing (OLAP and OLTP)

OLAP (online application platform) is a database that is designed facing analytics team, typically in large organizations (snowflake, red shift) and OLTP (online transaction platform) is a database that is facing customers that specialized in transactions.

OLTPOLAP
WorkloadTransactional processing; customer facingAnalytical processing; internal BI team facing
design goalOptimized for speed and efficiency of individual transactionsOptimized for complex queries on large amounts of data and easier for data analysts to query
data modelOLTP use a normalized data model to reduce redundancy.OLAP is usually more denormalized to have multiple copy of same column to reduce the number of joins in RDBM
Access (query) patterncharacterized by a high volume, high-frequency, small transactions that require fast response timescharacterized by less frequent but larger transactions that require longer response time
performancefast response timesusually can survive with a couple of minutes

As for OLAP platform, the fact table has billions of rows and a SQL query would take forever.

In simple words, a normalized database means less duplication of the same data across different tables within the database.

Indexing is a strategy that sacrifices some space for time. It is the same idea as the book has indexing pages for you to quickly locate where the information you stores

Pin on Index Page Layout Designs

In the database, it has lots of indexes such as

  • B+ tree

  • Hash index

  • Bitmap index

You sacrifice a couple of pages for indexing and table of contents, it saves the reader lots of time to locate the commands.

The question comes down to how we select the right index for your task, let's dive into some concept

Cardinality

In simple words, cardinality in the context of a database refers to the number of distinct entries in a column. Let's take a look at the table with billions of rows shown below

Employee_IDgenderProvince
1MON
2FAB
3FPE
4MSK
5FAB
6FON

.........

1,000,000,000FQC
1,000,000,001MNS
1,000,000,002FNL

Cardinality refs to number of distinct elements in each column:

  • Employee ID: cardinality is 1,000,000,002

  • Gender: cardinality is 2

  • Province: cardinality is all the provinces in canada. Up to 13

    • AB, BC, MB, NB, NL, NS, NT, NU, ON, PE, QC, SK, YT

The general rule of thumb is that the higher the cardinality the less likely you are going to use bitmap. Or only use bitmap when it doesn't have many unique values.

Bitmap

Let's take the gender column for example How bitmap works for the gender column is like

We only need to store two indexes, one column each

As for the province, we can create indexes such as

You only need up to 13 indexes for the analysis, let's say you wish to analyze how many employees in Ontario and Alberta,

you would do

SELECT
    count(e.employee_id)
from
    employee as e
where e.province in ('ON', 'QC')

How it's working behind the scene is that it will take the ON and QC bitmap index and perform an AND operation

Based on your query pattern, you could perform bitwise AND, bitwise OR or bitwise NOT operation.

Those access patterns are quite common for business analytics and choose the index properly when cardinality is relatively small compared with total number of entries in this column.

Summary

In this section, we discuss the bitmap index in the OLAP platform and how to select it based on cardinality.

Extra reading

If you interested