What is a Column Store Database?

A column store database is a type of database that stores data using a column oriented model.

A column store database can also be referred to as a:

  • Column database
  • Column family database
  • Column oriented database
  • Wide column store database
  • Wide column store
  • Columnar database
  • Columnar store

The Structure of a Column Store Database

Columns store databases use a concept called a keyspace. A keyspace is kind of like a schema in the relational model. The keyspace contains all the column families (kind of like tables in the relational model), which contain rows, which contain columns.

Like this:

Diagram of a keyspace containing column families.
A keyspace containing column families.

Here’s a closer look at a column family:

Diagram of a column family in a wide column store database.
A column family containing 3 rows. Each row contains its own set of columns.

As the above diagram shows:

  • A column family consists of multiple rows.
  • Each row can contain a different number of columns to the other rows. And the columns don’t have to match the columns in the other rows (i.e. they can have different column names, data types, etc).
  • Each column is contained to its row. It doesn’t span all rows like in a relational database. Each column contains a name/value pair, along with a timestamp. Note that this example uses Unix/Epoch time for the timestamp.

Here’s how each row is constructed:

Diagram of rows and columns in a wide column store database.

Here’s a breakdown of each element in the row:

  • Row Key. Each row has a unique key, which is a unique identifier for that row.
  • Column. Each column contains a name, a value, and timestamp.
  • Name. This is the name of the name/value pair.
  • Value. This is the value of the name/value pair.
  • Timestamp. This provides the date and time that the data was inserted. This can be used to determine the most recent version of data.

Some DBMSs expand on the column family concept to provide extra functionality/storage ability. For example, Cassandra has the concept of composite columns, which allow you to nest objects inside a column.

Benefits of Column Store Databases

Some key benefits of columnar databases include:

  • Compression. Column stores are very efficient at data compression and/or partitioning.
  • Aggregation queries. Due to their structure, columnar databases perform particularly well with aggregation queries (such as SUM, COUNT, AVG, etc).
  • Scalability. Columnar databases are very scalable. They are well suited to massively parallel processing (MPP), which involves having data spread across a large cluster of machines – often thousands of machines.
  • Fast to load and query. Columnar stores can be loaded extremely fast. A billion row table could be loaded within a few seconds. You can start querying and analysing almost immediately.

These are just some of the benefits that make columnar databases a popular choice for organisations dealing with big data.

NoSQL vs Relational

Some vendors and service providers (such as AWS) refer to columnar databases as NoSQL, citing their differences to the relational database model.

Others however, disagree. Steve Sarsfield (Vice President Product, AnzoGraph) previously worked for Vertica and explains:

From a user perspective, the metadata of a columnar database looks exactly the same as a RDBMS. You perform schema management in much the same way as Oracle. In most cases, it’s 100% SQL compliant and 100 ACID compliant (unlike many NoSQL). NoSQL databases tend to be either Key Value stores or Document Stores. Columnar is neither.

Examples of Column Store DBMSs