What is an MPP Database?

An MPP database is a massively parallel processing database (MPP stands for Massively Parallel Processing).

Massively parallel processing refers to the use of a large number of processors (or separate computers) to perform a set of coordinated computations in parallel (simultaneously).

In MPP databases, data is partitioned across many database servers or nodes, each of which has its own memory and processors. This way, each server can process data locally, using its own resources. None of the resources are shared. This is where the term shared-nothing comes from.

Data processing can occur much more quickly in a shared-nothing configuration. A shared configuration could result in a bottleneck where all nodes are trying to access the same resources. In theory, a pure shared-nothing database system could scale almost infinitely by adding nodes.

Massively parallel processing databases can be beneficial in environments where there are massive amounts of data, such as in data warehouses and other big data environments.

This process of partitioning data horizontally is also referred to as sharding. Each horizontal partition of data is called a database shard, or simply shard, and each shard is stored on its own database server.