What is a Document Store Database?

A document store database (also known as a document-oriented database, aggregate database, or simply document store or document database) is a database that uses a document-oriented model to store data.

Document store databases store each record and its associated data within a single document. Each document contains semi-structured data that can be queried against using various query and analytics tools of the DBMS.

Document Examples

Here are two examples of documents that could be stored in a document database. Both examples use the same data – they are just written in different languages.

Here’s the first example, written in XML.

 <artist>
   <artistname>Iron Maiden</<artistname>
   <albums>
     <album>
       <albumname>The Book of Souls</albumname>
       <datereleased>2015</datereleased>
       <genre>Hard Rock</genre>
     </album>
     <album>
       <albumname>Killers</albumname>
       <datereleased>1981</datereleased>
       <genre>Hard Rock</genre>
     </album>
     <album>
       <albumname>Powerslave</albumname>
       <datereleased>1984</datereleased>
       <genre>Hard Rock</genre>
     </album>
     <album>
       <albumname>Somewhere in Time</albumname>
       <datereleased>1986</datereleased>
       <genre>Hard Rock</genre>
     </album>
   </albums>
 </artist>

And here’s the same example, but this time written in JSON.

{
    '_id' : 1,
    'artistName' : { 'Iron Maiden' },
    'albums' : [
        {
            'albumname' : 'The Book of Souls',
            'datereleased' : 2015,
            'genre' : 'Hard Rock'
        }, {
            'albumname' : 'Killers',
            'datereleased' : 1981,
            'genre' : 'Hard Rock'
        }, {
            'albumname' : 'Powerslave',
            'datereleased' : 1984,
            'genre' : 'Hard Rock'
        }, {
            'albumname' : 'Somewhere in Time',
            'datereleased' : 1986,
            'genre' : 'Hard Rock'
        }
    ]
}

Notice that I decided to add an _id field in the second example. This may or may not be required by the DBMS, however, some DBMSs will automatically insert a unique ID field if one isn’t supplied.

Document Store vs Relational Databases

If we were to enter the above data into a relational database, the info would typically be stored across three different tables – with a relationship linking them together via their primary key and foreign key fields.

Here’s how a relational database might store the above data.

Artists

ArtistId ArtistName
1 Iron Maiden
2 Devin Townsend
3 The Wiggles
4

Albums

AlbumId AlbumName DateReleased ArtistId GenreId
1 The Book of Souls 2015 1 3
2 Killers 1981 1 3
3 Powerslave 1984 1 3
4 Somewhere in Time 1986 1 3
5 Ziltoid the Omniscient 2007 2 3
6

Genre

GenreId Genre
1 Country
2 Blues
3 Hard Rock
4

 

And here’s the relationship between those tables (done in MySQL):

Diagram of a relationship between three tables in MySQL Workbench.
Diagram of a relationship between three tables. The primary key and foreign key fields have been highlighted.

So this indicates that there are some significant differences between document store databases and relational databases.

Here are some of the main ones.

Tables

Relational databases store data within multiple tables, each table containing columns, and each row represents each record. Information about any given entity could be spread out among many tables. Data from different tables can only be associated by establishing a relationship between the tables.

Document databases on the other hand, don’t use tables as such. They store all data on a given entity within a single document. Any associated data is stored inside that one document.

Schemas

With relational databases, you must create a schema before you load any data. With document store databases (and most other NoSQL databases), you have no such requirement. You can just go ahead and load the data without any predefined schema.

So with a document store, any two documents can contain a different structure and data type. For example, if one user chooses not to supply his date of birth, that wouldn’t even be a field within the document. If another user does supply her date of birth, that would be a field in that document. If this was a relational database, date of birth would still be a field for both users – it just wouldn’t contain a value.

Scalability

Document databases can scale horizontally very well. Data can be stored over many thousands of computers and the system will perform well. This is often referred to as sharding.

Relational databases are not well suited to scaling in this fashion. Relational DBs are more suited towards scaling vertically (i.e. adding more memory, storage, etc). Seeing as there’s a limit to how many resources you can fit inside one machine, there could come a point where horizontal scaling becomes the only option.

Relationships

Document stores don’t have foreign keys, like relational databases have. Foreign keys are used by relational databases to enforce relationships between tables. If a relationship needs to be established with a document database, it would need to be done at the application level.

However, the whole idea behind the document model is that any data associated with a record is stored within the same document. So the need to establish a relationship when using the document model should not be as prevalent as in a relational database.

NoSQL

Most relational databases use SQL as the standard query language. Document store databases tend to use other query languages (although some are built to support SQL). Many document databases can be queried using languages such as XQuery, XSLT, SPARQL, Java, JavaScript, Python, etc.

Document Store vs Key-Value Databases

Document databases are similar to key-value databases in that, there’s a key and a value. Data is stored as a value. Its associated key is the unique identifier for that value.

The difference is that, in a document database, the value contains structured or semi-structured data. This structured/semi-structured value is referred to as a document.

The structured/semi-structured data that makes up the document can be encoded using one of any number of methods, including XML, JSON, YAML, BSON, etc. It could also be encoded using binary, such as PDFs, MS Office documents, etc.

A Benefit of the Document Model over Key-Value Stores

One benefit that document store databases have over key-value databases, is that you can query the data itself. You can query against the structure of the document, as well as the elements within that structure. Therefore, you can return only those parts of the document that you require.

With a key-value database, you get the whole value – no matter how big (and seemingly structured) it might be. You can’t query within the value.

What can a Document Database be used for?

Document-oriented databases are well suited for a wide variety of use cases. Here are some examples of where a document database could be useful.

Web Applications

  • Content management systems
  • Blogging platforms
  • eCommerce applications
  • Web analytics
  • User preferences data

User Generated Content

  • Chat sessions
  • Tweets
  • Blog posts
  • Ratings
  • Comments

Catalog Data

  • User accounts
  • Product catalogs
  • Device registries for Internet of Things
  • Bill of materials systems

Gaming

  • In-game stats
  • Social media integration
  • High-score leaderboards
  • In-game chat messages
  • Player guild memberships
  • Challenges completed

Networking/computing

  • Sensor data from mobile devices
  • Log files
  • Realtime analytics
  • Various other data from Internet of Things

Examples of Document Store DBMSs

There are many document oriented database management systems available. Some are open source, others are proprietary.

Here are examples of some of the leading document store DBMSs.