A document store database (also known as a document-oriented database, aggregate database, or simply document store or document database) is a database that uses a document-oriented model to store data.
Document store databases store each record and its associated data within a single document. Each document contains semi-structured data that can be queried against using various query and analytics tools of the DBMS.
Document Examples
Here are two examples of documents that could be stored in a document database. Both examples use the same data – they are just written in different languages.
Here’s the first example, written in XML.
<artist> <artistname>Iron Maiden</<artistname> <albums> <album> <albumname>The Book of Souls</albumname> <datereleased>2015</datereleased> <genre>Hard Rock</genre> </album> <album> <albumname>Killers</albumname> <datereleased>1981</datereleased> <genre>Hard Rock</genre> </album> <album> <albumname>Powerslave</albumname> <datereleased>1984</datereleased> <genre>Hard Rock</genre> </album> <album> <albumname>Somewhere in Time</albumname> <datereleased>1986</datereleased> <genre>Hard Rock</genre> </album> </albums> </artist>
And here’s the same example, but this time written in JSON.
{ '_id' : 1, 'artistName' : { 'Iron Maiden' }, 'albums' : [ { 'albumname' : 'The Book of Souls', 'datereleased' : 2015, 'genre' : 'Hard Rock' }, { 'albumname' : 'Killers', 'datereleased' : 1981, 'genre' : 'Hard Rock' }, { 'albumname' : 'Powerslave', 'datereleased' : 1984, 'genre' : 'Hard Rock' }, { 'albumname' : 'Somewhere in Time', 'datereleased' : 1986, 'genre' : 'Hard Rock' } ] }
Notice that I decided to add an _id
field in the second example. This may or may not be required by the DBMS, however, some DBMSs will automatically insert a unique ID field if one isn’t supplied.
Document Store vs Relational Databases
If we were to enter the above data into a relational database, the info would typically be stored across three different tables – with a relationship linking them together via their primary key and foreign key fields.
Here’s how a relational database might store the above data.
Artists
ArtistId | ArtistName |
---|---|
1 | Iron Maiden |
2 | Devin Townsend |
3 | The Wiggles |
4 | … |
Albums
AlbumId | AlbumName | DateReleased | ArtistId | GenreId |
---|---|---|---|---|
1 | The Book of Souls | 2015 | 1 | 3 |
2 | Killers | 1981 | 1 | 3 |
3 | Powerslave | 1984 | 1 | 3 |
4 | Somewhere in Time | 1986 | 1 | 3 |
5 | Ziltoid the Omniscient | 2007 | 2 | 3 |
6 | … | … | … | … |
Genre
GenreId | Genre |
---|---|
1 | Country |
2 | Blues |
3 | Hard Rock |
4 | … |
And here’s the relationship between those tables (done in MySQL):
So this indicates that there are some significant differences between document store databases and relational databases.
Here are some of the main ones.
Tables
Relational databases store data within multiple tables, each table containing columns, and each row represents each record. Information about any given entity could be spread out among many tables. Data from different tables can only be associated by establishing a relationship between the tables.
Document databases on the other hand, don’t use tables as such. They store all data on a given entity within a single document. Any associated data is stored inside that one document.
Schemas
With relational databases, you must create a schema before you load any data. With document store databases (and most other NoSQL databases), you have no such requirement. You can just go ahead and load the data without any predefined schema.
So with a document store, any two documents can contain a different structure and data type. For example, if one user chooses not to supply his date of birth, that wouldn’t even be a field within the document. If another user does supply her date of birth, that would be a field in that document. If this was a relational database, date of birth would still be a field for both users – it just wouldn’t contain a value.
Scalability
Document databases can scale horizontally very well. Data can be stored over many thousands of computers and the system will perform well. This is often referred to as sharding.
Relational databases are not well suited to scaling in this fashion. Relational DBs are more suited towards scaling vertically (i.e. adding more memory, storage, etc). Seeing as there’s a limit to how many resources you can fit inside one machine, there could come a point where horizontal scaling becomes the only option.
Relationships
Document stores don’t have foreign keys, like relational databases have. Foreign keys are used by relational databases to enforce relationships between tables. If a relationship needs to be established with a document database, it would need to be done at the application level.
However, the whole idea behind the document model is that any data associated with a record is stored within the same document. So the need to establish a relationship when using the document model should not be as prevalent as in a relational database.
NoSQL
Most relational databases use SQL as the standard query language. Document store databases tend to use other query languages (although some are built to support SQL). Many document databases can be queried using languages such as XQuery, XSLT, SPARQL, Java, JavaScript, Python, etc.
Document Store vs Key-Value Databases
Document databases are similar to key-value databases in that, there’s a key and a value. Data is stored as a value. Its associated key is the unique identifier for that value.
The difference is that, in a document database, the value contains structured or semi-structured data. This structured/semi-structured value is referred to as a document.
The structured/semi-structured data that makes up the document can be encoded using one of any number of methods, including XML, JSON, YAML, BSON, etc. It could also be encoded using binary, such as PDFs, MS Office documents, etc.
A Benefit of the Document Model over Key-Value Stores
One benefit that document store databases have over key-value databases, is that you can query the data itself. You can query against the structure of the document, as well as the elements within that structure. Therefore, you can return only those parts of the document that you require.
With a key-value database, you get the whole value – no matter how big (and seemingly structured) it might be. You can’t query within the value.
What can a Document Database be used for?
Document-oriented databases are well suited for a wide variety of use cases. Here are some examples of where a document database could be useful.
Web Applications
- Content management systems
- Blogging platforms
- eCommerce applications
- Web analytics
- User preferences data
User Generated Content
- Chat sessions
- Tweets
- Blog posts
- Ratings
- Comments
Catalog Data
- User accounts
- Product catalogs
- Device registries for Internet of Things
- Bill of materials systems
Gaming
- In-game stats
- Social media integration
- High-score leaderboards
- In-game chat messages
- Player guild memberships
- Challenges completed
Networking/computing
- Sensor data from mobile devices
- Log files
- Realtime analytics
- Various other data from Internet of Things
Examples of Document Store DBMSs
There are many document oriented database management systems available. Some are open source, others are proprietary.
Here are examples of some of the leading document store DBMSs.