NoSQL is a term that refers loosely to a particular type of database model, or database management system (DBMS).
NoSQL is a very broad term that doesn’t refer to one particular database model. Rather, it refers to a whole variety of different models that don’t fit into the relational model.
Although NoSQL databases have been around since the 1960s, it wasn’t until the early 2000s that the NoSQL approach started to pick up steam, and a whole new generation of NoSQL systems began to hit the market.
The Big Data Problem
One of the main reasons the NoSQL approach was being pursued was due to the big data problem. Companies like Google and Amazon were starting to deal with massive amounts of data due to their immense popularity.
This lead to the following:
- Google developed Bigtable a distributed storage system for managing structured data designed to reliably scale to petabytes of data and thousands of machines. Bigtable is used by over sixty products and projects, including Search, Analytics, Maps, and Gmail. The aim of Bigtable is to provide wide applicability, scalability, high performance, and high availability. Read more about Bigtable (PDF – 14 pages).
- Amazon developed and implemented Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. Read more about Dynamo (PDF – 16 pages).
But Google and Amazon weren’t the only companies dealing with the big data problem. Many other companies were developing similar innovative solutions for similar reasons. And the innovation continues today.
This has lead to an explosion of new technologies that have been developed outside of the relational DBMS framework – thus attracting the label “NoSQL”.
The precise definition of NoSQL is often argued.
- Some claim for it to mean “No SQL” (meaning that the system doesn’t use SQL – it uses an alternative query language).
- The definition is sometimes expanded to mean “Not only SQL” (meaning that the system uses SQL along with other technologies/query languages).
- Many argue that the one thing all NoSQL databases have in common is that they’re non-relational, and that “NoREL” would be a more suitable name.
The term NoSQL actually comes from a Twitter hashtag that was suggested by Eric Evans for the purpose of a meeting in 2009 to discuss big data and linearly scalable distributed systems. Eric has since said:
Johan Oskarsson was organizing the first meetup and asked the question “What’s a good name?” on IRC; it was one of 3 or 4 suggestions that I spouted off in the span of like 45 seconds, without thinking.
Since then the term has almost taken on a life of its own.
Characteristics of a NoSQL Database
The term NoSQL now generally refers to a particular group of DBMSs that share certain characteristics, such as the following.
- Open Source
- Horizontally Scalable
- Lack of Adherence to ACID Principles
- No Standard Query Language
Not all NoSQL databases will possess all of these characteristics. Many NoSQL systems will possess perhaps several of the above characteristics. However, most of these characteristics are inherently lacking in relational databases.
By non-relational, I mean, not based on the relational model as proposed by E. F. Codd in 1970.
Non-relational DBMSs are built non-relational for a reason. In many cases this is because the relational model isn’t a good fit for the requirements.
It could be that the data is mainly unstructured or semi-structured. Or it could be that the sheer amount of data requires a new way of approaching the problem of data storage and retrieval. It could be both and it could also be that the system needs to scale to hundreds or thousands of computers. Or it could be some other reason entirely.
In any case, a new solution was built to satisfy the requirements.
Most NoSQL DBMSs are open source. While there are also many relational DBMSs that are open source, the NoSQL movement tends to lean towards open source projects, with many organisations contributing to development efforts for a single solution.
This is not necessarily a “NoSQL requirement” but it is a “NoSQL observation”.
Most NoSQL databases have no fixed schema.
Whereas a relational DBMS requires a schema to be modelled and created before any data can be entered, a NoSQL database doesn’t have this requirement. This tends to give NoSQL databases a name for being more flexible with the data they can accept, as well as supporting an agile development approach.
Most NoSQL databases excel in clustered environments. This is where data is partitioned across multiple computers so that each computer can perform a specific task independently of the others. Each processor can perform its task without having to share memory or disc space with others. This is known as a shared nothing architecture (SN).
Although relational databases can also be set up within a cluster, the RDBMS approach tends to make it more difficult to set up than NoSQL databases. Performance can also suffer when scaling a relational database in this way. Relational databases are better at “scaling up”, meaning adding more resources to a single machine, or getting a more powerful machine.
Lack of Adherence to ACID Principles
Most NoSQL databases relax ACID (Atomicity, Consistency, Isolation, Durability) constraints to some degree.
This is because, most NoSQL solutions were developed for the purpose of providing high availability and scalability across a clustered environment. When providing high availability across a clustered environment, one normally has to sacrifice either consistency or durability (or find a balance between all three). By relaxing consistency, one can achieve higher availability.
The NoSQL approach recognises that there are some cases where inconsistent data is not the end of the world. As long as it is dealt with in some way, it is OK to present data that is not 100% consistent 100% of the time.
However, some NoSQL systems, such as MarkLogic, Neo4j, and OrientDB do support ACID transactions.
No Standard Query Language
There is currently no standard query language that is supported by all NoSQL databases.
While there have been attempts to introduce a standard query language for all NoSQL databases, none have been forthcoming.
Some NoSQL DBMSs have their own query language, while others support various languages such as JSON, XQuery, SPARQL, etc.
The First NoSQL Databases
There have been many “non SQL” databases throughout the years. Even before SQL was first proposed in 1974, and the first RDBMS was commercially released in 1979, there were still databases.
Here are three of the first (NoSQL) database management systems ever built:
- MultiValue is a NoSQL database that was first developed in 1965 as part of the Pick operating system..
- MUMPS (Massachusetts General Hospital Utility Multi-Programming System) has been around since 1966. MUMPS which is schema-less, and uses a key-value database engine, is a classic example of a NoSQL database.
- IBM IMS is a joint hierarchical database and information management system that was developed in 1996.
Of course, no one called these “NoSQL” databases – SQL hadn’t even been invented yet. It wasn’t until after 2009 that these older DBMSs were associated with NoSQL.
NoSQL – The Relational Database
NoSQL – the RDBMS – is named as such because it doesn’t use SQL as its query language. NoSQL – the RDBMS – has no connection to the NoSQL movement that began in 2009.