NoSQL Databases
This time I’m going to write about what is a NoSQL database, let’s be clear about the different types of NoSQL databases that exist right now and why there are so many of them, but before we begin we need to understand what are SQL databases.
SQL Databases are more commonly known as Relational databases, chances are that if you are reading this you have already used this kind of databases before, most of us have been working with them for quite a long time. In the relational databases data needs to be saved in a tabular format, this constraint makes the data to have structure and this structure is what makes a RDBMS very reliable for querying and manipulating related data in a consistent manner.
The most common databases in this realm are Oracle, SQL Server, MySQL, PostgreSQL, etc. All of them make use of SQL as a standard language to query the database, but of course each vendor introduces their own concepts in their engines, for example, SQL Server has XML support, that means that you can query this documents as they are data, PostgreSQL on the other hand has built in support for storing JSON objects.
Now, NoSQL databases are data storage engines that due to the CAP theorem sacrifice consistency in order to achieve availability and partitioning of data, they are by no means a drop-in replacement to relational databases but an alternative to very specific needs in certain business applications.
NoSQL Databases families
They are main four types of NoSQL databases, let’s review each one of them.
Key-Value Store
Key-Value store are the simplest of the NoSQL databases, they work relatively straightforward, think of them as simple python dicts or hash tables since they work the same. You set a key linked to a value so later you use that same key to retrieve the stored value. Common uses are caching and task queueing.
Some solutions in this category are:
- Memcached
- Redis
- Riak
Column Oriented
A column oriented database it’s somewhat similar to a key-value store database, only that the key has associated a sequence of values instead of just one, I like to think of them as similar (though not precisely) to columns in a Pandas' DataFrame, or perhaps the best analogy to them is a CSV file in which instead of every row being a single record, each row is the sequence of all values under the same label of all the records:
name:John, Peter, Eddard, Jorah, Sansa, Arya
age:23, 39, 51, 49, 16, 12
house: Snow, NaN, Stark, Mormont, Stark, Stark
By keeping this structure we can see their power on aggregations of columns, since now it isn’t needed to query all other data just to work on a single column, but this comes with a price, now updating or inserting new records needs to alter all other columns which is a heavy process. Although it isn’t represented in my example, each record consist of another key-value pair and in some column oriented DBMS records could be nested by multiple levels.
Some solutions in this category are:
- Cassandra
- HBase
Document Oriented
I think of document oriented databases to be the best of both worlds between relational databases and NoSQL databases, this data stores can allow much deeper and complex nesting on their values associated to their keys, basically allow us to save whole objects as a single document (commonly serialized in JSON) not caring if other documents in the collection have or not the same schema. However the disadvantage is that when querying a single attribute the whole document needs to be retrieved and that is costly.
To better describe how data is arranged in a document oriented database let
me illustrate a simple example; using a relational database in a e-commerce
system we might have an orders
and order_items
table with a one-to-many
relationship between them. In a document oriented database we can have the
orders and their order items in the same document, since an items
attribute can hold an array of item objects within it. I really recommend
watching Modeling Data for NoSQL Document Databases it gives insight about
how to think of about the model in our application to be stored in document
databases.
Some solutions in this category are:
- MongoDB
- CouchDB
Graph Based
Graph based databases are a whole new beast, at least compared to the other already given types of NoSQL databases. The idea of how data is stored derives from the Graph Theory in which vertices are connected between them through edges, this family of databases have their power in dealing with connected data and understanding its relationships, the most common use of them is in social network applications.
Some solutions in this category are:
- OrientDB
- Neo4j
Note that OrientDB is a muiti-model database, we can use it as document or key-value store but it is more popular as a graph database.
Conclusion
As a conclusion we should note a few points about NoSQL databases:
- They are more flexible as they don’t enforce a schema.
- Sacrifice consistency for availability and partitioning.
- They are the perfect choice for a distributed storage system, their power is unleashed when they are in clusters.
- They are not a replacement for relational databases.
- Most applications will not need this kind of data bases, they are meant for very specific use cases, if you are unsure whether you need a NoSQL database or not, chances are you’re fine with a relational database.