September 2, 2017

911 words 5 mins read

NoSQL Databases

This time I’m going to write about what is a NoSQL database, let’s be clear about the different types of NoSQL databases that exist right now and why there are so many of them, but before we begin we need to understand what are SQL databases.

SQL Databases are more commonly known as Relational databases, chances are that if you are reading this you have already used this kind of databases before, most of us have been working with them for quite a long time. In the relational databases data needs to be saved in a tabular format, this constraint makes the data to have structure and this structure is what makes a RDBMS very reliable for querying and manipulating related data in a consistent manner.

The most common databases in this realm are Oracle, SQL Server, MySQL, PostgreSQL, etc. All of them make use of SQL as a standard language to query the database, but of course each vendor introduces their own concepts in their engines, for example, SQL Server has XML support, that means that you can query this documents as they are data, PostgreSQL on the other hand has built in support for storing JSON objects.

Now, NoSQL databases are data storage engines that due to the CAP theorem sacrifice consistency in order to achieve availability and partitioning of data, they are by no means a drop-in replacement to relational databases but an alternative to very specific needs in certain business applications.

NoSQL Databases families

They are main four types of NoSQL databases, let’s review each one of them.

Key-Value Store

Key-Value store are the simplest of the NoSQL databases, they work relatively straightforward, think of them as simple python dicts or hash tables since they work the same. You set a key linked to a value so later you use that same key to retrieve the stored value. Common uses are caching and task queueing.

Some solutions in this category are:

  • Memcached
  • Redis
  • Riak

Column Oriented

A column oriented database it’s somewhat similar to a key-value store database, only that the key has associated a sequence of values instead of just one, I like to think of them as similar (though not precisely) to columns in a Pandas' DataFrame, or perhaps the best analogy to them is a CSV file in which instead of every row being a single record, each row is the sequence of all values under the same label of all the records:

name:John, Peter, Eddard, Jorah, Sansa, Arya
age:23, 39, 51, 49, 16, 12
house: Snow, NaN, Stark, Mormont, Stark, Stark

By keeping this structure we can see their power on aggregations of columns, since now it isn’t needed to query all other data just to work on a single column, but this comes with a price, now updating or inserting new records needs to alter all other columns which is a heavy process. Although it isn’t represented in my example, each record consist of another key-value pair and in some column oriented DBMS records could be nested by multiple levels.

Some solutions in this category are:

  • Cassandra
  • HBase

Document Oriented

I think of document oriented databases to be the best of both worlds between relational databases and NoSQL databases, this data stores can allow much deeper and complex nesting on their values associated to their keys, basically allow us to save whole objects as a single document (commonly serialized in JSON) not caring if other documents in the collection have or not the same schema. However the disadvantage is that when querying a single attribute the whole document needs to be retrieved and that is costly.

To better describe how data is arranged in a document oriented database let me illustrate a simple example; using a relational database in a e-commerce system we might have an orders and order_items table with a one-to-many relationship between them. In a document oriented database we can have the orders and their order items in the same document, since an items attribute can hold an array of item objects within it. I really recommend watching Modeling Data for NoSQL Document Databases it gives insight about how to think of about the model in our application to be stored in document databases.

Some solutions in this category are:

  • MongoDB
  • CouchDB

Graph Based

Graph based databases are a whole new beast, at least compared to the other already given types of NoSQL databases. The idea of how data is stored derives from the Graph Theory in which vertices are connected between them through edges, this family of databases have their power in dealing with connected data and understanding its relationships, the most common use of them is in social network applications.

Some solutions in this category are:

  • OrientDB
  • Neo4j

Note that OrientDB is a muiti-model database, we can use it as document or key-value store but it is more popular as a graph database.

Conclusion

As a conclusion we should note a few points about NoSQL databases:

  • They are more flexible as they don’t enforce a schema.
  • Sacrifice consistency for availability and partitioning.
  • They are the perfect choice for a distributed storage system, their power is unleashed when they are in clusters.
  • They are not a replacement for relational databases.
  • Most applications will not need this kind of data bases, they are meant for very specific use cases, if you are unsure whether you need a NoSQL database or not, chances are you’re fine with a relational database.