Your Guide to NoSQL Databases | Data Engineering
One of the major reasons that the era of big data started was the increase in the number of data source and variety of data types that each organization has nowadays, almost any organization has different types of data not only structured data but also it can have unstructured or semi-structured data, and each type of these types has a different use case, a different approach to handle and process data, as a data engineer you will come across many different types of data source and data formats, and each data source will have its own way of handling data and its own way of extracting information.
In this article, we will set the guidelines on how to choose the best NoSQL database type for your use case, and according to the requirements you have to implement, we will describe each NoSQL type with its use cases, vendors, data formats, and what is best fit for.
We call these databases as NoSQL because data is not stored in our friendly structured format like Relational databases, each NoSQL database has its own data format and its own way of handling and processing data.
Mainly we have four categories of NoSQL databases:
- Columnar Database (Column-Oriented)
- Key-Value Database
- Document database
- Graph database
Columnar Database (Column Oriented)
Columnar Database is different from the relational database in the way the data stored on disk, in relational databases data is stored in a record by record basis, in columnar database data stored on a column-oriented base, each column’s data stored individually, and this method of storage enhance the performance of queries that target a specific set of columns (not all columns) from a database table, as we can see from the following example when we execute a query to select from a relational database table, we got all columns from a record because as we mentioned data is stored in a record-oriented format, then we filter the columns we need in our query, imagine if we have a query to select data from a data warehouse table that has millions of records it will not be the optimum for performance to get all the data from the table then filter the results, as you can see from the following example
In Column-Oriented database, as we mentioned each column is stored individually so we don’t need to get all columns, we can get only the data we need, and this is the advantage of the column-oriented databases is that we reduce the amount of data to read from disk (reduce disk I/O).
With the mechanism of storing data individually, columnar databases are best fit for OLAP use cases where performance and fast retrieval of data is a key parameter such as:
- Data Warehouse
- Business Intelligence
Not for Columnar Databases
Despite the big advantages of columnar databases in performance, it is not the best solution for all use cases, for example, it is not the best fit for use cases where many insert/update operations are required, due to separate column storage, load or update operations will be executed on multiple operations not a single operation like the RDBMS.
- Many Insert/Update Operations
- Incremental Loading
A key-value database is a type of NoSQL database that uses a simple key-value method to store data. A key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier. Both keys and values can be anything, ranging from simple objects to complex compound objects. Key-value allows horizontal scaling at scales that other types of databases cannot achieve.
The main use case that Key-Value database fits in is when you need fast access and fast retrieval of data for real-time actions or in your applications when you need temporary access for data objects, such as web applications when we need to access and store user sessions information, a Real-time recommendation and advertising for eCommerce applications where we need to store and access user behavior and viewed products to recommend another relevant product for the user based on his browsing behavior.
In Document database data stored in a format of semi-structured documents most probably JSON, or BSON format, a single document has many attributes as a key-value pair, the following example from is a document for product specifications which serve as the backend for an eCommerce web site, as we can see we have many key-value pairs, and we can have nested objects and each nested object has its own key-value pairs.
If you are going to compare between RDBMS and Document databases, you will find almost similar concepts but with different terminologies such as the following example
The flexibility of Document Databases makes it a perfect fit for applications that require a dynamic changing of contents with no complex processes, these applications such as Serverless applications, eCommerce, Blog, and Content Management applications.
As an example to show the flexibility of Document databases, and to demonstrate how it is dynamic and support dynamic changes in your data, in the following example of a database of e-commerce website we have two documents under the same collection and each document has different attributes (Key-Value pairs).
Graph databases focuses on a different view of data, which is the relationship between data entities, in the graph database, data is represented as nodes and relationships, each node has a set of key-value properties and also relationships can have a key-value properties.
This perspective of data format makes graph database best fit for cases where relationships between data entities are important such as social networking, fraud detection, real-time recommendations, and graph-based search.