posted 4 years ago in Dev Platform category by Kyu Jae Lee
In a social network service where many people form relationships, a connection becomes the information. A fundamental method is thus to effectively control the data.
In this article, we will learn the origin of the NoSQL system and its subclasses.
Standard for NoSQL Classification
Reading and writing are the basic building blocks of a platform, whether it be a storage system or a service development system including a PC. Think about a football match. You must have two teams - and when one on the offense, the other must be on defense. Let's take a few examples.
The theoretical basis of all computers with built-in programs, in the John von Neumann structure, is a mathematical model called Turing machine. The model consists of the following components:
- A tape that is infinitely long and has an infinite number of cells, each of which contains a symbol.
- A header that can read a symbol from and write it to a cell, and can traverse the cells to the left or right.
- A state transition table
- (The current state, the symbol read in the current cell) ? (next state, the symbol to write in a cell, left/right)
Suppose a state transition table program is a service. Then what a platform does is reading and writing. Thus, read/write is the fundamental feature of any system of any size, from the micro architecture of a CPU to cloud services.
- Intel core2 micro architecture: In its hardware configuration, everything except the pipelining, out-of-order execution, and ALU is the configuration for reading and writing operations.
- MS Windows Azure: A storage that reads and writes is also the core part of a cloud system such as Azure.
A program repeats the process of reading-calculating-writing; it reads written data, processes it, and writes the intermediate result data. The same rule applies to Internet services. The reason for this lengthy explanation of the importance of reading and writing is to remind The Platform Magazine readers - who already know such basic knowledge - of the core features of a storage system such as NoSQL.
NoSQL System Standards
The following is a summary of standards that will be used to classify the NoSQL systems, which are the topic of our discussion.
Data model and query model
Although a Turing machine performs read/write in a single cell on a tape, on a practical system much more diverse and complex read/write operations are required. I will define a storage model that is the target of read/write operations as a data model, and the command for such operations as a query model.
A query model can be regarded as an API, but to emphasize that it is a reading/writing operation for a storage model, I will use the term query, which is often used in RDBMS. A data model and a query model are closely related to each other, like the two sides of a coin. In a sense, the relationship between a storage model and a query is similar to that of an instance and a method that is defined in a class in an object-oriented language.
Sometimes, the processing power of a single computer is not sufficient to process a query requested by a service. This happens because the resources that a computer can use to process the query, such as the CPU, memory, disk, network capacity, and throughput, are limited. There is a solution to this problem, called the scale-up approach, but it is costly and often ineffective when the scale of data to be processed is huge, due to the nature of its architecture. For this reason, the scale-out approach, an alternative, is used to distribute the resources required to process queries between two or more computers.
The basic principle of distribution in a large-scale service is that the greater the separation within the required level of service, the more throughput for the query is required. It would be great if there was a method to scale-out to infinity; however, as it is not just realistic to assume that the scale-out approach method will always satisfy all the scale-out requirements, we will limit it with a condition, "within the required level of service."
Replication to increase availability
Data stored in a system may not be read or written when a failure occurs in the program or the resource action that are necessary to process queries. To ensure high availability of a service, resources for data and query processing must be distributed to and stored in two or more locations. This is what we call 'replication.'
Replication Models: There are three models in replication, which are described below :
- Transactional replication: A model that replicates at the transaction-level by using the Two Phase Commit (2PC) protocol (CUBRID RDBMS uses the transactional replication in its HA feature).
- State machine replication: A model that replicates the events that occur from the source system to the target system by using atomic broadcast (transfers total ordering messages). In this model, the target system is considered a Finite State Machine.
- Virtual Synchrony: A model that is often used in group communication and the like. It ensures that events that concurrently occur in a process group are delivered in the same order in which they occur. (This is for in-memory data).
Data consistency model for operations
RDBMS operates on a concept of transaction that can be summed up by the ACID (Atomicity, Consistency, Isolation, Durability) attribute. A transaction logically holds one or more queries; when such a transaction is applied to a DBMS, the system protects a certain attribute from being changed to maintain data consistency. The NoSQL system is no exception. We will explore how it protects data and data consistency from user operations.
RDBMS vs. NoSQL
Before studying the features of a variety of NoSQL systems, it would be useful to learn the background history of the NoSQL storage system.
NoSQL as antithesis
Here is a little mnemonic device. If NoSQL is No SQL, it must be an anti-RDBMS system. The logical flow explaining why NoSQL is the opposite of RDBMS is as follows:
- RDBMS defines the SQL language by using a data manipulation method that can model relational data (relation and table), and supports transactions with the ACID attribute.
- The scale-out approach which distributes data before processing it, is in high demand due to the high data-throughput demand of Internet-scale services.
- The scale-out approach of existing RDBMSs is intended to maintain operational integrity of the relational model, which is the core of RDMBS, and transaction operations.
It is difficult to scale-out while maintaining integrity. The same problem arises when distributing or replicating data. So we make the ACID-based transaction attributes that are ensured by DBMS or the replication integrity model less strict before we use them to scale-out.
Scale-out of RDBMS
Scaling-out is difficult in RDBMS. If scaling-out to several thousand RDMBSs were easy, prominent database developers such as Oracle would have released a product or two for that a long time ago.
Now let us assume that the tables of RDBMS are distributed to several computers, and each piece of data is replicated before it is stored for high availability. First, executing distribution transaction while satisfying ACID is difficult in scale-out.
- To satisfy the atomicity attribute of ACID, the distribution transaction protocol such as the 2PC protocol must be used in all systems that are related to a specific transaction.
- To match the isolation level among ACID attributes, data must be locked in general. The units of Locking can be a record, a table, or an index.
- Therefore, to satisfy the Atomic and Isolated attributes in a distributed environment, all related locks must be applied to each system while the distribution transaction protocol is being processed; the higher the service load of the system, the heavier the lock competition becomes. This is what makes scaling-out difficult.
Another problem is that there is a limitation to scale-out by replicating and distributing data.
- The transactional replication method using the 2PC method has a problem in which a transaction fails and becomes unavailable when one of the systems related to the replication process fails. In addition, the performance degrades when several systems are involved in the replication.
- As an alternative, it is possible to pass the Write Ahead Logging (as known as WAL) data of a DBMS to the replication system and have it apply the data. If we consider the system in which replication occurs as a master (or primary), and the system to which the changes are applied as a slave (or backup), they are configured either as master-slave or multi-master.
- When configuring master-slave: This is the most commonly used replication method. The speed of the process is in inverse proportion to the number of systems involved in replication in this method.
- When configuring multi-master: It is difficult to solve the collision between data write processes or prevent it from happening when there are several masters. In The Danagers of Replication and a Solution Jim Gray conducted a study on this issue (Received Turing Award in 1998 for his contribution to the related database and transaction processing) .
Sharding by developers
Generally speaking, it is extremely difficult to scale-out while satisfying the ACID attribute in the DBMS data model. For this reason, to scale-out based on DBMS, one must simplify the data model itself, partition the data by the number of N, and then execute the query within a separate piece of data.
The unit of partitioned data is called a 'shard'. Distribute and service N number of shards to M number of DBMSs. DBMS does not manage shards. This is the responsibility of the service developer.
The sharding method is focused on developers and has the following difficulties:
- First, a shard must be defined.
- Shard mapping:
- The basic storage unit of a DBMS is the table. Because a table can contain one or more shards, there is a need to know which shard is mapped to which instance of a database. The locations of shard tables must be known to the application.
- Shard distribution/redistribution:
- As each shard is different, so are the throughput requirements and data size. As a result, a developer must add a new instance to or delete one from the database, and redistribute the shards manually. This is a painstaking and labor-intensive process.
- The mapping information that has been modified by the distribution/redistribution process must be applied to the application.
- Management, such as configuring for the replication, is necessary when modifying data.
Approach of NoSQL
The Internet-scale data storage system aims for the scale-out method through data distribution, and replicates data to ensure its high availability. We have already examined the fact that scaling-out is difficult to carry out under the process that supports a transaction of ACID attribute and the relation model of RDBMS. Now, we will look into the territory of NoSQL to see how it handles scaling-out.
Simplify data model for sharding
The data closure of a query operation will be the default unit of distribution, and the data model will be simplified so as to create an appropriate level of shard, which is a collection of a certain number of data closures. The default unit of distribution is key.
- The simplest model reads and writes the whole data by recognizing it as an immutable object. Of the key-value storages, Dynamo and Membase are examples of this model, both of which support blob type data models.
- Chunk, a component unit of a file in the Google file system, has a 64-bit identifier that is globally unique. Operations based on byte range (read/write/append) can be done with chunks. While the storage abstraction is provided to users as a file, the operation of this file is actually performed in chunk at the API level.
- Some systems provide slightly more complex data models than that of the blob-type immutable object.
- Key-Value DB: A database that provides a query containing information on what operations can be done on a specific structure, by modeling data in the Abstract Data Structure (as known as ADS) such as list and set. Redis and Tokyo Cabinet are most popular.
- Document-oriented: A database that stores random and irregular objects by using the object representing notation, and provides queries based on the object of a property, such as CouchDB or MongoDB.
- A system that supports the multi-dimensional map type data model in the row-column family-column or row-column-timestamp structure, such as Bigtable or Cassandra.
The default query supports reading and writing at this level of distribution. However, it may also support a separate query method, such as map-reduce in order to perform aggregation operations in multiple distribution units. In addition, document-oriented storage systems such as MongoDB allow users to apply an index other than document ID for a specific property of a document, thereby enabling users to specify additional quick access paths.
- The key has no meaning in this method. It distributes data natural to throughput demand or data size.
- To minimize mapping and redistribution problems resulting from the addition or deletion of nodes, it often distributes data based on consistent hashing.
- It ensures a range query by maintaining the key-based order-preserving.
- Metadata server-based:
- A method that is used to manage the location information of a distribution unit from a separate metadata server.
Makes the ACID attribute of a query less strict.
For the consistency of query operation, it uses a less strict attribute than the ACID attribute of RDBMS. The relaxation phenomenon of query attribute is noticeable if you consider replications as well. While there are systems that maintain the ACID attribute like CouchDB, in many cases, it makes Consistence or Isolated less strict for high availability.
We will now examine how some of the systems make attributes less strict in detail.
- In GFS, multiple clients may write or execute record append operations to a file simultaneously. Although concurrent writings of multiple clients may show the same value for all replicas, the write of each client is undefined (in the dissertation term). The record append operation ensures consistency in the file area of each replica that is used at least once for atomicity.
- Dynamo ensures view consistency for the replications and reads in the sloppy quorum method, as requested by a service that it to be 'always writeable.' The Quorum protocol determines the maximum number of systems to be used for R (read) , W (write), and each operation in N number of systems that are involved in the replication process, in order to prevent inconsistency of information from happening even when the network is partitioned (R + W > N, W > N/2). To cope with temporary system failure, Dynamo allows other systems to execute a key query that is not directly involved with them, by using the Hinted-handoff method (sloppy quorum). It also allows each system to manage data versions for possible data inconsistency and to determine the casual order of data modification among the systems that are involved in the replication by using the vector clock technique, so that each of them can automatically recognize which data is new (reconciled), and allows a client to determine such data when it is impossible to determine them, by transmitting versions of data to it.
- Cassandra adjusts the level of consistency by allowing users to determine how many reads and writes must be successful in a node with N number of replication relations. It provides specifiable options when reading (Zero, Any, One Quorum, and All) and when writing (One, Quorum, and All). It also provides consistency (view consistency) when reading, by read-repairing through the timestamp-based ordering, similar to what Dynamo does.
We will now classify the NoSQL systems.
The following is a classification of commonly used NoSQL storage systems:
|System||Data Model||Distribution Unit||Distribution Method||Replication|
|Big table||multi dimensional map||row||index (B+ tree like)||Dependon GFS|
|Cassandra||multi dimensional map||row||hash/index||optional|
a A method in which a master node that is unique in the whole system lends write authority to a chunk to one of the chunk servers (replicas) that are related to the replication is used. The replica that borrowed the authority from a master node is called primary. To reduce the burden of a master node, the primary continues to have the write authority as long as it is alive (piggybacking to a heartbeat message).
b Basically, it executes consistent hashing. However, users can allow it to decide which Partitioner it will use through configuration (random, order preserving, and collating order preserving).
c Able to create a user-defined index
d Index for ID and Sequence Number. Sequence Number increases per update
What Kind of Storage System Do I Need?
Now we will explore what needs to be considered when answering a question like "What kind of storage system should I use for the service I am developing?"
RDBMS is still the standard
It must be considered before all else. RDBMS is even more important when read/write operations require ACID-type consistency. In most cases, implementing an RDBMS and additional storage systems will solve the problem of distributing the service load.
- Read distribution through replication
- The RDBMS scale-out technology in OLTP domain, such as MySQL Cluster, can also be considered.
- RDBMS + Cache + Storage system
- If the performance of RDBMS is not terribly good, you can use a caching system such as Arcus for time-consuming query items or frequently used items. Of course, you must carefully select a consistency model that is appropriate for the data that is being cached.
- If a large data storage is required, you can also use a separate storage system like OwFS.
- Finally, there is a sharding technique implemented by developers. That is, if you're willing to deal with replicating, troubleshooting, and restoring processes as well as the distributing and redistributing of shards.
Do you think it is difficult to implement a service on an RDBMS platform? Then use NoSQL. NoSQL is a system that solves the difficulties associated with replication, troubleshooting, restoration, and shard distribution/redistribution, regardless of the type of a service. It can be an ideal choice if the data model and consistency of the service are supported by the NoSQL system.
Advantages and Disadvantages
Let's look into the advantages and weaknesses of NoSQL by data model.
- Key-Value (blob)
- Simple and fast.
- Often supports only atomic write/read at key level. In this case, there is no way to serialize the values of several keys as they are processed.
- Get/put is very fast when the memory serves as the storage (Ideal when the size of data fits in the memory).
- While the operation for a single Key is fast, an operation involving multiple keys may be slow as the network transmission is frequently delayed. (There is a huge difference in speed when selecting data with 100,000 rows in an RDMBS table and when reading a key-value 100,000 times. The latter is significantly slower. This difference can be perceived when there are only 100, or even 10 items to be processed).
- Key-Value (Structure)
- A model that provides ADT such as List or Set; able to hold multiple values for a single key. Unlike the Key-BLOB model in which a single value is assigned to a single key, using this allows users to store multiple sets of data with a single key.
- Processing takes a little longer than a model that performs a simple get/set (However, the difference is barely noticeable. The performance differences inherent to a data model can be overcome by effective implementation of system.)
- Document oriented
- A data model that can add a random property without a schema.
- As the name suggests, it is an optimal structure for storing document data such as JSon or XML.
- It often maintains the order by document id or the value of a property. (This enables efficient operation for the range of the corresponding key value, and thus provides queries.)
- The process overhead of this model may be larger than that of a key-value model or a key-structure model, as it must parse the data and compute it in the memory when processing a query. (Performance will degrade when handling a large document.)
- Multi-dimensional map
- For Bigtable, data is mapped by row-column-timestamp. The data is binary.
- For Cassandra, data is mapped in the form of row-column families-column. The data itself is binary.
- Both models allow data to be modeled by structuralizing the grouping and access methods for data (column). The detailed process of modeling is beyond the scope of this document. For more information on modeling, refer to the Cassandra homepage and.
If it is the mission of a scholar to find the unmovable truth from the uncertain, it is the mission of a developer to find a way to change what seems to be unchangeable.
For Not only SQL
By Gyujae Lee, Senior Engineer, the Storage System Development Team, NHN Corporation.
- . WIKI: Replication(computer science) - http://en.wikipedia.org/wiki/Replication_(computer_science)
- . Jim Gray, Pat Helland, Patrick O’Neil, Dennis Shasha – The Danagers of Replication and a Solution: In ACM SIGMOD ‘96
- . The problems with ACID, and how to fix them without going NoSQL - http://dbmsmusings.blogspot.com/2010/08/problems-with-acid-and-how-to-fix-them.html
- . Werner Vogels, Amazon.com: Eventually Consistent. In ACM Queue Vol6, Issue 6
- . Dan Pritchett, E-bay: BASE: An ACID Alternative. In ACM Queue Vol6, Issue 3
- . Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung: The Google File System
- . Fay Chang , Jeffrey Dean , Sanjay Ghemawat , Wilson C. Hsieh , Deborah A. Wallach , Mike Burrows , Tushar Chandra , Andrew Fikes , Robert E. Gruber: Bigtable: A distributed storage system for structured data.
- . A Lakshman, P Malik – Cassandra: a decentralized structured storage system, In ACM SIGOPS Operating Systems Review, 2010
- . Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels – Dynamo: Amazon’s Highly Available Key-Value Store In ACM SIGOPS 2007
- . WTF is a SuperColumn? An Intro to the Cassandra Data Model - http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
- NoSQL Benchmark Open-Sourced - http://www.readwriteweb.com/hack/2011/02/nosql-benchmark-open-sourced.php
 Unless the database is separated by the number of N, N number of systems that are related to the replication are bound to suffer from a serious performance hit due to concurrency issues. “Update anywhere-anytime-anyway transactional replication has unstable behavior as the workload scales up: a ten-fold increase in nodes and traffic gives a thousand fold increase in deadlocks or reconciliation.”
 A VoltDB that replaces Durability of ACID with Replication, stores data in the memory, executes queries (stored procedures) partition by partition in sequence for log shipping, and parallelizes such executions in a DBMS would also be good. (Such a database would exist in a grey area between RDBMS and NoSQL.)