Open Source RDBMS - Seamless, Scalable, Stable and Free

한국어 | Login |Register

Database Technology for Large Scale Data

What do we know about the Database Technology used today for Large Scale Data?

Vast variety of services around the world are accessible by users anywhere at any point of time, thus accelerating the increase of the amount of processed data in a very short period of time. For example, bus route finder applications for Smartphones are overwhelmed by data-traffic as the number of Smartphone users rises. In addition, the rapidly evolving Web service technologies challenge the traditional theories and approaches to databases. Overall efficiency and flexibility have become more crucial than the individual process result between a transaction and a record, as these days it is not just a couple of DB servers that access the SAN storage.

Many argue that the traditional database theories and the current RDBMS are inappropriate for handling large-scale data (more commonly known as big data). Michael Stonebraker, a database guru, points out such facts in The End of Architectural Era, a paper that he wrote for The VLDM Journal in 2007. In the paper, he claims that relational DBMS technology, which began to flourish in late '70s and reached its height in late '80s, was a project to develop a universal Database Management System (DBMS) that could be used in any field for any application, including the fields of Online Transaction Processing (OLTP), Database Warehousing (DW), science and the Web, but it is perhaps time to leave this 25-years-old 'one size fits all' approach and its dusty code, and come up with a new approach that is needed to handle the large-scale data.

In the past 5 years many specialized DBMSs have been introduced, as well as database systems with new storage schemes [1]. These can be categorized by the type of the applied technology. In this article we will explain the four most popular types of database systems currently used by top corporations like Yahoo, Facebook, Google, Misrosoft, Amazon, etc.

  • Massively Parallel Processing (MPP) or parallel DBMS – A system that parallelizes the query execution of a DBMS, and splits queries and allocates them to multiple DBMS nodes in order to process massive amounts of data concurrently.
  • Column-oriented database – A system that stores the values in the same field as a column, as opposed to the conventional ow method that stores them as individual records.
  • Streaming processing (ESP or CEP) – A system that processes a constant data (or events) stream, or a concept in which the content of a database is continuously changing over time.
  • Key-value storage (with MapReduce programming model) – A storage system that focuses on enhancing the performance when reading a single record by adopting the key-value data model, which is simpler than the relational data model.

RDBMS Characteristics

Before we go on, let's see the characteristics of the current relational database management systems. RDBMS is characterized as:

  • Has relational data model – All data is shown in tables, and its correlation can be expressed.
  • Uses SQL – Has advanced technologies in optimizing queries, as it is a declarative language.
  • Disk-based storage - Records are stored in consecutive disk blocks. Implements B-tree, a disk-based indexing method.
  • Concurrency control and log-based recovery – Ensure integrity while allowing multiple accesses to a single piece of data simultaneously, by means of the lock or multi-version.

Universal DBMSs that pursue the 'one size fits all' approach while satisfying all the characteristics described above are somewhat lacking when compared to systems that specialize in a specific area, such as text processing or Database Warehousing. As they are operated by Web log analyses and BIs that handle massive amounts of data, or through IDCs around the globe, the commercial RDBMS alone – the hero of conventional database technologies – cannot meet the demand of large Web services in which data response speed is crucial.  New database technologies are being researched to find ways to analyze logs, amounting to several hundred gigabytes each day, or to allow a user in the U.K. to access a blog (through a Web server) that has been stored (in a DB server) by a user in the US in real time.

Below we explain each of four types of contemporary solutions for large scale data management.

Massive Parallel Processing / Parallel DBMS

As database management systems use a single CPU, they have inadequate processing power to handle data in terabytes at 10,000 TPS. For this reason, many technologies have been introduced to implement 'parallel query processing' on the SMP hardware since the '80s.

For example, the sort and hash join functions are often parallelized because they are some of the most time-consuming operations of a DB.

Another important technique in parallel processing is the pipelining, which parallelizes each step of a series of long and complex SQL queries. Parallelization is widely used in data analysis such as BI. Such parallel DBMS technology has surpassed the single-node SMP hardware and is being advanced to the massively parallel database system, which is based on clusters in the 'shared-nothing' structure.

For example, the 6.5 petabyte-scale DW database of Ebay or the Everest Architecture of Yahoo for 10 petabyte-scale data analysis are based on the MPP structure and technologies.

yahoo-dw-system-everest-architecture.png

Other examples of MPP DBMS are Greenplum and Aster Data, both of which are developed by using PostgreSQL, an open-source DBMS. The Greenplum system has an advantage in the DW market; it gained popularity when it was adopted by eBay [2]. Both Greenplum and Aster Data process massive amounts of data with parallel clusters, while utilizing existing tools such as the relative data model and SQL. In these systems, the data is properly distributed (partitioned or shared) to each DB node and the data analysis SQL is executed at each node in parallel.

greenplum-system-configuration.png

It is similar to the MapReduce programming model, which has been frequently used as of late. Both Greenplum and Aster Data provide a feature of combining SQL and MapReduce. The following is a description of the SQL used by Greenplum and the manner in which it is processed.

Each node in a DB executes SQL for the local data (similar to what happens in the mapping phase) and then merges the results in sequence. This is possible because the data is separated by condition (similar to what happens in the reducing phase).

greenplum-system-configuration-2.png

The MapReduce method [3] announced by Google is one of the technologies used for analyzing massive amounts of data such as MPP or parallel DBMS. Hadoop is an implementation of the framework that uses the MapReduce method in Yahoo. The following table shows the comparison of a parallel database and the Hadoop system.

Parallel Database Hadoop (MapReduce)
Data Designed for structured relational data Designed for unstructured data
Query Interface SQL MapReduce programs Java, C++, Script, SQL-like
Query Execution Pipelines result between operators Materialized results between Map and Reduce phrase
Job Granularity Entire query Determined by data storage block size (Runtime scheduler)
Strength High performance (Query on structured data) Scalability (thousands of nodes)

HadoopDB is one of the noteworthy open-source projects that combine Hadoop and RDBMS. HadoopDB is based on SQL and the Hadoop system, and uses RDBMS, whereas Greenplum or Aster Data with MapReduce use HDFS (Hadoop Distributed File System). While HadoopDB adds the SQL functions on top of the MapReduce engine, Aster Data uses the MapReduce functions inside an SQL query, a method called 'In-Database MapReduce.' To be more precise, HadoopDB uses the functions provided by Hive, a scripting engine developed by Facebook that can execute pseudo-SQL queries in the HDFS/Hadoop system, while using an RDBMS file, instead of an HDFS file, to execute the map function.

hadoopdb-structure.png

hadoopdb-reduce-map-phase.png

The following is an example of an SQL/MR used by Aster Data. In the example, the sessionize() MR function is implemented with Java and then used for an SQL query.

aster-data-sql-mr-sessionize.png

aster-data-sessionize-java-class.png

The SCOPE project, announced by Microsoft, implements a MapReduce process such as Hadoop in the Windows and .NET environment.

Column-Oriented Databases

Column-oriented DBMS is a system that stores data in a sequence of columns, instead of rows. For your reference, the RDBMS storage system [4] stores and manages the rows of the table (or records) as shown in the figure below.

rdbms-row-based-storage.png

In this structure, several records are stored in a single disk page.

column-based-storage.png

The column-store stores the identical column values found in a table in succession,and manages them. A file is created for each column, and the identical values found in a column are stored in a disk page in sequence.

row-storage-column-storage-comparison.png

The characteristics of row store and column store listed in the following table show that the latter is more suitable for the application of a massive database in which most operations involve data reading. However, column store is unsuitable when frequent writing is required. It is also inefficient when there are several columns to be processed, as the size of an intermediate result in a query is larger than that of row store, and more steps are involved when compared to row store. These shortcomings are due to the fact that column store processes one column at a time even when there are several conditions. The following table shows a comparison of Row Store and Column Store.

Row Store Column Store
Suitable for adding or deleting records. Has an advantage in reading related data.
Reads unnecessary (unused) column values causing disk I/O. Several entries need to be modified to change a record causing disk I/O.
Relatively low record-level compression rate. Compression rate is good as the columns of a domain can share the same value.
Has an advantage in performing random access because indexes are configured at record-level (such as B+- trees). Has an advantage in performing range quries.

To overcome these drawbacks, a variety of methods are used, including:

  1. compression
  2. late materialization [5]
  3. block iteration [6]
  4. and invisible join

The figure below illustrates the SQL execution process in column store.

sql-execution-process-in-column-store.png

sql-execution-process-in-column-store-2.png

In conclusion, column store and row store are complementary, and neither is superior to the other. For this reason, they are often used in conjunction with each other. The write of data is executed in a row store DBMS. When a certain amount of data is collected, that data is moved to column store (after data compression and other tasks are executed) for read-heavy analysis or other tasks.

Stream Processing / ESP or CEP

Stream processing, event stream processing, and complex event processing derived from an effort to process dynamic data in a database. The StreamSQL standard, an extension of SQL, has been proposed for this purpose. One of the forerunners is Truviso, who came up with the 'continuous analysis' concept with the support of the snapshot query and continuous query features [7]. Other commonly-used data processing techniques adopt the 'store-first-query-later' method, in which an external event is converted and stored in a database and then a query is executed.

When SQL processes a table with a fixed number of records, StreamSQL processes a table at which infinite tuples are generated. That is, not all data is present in a table at any given time. The following figure illustrates the system architecture of Aurora, the forerunner of what is now StreamSQL.

auroa-system-architecture.png

It has become a common practice to apply the concept of stream processing to Web services. In normal operation of a Web service, the DB is rarely modified, as the service merely presents prepared content in the DB or queries the UCC data that already exist in it.

web-service-with-data-io-process.png

(See the figure above) However, it is not uncommon to create a Web service with a system that engages in data I/O process even before all the input data is given, as it is preferable to handle a massive amount of data in real time when searching or analyzing logs.

(See the second figure below) Stream processing will become a necessity even for Web services, as is already the case in web services in which a sensor device generates a never-ending stream of data, or Smartphones, in which multiple user-content providers create content data for other user-consumers.

web-service-and-stream-processing.png

Key-Value Store / MapReduce

The byword of the recent key-value stores is either the BigTable from Google or the Dynamo (simpleDB) from Amazon. Key-value store is an old storage system that predates RDBMS. For example, the DBM library of UNIX or more advanced BerkleyDB came out quite some time ago, and some would go as far as to consider them the precursors of RDBMS. The most recent entry in this field is Tokyo Cabinet, a new iteration of the DBM library of UNIX.

It is my opinion that the newer key-value stores differ from the traditional ones in that they are based on DFS. BigTable, Dynamo, and Yahoo's PNUT also work in the distributed environment. As the size of processed data increased, it exceeded the performance and capacity of single-server systems or storage systems similar to the SAN system. The answer to this situation was distribution storage using a large number of servers which is more service-ready and cost-effective, and has a better work load-distribution capacity than RDBMS. What ultimately born was the distributed key-value store in the form of an extendable hash, which is used for processing Web services or single data processes at the performance level of the single key-value pair.

However, as the characteristics of the key-value store prevent it from utilizing well-developed declarative queries such as SQL, a programming model called MapReduce was introduced to circumvent this limitation. It becomes clear, when one looks into Amazon's Dynamo (simpleDB) or Yahoo's PNUT, that the key-value store has an architecture suitable for catering cloud DB services that are used for developing Web services. This is the case because the key-value store can be operated by using only basic functions such as Get/Set, for most of the actions involve single value retrieval, and that it is convenient to provide interfaces by using Web service protocols such as HTTP or REST. The key-value store is extremely useful when ease of operation, good performance, scalability and availability are crucial. It is a database technology for areas where RDBMS cannot cover.

It must be noted as well that much effort has been invested in a variety of disciplines to enhance the performance of databases so that they can meet the ever-growing demand. However, this effort does not yet translated into the phasing-out of commercial open source RDBMSs, such as Oracle or MySQL, as they are being widely used in more areas than ever before. On the contrary, many RDBMS-based solutions are being suggested. Still, the fact remains that many of the key-value stores such as Google's BigTable, Amazon's Dynamo, BerkleyDB and Tokyo Cabinet, and distributed key-value stores in particular, are found useful in many new ways.

  1. There are many other successful cases of open-source or commercialization that are past the research phase as well. Also, it is known that numerous large corporations develop and use in-house systems. However, these in-house systems can hardly challenge Oracle, the most popular commercial RBDMS of today.
  2. According to the information provided by Greenplum, it processes about 17 trillion records (6.5 petabyte in size, 150 billion new records are inserted every day) with 96 DB nodes.
  3. It would be more appropriate to consider MapReduce as a type of programming model, or a pattern and a framework rather than a new technology or system. When the paper was patented, the originality of MapReduce's algorithm was challenged, as it looked the same as the decades-old divide & conquer algorithm.
  4. A DBMS consists of a storage system and a query processor. The storage system has features such as transaction logging and concurrency control to maintain the effectiveness and integrity of its file structure.
  5. A technique that reduces physical data-read costs through materialization, in which a list of the positions of the values passes to the next process, rather than creating individual intermediate result when processing queries.
  6. Also known as the vector-ized processing.
  7. Truviso also uses an expanded PostgreSQL, an open-source DBMS.

By Park Kieun, the CUBRID Cluster Architect.

References

Massive Parallel Processing / Parallel DBMS

  • Aster nCluster In-Database MapReduce, Aster Data Systems.
  • SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions, Eric Friedman (Aster Data), VLDB 2009.
  • Greenplum DW Appliance: Technical View, Sun.
  • Sun Oracle Exadata and Database Machine Overview, Oracle, 2009.
  • MAD Skills: New Analysis Practices for Big Data, Jeffrey Cohen (Greenplum), VLDB 2009.
  • HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads, Azza Abouzeid (Yale), VLDB 2009.
  • Osprey: Implementing MapReduce-Style Fault Tolerace in a Shared-Nothing Distributed Database, Christopher Tang (MIT), http://www.dbms2.com.
  • Parallel Database Systems: The Future of Database Processing or a Passing Fad?, David J. DeWitt (UW), 1991.
  • Parallel Database Systems: The Furture of High Performance Database Processing, David J. DeWitt (UW), 1992.
  • Hive - A Warehousing Solution Over a Map-Reduce Framework, Ashish Thusoo (Facebook), VLDB 2009.
  • HIVE Data warehousing using Hadoop, Facebook Data Team.

Column-Oriented Databases

  • C-Store: A Column-oriented DBMS, Mike Stonebraker (MIT), VLDB 2005.
  • Column-Oriented Database Systems, VLDB 2009 Tutorial.
  • Column-Stores vs. Row-Stores: How Different Are They Really?, Daniel J. Abadi (Yale) , SIGMOD 2008.
  • A Comparison of C-Store and Row-Store in a Common Framework, Alan Halverson (UW), VLDB 2006.
  • Read-Optimized Databases, In Depth, Allison L. Holloway (UW), VLDB 2008.
  • Integrating Compression and Execution in Column Oriented Database Systems, Daniel J. Abadi (MIT), SIGMOD 2006.
  • Query Execution in Column-Oriented Database Systems, Daniel J. Abadi (MIT), MIT 2008.
  • Performance Tradeoffs in Read-Optimized Databases, Stavros Harizopoulos (MIT), VLDB 2006.
  • Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct, Stefan Manegold (CWI), VLDB 2009.
  • MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database, M. Ivanova (CWI), SSDBM 2007.
  • Sybase IQ, An Evaluation paper by Bloor Research, 2006.
  • The Vertica® Analytic Database Technical Overview White Paper, Vertica Systems Inc, 2008.

Key-Value Store / MapReduce

  • MapReduce: Simplified Data Processing on Large Clusters, Google, OSDI 2004.
  • BigTable: A Distributed Storage System for Structued Data, Google, OSDI 2006.
  • Hive - A Warehousing Solution Over a Map-Reduce Framework, Ashish Thusoo (Facebook), VLDB 2009.
  • HIVE data warehousing using Hadoop, Facebook Data Team.
  • Hadoop Architecure and its Usage at Facebook, Dhruba Borthakur (Apache), 2009.
  • Data Intensive Super Computing, Randal E. Bryant (CMU).
  • Cloud Architecture, Jinesh Varia (Amazon Web Services).
  • GAIA & Neptune, Joon Kim.
  • Hadoop과 오픈소스 소프트웨어를 이용한 비지니스 인텔리전스 플랫폼 구축, 김영우 (다음 커뮤니케이션).
  • Stream Processing.
  • Toward a Streaming SQL Standard, Namit Jain (Oracle), VLDB 2008.
  • Big Data and Real-time Structured Data Analytics, Ben Lorica (O'Reilly), 2009.
  • Aurora: A New Model and Architecture for Data Stream Management, Daniel J. Abadi (Brandeis University), VLDB 2003.
  • Continuous Analytics technical whitepaper, Truviso, 2010.

Other

  • Handling Large Datasets at Google: Current Systems and Future Directions, Jeff Dean (Google).
  • Challenges in Building Large-Scale Information Retrieval Systems, Jeff Dean (Google), WSDM 2009.
  • A Large Scale Search Engine System in 2010, Shihua Ming (Yanbian), 2010.
  • PNUTS: Yahoo!’s Hosted Data Serving Platform, Brian F. Cooper, (Yahoo! Research), VLDB 2008.
  • Sherpa: Cloud Computing of the Third Kind, Raghu Ramakrishnan (Yahoo! Research and Platform Engineering Team).
  • PNUTS - Platform for Nimble Universal Table Storage, http://research.yahoo.com/project/212.
  • Sherpa: Hosted Data Serving | Yahoo! Research, http://research.yahoo.com/node/2139.
  • http://www.microsoft.com/windowsazure/sqlazure.
  • Overview of Microsoft SQL Azure Database, Microsoft, 2009.
  • Simiarities and Differences of SQL Azure and SQL Server, Microsoft, 2009.
  • Introducing the windows azure platform, David Chappell, 2009.
  • http://cassandra.apache.org.



comments powered by Disqus