Open Source RDBMS - Seamless, Scalable, Stable and Free

한국어 | Login |Register

CUBRID 8.4.0 Key Features

The new CUBRID 8.4.0 features many significant improvements which includes Performance Improvements, Developer Productivity Improvements and HA Reliability Improvements. Below you can find overview of how each of these improvement dramatically boosts CUBRID's performance. 

Performance Improvement

Database volume reduction

In CUBRID 8.4.0 the database volume size has been decreased by whopping 218%. In this new release we have changed how indexes are stored to reduce the index volume size. As a result the improved storage structure allows CUBRID users to benefit from increased performance.

The following figure illustrates the database volume size comparison between the previous version 8.3.1 and the new CUBRID 8.4.0. In this example, both databases contain 64,000,000 records with a PRIMARY KEY column defined. The numbers represent gigabytes of data.

Database volume usage comparison.

Improved concurrency in Windows builds

CUBRID 8.4.0 provides improved concurrency on Windows OS through enhanced Mutex. The following graph illustrates the basic performance test comparison between the previous version and the new release.

Improved concurrency in Windows builds

Index optimizations

CUBRID 8.4.0 features twice faster database engine than previous 8.3.1 release. It provides several significant built-in index optimizations like:

  • Covering Index
  • LIMIT clause processing optimizations
    • Key Limit
    • Multi Range
  • GROUP BY clause processing optimizations
  • Descending Index Scan
  • Index Scan support in the LIKE clause

Let's see how the indexing structure is organized in CUBRID 8.4.0. In CUBRID indexing is implemented in B+ tree where index values are stored in leaf nodes.

For a practical example, let's consider the following table structure:

CREATE TABLE tbl (a INT, b STRING, c BIGINT);

And we create a multi-column index on columns a and b.

CREATE INDEX idx ON tbl (a, b);

The following picture shows index nodes pointing to data stored in the heap file on the disk.

  1. The index values (both a and b) are sorted in ascending order (by default).
  2. Each index node has a pointer to a corresponding data (table row) in the heap file illustrated by an arrow.
  3. The data in heap file is stored in random order.

CUBRID Index Structure

Index Scan

Let's see how the index scanning is usually performed. On the table defined above, we will execute the following SELECT query.

SELECT * FROM tbl
WHERE a > 1 AND a < 5
AND b < ‘K’
AND c > 10000
ORDER BY b;
  1. CUBRID will first find all index nodes where a >1 and a < 5.
  2. Then among these nodes, it will find all nodes where b < 'K'.
  3. Since column c is not indexed, to obtain its value, it is necessary to look up the heap file.
  4. Each index node has OID (Object Identifier) which tells CUBRID where exactly on the disk a particular row is located.
  5. Based on these OID, the server will look up the heap file to retrieve values of column c.
  6. Then CUBRID will find all those records where c > 10000.
  7. It will take all these records and sort by column b.
  8. And return to a client application.

CUBRID Index Scan

Covering Index

Now let's see how Covering Index improves CUBRID's performance. It allows to return the requested data immediately by skipping heap file look-up, which also reduced the number of I/O operations, which is the most expensive part of the process in terms of time spent.

However, the magic of Covering Index can be applied only when all columns, which appear in the query, are in the same compound index, in other words their values are stored in the same node in the indexing tree. For example, see the following query.

SELECT a, b FROM tbl
WHERE a > 1 AND a < 5
AND b < ‘K’
ORDER BY b;

  • This SQL statement contains only those columns which are in the same multi-column index.
  • The WHERE clause references only those columns which are in the same multi-column index.
  • And the ORDER BY clause reference only those columns which are in the same multi-column index.

So, if we execute the above query:

  1. As a part of the normal index scanning process, CUBRID will first find all index nodes where a >1 and a < 5.
  2. Then among these nodes, it will find all nodes where b < 'K'.
  3. Since the server already has both a and b column values in its index nodes, there is no need to look into the heap file again to obtain these values. So after the second step, it sort the result records by column b.
  4. And returns the data to the client application.

    CUBRID Covering Index

    Now let's see how much Covering Index can increase your database performance. For the same example defined above, we will assume the database table contains some large amount of records.

    Q1. Here is a query which references columns a and b that are covered by the same index.

    SELECT a, b FROM tbl WHERE a BETWEEN ? AND ?

    Q2. And this query uses an indexed column a and a column c without index.

    SELECT a, c FROM tbl WHERE a BETWEEN ? AND ?

    The following graph shows how fast queries can be executed if they use Covering Index.

    CUBRID Covering Index Performance

    LIMIT clause processing optimizations

    Key Limit

    CUBRID 8.4.0 has a way smarter LIMIT clause analyzer. It has been greatly optimized so that only the requested amount of records/indexes are analyzed and the data is returned right after that LIMIT is reached.

    For instance, let's run the following query.

    SELECT * FROM tbl
    WHERE a = 2
    AND b < ‘K’
    ORDER BY b
    LIMIT 3;

    1. CUBRID finds the index nodes where a = 2.
    2. It stops traversing further as it reaches the requested amount of nodes, i.e. 3. Thus it greatly improves the overall performance.
    3. Since the values of column b are already sorted in the index nodes, it looks up the heap file.
    4. And returns the required amount of requested data to a client application.

    CUBRID Key Limit

    Multi Range

    Multi Range optimization is another great future implemented in CUBRID 8.4.0. When users request data which lies within a certain fixed range like a > 0 AND a < 5, it is easy task for database systems. However, multiple ranges like a > 0 AND a < 5 AND a = 7 AND a > 10 AND a < 15 makes things more complicated. However, CUBRID has a great optimization for it called In-place sorting which allows to perform both:

    1. Key Limit
    2. And sort records in place.

    Considering the following for a Multi Range example.

    SELECT * FROM tbl
    WHERE a IN (2, 4, 5)
    AND b < ‘K’
    ORDER BY b
    LIMIT 3;

    1. CUBRID will spot the first index nodes where a IN (2, 4, 5) AND b < 'K'.
    2. As it requires only 3 records sorted by column b, it will perform in-place sorting starting from first nodes in the (2, 4, 5) range.
      1. It will find an index node (2, AAA), which makes 1.
      2. Then it will find an index node (2, ABC), which makes 2.
      3. Then it will find an index node (2, CCC), which makes 3.
      4. As it reaches the LIMIT of 3, it will search other ranges for values of b smaller than those already found.
        1. It will find an index node (4, DAA), which is bigger, so it will skip it.
        2. It will find an index node (5, AAA), which is smaller than ABC and CCC, so it will take it into its count, removing CCC from the final list.
        3. It will find an index node (5, BBB), which is bigger than ABC, so it will skip it and stop here.
    3. Finally, CUBRID will look up the heap file to retrieve all the values.

    This allows CUBRID to perform much faster on large amount of data.

    CUBRID Multi Range LIMIT

    Test results

    The following are the actual performance test comparison results obtained by analyzing the data from active Social Networking Service called Me2Day (Twitter analogue in Korea).

    SELECT * FROM posts 
    WHERE author_id IN (?, ?, ..., ?) AND registered < :from ORDER BY reg_date DESC
    LIMIT 20;

    The posts table has a multi-column index.

    INDEX (author_id, registered DESC)

    • Query rate:
      • 50% of the table data make users with 1~50 friends.
      • 40% of the table data make users with 51~2000 friends.
      • 10% of the table data make users with 2001+ friends.
    • The test randomly selects 20 post of user's friends.
    • The test was run for 10 minutes.

    The following graph compares MySQL UNION (which is faster that MySQL IN) performance with CUBRID IN performance.

    CUBRID IN operator test results

    GROUP BY processing optimizations

    CUBRID 8.4.0 has improved execution of queries containing ORDER BY and GROUP BY clauses. When CUBRID uses multi-column indexes in GROUP BY or ORDER BY, the values are already sorted in the index tree so there is no need to sort again. This new optimization allows CUBRID to perform much faster.

    SELECT COUNT(*) FROM tbl
    WHERE a > 1 AND a < 5
    AND b < ‘K’ AND c > 10000
    GROUP BY a;
    

    1. As a part of the normal index scanning process, CUBRID will first find all index nodes where a >1 and a < 5.
    2. Then among these nodes, it will find all nodes where b < 'K'.
    3. Since column c is not indexed. To obtain its value it is necessary to look up the heap file.
    4. It finds all those records where c > 10000.
    5. Since all the index values are already sorted, CUBRID performs GROUP BY without sorting which significantly increases overall performance.
    6. Finally it returns the grouped data to the client application.

    cubrid-group-by.png 

    Developer Productivity Improvement

    CUBRID 8.4.0 supports over 90% of MySQL SQL syntax. Additionally implicit type conversion has been improved much to let developers focus more on the functionality of their applications while CUBRID makes sure their queries run as expected. Here are just some of the new syntax developers can use in the new CUBRID 8.4.0.

    • Implicit type conversion
      CREATE TABLE x (a INT);
      INSERT INTO x VALUES (‘1’);
      
    • SHOW queries
      SHOW TABLES; SHOW COLUMNS; SHOW INDEX; …
      
    • ALTER TABLE ... CHANGE/MODIFY COLUMN ...
      CREATE TABLE t1 (a INTEGER);
      ALTER TABLE t1 CHANGE a b DOUBLE;
      ALTER TABLE t2 MODIFY col1 BIGINT DEFAULT 1;
      
    • UPDATE ... ORDER BY
      UPDATE t
      SET i = i + 1
      WHERE 1 = 1
      ORDER BY i
      LIMIT 10;
      
    • DROP TABLE IF EXISTS ...
      DROP TABLE IF EXISTS history;
      

    There are 23 DATE/TIME related, 5 TEXT, 5 Aggregate related function have been added. For the full of new SQL syntax, refer to Roadmap: What to expect in CUBRID 8.4.0.

    HA Reliability Improvement

    Next Key Locking Improvement

    In the new CUBRID 8.4.0 the locking mechanism has been greatly improved to minimize the occurrence of deadlock. For example, deadlocks will not occur between transactions that enter data to a table at the same time.

    See also

    CUBRID vs. MySQL Benchmark Test Results for SNS Data and Workload

    As we have recently rolled out the new 8.4.0 version of the CUBRID Database, one of our CUBRID users has approached us with a proposal t...

    Increasing Database Performance by Query Tuning

    It is common that applications use various kinds of SQL queries when communicating with the database server. If queries are not well structured, they wi...

    NBD Benchmark Result

    The NBD (NHN Internet Bulletin Board Application Database) Benchmark is a benchmark to measure the performance of a DBMS used for a Bullet...

    CUBRID 8.4.0 vs. 8.3.1 Volume Space Reusability Comparison

    CUBRID 2008 R4.0 has brought great achievements in overall engine performance and usability of tools. In the first of these blog posts we have a...




    You are either using a very old browser or a browser that is not supported.
    In order to browse cubrid.org you need to have one of the following browsers:



    Internet Explorer: Mozilla Firefox: Google Chrome: