Open Source RDBMS - Seamless, Scalable, Stable and Free

English | Login |Register

Increasing Database Performance by Query Tuning

1. About the Test

Outline

It is common that applications use various kinds of SQL queries when communicating with the database server. If queries are not well structured, they will definitely affect the overall performance of the database server and the web application as well. Thus, this test is intended to reveal the degree of database performance improvement when SQL queries are well tuned.

Test Scenario

This test has several constant code tables (with fixed number of data inside) and another table, which continuously accumulates the large amount of data.

Table Schema used in the test

Table Schema

The image above illustrates the table schema used in this test.

  • SN_SVC: a table which defines the service codes.
  • SN_CONF: a table which stores service configuration details managed by a user.
  • SN_MSG: a table which maintains the messages generated in every service configured by a user.
  • SN_SVC & SN_CONF are the code tables while SN_MSG is a table which stores the large amounts of data.

The process works like this: the service code for each message/event, which will be created by a user, can be configured in SN_CONF. After the service code is set by a user, the event details will be continuously accumulated in SN_MSG. If the details are requested for the message created by a user, it displays the message information by joining the three tables based on the service code.

Table Data Structure

In the test database shown above there are 10 service codes, and 4,000 registered users. In addition, it is fixed that every week each user generates 500 messages of one type (this suggests that per week one user generates 5,000 messages in total). Thus,

  • # of SN_SVC = 10
  • # of SN_CONF = 4,000
  • # of SN_MSG = 4,000 x 5,000 = 20,000,000

Test Workload

In order to measure the performance, three types of queries were created to access the table with the large amount of data. Below are these 3 different queries defined for this experiment:

  • MessageList: Displays the 10 most recent messages with a particular service code set by a user.
  • MessageTime: Displays the most recent message with a particular service code read by a user.
  • NewCount: Displays the number of the most recent messages generated by a user within the last week.

This test generates a database workload with 30 threads composed of the above mentioned queries. NBench2 BMT (Benchmark Test) Tool was used to generate the load. In order to measure the change in TPS (Transactions Per Second) the two Database Systems given below were used separately on the same hardware.

  • CUBRID R3.0
  • MySQL 5.1

Test Insight and Results

Test Results before Tuning

In average CUBRID processed 25 TPS, while MySQL - 10 TPS. Without query tuning, the test showed poor results for both database systems.

CUBRID Database Test Results (in TPS)

CUBRID TPS Before Tuning

MySQL Database Test Results (in TPS)

MySQL TPS Before Tuning

Query Analysis with Tuning

MessageList Query

The following is the original Query Statement and the Query Plan.

SELECT …
FROM SN_MSG msg ,
  SN_CONF cnfg ,
  SN_SVC svcobj
WHERE msg.user_id = cnfg.user_id
  AND msg. code = cnfg. code
  AND msg. code = svcobj. code
  AND msg.is_feed = ?
  AND msg.code = ?
  AND msg.user_id = ?
  AND msg.is_deleted = 'N'
  AND msg.create_time < DATE_ADD(CURRENT_TIMESTAMP, INTERVAL 1 DAY)
  AND cnfg.is_received = 'Y'
ORDER BY msg.create_time DESC
LIMIT 10

Query plan:
Sort(order by)
  Nested loops
    Nested loops
      Index scan(SN_CONF cnfg, ipk_svc_cnfg, cnfg.code=? and cnfg.user_id=?)
      Index scan(SN_MSG msg, fk_code, msg.code=? and msg.user_id=?)
    Index scan(SN_SVC svcobj, ipk_obj_svc, svcobj.code=?)

If we look at the above Query Plan,

  1. It retrieves one record of data from SN_CONF table with the given user_id.
  2. Then retrieves 500 records of data by joining with the SN_MSG table, using the code value from the previous results.
  3. It retrieves 500 records of data by joining with the SN_SVC table.
  4. Then all these records are sorted, with the latest displayed first, and the first 10 results are returned.

In this case, even though only 10 results are returned at the end, there are many Disk I/O operations occurring before the final results are returned, because 500 records are retrieved randomly from SN_MSG, a table holding the huge data of 20,000,000 records. (Using CUBRID's statdump, it is possible to monitor how many times fetch has been occurred in the process.) To optimize this problem, we have to:

  1. Create the following Index to retrieve 10 records directly avoiding ORDER BY.
    create index ink2_sn_user_msg (user_id, code, is_feed, create_time DESC, is_deleted, is_read);
    
  2. Provide additional hints when joining with the SN_MSG table to avoid unnecessary sorting.

Below is the revised Query Statement and the Query Plan. Using the newly created Index, we can be sure that ORDER BY does not occur in the Query Plan.

SELECT /*+ ORDERED */
  …
FROM SN_MSG msg ,
  SN_CONF cnfg ,
  SN_SVC svcobj
WHERE msg.user_id = cnfg.user_id
  AND msg.code = cnfg.code
  AND msg. code = svcobj.code
  AND msg.is_feed = ?
  AND msg.code = ?
  AND msg.user_id = ?
  AND msg.is_deleted = 'N'
  AND msg.create_time < DATE_ADD(CURRENT_TIMESTAMP, INTERVAL 1 DAY)
  AND cnfg.msg_is_received = 'Y'
ORDER BY msg.create_time DESC
LIMIT 10

Query plan:
nl-join (inner join)
  outer: nl-join (cross join)
    outer: iscan
      class: msg
      index: ink2_sn_user_msg
    inner: iscan
      class: cnfg
      index: ipk_svc_conf
  inner: iscan
    class: svcobj
    index: ipk_svc
  sort: 4 desc --> skip order by

MessageTime Analysis

The following is the original Query Statement and the Query Plan.

SELECT create_time, msg_cre_ms
FROM SN_MSG msg
WHERE msg.is_feed = 'Y'
  AND msg.code = ?
  AND msg.mbr_id = ?
ORDER BY create_time DESC
LIMIT 1

Query plan:
Sort(order by)
  Index scan(SN_MSG msg, fk_code, msg.code=? and msg.user_id=?)

If we consider the above Query Plan, its structure is identical to that of MessageList. Therefore, we will use likewise the Index created above (with the consideration that no changes has been applied to the Query Statement).

The following is the modified Query Plan. Here we can see that ORDER BY has not been used.

Query plan:
iscan
  class: msg
  index: ink2_sn_user_msg
  sort: 1 desc --> skip order by

NewCount Query

The following is the original Query Statement and the Query Plan.

SELECT msg.code, msg.is_feed, count(*)
FROM SN_MSG msg
WHERE msg.user_id = ?
  AND msg.is_deleted = 'N'
  AND msg.is_read = 'N'
  AND msg. code IN (?,?)
GROUP BY msg.code, msg.is_feed

Sort(group by)
  Index scan(SN_MSG msg, fk_code, msg.user_id=? and msg.code=?)

If we consider the above Query Plan:

  1. It retrieves 1,000 records randomly from SN_MSG, a table holding enormous number of records, thus causing the Disk I/O operations.
  2. Then, it returns the final results by GROUPING them.

In fact, in the above query everything is grouped. Therefore, in order to avoid accessing the data, Index can be used to retrieve the results by:

  1. Preparing the Query Statement, decomposing the groups as shown below.
  2. Preparing the final Query Statement, joining the previously decomposed Query Statement. (In this case is_feed: Y/N, code: 2 types ==> 4 times merging)

The following is the revised Query Statement and the Query Plan. Here we can be sure that we can obtain the count(*) using Index while not accessing the data.

SELECT ?, 'Y', count(*) AS mycount
FROM SN_MSG msg
WHERE
  msg.code = ?
  AND msg.user_id = ?
  AND msg.is_feed = 'Y'
  AND msg.is_deleted = 'N'
  AND msg.is_read = 'N'
USING INDEX ink2_sn_user_msg

Query plan:
iscan
  class: msg
  index: ink2_sn_user_msg

Test Results after Tuning

When the Query Tuning had been applied to CUBRID Database System, the tuned results showed 67 times TPS increase resulting to 1,721.97 Transactions Per Second. MySQL DMBS illustrated 107 times TPS increase resulting to 1,114.22 Transactions Per Second. (In the table below the time min, max, avg, std, are displayed in milliseconds.)

Test Results After Tuning

Conclusion

When you tune the query, whatever the desired results are, it is necessary to know how to confirm the Query Plan of the given Query Statement with the analysis. It is important to find the efficient way to handle the Index Addition/Removal in order to avoid the unnecessary operations.

In this test it was necessary to make measures in order to avoid the random access to the large amount of data, which means a large amount of Disk I/O operations. As a result of the correctly leveraged Indexing techniques, the significant performance increase has been gained using the simple query tuning. It can be difficult to precisely tune the queries at the beginning when the web service or application is created. Because when the service is started the tuned query may behave perfectly as expected, but with the time when the data gets bigger, the tuned query may not perform well enough. Therefore, consistent query verifications and performance analysis has to be conducted.

See also

CUBRID 8.4.0 Key Features

The new CUBRID 8.4.0 features many significant improvements which includes Performance Improvements, Developer Productivity Improvements and HA Reliab...

CUBRID vs. MySQL Benchmark Test Results for SNS Data and Workload

As we have recently rolled out the new 8.4.0 version of the CUBRID Database, one of our CUBRID users has approached us with a proposal t...

NBD Benchmark Result

The NBD (NHN Internet Bulletin Board Application Database) Benchmark is a benchmark to measure the performance of a DBMS used for a Bullet...

CUBRID 8.4.0 vs. 8.3.1 Volume Space Reusability Comparison

CUBRID 2008 R4.0 has brought great achievements in overall engine performance and usability of tools. In the first of these blog posts we have a...




You are either using a very old browser or a browser that is not supported.
In order to browse cubrid.org you need to have one of the following browsers:



Internet Explorer: Mozilla Firefox: Google Chrome: