<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/">
    <channel>
        <title>Increasing Database Performance by Query Tuning</title>
        <link>http://www.cubrid.org/?mid=query_tuning_results</link>
        <description>Increasing Database Performance by Query Tuning</description>
        <language>en</language>
        <pubDate>Thu, 02 Sep 2010 10:23:31 -0800</pubDate>
        <lastBuildDate>Tue, 19 Apr 2011 05:30:59 -0800</lastBuildDate>
        <generator>XpressEngine 1.4.4.1</generator>
                        										        <item>
            <title>Increasing Database Performance by Query Tuning</title>
            <dc:creator>admin</dc:creator>
            <link>http://www.cubrid.org/query_tuning_results</link>
            <guid isPermaLink="true">http://www.cubrid.org/query_tuning_results</guid>
                                    <description><![CDATA[<h1>Increasing Database Performance by Query Tuning</h1>
<div class="category"><a href="/performance_results">⇐CUBRID Performance Test Results</a>
<a class="pdf right" href="/files/docs/misc/performance/Increasing Database Performance by Query Tuning.pdf" target="_self">Download this document in PDF</a>
</div>

<div class="contents-table">
<h3>Table of Contents</h3>
<ul>
<li><a class="toTop">Back to Top</a></li>
<li><a href="#about">About the Test</a></li>
<li><a href="#results">Test Insight and Results</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>

<h2><a id="about"></a>1. About the Test</h2>

<h3>Outline</h3>
<p>
It is common that applications use various kinds of SQL queries when communicating with the database server. If queries are not well structured, they will definitely affect the overall performance of the database server and the web application as well. Thus, this test is intended to reveal the degree of database performance improvement when SQL queries are well tuned.
</p>

<h3>Test Scenario</h3>
<p>
This test has several constant code tables (with fixed number of data inside) and another table, which continuously accumulates the large amount of data.</p>

<h4>Table Schema used in the test</h4>
<p class="center">
<img src="http://www.cubrid.org/files/attach/images/49/887/001/Table-Schema-web.png" alt="Table Schema" width="698" height="374" editor_component="image_link"/>
</p>
<p>
The image above illustrates the table schema used in this test.
</p>
<ul><li><b>SN_SVC:</b> a table which defines the service codes.
</li><li><b>SN_CONF:</b> a table which stores service configuration details managed by a user.</li><li><b>SN_MSG:</b>&nbsp;a table which maintains the messages generated in every service configured by a user.</li><li><b>SN_SVC &amp; SN_CONF</b> are the code tables while <b>SN_MSG</b>&nbsp;is a&nbsp;table which stores the large amounts of data.</li></ul>
<p>
The process works like this: the service code for each message/event, which will be created by a user,&nbsp;can be configured in&nbsp;<b>SN_CONF</b>. After the service code is set by a user, the event details will be continuously accumulated in <b>SN_MSG</b>.&nbsp;If the details are requested for the message created by a user, it displays the message information by <i>joining</i>&nbsp;the three tables based on the service code.</p>

<h4>Table Data Structure</h4>
<p>
In the test database shown above there are 10 service codes, and 4,000 registered users. In addition, it is fixed that every week each user generates 500 messages of one type (this suggests that per week one user generates 5,000 messages in total). Thus,
</p>
<ul><li># of <b>SN_SVC</b>&nbsp;=&nbsp;10</li><li><b><span class="Apple-style-span" style="font-weight: normal; "># of&nbsp;</span>SN_CONF</b>&nbsp;= 4,000</li><li><b><span class="Apple-style-span" style="font-weight: normal; "># of&nbsp;</span>SN_MSG</b> = 4,000 x 5,000 = 20,000,000</li></ul>

<h4>Test Workload</h4>
<p>In order to measure the performance, three types of queries were created to access the table with the large amount of data.&nbsp;Below are these 3 different queries defined for this experiment:
</p>
<ul><li><b>MessageList:</b>&nbsp;Displays the 10 most recent messages with a particular service code set by a user.</li><li><b>MessageTime:</b>&nbsp;Displays the most recent message with a particular service code read by a user.</li><li><b>NewCount:</b>&nbsp;Displays the number of the most recent messages generated by a user within the last week.</li></ul>
<p>
This test generates a database workload with 30 threads composed of the above mentioned queries. <b><a href="http://dev.naver.com/projects/cubrid-nbench" target="_blank">NBench2 BMT (Benchmark Test) Tool</a></b>&nbsp;was used to generate the load. In order to measure the change in TPS (Transactions Per Second) the two Database Systems given below were used separately on the same hardware.</p>
<ul><li><b>CUBRID R3.0</b></li><li><b>MySQL 5.1</b></li></ul>

<h2><a id="results"></a>Test Insight and Results</h2>

<h3>Test Results before Tuning</h3>
<p>
In average CUBRID processed 25 TPS, while MySQL - 10 TPS. Without query tuning, the test showed poor results for both database systems.
</p>

<h4>CUBRID Database Test Results (in TPS)</h4>
<p class="center"><img src="http://www.cubrid.org/files/attach/images/49/887/001/cubrid-tps-before-tuning.png" alt="CUBRID TPS Before Tuning" width="698" height="327" editor_component="image_link"/>
</p>

<h4>MySQL Database Test Results (in TPS)</h4>
<p class="center"><img src="http://www.cubrid.org/files/attach/images/49/887/001/mysql-tps-before-tuning.png" alt="MySQL TPS Before Tuning" width="698" height="327" editor_component="image_link"/>
</p>

<h3>Query Analysis with Tuning</h3>

<h4>MessageList Query</h4>
<p>
The following is the original Query Statement and the <b>Query Plan</b>.
</p>
<div class="code">
<div class="code" editor_component="code_highlighter" code_type="sql" file_path="" description="" first_line="1" collapse="false" nogutter="false" nocontrols="false">
SELECT …
FROM SN_MSG msg ,
  SN_CONF cnfg ,
  SN_SVC svcobj
WHERE msg.user_id = cnfg.user_id
  AND msg. code = cnfg. code
  AND msg. code = svcobj. code
  AND msg.is_feed = ?
  AND msg.code = ?
  AND msg.user_id = ?
  AND msg.is_deleted = 'N'
  AND msg.create_time &lt; DATE_ADD(CURRENT_TIMESTAMP, INTERVAL 1 DAY)
  AND cnfg.is_received = 'Y'
ORDER BY msg.create_time DESC
LIMIT 10

Query plan:
Sort(order by)
  Nested loops
    Nested loops
      Index scan(SN_CONF cnfg, ipk_svc_cnfg, cnfg.code=? and cnfg.user_id=?)
      Index scan(SN_MSG msg, fk_code, msg.code=? and msg.user_id=?)
    Index scan(SN_SVC svcobj, ipk_obj_svc, svcobj.code=?)
</div>
</div>

<p>
If we look at the above Query Plan,
</p>
<ol><li>It retrieves one record of data from <b>SN_CONF</b>&nbsp;table with the given <b>user_id</b>.</li><li>Then retrieves 500 records of data by joining with the&nbsp;<b>SN_MSG</b>&nbsp;table,&nbsp;using the <b>code</b>&nbsp;value from the previous results.</li><li>It retrieves 500 records of data by joining with the&nbsp;<b>SN_SVC</b>&nbsp;table.</li><li>Then all these records are sorted, with the latest displayed first, and the first 10 results are returned.</li></ol>

<p>In this case, even though only 10 results are returned at the end, there are many Disk I/O operations occurring before the final results are returned, because 500 records are retrieved&nbsp;randomly&nbsp;from <b>SN_MSG</b>, a table holding the huge data of 20,000,000 records. (Using CUBRID's <b>statdump</b>,&nbsp;it is possible to monitor how many times&nbsp;<b>fetch</b>&nbsp;has been occurred in the process.) To optimize this problem, we have to:</p>
<ol><li>Create the following <b>Index</b>&nbsp;to retrieve 10 records directly avoiding <b>ORDER BY</b>.

<div class="code">
<div class="code" editor_component="code_highlighter" code_type="sql" file_path="" description="" first_line="1" collapse="false" nogutter="false" nocontrols="false">
create index ink2_sn_user_msg (user_id, code, is_feed, create_time DESC, is_deleted, is_read);
</div>
</div>

</li><li>Provide additional hints when joining with the <b>SN_MSG</b>&nbsp;table to avoid unnecessary sorting.</li></ol>
<p>
Below is the revised Query Statement and the Query Plan. Using the newly created <b>Index</b>,&nbsp;we can be sure that <b>ORDER BY</b> does not occur in the Query Plan.</p>

<div class="code">
<div class="code" editor_component="code_highlighter" code_type="sql" file_path="" description="" first_line="1" collapse="false" nogutter="false" nocontrols="false">
SELECT /*+ ORDERED */
  …
FROM SN_MSG msg ,
  SN_CONF cnfg ,
  SN_SVC svcobj
WHERE msg.user_id = cnfg.user_id
  AND msg.code = cnfg.code
  AND msg. code = svcobj.code
  AND msg.is_feed = ?
  AND msg.code = ?
  AND msg.user_id = ?
  AND msg.is_deleted = 'N'
  AND msg.create_time &lt; DATE_ADD(CURRENT_TIMESTAMP, INTERVAL 1 DAY)
  AND cnfg.msg_is_received = 'Y'
ORDER BY msg.create_time DESC
LIMIT 10

Query plan:
nl-join (inner join)
  outer: nl-join (cross join)
    outer: iscan
      class: msg
      index: ink2_sn_user_msg
    inner: iscan
      class: cnfg
      index: ipk_svc_conf
  inner: iscan
    class: svcobj
    index: ipk_svc
  sort: 4 desc --&gt; skip order by
</div>
</div>

<h4>MessageTime Analysis</h4>
<p>
The following is the original Query Statement and the Query Plan.
</p>

<div class="code">
<div class="code" editor_component="code_highlighter" code_type="sql" file_path="" description="" first_line="1" collapse="false" nogutter="false" nocontrols="false">
SELECT create_time, msg_cre_ms
FROM SN_MSG msg
WHERE msg.is_feed = 'Y'
  AND msg.code = ?
  AND msg.mbr_id = ?
ORDER BY create_time DESC
LIMIT 1

Query plan:
Sort(order by)
  Index scan(SN_MSG msg, fk_code, msg.code=? and msg.user_id=?)
</div>
</div>

<p>
If we consider the above Query Plan, its structure is identical to that of <b>MessageList</b>. Therefore, we will use likewise the&nbsp;<b>Index</b> created above (with the consideration that no changes has been applied to the Query Statement).</p><p>The following is the modified Query Plan. Here we can see that <b>ORDER BY</b>&nbsp;has not been used.</p>

<div class="code">
<div class="code" editor_component="code_highlighter" code_type="sql" file_path="" description="" first_line="1" collapse="false" nogutter="false" nocontrols="false">
Query plan:
iscan
  class: msg
  index: ink2_sn_user_msg
  sort: 1 desc --&gt; skip order by
</div>
</div>

<h4>NewCount Query</h4>
<p>
The following is the original Query Statement and the Query Plan.
</p>

<div class="code">
<div class="code" editor_component="code_highlighter" code_type="sql" file_path="" description="" first_line="1" collapse="false" nogutter="false" nocontrols="false">
SELECT msg.code, msg.is_feed, count(*)
FROM SN_MSG msg
WHERE msg.user_id = ?
  AND msg.is_deleted = 'N'
  AND msg.is_read = 'N'
  AND msg. code IN (?,?)
GROUP BY msg.code, msg.is_feed

Sort(group by)
  Index scan(SN_MSG msg, fk_code, msg.user_id=? and msg.code=?)
</div>
</div>

<p>
If we consider the above Query Plan:</p>
<ol><li>It retrieves 1,000 records randomly from <b>SN_MSG,</b>&nbsp;a table holding enormous number of records, thus causing the Disk I/O operations.</li><li>Then, it returns the final results by GROUPING them.</li></ol>
<p>In fact, in the above query everything is grouped. Therefore, in order to avoid accessing the data, <b>Index</b>&nbsp;can be used to retrieve the results by:</p>
<ol><li>Preparing the Query Statement, decomposing the groups as shown below.</li><li>Preparing the final Query Statement, joining the previously decomposed Query Statement. (In this case <b>is_feed:</b> Y/N, <b>code:</b> 2 types&nbsp;==&gt; 4 times merging)</li></ol>
<p>
The following is the revised Query Statement and the Query Plan. Here we can be sure that we can obtain the <b>count(*)</b>&nbsp;using <b>Index</b>&nbsp;while not accessing the data.</p>

<div class="code">
<div class="code" editor_component="code_highlighter" code_type="sql" file_path="" description="" first_line="1" collapse="false" nogutter="false" nocontrols="false">
SELECT ?, 'Y', count(*) AS mycount
FROM SN_MSG msg
WHERE
  msg.code = ?
  AND msg.user_id = ?
  AND msg.is_feed = 'Y'
  AND msg.is_deleted = 'N'
  AND msg.is_read = 'N'
USING INDEX ink2_sn_user_msg

Query plan:
iscan
  class: msg
  index: ink2_sn_user_msg
</div>
</div>

<h3>Test Results after Tuning</h3>
<p>
When the Query Tuning had been applied to CUBRID Database System, the tuned results showed 67 times TPS increase resulting to 1,721.97 Transactions Per Second. MySQL DMBS illustrated 107 times TPS increase resulting to 1,114.22 Transactions Per Second. (In the table below the time <b>min</b>, <b>max</b>, <b>avg</b>, <b>std</b>, are displayed in milliseconds.)</p>
<p class="center"><img src="http://www.cubrid.org/files/attach/images/49/887/001/test-results-after-tuning-web_1.png" alt="Test Results After Tuning" width="587" height="174" editor_component="image_link"/>
</p>

<h2><a id="conclusion"></a>Conclusion</h2>
<p>
When you tune the query, whatever the desired results are, it is necessary to know how to confirm the Query Plan of the given Query Statement with the analysis. It is important to find the efficient way to handle the Index Addition/Removal in order to avoid the unnecessary operations.
</p><p>In this test it was necessary&nbsp;to make measures in order to avoid the random access to the large amount of data, which means a large amount of Disk I/O operations. As a result of the correctly leveraged <b>Indexing</b> techniques, the significant performance increase has been gained using the simple query tuning. It can be difficult to precisely tune the queries at the beginning when the web service or application is created. Because when the service is started the tuned query may behave perfectly as expected, but with the time when the data gets bigger, the tuned query may not perform well enough. Therefore, consistent query verifications and performance analysis has to be conducted.</p>]]></description>
                        <pubDate>Thu, 02 Sep 2010 09:28:53 -0800</pubDate>
                        <category>performance</category>
                        <category>test</category>
                        <category>mysql</category>
                        <category>sql</category>
                        <category>query tuning</category>
                                </item>
            </channel>
</rss>
