<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/">
    <channel>
        <title>CUBRID vs. MySQL Benchmark Test Results for SNS Data and Workload</title>
        <link>http://www.cubrid.org/?mid=cubrid_mysql_sns_benchmark_test</link>
        <description>CUBRID vs. MySQL Benchmark Test Results for SNS Data and Workload</description>
        <language>en</language>
        <pubDate>Thu, 16 Jun 2011 10:01:57 -0800</pubDate>
        <lastBuildDate>Wed, 22 Jun 2011 10:43:40 -0800</lastBuildDate>
        <generator>XpressEngine 1.4.4.1</generator>
                        										        <item>
            <title>CUBRID vs. MySQL Benchmark Test Results for SNS Data and Workload</title>
            <dc:creator>admin</dc:creator>
            <link>http://www.cubrid.org/cubrid_mysql_sns_benchmark_test</link>
            <guid isPermaLink="true">http://www.cubrid.org/cubrid_mysql_sns_benchmark_test</guid>
                                    <description><![CDATA[<h1>CUBRID vs. MySQL Benchmark Test Results for SNS Data and Workload</h1>

<p>As we have recently rolled out the new&nbsp;8.4.0&nbsp;version of the CUBRID Database, one of our CUBRID users has approached us with a proposal to test the new CUBRID on one of their live services&nbsp;which currently uses MySQL as a back-end database. It is a SNS (Social Networking Service) which operates a base of&nbsp;over 6 million active users. Their major queries mostly comprised of <b>IN</b> and <b>UNION</b>&nbsp;operators, where UNION prevailed. So they wanted to see if CUBRID's IN and UNION operators perform faster or not, and which one is actually more efficient.</p><p>This article explains the entire flow of the test we have conducted to determine the performance benefit of the new CUBRID 8.4.0 over MySQL 5.1 deployed at present on their live servers. The test environment, scenarios as well as results are discussed below.</p>

<h2>Test Database Information</h2>

<h3>The tables used in the test</h3>

<ul><li>friends</li><li>posts</li></ul>

<h3>The indexes used in the test</h3>

<ul><li><b>friends </b>table</li><ul><li>unique(user_id,friend_id)</li></ul><li><b>posts </b>table</li><ul><li>primary key(id)</li><li>index(reg_date)</li><li>index(author_id,reg_date)</li></ul></ul>

<h2>Scenario</h2>

<ol><li>The following is a list of scenarios where users are separated based on the number of friends they have:</li><ul><li><b>T1:</b> users with 50 or less friends</li><li><b>T2:</b> users with 51~2000 friends</li><li><b>T3:</b> users with 2001 or more friends</li><li><b>T4:</b> this scenario contains 40% of T1, 50% of T2, 10% of T3 users</li><li><b>T5:</b> this scenario contains 10% of T1, 50% of T2, 40% of T3 users</li></ul><li>When the scenarios are executed one by one, we retrieve the list of all friends and their posts in one transaction.</li><li>Then all transactions used in the scenarios are divided into those which use <b>IN </b>query and those which use <b>UNION </b>query.</li></ol>

<h3>Test User Groups</h3>

<ol><li>In this test users are divided into three groups based on the number of friends they have. Based on the real proportion of these groups retrieved from the live SNS service, each group is stored in a separate test database table as explained below.</li><ul><li><b>user_group_1</b>&nbsp;table</li><ul><li>stores 1,442,329 user IDs who have 50 or less friends</li></ul><li><b>user_group_2</b>&nbsp;table</li><ul><li>stores 85,568 user IDs who have 51~2000 friends</li></ul><li><b>user_group_3</b>&nbsp;table</li><ul><li>stores 836 user IDs who have 2001 or more friends</li></ul></ul><li>After each user ID is mapped with a sequentially generated ID in each of the tables, we randomly choose one of the sequentially generated ID and retrieve the mapped user ID.</li></ol>

<p>The following is the SQL statements used to create and populate these user group tables.</p>

<div editor_component="code_highlighter" code_type="Sql" file_path="" first_line="1" collapse="false" nogutter="false" nocontrols="false" style="border: #666666 1px dotted; border-left: #22aaee 5px solid; padding: 5px; background: #FAFAFA url(modules/editor/components/code_highlighter/code.png) no-repeat top right;">
mysql&gt; create table user_group_1 (id int auto_increment primary key, user_id int, num_friends int);
Query OK, 0 rows affected (0.03 sec)

mysql&gt; create table user_group_2 (id int auto_increment primary key, user_id int, num_friends int);
Query OK, 0 rows affected (0.01 sec)

mysql&gt; create table user_group_3 (id int auto_increment primary key, user_id int, num_friends int);
Query OK, 0 rows affected (0.00 sec)

mysql&gt; insert into user_group_1(user_id, num_friends) select user_id, count(friend_id) from friends group by user_id having count(friend_id) &lt; 51;
Query OK, 1442329 rows affected (38.28 sec)
Records: 1442329  Duplicates: 0  Warnings: 0

mysql&gt; insert into user_group_2(user_id, num_friends) select user_id, count(friend_id) from friends group by user_id having count(friend_id) &gt; 50 and count(friend_id) &lt; 2001;
Query OK, 85568 rows affected (28.95 sec)
Records: 85568  Duplicates: 0  Warnings: 0

mysql&gt; insert into user_group_3(user_id, num_friends) select user_id, count(friend_id) from friends group by user_id having count(friend_id) &gt; 2000;                          
Query OK, 836 rows affected (27.89 sec)
Records: 836  Duplicates: 0  Warnings: 0</div>

<h3>UNION Transaction</h3>

<ol><li>Retrieve an ID of a random user from <b>user_group_#</b></li>
<div editor_component="code_highlighter" code_type="Sql" file_path="" collapse="false" nogutter="false" nocontrols="false" style="border: #666666 1px dotted; border-left: #22aaee 5px solid; padding: 5px; background: #FAFAFA url(modules/editor/components/code_highlighter/code.png) no-repeat top right;">
SELECT user_id FROM user_group_1 WHERE id = ?
</div>
<li>Retrieve 100 friends of the user from <b>friends</b></li>
<div editor_component="code_highlighter" code_type="Sql" file_path="" collapse="false" nogutter="false" nocontrols="false" style="border: #666666 1px dotted; border-left: #22aaee 5px solid; padding: 5px; background: #FAFAFA url(modules/editor/components/code_highlighter/code.png) no-repeat top right;">
SELECT friend_id FROM friends WHERE user_id = ? LIMIT 100
</div>
<li>Pick up some 20 post IDs by these 100 friends from <b>posts</b></li>
<div editor_component="code_highlighter" code_type="Sql" file_path="" collapse="false" nogutter="false" nocontrols="false" style="border: #666666 1px dotted; border-left: #22aaee 5px solid; padding: 5px; background: #FAFAFA url(modules/editor/components/code_highlighter/code.png) no-repeat top right;">
(SELECT id, author_id, reg_date FROM posts
WHERE author_id = ?
ORDER BY reg_date DESC LIMIT 20)
UNION (SELECT …)
ORDER BY reg_date DESC LIMIT 20
</div>
<li>Retrieve these 20 posts of friends from <b>posts</b></li>
<div editor_component="code_highlighter" code_type="Sql" file_path="" collapse="false" nogutter="false" nocontrols="false" style="border: #666666 1px dotted; border-left: #22aaee 5px solid; padding: 5px; background: #FAFAFA url(modules/editor/components/code_highlighter/code.png) no-repeat top right;">
SELECT * FROM posts 
WHERE id = ? OR id = ? OR … OR id = ?
ORDER BY reg_date DESC LIMIT 20
</div>
</ol>

<h3>IN Transaction</h3>

<ol><li>Retrieve an ID of a random user from <b>user_group_#</b></li>
<div editor_component="code_highlighter" code_type="Sql" file_path="" collapse="false" nogutter="false" nocontrols="false" style="border: #666666 1px dotted; border-left: #22aaee 5px solid; padding: 5px; background: #FAFAFA url(modules/editor/components/code_highlighter/code.png) no-repeat top right;">
SELECT user_id FROM user_group_1 WHERE id = ?
</div>
<li>Retrieve 100 friends of the user from <b>friends</b></li>
<div editor_component="code_highlighter" code_type="Sql" file_path="" collapse="false" nogutter="false" nocontrols="false" style="border: #666666 1px dotted; border-left: #22aaee 5px solid; padding: 5px; background: #FAFAFA url(modules/editor/components/code_highlighter/code.png) no-repeat top right;">
SELECT friend_id FROM friends WHERE user_id = ? LIMIT 100
</div>
<li>Retrieve 20 posts of friends from&nbsp;<b>posts</b></li>
<div editor_component="code_highlighter" code_type="Sql" file_path="" collapse="false" nogutter="false" nocontrols="false" style="border: #666666 1px dotted; border-left: #22aaee 5px solid; padding: 5px; background: #FAFAFA url(modules/editor/components/code_highlighter/code.png) no-repeat top right;">
(SELECT * FROM posts
WHERE author_id IN (?, ?, …, ?)
ORDER BY reg_date DESC LIMIT 20)
</div>
</ol>

<h2>Test Environment</h2>

<p>The following are the characteristics of the test machine used in this setting.</p>

<table class="blackcap rowbg">
<thead>
	<tr>
		<th></th>
		<th>Characteristics</th>
	</tr>
</thead>
<tbody>
	<tr>
		<th>OS</th>
		<td>CentOS 5.2 x86_64</td>
	</tr>
	<tr>
		<th>CPU</th>
		<td>Xeon 2.5GHzQuad*2</td>
	</tr>
	<tr>
		<th>HDD</th>
		<td>RAID 0+1 SAS 300G*6</td>
	</tr>
	<tr>
		<th>Memory</th>
		<td>8 GB</td>
	</tr>
	<tr>
		<th>Buffer</th>
		<td>2 GB</td>
	</tr>
	<tr>
		<th>Execution time</th>
		<td>10 min</td>
	</tr>
</tbody>
</table>

<p>Both CUBRID and MySQL databases store 3 user group tables with 1,520,000 rows in total (table 1: 1,442,329 rows, table 2: 85,568 rows, and table 3: 836 rows). Each of the databases were undergone to 40 thread loads for 10 minutes.</p>

<h2>Test Results</h2>

<p>The following table illustrates the test results we have obtained from this experiment.</p>

<table class="blackcap rowbg">
<thead>
	<tr>
		<th rowspan="2">TPS</th>
		<th colspan="2">MySQL 5.1</th>
		<th colspan="2">CUBRID 8.4.0</th>
	</tr>
	<tr>
		<th>UNION</th>
		<th>IN</th>
		<th>UNION</th>
		<th>IN</th>
	</tr>
</thead>
<tbody>
	<tr>
		<th>T1</th>
		<td>223</td>
		<td>254</td>
		<td>55</td>
		<td>277</td>
	</tr>
	<tr>
		<th>T2</th>
		<td>54</td>
		<td>9</td>
		<td>3</td>
		<td>128</td>
	</tr>
	<tr>
		<th>T3</th>
		<td>45</td>
		<td>1463</td>
		<td>20</td>
		<td>1192</td>
	</tr>
	<tr>
		<th>T4</th>
		<td>64</td>
		<td>15</td>
		<td>5</td>
		<td>118</td>
	</tr>
	<tr>
		<th>T5</th>
		<td>74</td>
		<td>15</td>
		<td>4</td>
		<td>176</td>
	</tr>
</tbody>
</table>

<p>The following is a graphical illustration of the above results,&nbsp;excluding T3 which is separately illustrated lower due to its scale.</p>

<p><img src="http://www.cubrid.org/files/attach/images/49/505/196/union-in-sns-test-results.png" alt="SELECT … UNION/IN Tests on SNS Workloads" width="486" height="374" editor_component="image_link"/></p><p>The graph above illustrates the test results comparison of the T1, T2, T4, T5 scenarios.</p><p></p><ol><li>From these figures we can conclude that MySQL 5.1 <b>UNION</b>&nbsp;(illustrated in light blue)&nbsp;in average&nbsp;performs 6 times faster than CUBRID 8.4.0 <b>UNION</b>&nbsp;(illustrated in light green).</li><li>On the other hand&nbsp;CUBRID 8.4.0&nbsp;<b>IN</b>&nbsp;(illustrated in green)&nbsp;in average&nbsp;performs 3 times faster than&nbsp;MySQL 5.1&nbsp;<b>IN</b>&nbsp;(illustrated in navy blue).</li><li>On the entire scale&nbsp;CUBRID 8.4.0&nbsp;<b>IN</b>&nbsp;in average&nbsp;performs 2 times faster than both MySQL 5.1&nbsp;<b>UNION </b>and <b>IN</b>&nbsp;operations.</li></ol><div>From the graph below which illustrates the test results comparison of the T3 scenario, we can conclude that:</div><div><ol><li>MySQL 5.1 <b>IN </b>operator performs slightly better than CUBRID 8.4.0 <b>IN </b>operator.</li></ol></div><p><img src="http://www.cubrid.org/files/attach/images/49/505/196/union-in-sns-test-results-t3.png" alt="SELECT … UNION/IN Tests on SNS Workloads" width="249" height="374" editor_component="image_link"/></p><h2>Statistics Comparison</h2><p>The major goal of this experiment was to obtain the performance difference of <b>IN</b>&nbsp;and <b>UNION</b>&nbsp;operators in the new CUBRID 8.4.0 and MySQL 5.1. As we expected, the new CUBRID 8.4.0 can leverage its improved database engine along with its new <a href="http://blog.cubrid.org/news/cubrid-8-4-0-has-arrived-w-x2-faster-database-engine/" target="_self" title="CUBRID 8.4.0 has arrived w/ x2 faster database engine!">index structure and algorithm</a>&nbsp;very well.</p><p>Besides this experiment, we have conducted a comparison test of the resource usage pattern between the new CUBRID 8.4.0 and previous CUBRID 8.3.1. With the same configurations we have executed the following query.</p>
<div editor_component="code_highlighter" code_type="Sql" file_path="" collapse="false" nogutter="false" nocontrols="false" style="border: #666666 1px dotted; border-left: #22aaee 5px solid; padding: 5px; background: #FAFAFA url(modules/editor/components/code_highlighter/code.png) no-repeat top right;">
(SELECT * FROM posts
WHERE author_id IN (?, ?, …, ?)
ORDER BY reg_date DESC LIMIT 20)<br /></div>
<p>The workload was generated with 40 threads. We have been monitoring the resource usage patterns for 10 minutes. As a result the following figures were obtained (the less, the better).</p>

<table class="blackcap rowbg">
<thead>
	<tr>
		<th></th>
		<th>CUBRID 8.4.0</th>
		<th>CUBRID 8.4.0 without in-place sorting</th>
		<th>CUBRID 8.3.1</th>
	</tr>
</thead>
<tbody>
	<tr>
		<th>Num_data_page_fetches</th>
		<td>378</td>
		<td>2836</td>
		<td>229316</td>
	</tr>
	<tr>
		<th>Num_data_page_dirties</th>
		<td>16</td>
		<td>2279</td>
		<td>206182</td>
	</tr>
	<tr>
		<th>Num_data_page_ioreads</th>
		<td>135</td>
		<td>2103</td>
		<td>121851</td>
	</tr>
</tbody>
</table>

<ol><li>In the above table we can see that the new CUBRID 8.4.0 uses much less&nbsp;resources&nbsp;than the previous version.</li><ol><li>CUBRID 8.4.0 fetches over 600 times less data pages than 8.3.1.</li><li>As a result CUBRID 8.4.0 leaves 1290 times less dirty data pages than 8.3.1.</li><li>Moreover, CUBRID 8.4.0 performs for over 900 times less IO operations than 8.3.1.</li></ol><li>More than that we can evidently notice that the new&nbsp;<b>in-place sorting</b>&nbsp;feature we have implemented in 8.4.0 has a significant impact on resource usage. It is compared in the second and third column figures obtain when&nbsp;<b>in-place sorting </b>was enabled, then disabled.&nbsp;In fact, it is not possible for a user to enable or disable it, as it is an internal feature in CUBRID, which is always ON. We have tested this ourselves to see the impact of the new in-place sorting feature.</li></ol>

<h2>Results Analysis</h2>

<ol><li>In case of the T4&nbsp;test, which is the most realistic scenario, CUBRID’s&nbsp;<b>IN&nbsp;</b>query has performed twice faster than MySQL&nbsp;<b>UNION&nbsp;</b>query and 8 times faster than MySQL&nbsp;<b>IN&nbsp;</b>query.</li><li>In case of the T3 scenario, the performance was much higher when compared to other scenario cases because the&nbsp;<b>buffer hit ratio</b>&nbsp;was very close to 100%.</li><li>CUBRID 2008 R4.0 performance analysis:</li><ol><li>The effect of&nbsp;<b>Key limit</b>&nbsp;has decreased the&nbsp;<b>IO&nbsp;</b>operations for about 50 times</li><li>When&nbsp;<b>Key limit</b>&nbsp;is applied the&nbsp;<b>in-place sorting</b>&nbsp;is resulted in approximately 10 times decrease of the&nbsp;<b>IO&nbsp;</b>operations.</li><ol><li>If&nbsp;<b>in-place sorting</b>&nbsp;is not used, the T4 scenario results in 6 TPS which will be similar to the test results of the&nbsp;<b>UNION&nbsp;</b>query.</li></ol><li>In case of the&nbsp;<b>UNION&nbsp;</b>query, when&nbsp;<b>sorting</b>, the partial results were repeatedly created in a&nbsp;<b>temp file</b>&nbsp;which had an impact on the performance which also resulted in high number of&nbsp;<b>dirty pages</b>.</li></ol></ol>]]></description>
                        <pubDate>Thu, 16 Jun 2011 09:03:18 -0800</pubDate>
                        <category>performance</category>
                        <category>test</category>
                        <category>mysql</category>
                        <category>sns</category>
                                </item>
            </channel>
</rss>
