Open Source RDBMS - Seamless, Scalable, Stable and Free

한국어 | Login |Register

CUBRID 8.4.0 vs. 8.3.1 Volume Space Reusability Comparison

CUBRID 2008 R4.0 has brought great achievements in overall engine performance and usability of tools. In the first of these blog posts we have announced that the new CUBRID 8.4.0 featured Index Size Reduction by 70%, while the data volume size - by 20%. In this article we will continue this discussion and give you a more detailed comparison between how much space CUBRID was able to reuse in previous version 8.3.1 and how much it can in the new 8.4.0.

Overview of this test

CUBRID 8.4.0 was designed to use less disk space and reuse them as much as possible and more efficiently. This setting exposes both versions (8.3.1 and 8.4.0) to a workload of series INSERT/DELETE tests and monitors the space reusability rate for both servers.

Test Method

Test environment

  • Linux Cent OS 5.3.
  • CUBRID 2008 R3.1 patch 2 (8.3.1.2003) referred in this document as 2008 R3.1.
     cubrid createdb -p 16384 --page-size=16384 testdb
  • CUBRID 2008 R4.0 (8.4.0.0228) referred in this document as 2008 R4.0.
     cubrid createdb --db-volume-size=256M --db-page-size=16K testdb

Database configuration

The generic database volume size is set 256MB which is composed of 16,000 pages of 16KB. The following is the table structure used in this test.

create table tbl1 (
    i int primary key,
    c varchar(100)
);

insert into tbl1 values(?, '12345678901234567890123456789012345678901234567890')

Test Scenario

This test is executed according to the following scenarios.

Scenario 1

  1. This scenario assumes there is a table which is created daily to store the intermediate data.
  2. The table has continuous INSERT/DELETE opertions during the day.
  3. At the end of the day all the data is DELETEd and the table is DROPped.
  4. The scenario then checks which database server shows higher space reusability.

Steps:

  1. Compare the space reusability between database servers by continuously executing INSERT/DELETE and INSERT/DROP/CREATE of 10,000 row data.
    1. Perform 10,000 record INSERT/DELETE for 10 times.
    2. Perform 10,000 record INSERT/DROP/CREATE for 10 times.

Scenario 2

  1. In this scenario throughout the days databases maintain the recent log data.
  2. At certain point of time, after new data is INSERTed, the oldest data gets removed.
  3. The scenario then checks which database server shows higher space reusability.

Steps:

  1. After the initial data has been INSERTed, repeatedly execute INSERT/DELETE and compare space reusability.
    (Here DELETE deletes the oldest records first.)
    1. After 100,000 records have been INSERTed, perform 10,000 record INSERT/DELETE operations for 10 times.
    2. After 1,000,000 records have been INSERTed, perform 10,000 record INSERT/DELETE operations for 100 times.
    3. After 1,000,000 records have been INSERTed, perform 100,000 record INSERT/DELETE operations for 10 times.
    4. After 3,000,000 records have been INSERTed, perform 300,000 record INSERT/DELETE operations for 10 times.

Scenario 3

  1. This scenario compares the space reusability difference between tables which use REUSE_OID option and tables which do not.
  2. OID is an object identifies which represent the information about the physical location of the data on the disk (in terms of volume number, page number, and slot number) that can be used to manage reference relationship of objects as well as perform view/store/delete operations.
  3. When OID is used, accessibility is improved as it allows to directly access the object located in the heap file on the disk without referring to a table.
  4. However, the problem of decreased reusability of the storage occurs when there are many DELETE/ INSERT operations because the object's OID is kept to maintain the reference relationship with the object even if it is deleted.

Steps:

  1. Compare the space reusability between database servers by creating a table with REISE_OID option and executing INSERT/DELETE and INSERT/DROP/CREATE operations.
    (Here DELETE deletes the oldest records first.)
    1. Perform 10,000 record INSERT/DELETE for 10 times as in 1.1 above.
    2. Perform 10,000 record INSERT/DROP/CREATE for 10 times as in 1.2 above.
    3. After 1,000,000 records have been INSERTed, perform 10,000 record INSERT/DELETE operations for 100 times as in 2.2.

Data Measurement Method

After every execution cubrid spacedb utility was run to obtain the amount of unreused space:

X = total size - free size

The following is a set of operations executed for Scenario 1.1: run for 10 times.

insert into tbl1 values(1, '12345678901234567890123456789012345678901234567890');
insert into tbl1 values(2, '12345678901234567890123456789012345678901234567890');
…
insert into tbl1 values(10000, '12345678901234567890123456789012345678901234567890');
delete from tbl1;

cubrid spacedb : find total_size – free_size

The following is a set of operations executed for Scenario 2.1: after 100,000 rows INSERTed, execute 10,000 row INSERT/DELETE from oldest for 10 times.

insert into tbl1 values(1, '12345678901234567890123456789012345678901234567890');
insert into tbl1 values(2, '12345678901234567890123456789012345678901234567890');
…
insert into tbl1 values(100000, '12345678901234567890123456789012345678901234567890');

insert into tbl1 values(100001, '12345678901234567890123456789012345678901234567890');
…
insert into tbl1 values(110000, '12345678901234567890123456789012345678901234567890');

delete from tbl1 where i between 1 and 10000
cubrid spacedb : find total_size – free_size

insert into tbl1 values(110001, '12345678901234567890123456789012345678901234567890');
…
insert into tbl1 values(120000, '12345678901234567890123456789012345678901234567890');

delete from tbl1 where i between 10001 and 20000
cubrid spacedb : find total_size – free_size

insert into tbl1 values(190001, '12345678901234567890123456789012345678901234567890');
…
insert into tbl1 values(200000, '12345678901234567890123456789012345678901234567890');

delete from tbl1 where i between 90001 and 100000
cubrid spacedb : find total_size – free_size

Test Results

When data is INSERTed/DELETEd, the space usage increases which allows to determine how much space can potentially be reused after the data is deleted. So every time there was an INSERT/DELETE operation, we could obtain the average database volume increase summarized below.

If the volume keeps increasing, after the same number of records have been INSERTed and then DELETEd, it means some space is not being reused. In other words, for the same number of records:

Space used after the data is inserted - space remained after the data deletion = reused space.

* As you will see in the graphs below, when INSERT/DELETE have been performed for the first time in the series of 10, the space reusage does not occur (the table is clean). For this reason, we have excluded those numbers of the first run from the calculation of the average.

Scenario 1

The following compares which of the methods result in higher space reusage during 1.1 and 1.2. In case of CUBRID 2008 R3.1, the INSERT/DROP/CREATE method shows 5.7% higher space reusability than the INSERT/DELETE method, while in CUBRID 2008 R4.0 there is no difference in space reusability between these two methods.

                          space reused after 10,000 records have been deleted
Space reusage           = ---------------------------------------------------
                          space used after 10,000 records have been inserted
Space reused            = space used after 10,000 records have been inserted -
after 10,000 records      space remained after 10,000 records have been deleted
have been deleted

Space used and wasted after the test

2008 R3.1 2008 R4.0
a) When the data has been initialized, record the average space usage after 10,000 row INSERT is executed for 10 times. 1.91MB 0.98MB
b) 1.1. Record the unreused space after 10,000 row INSERT/DELETE are executed for 9 times. 0.33MB 0.03MB
Reuse rate 82.72% 96.94%
c) 1.2. Record the unreused space after 10,000 row INSERT/DROP/CREATE are executed for 9 times. 0.24MB 0.03MB
d) Compare the difference in space reusability for 1.2 against 1.1. 5.7% 0%
d = ( ( a - c ) / ( a -b ) - 1 ) * 100

Results

Below, the left graph shows the result of Scenario 1.1 (10,000 row INSERT/DELETE operations for 10 times). The right one shows the results of Scenario 1.2 (10,000 row INSERT/DROP/CREATE operations for 10 times). Y-axis represents the space used in MB, while x-axis illustrates the number of test executions.

Scenario 1.1: Volume space usage after 10,000 INSERT/DELETE for 10 timesScenario 1.2: Volume space usage after 10,000 INSERT/DROP/CREATE for 10 times

  1. In the above graphs, you can notice that when the tests are executed for the first time, the space usage rates are high. This is due to the fact that the table is initially clean and no space reusage occurs at the beginning.
  2. Further, if we look at the trends of 2008 R3.1, the space reusability rate decreases sharply in the INSERT/DELETE approach after a certain point (space is not well reused), while the INSERT/DROP/CREATE approach maintains moderate rate of space reusage.
  3. In case of 2008 R4.0, there are no particular differences between these two approaches.
  4. In Scenario 1.1, if the inserted data is immediately deleted, we can observe the following results for each database version:
    1. CUBRID 2008 R3.1 reuses 82.72% of the space.
    2. CUBRID 2008 R4.0 reuses 96.94% of the space.

Scenario 2

After the initial data has been inserted, and new INSERT/DELETE operations take place, where the DELETE operation removes the oldest data first, we can observe the following results.

  1. If we look at the difference between the Scenario 1.1 and 2.1,
    1. the Scenario 1.1 executes INSERT and DELETE operations, where DELETE deletes ALL records;
    2. while the Scenario 2.1 executes INSERT and DELETE operations, where DELETE deletes the oldest data first.
  2. In Scenario 2.1, CUBRID 2008 R3.1 shows 58.64% of space reusability rate in contrast to 82.72% in Scenario 1.1, thus resulting in 24.08% volume space waste.
  3. On the other hand, 2008 R4.0 shows very little difference of 1.02% between 95.92% in Scenario 2.1 and 96.94% in 1.1.
    1. In other words, in CUBRID 2008 R4.0 entering the new data after deleting all the data will result in the same space reusage as entering the new data after partial delete.
  4. In addition, in Scenario 2.2, 2.3, and 2.4, the test results show that if the number of records and the amount of data increase, the space reusability rate will slightly decrease.

2.1. After 100,000 records have been INSERTed, perform 10,000 record INSERT/DELETE operations for 10 times

2008 R3.1 2008 R4.0
When the data has been initialized, record the average space usage after 10,000 row INSERT is executed for 10 times. 1.91MB 0.98MB
Record the unreused space after 10,000 row INSERT/DELETE are executed for 9 times. 0.79MB 0.04MB
Reusage rate 58.64% 95.92%

2.2. After 1,000,000 records have been INSERTed, perform 10,000 record INSERT/DELETE operations for 100 times.

2008 R3.1 2008 R4.0
When the data has been initialized, record the average space usage after 10,000 row INSERT is executed for 100 times. 1.91MB 0.98MB
Record the unreused space after 10,000 row INSERT/DELETE are executed for 99 times. 0.95MB 0.06MB
Reusage rate 50.26% 93.81%

2.3. After 1,000,000 records have been INSERTed, perform 100,000 record INSERT/DELETE operations for 10 times.

2008 R3.1 2008 R4.0
When the data has been initialized, record the space average usage after 100,000 row INSERT is executed for 10 times. 19.14MB 9.76MB
Record the unreused space after 10,000 row INSERT/DELETE are executed for 9 times. 9.57MB 0.42MB
Reusage rate 50.00% 95.70%

2.4. After 3,000,000 records have been INSERTed, perform 300,000 record INSERT/DELETE operations for 10 times.

2008 R3.1 2008 R4.0
When the data has been initialized, record the space average usage after 300,000 row INSERT is executed for 10 times. 57.42MB 29.29MB
Record the unreused space after 10,000 row INSERT/DELETE are executed for 9 times. 28.89MB 1.27MB
Reusage rate 49.69% 95.66%


2.1 2.2
2.1. After 100,000 records have been INSERTed, perform 10,000 record INSERT/DELETE operations for 10 times 2.2. After 1,000,000 records have been INSERTed, perform 10,000 record INSERT/DELETE operations for 100 times
2.3 2.4
2.3. After 1,000,000 records have been INSERTed, perform 100,000 record INSERT/DELETE operations for 10 times 2.4. After 3,000,000 records have been INSERTed, perform 300,000 record INSERT/DELETE operations for 10 times

Scenario 3

  1. The INSERT/DELETED space usage in Scenario 3.1 is compared with Scenario 1.1;
  2. while INSERT/DROP/CREATE space usage in Scenario 3.2 is compared with Scenario 1.2.
  3. The Scenario 3.3 is compared with Scenario 2.2,
    1. Having INSERTed the initial data of 1,000,000 records;
    2. And performing the DELETE operation, likewise, by removing the oldest data first.
  4. For all comparison, this Scenario will use the REUSE_OID option in the test table, while the tables in 1.1, 1.2, and 2.2 did not use this option.

Scenario 3.1 Results

  1. Performed 10,000 record INSERT/DELETE for 10 times using REUSE_OID.
  2. As can be observed from the table below, when REUSE_OID option is used in a table definition, the rate of space reusability increases substantially.
  3. While 2008 R3.1 shows 13% increase from 82.72% to 95.81%;
  4. Version 2008 R4.0 shows 100% of volume space reusability.
2008 R3.1 2008 R4.0
Whether REUSE_OID option has been used or not NO YES NO YES
Space occupied by 10,000 rows (AVG of 10 runs) 1.91MB 1.91MB 0.98MB 0.98MB
Space not reused after 10,000 row INSERT/DELETE are executed for 9 times. 0.33MB 0.08MB 0.03MB 0MB
Reusage rate 82.72% 95.81% 96.94% 100%
1.1 3.1
1.1. Perform 10,000 record INSERT/DELETE for 10 times. 3.1. Using REUSE_OID, perform 10,000 record INSERT/DELETE for 10 times.

Scenario 3.2 Results

  1. Performed 10,000 record INSERT/DROP/CREATE for 10 times using REUSE_OID.
  2. As in the previous test, when REUSE_OID option is used in a table definition, the rate of space reusability increases substantially, when compared to tables which do not use REUSE_OID.
  3. Moreover, 2008 R3.1 reaches 100% space reusability after from 5th run.
  4. 2008 R4.0, on the other hand, shows 100% volume space reusability from the very start.
2008 R3.1 2008 R4.0
Whether REUSE_OID option has been used or not NO YES NO YES
Space occupied by 10,000 rows (AVG of 10 runs) 1.91MB 1.91MB 0.98MB 0.98MB
Space not reused after 10,000 row INSERT/DROP/CREATE are executed for 9 times. 0.24MB 0.16MB 0.03MB 0MB
Reusage rate 87.43% 91.62% 96.94% 100%
1.2 3.2
1.2. Perform 10,000 record INSERT/DROP/CREATE for 10 times. 3.2. Using REUSE_OID, perform 10,000 record INSERT/DROP/CREATE for 10 times.

Scenario 3.3 Results

  1. After 1,000,000 records have been INSERTed, performed 10,000 record INSERT/DELETE operations for 100 times using REUSE_OID.
  2. In contrast to previous test, this Scenario 3.3 did not have show much space reusability increase when REUSE_OID was used.
  3. 2008 R3.1 has used less than 2MB of space to store 10,000 records in 100 runs.
  4. 2008 R4.0 showed no difference as well in using REUSE_OID option.
2008 R3.1 2008 R4.0
Whether REUSE_OID option has been used or not NO YES NO YES
Space occupied by 10,000 rows (AVG of 100 runs) 1.91MB 1.91MB 0.97MB 0.97MB
Space not reused after 10,000 row INSERT/DELETE are executed for 99 times. 0.95MB 0.93MB 0.06MB 0.06MB
Reusage rate 50.26% 51.31% 93.81% 93.81%
1.2 3.2
2.2. After 1,000,000 records have been INSERTed, perform 10,000 record INSERT/DELETE operations for 100 times. 3.3. Using REUSE_OID, after 1,000,000 records have been INSERTed, perform 10,000 record INSERT/DELETE operations for 100 times.

Conclusion

  1. In version 2008 R3.1, if the same number of records are INSERTed and then DELETEd, the space reusability ration is above 50%.
  2. Version 2008 R4.0 has improved space reusability, showing above 95% volume space reusability.
  3. In other words, if the same number of records are INSERTed and then DELETEd, in CUBRID 2008 R3.1 some 50% of space will not be reused, while in CUBRID 2008 R4.0 only 5% of space will not be reused.
  4. In case a table uses the REUSE_OID option,
    1. both database servers show substantially higher space reusability rate after both INSERT/DELETE ALL and INSERT/DROP/CREATE operations;
    2. especially INSERT/DROP/CREATE operation reaches 100% space reusability after a certain number of executions.
    3. 2008 R4.0 showed 100% space reusability in both cases.
  5. If REUSE_OID is used for INSERT/DELETE oldest records first operations,
    1. 2008 R3.1 has slightly higher rate of volume space reusability;
    2. while 2008 R4.0 has no improvements in space reusage.

As a disclaimer, it is worth mentioning that these results are limited to the above test cases. However, it is possible to conclude that if the number of INSERT/DELETE operations increase, the space reusability will slightly decrease. So, in conclusion:

  1. If your service is INSERT/DELETE heavy, we highly recommend you to upgrade to CUBRID 8.4.0, which is uses less space, reuses the space very efficient, has twice faster engine, supports over 90% of MySQL syntax, and provides extensive implicit data type conversion.
  2. However, if your service is mainly a SELECT heavy service and you do not want to upgrade, it is fine to work with CUBRID 8.3.1. But we still recommend you to upgrade to enjoy more optimized database system.

See also

CUBRID 8.4.0 Key Features

The new CUBRID 8.4.0 features many significant improvements which includes Performance Improvements, Developer Productivity Improvements and HA Reliab...

CUBRID vs. MySQL Benchmark Test Results for SNS Data and Workload

As we have recently rolled out the new 8.4.0 version of the CUBRID Database, one of our CUBRID users has approached us with a proposal t...

Increasing Database Performance by Query Tuning

It is common that applications use various kinds of SQL queries when communicating with the database server. If queries are not well structured, they wi...

NBD Benchmark Result

The NBD (NHN Internet Bulletin Board Application Database) Benchmark is a benchmark to measure the performance of a DBMS used for a Bullet...




You are either using a very old browser or a browser that is not supported.
In order to browse cubrid.org you need to have one of the following browsers:



Internet Explorer: Mozilla Firefox: Google Chrome: