Server

What's New in CUBRID 10: 'String Compression'

posted Jul 17, 2020

 

Author: Henry Kwon 

 

CUBRID 10 has added various new features. One of the significant new features is string compression function! Now, let’s take a look at it!

 

String compression is shown in the table below.

Annotation 2020-07-17 173728.png

CUBRID string compression is performed only for 255 bytes or more, and compression is not performed unless compression is efficient. In order to test how good the string compression ratio is, we entered 100,000 test data in CUBRID 9.3 and 10.1 without string compression and checked the table size.

  • “Case #1” entered non-overlapping string data and checked the compression rate.
  • “Case #2” entered duplicate data and checked the compression rate.

Classification

Scenario

Case #1

  • ‘ajhdghfjasodfajsdfaksfapsidpf…’ 100,000 inputs of 501-byte data
  •  Check the table capacity of CUBRID 9.3 and 10.1

Case #2

  • ‘aaaaaaaaaaaaaaaaaaaaaaaaa…’ 100,000 inputs of 501-byte data
  • Check the table capacity of CUBRID 9.3 and 10.1

 

The table below shows how to generate 100,000 data for each case. First, one test data was entered, and the test data was generated using the catalog table and the Cartesian Product in the “insert table select ...” statement.

Classification

SQL Statement

Case #1

create table str_comp( col1 string );

insert into str_comp values('aoisioeopsvnasdfpoasdfoasjd’)

insert into str_comp select a.col1  from str_comp a, db_attribute b, db_attribute c, db_attribute d limit 99999;

Case #2

create table str_comp2( col1 string );

insert into str_comp2 values(' aaaaaaaaaaaaaaaaaaaaaaaaaaa ');

insert into str_comp2 select a.col1  from str_comp2 a, db_attribute b, db_attribute c, db_attribute d limit 99999;

 

We entered 100,000 data into the SQL statement in the table above and checked the table size. The table size is “show heap capacity of table name;” I ran the command and checked the Num_pages value, and the results are shown in the table below.

 

 

No. of input

9.3

10.1

Compression rate

Case #1

100,000

3,704

3,704

0 %

Case #2

100,000

3,704

357

938 %

 

"Num_pages" entered and verified the data of the non-duplicated string of "case #1" and the result was the same. The reason is that compression was not performed because the data being entered was the same as the original value, and the data was the same because the original data was entered. In "Case #2", duplicate data was entered and the compression rate was treated as 938%.

 

When the CUBRID 10 string compression function decompresses the compressed string, it is said that "decompress the compressed string when reading from the database". However, a question arises: How much time does it take to extract data?

 

To answer that, we wrote and tested the following script. The script below is a script that receives the SQL statement to be executed as a parameter, executes the SQL statement, records the result in resut_sql.txt, and outputs the value minus the completion time from the time before execution.

 

Name

Script Content

run_time.sh

 #!/bin/bash

StartTime=$(date +%s.%N)

 

RUN_SQL=$1

#echo $RUN_SQL

csql -u dba  -c "$RUN_SQL" -o result_sql.txt  test_db

 

EndTime=$(date +%s.%N)

 

run_time=`echo "$EndTime - (($StartTime)) " | bc`

echo $run_time

 

The script was executed as follow:

sh run_time.sh "select * from str_comp limit 100000;"

 

The time required to extract 500,000 cases from the in-house test server equipment was measured as follows. The measurement time below is not an absolute value, and may be changed depending on equipment performance, OS setting, storage IO performance, etc.

 

Classification

9.3

10.1

Compression Rate

Time Difference

Number

Page Count

Time (sec)

Page Count

Time

(sec)

Case#2

3,704

1.04317

357

0.848578

938 %

-0.194592

 

The time taken to extract 100,000 data from CUBRID 9.3 was 1.04317 seconds, and 10.1 was 0.848578 seconds. The time difference was -0.194592 seconds. In another word, the processing speed is 0.19sec faster when decompressing and extracting 100,000 compressed 501 byte data.

 

 

In summary, the storage capacity in DISK was 57 MB (3,704 * page size 16K) in 9.3, and 5 MB (357 * page size 16K) in 10.1. The processing time is 0.19 seconds faster. The string compression function added in CUBRID 10 reduces disk usage and improves processing performance.