CUBRID Foundation: Tutorials - What's New in CUBRID 10: 'String Compression'

Author: Henry Kwon

CUBRID 10 has added various new features. One of the significant new features is string compression function! Now, let’s take a look at it!

String compression is shown in the table below.

Annotation 2020-07-17 173728.png

CUBRID string compression is performed only for 255 bytes or more, and compression is not performed unless compression is efficient. In order to test how good the string compression ratio is, we entered 100,000 test data in CUBRID 9.3 and 10.1 without string compression and checked the table size.

“Case #1” entered non-overlapping string data and checked the compression rate.
“Case #2” entered duplicate data and checked the compression rate.

Classification	Scenario
Case #1	‘ajhdghfjasodfajsdfaksfapsidpf…’ 100,000 inputs of 501-byte data Check the table capacity of CUBRID 9.3 and 10.1
Case #2	‘aaaaaaaaaaaaaaaaaaaaaaaaa…’ 100,000 inputs of 501-byte data Check the table capacity of CUBRID 9.3 and 10.1

The table below shows how to generate 100,000 data for each case. First, one test data was entered, and the test data was generated using the catalog table and the Cartesian Product in the “insert table select ...” statement.

Classification

SQL Statement

Case #1

create table str_comp( col1 string );

insert into str_comp values('aoisioeopsvnasdfpoasdfoasjd’)

insert into str_comp select a.col1 from str_comp a, db_attribute b, db_attribute c, db_attribute d limit 99999;

Case #2

create table str_comp2( col1 string );

insert into str_comp2 values(' aaaaaaaaaaaaaaaaaaaaaaaaaaa ');

insert into str_comp2 select a.col1 from str_comp2 a, db_attribute b, db_attribute c, db_attribute d limit 99999;

We entered 100,000 data into the SQL statement in the table above and checked the table size. The table size is “show heap capacity of table name;” I ran the command and checked the Num_pages value, and the results are shown in the table below.

	No. of input	9.3	10.1	Compression rate
Case #1	100,000	3,704	3,704	0 %
Case #2	100,000	3,704	357	938 %

"Num_pages" entered and verified the data of the non-duplicated string of "case #1" and the result was the same. The reason is that compression was not performed because the data being entered was the same as the original value, and the data was the same because the original data was entered. In "Case #2", duplicate data was entered and the compression rate was treated as 938%.

When the CUBRID 10 string compression function decompresses the compressed string, it is said that "decompress the compressed string when reading from the database". However, a question arises: How much time does it take to extract data?

To answer that, we wrote and tested the following script. The script below is a script that receives the SQL statement to be executed as a parameter, executes the SQL statement, records the result in resut_sql.txt, and outputs the value minus the completion time from the time before execution.

Name

Script Content

run_time.sh

#!/bin/bash

StartTime=$(date +%s.%N)

RUN_SQL=$1

#echo $RUN_SQL

csql -u dba -c "$RUN_SQL" -o result_sql.txt test_db

EndTime=$(date +%s.%N)

run_time=`echo "$EndTime - (($StartTime)) " | bc`

echo $run_time

The script was executed as follow:

sh run_time.sh "select * from str_comp limit 100000;"

The time required to extract 500,000 cases from the in-house test server equipment was measured as follows. The measurement time below is not an absolute value, and may be changed depending on equipment performance, OS setting, storage IO performance, etc.

Classification

9.3

10.1

Compression Rate

Time Difference

Number

Page Count

Time (sec)

Page Count

Time

(sec)

Case#2

3,704

1.04317

357

0.848578

938 %

-0.194592

The time taken to extract 100,000 data from CUBRID 9.3 was 1.04317 seconds, and 10.1 was 0.848578 seconds. The time difference was -0.194592 seconds. In another word, the processing speed is 0.19sec faster when decompressing and extracting 100,000 compressed 501 byte data.

In summary, the storage capacity in DISK was 57 MB (3,704 * page size 16K) in 9.3, and 5 MB (357 * page size 16K) in 10.1. The processing time is 0.19 seconds faster. The string compression function added in CUBRID 10 reduces disk usage and improves processing performance.

No.	Category	Subject	Date
»	Server	What's New in CUBRID 10: 'String Compression'	2020.07.17
6	Server	How to Start CUBRID Automatically upon LINUX Boot	2020.07.17
5	Tools	Running CUBRID Container Services in Docker, Kubernetes Environment	2020.07.23
4	Tools	CUBRID Manager Import Wizard Useful Tips!	2020.07.23
3	Tools	Utilizing the CUBRID Migration Tool (CMT)	2020.07.23
2	Tools	Let’s Take a Look at SQLGate for CUBRID!	2020.07.23
1	Tools	eXERD Which Deals with Logical/Physical Models, Supports CUBRID	2020.07.23

DOCUMENTATION

What's New in CUBRID 10: 'String Compression'

Shortcut

Shortcut

Join the CUBRID Project on

Site Map