How to Efficiently Import Large Files in CUBRID
Sometimes we need to import large data files to CUBRID tables which can go over several GB. In this article we will cover different ways to perform this task. They are:
- The easiest way to import data to CUBRID is using the CUBRID Manager.
- You can also insert using the data using PHP, Java, or other CUBRID drivers.
- Otherwise, you can use CSQL, CUBRID's command line SQL interpreter.
First, we will show you some basic test results so that you can have an overall idea why certain solution, among those listed above, is faster or slower. Then we will list the recommendation you are encouraged to follow when importing large data to CUBRID.
Test Overview
For each of these solution we will perform a test on a small amount of data (100,000 records) to provide some idea on each of the solution's performance. The setting will be executed on Windows 7 x86 with CUBRID 8.4.0 installed. We will use .
- CSQL
- -S (Stand-alone execution mode)
- -C (Client-server execution mode)
- CUBRID Manager
- PHP API
Additionally the following configurations will be applied to each of the solutions.
- Commit cycle will be set to 5,000.
- We will measure the execution time of only the INSERT statements, thus we will have the database and tables created in advance.
Test Scenario
Running CSQL in the command line (-S and -C modes)
CSQL is a command line tool which allows to execute SQL queries to CUBRID database server. When compared to CUBRID Manager, it is much lighter and faster. There are two modes in which CSQL can be executed. One is Stand-alone mode (-S), the other is Client-server mode (-C).
- In the Stand-alone mode, CSQL Interpreter directly accesses database files and executes commands including server process functions. That is, SQL statements can be sent and executed to a database without a separate database server process running for the task. Since the standalone mode allows only one user access at a given time, it is suitable for management tasks by Database Administrators (DBAs).
- In the Client-server mode, CSQL Interpreter usually operates as a client process and accesses the server process.
For more, read CSQL Execution Mode.
Now we will create a database for our test in the command line.
$> cubrid createdb dbtest
Then we need to connect to the dbtest database using CSQL and create the necessary tables.
$> csql demodb
CUBRID SQL Interpreter
Type `;help' for help messages.
csql> CREATE TABLE test1(a int, b TIMESTAMP, c int AUTO_INCREMENT)
csql> ;ru
Current transaction has been committed.
1 command(s) successfully processed.
csql> ;ex
- ;ru is a session command which directs CSQL to run the entered SQL statement;
- ;ex stands for exit. For more commands see Session Commands.
Since we have prepared everything, let's create a dbtest.sql file with the INSERT statements which will be used in our test.
INSERT INTO test1 VALUES (0 , SYS_TIMESTAMP, NULL); INSERT INTO test2 VALUES (1 , SYS_TIMESTAMP, NULL); …………………… INSERT INTO test1 VALUES (99998 , SYS_TIMESTAMP, NULL); INSERT INTO test1 VALUES (99999 , SYS_TIMESTAMP, NULL);
To run the test we direct CSQL to load the SQL statements from the file and execute them. To start CSQL in Stand-alone mode, we run the following command:
$> csql -u dba -p 1111 –S -i dbtest1.sql dbtest
To start CSQL in Client-server mode, we run the following command:
$> csql -u dba -p 1111 –C -i dbtest1.sql dbtest
Importing in PHP
Like in the previous example, we will use the same database and table information. Then we will run the following PHP code to insert 100,000 records.
$host_ip = "localhost";
$host_port = 33000;
$db_name = "dbtest";
$userId = "dba";
$password = "1111";
$cubrid_con = @cubrid_connect($host_ip, $host_port, $db_name, $userId, $password);
if($cubrid_con)
{
$sql = "insert into " . $db_name . " (a, b) values (?, SYS_TIMESTAMP)";
$reg = cubrid_prepare($cubrid_con, $sql, CUBRID_INCLUDE_OID);
// Insert 100,000 records in the loop.
for($i = 0; $i < 100000; $i++)
{
$res = cubrid_bind($reg, 1, $i);
$res = cubrid_execute($reg);
// Commit once in 5,000 times (commit cycle).
if (($i+1) % 5000 == 0)
{
cubrid_commit($cubrid_con);
echo $i, "
";
}
}
}
Importing in CM
We will import the data from the file we have previously created for CSQL. We will use CUBRID Manager's default Import Data feature (see the image below).

Test Results
The following are the results we obtained from this experimental test ran on a small amount of data. The following table and graph represent the data in seconds.
| 50,000 records | 100,000 records | 300,000 records | |
|---|---|---|---|
| csql-S | 5 | 10 | 29 |
| csql-C | 111 | 224 | 599 |
| PHP | 68 | 136 | 413 |
| CM | 17 | 33 | 96 |

Conclusion
Use CSQL in Stand-alone mode
As we can see from this test, CSQL in Stand-alone mode is the fastest solution to import the data to CUBRID. This is because it directly deals with the database without interacting with server processes. It behaves as if server itself is performing the operations, not the client which is connected to the server. For this reason it is the fastest solution.
However, there are cases when we cannot use the Stand-alone mode, since in Stand-alone mode no other database user has to interact with the database. This means that CSQL should be the only user which works with the database at that particular moment. Thus Stand-alone mode implies that the database should not be running. If it is running, this means another user (host) is using it. In this case, you will receive the following warning.
$> csql -S demodb ERROR: Unable to mount disk volume "C:CUBRIDdatabasesdemodbdemodb_lgat". The database "C:CUBRIDDATABA~1demodbdemodb", to which the disk volume belongs, is in use by user USER-PC$ on process 3096 of host user-PC since Thu Sep 22 11:04:01 2011.
In such cases, either you have to make sure no other user is interacting with the database, by stopping the database, or use other solutions. To stop the database in the command line, type the following command where demodb is the name of your database.
cubrid server stop dbtest1
Create CONSTRAINT after Importing
This is one of the foremost important recommendations for users who import large data into tables. Do not create any type of INDEXing before you import the data. This related to INDEX, UNIQUE, REVERSE INDEX, REVERSE UNIQUE, and even PRIMARY KEY (PK automatically creates INDEX). Otherwise, every INSERT during the importing process will force the server to reindex the new value which adds up to the overall time. So:
- Create your table.
- Define your columns with data types, but do not add any constraints event the
PRIMARY KEY. - Import your data.
- Only then create the constraints.
Turn OFF Logging
There are two types of logging:
- Client-side logging
- Server-side logging
Client-side logging
The client-size logging refers to SQL_LOG parameter of the Broker, the middleware of CUBRID. By default it is turned ON.
When SQL_LOG is configured to ON all SQL statements processed by the CUBRID Application Server (CAS) will be stored in the external log file. So this adds up some time, too. So unless you are sure that you need logging for your import, turn it OFF. It will significantly improve the import performance.
There are several ways to turn SQL_LOG OFF. We will show you how to change its value in the CUBRID Manager and in the command line.
CM Example
To turn OFF SQL_LOG in the CUBRID Manager, right click on the broker and select Properties.

In the modal window set the value of SQL_LOG to OFF. Press the OK button.

To apply the changes, we need to restart the Broker. Right click on the broker and select Broker Off. Then start it again by clicking on the Broker On menu. Then do your importing.
Command Line Example
Open in the text editor the cubrid_broker.conf default broker configuration file located in the conf directory where you have installed CUBRID. In the file set SQL_LOG = OFF and save it as shown below.
[broker] ... SQL_LOG =OFF ...
Then restart the broker.
cubrid broker restart
Server-side logging
The server-side logging is determined by the media_failure_support parameter of the CUBRID Server which specifies whether or not to store archive logs in case of storage media failure. If the parameter is configured to yes, which is the default value, all active logs are copied to archive logs when the active logs are full and the transaction is active. If it is configured to no, archive logs created after the active logs are full are deleted automatically. Note that archive logs are deleted automatically if the value of the parameter is configured to no.
By setting its value to no it is possible to gain some more time when performing mass import. To change its value in the CUBRID Manager, right click on the host and select Properties.

In the modal window set the value of media_failure_support parameter to no. Press the OK button.

To apply the changes, make sure to restart the server.
Utilize Threads and Commit Count
If you use CUBRID Manager to import the data, make sure you utilize threads and commit counts.
Thread will direct CM to use several concurrent connections to perform INSERTs. To specify the number of threads, set the Thread count value. However you should remember that too many thread may vice-versa deteriorate the performance. It all depends on your hardware spec. Normally we recommend to use somewhere between 5 and 10 threads.

Also, the execution time will depend on the commit cycle. It determines how often the inserted data should be committed. Committing frequently will deteriorate the performance. However, too high number might require too much memory. The value around 5,000 is recommended.
Utilize data_buffer_size
The data_buffer_size is the server parameter that configures the number of data pages to be cached in the memory by the Database Server. The greater the value of the data_buffer_size parameter, the more data pages can be cached in the buffer, which will provide the advantage of decreased disk I/O cost. However, if this parameter is too large, the buffer pool can be swapped out by the operating system because the system memory is excessively occupied. It is recommended to configure the data_buffer_size parameter in a way the required memory size is less than two-thirds of the system memory size. The default value is 512M.
Utilize insert_execution_mode
This insert_execution_mode is a very useful setting which allows to perform the INSERT operations on the server side instead of the client side. It has 7 modes (for more see the Database Server Parameters). By default it's value is 1, which means all INSERT INTO ... SELECT ... queries will be executed on the server side. When importing the data, you do not need this mode. Instead, you will need to set insert_execution_mode to 2, which will direct to perform all common INSERT statements like INSERT INTO ... VALUES ... to be processed by the server. See the following image.
This parameter is useful when the memory capacity of the client is limited or the dirty reading of INSERTed data is required.

In conclusion, let's wrap up all these.
- Use CSQL in Stand-alone mode.
- Create CONSTRAINT after importing.
- Turn OFF Logging.
- Utilize Threads and Commit Count.
- Utilize data_buffer_size.
- Utilize insert_execution_mode.
References
- cubrid_broker.conf Configuration File and Default Parameters
- Broker Logs
- Dynamically Changing Broker Parameters
- Checking Broker Status
- Parameter By Broker
- Issues importing a large MySQL table - Forum post
