Open Source RDBMS - Seamless, Scalable, Stable and Free

한국어 | Login |Register

Why prepare queries if auto-parameterization already does that?


There are two main reasons why you should prepare your SQL:

  1. You will protect youself from SQL injections.
  2. You can improve the performance of your application.

SQL injection

This is a very important part you should remember. Auto-parameterization feature, which is available in CUBRID, does NOT protect you from SQL injections. Even if all the queries get parameterized automatically, it happens on the server side after the query is already populated with values in your application long before it is processed by the Broker. Therefore, to avoid SQL injections, you must prepare your queries yourself.

Performance implications of prepared queries

If you have a query which is executed multiple times consequently in your application code, you are highly recommended to prepare your query once, then bind values and execute each query. There is big performance implication behind this. To understand this better, I will give you the following example.

Assume you have this query SELECT * FROM mytable WHERE id = 10 which you plan to execute using cubrid_execute() in your application several times consequently. This may be a part of your cron task or something like that.

  1. The first time you run cubrid_execute() with the above query, the Broker will perform all the query processing job we explained before:
    1. parse SQL
    2. check if this SQL has a cached execution plan. Since it’s the first time you execute it, there is no cache. Then:
    3. resolve names
    4. check column types
    5. check for semantics errors
    6. generate the Augmented Parse Tree.
    7. rewrite the query
    8. determine the most optimized access method
    9. calculate selectivity
    10. determine access sequence among tables
    11. classify terms
    12. create the execution plan.
    13. then the server finally execute and return the data.
  2. The second time you will execute the same query, the Broker will:
    1. parse SQL
    2. check if this SQL has a cached execution plan.
    3. retrieve the execution plan.
    4. then the server execute and return the data.
  3. If you executes the same query again, the Broker will again:
    1. parse SQL
    2. check if this SQL has a cached execution plan.
    3. retrieve the execution plan.
    4. then the server execute and return the data.
  4. and so on. The last few steps will always be performed if cubrid_execute is called directly without preparing the query.

Now asume you first cubrid_prepare() the query, then cubrid_bind() values and finally cubrid_execute().

  1. When you run cubrid_prepare(), the Broker will perform all the query processing job we explained above. The only difference is that it will not execute the query, but return the resource which can be used for further execution.
    1. parse SQL
    2. check if this SQL has a cached execution plan. Since it’s the first time you execute it, there is no cache. Then:
    3. resolve names
    4. check column types
    5. check for semantics errors
    6. generate the Augmented Parse Tree.
    7. rewrite the query
    8. determine the most optimized access method
    9. calculate selectivity
    10. determine access sequence among tables
    11. classify terms
    12. create the execution plan.
    13. return the resource.
  2. Now when you bind the values and execute the query for the first time, the server will directly:
    1. execute and return the data.
  3. If you execute the same query again, the server will only:
    1. execute and return the data.
  4. and so on.

Thus, when you prepare the query in advance and execute the query multiple times consequently, you save much time, CPU and memory resources required to parse the SQL statement and read from the cache.

I have created two small PHP programs to visually illustrate the performance implications of preparing the query. The first program directly executes SELECT * FROM zip_codes WHERE id = 50 in a loop of 5,000 without preparing it. And I have rerun this program 10 times after cold restart of CUBRID server.

The second program, first, prepares SELECT * FROM zip_codes WHERE id = ?, then in a loop of 5,000 it binds the value and calls cubrid_execute(). Also I have rerun the program 10 times after restarting CUBRID. Since in CUBRID prepared queries can be executed in synchronous and asynchronous modes, I have run the program in both modes.

So, below is the performance test results. What I would like to note here is that you should not look at the raw values displayed along the y-axis because the programs were executed in a low-end virtual machine environment. But what is important is the ratio between preparing the queries and not.

Performance implications of Prepared Queries in CUBRID

As you can see if queries, which are supposed to run multiple times, are prepared in advance, the execution can be 200% faster in sync_mode, or 60% in async_mode. Therefore, it is highly recommended to prepare queries where necessary.

You should also know that synchronous execution of SELECT quieres is not always faster than asynchronous execution. If you expect large result sets after each query execution, sync_mode can consume much memory and become slower, while in async_mode large results can be consumed right away after each query execution, thus may be faster. You should experiment this on your application and decide which one is better for your case.

comments powered by Disqus
Page info
viewed 4436 times
translations en
Author
posted 2 years ago by
CUBRID
Contributors
updated 2 years ago by
View revisions
Share this article