Open Source RDBMS - Seamless, Scalable, Stable and Free

한국어 | Login |Register

Shared Query Plan Caching in CUBRID


Summary

CUBRID Database is well known for its various optimizations for Web services. One more feature which makes CUBRID stand out is the support for Shared Query Plan Caching. We have got huge improvements in overall performance when we added this feature.

What is Query Execution Plan?

In simple words, the Query Execution Plan is a tree of instructions how should the database server execute the client’s SQL request.

When a SQL statement is sent for execution, the Broker, CUBRID’s middleware, performs the following steps to generate the Execution Plan for this particular query, and then relays this tree of instructions to the database server. More detailed explanation of each step is provided in CUBRID Query Processing tutorial.

  1. SQL Parser parses the received statement and generates the Parse Tree. This step validates the syntactical correctness of the statement.
  2. If the statement is syntactically correct, the Semantic Checker performs:
    1. name resolution (checks whether the referenced tables and columns actually exist in the database)
    2. node type checking for previously obtained Parse Tree (infers the type of columns)
    3. semantic checking (checks whether an operation that is not supported between types is used)
    4. generate the Augmented Parse Tree.
  3. Query Optimizer performs the following action on the Augmented Parse Tree:
    1. Cost-based Optimization (CBO)
      1. determine the most optimized access method to each table
      2. calculate selectivity for each table
      3. determine access sequence among tables
    2. Query Rewrite Optimization (QRO)
      1. view translation (translate the views or virtual classes into base classes; auto-parameterization happens here)
    3. classify terms to configure search conditions in table units
    4. generate XASL tree as the final execution plan.

The final XASL (eXtended Access Specification Language) tree, obtained from the Query Optimizer, is that particular query execution plan we are talking about. It tells the server engine how the incoming query should be executed.

There are 13 types of execution plans: UNION, DIFFERENCE, INTERSECTION, OBJFETCH, SETFETCH, BUILD_LIST, BUILD_VALUE, SCAN, MERGE_LIST, READ, UPDATE, DELETE, and INSERT. The query execution plan also includes the scan information (heap scan, index scan, list file scan, set scan, and method scan), the value list (values required for query results) and the predicate.

All these steps are always performed unless the execution plan for the query is already cached. In that case, the execution plan is retrieved from the cache.

As a result the Broker sends this Query Execution Plan (XASL) to the Query Manager, a server module that executes the client’s query according to the plan.

Why is Query Execution Plan Cache so crucial then?

Well, it is all about performance. Imagine the Broker no longer has to go through all the steps performed by the Semantic Checker and the Query Optimizer. Imagine all the Broker has to do is parse the query and get its execution plan from the cache as shown in the image below.

cubrid-query-plan-cache-impact.png

The Query Plan Caching does exactly what we want. It allows the Broker to save the execution plan of a query to the Shared Memory for further re-use in case the same query is requested again by the client.

More than that, the query plan cached in the Shared Memory can be literally Shared between transaction, sessions, and different clients concurrently. This means that once the query plan is cached, all clients benefit.

By default, the Broker caches 1,000 distinct SQL statements as set by the client-server paramater called MAX_PLAN_CACHE_ENTRIES. It is relatively rare for a Web application to have more than 1,000 distinct SQL statements. If that is the case, you may increase this value in cubrid.conf configuration file. But before you do, make sure you have that many distinct queries in your application. Also make sure that the size of the cache generated from query plans does not exceed the memory allowed by your system.

How to get query plan cache statistics?

There is a cubrid command line utility for this called plandump. Type the following command in the terminal to obtain the statistics for desired database.

cubrid plandump -o output_file.txt database_name

  • -o command followed by a file name allows to save the exection plan statistics to the indicated file.
  • -d command will force the utility to drop all query plans stored in the cache (hash table). Thus you can clear the cache whenever you feel necessary.

The following is a sample output of cubrid plandump command.

CACHE MAX NUM LOOKUP HIT MISS FULL
entry 1000 991 11013226 110649495 363731 9043

This header part of the output displays the following information:

  • MAX: this is the maximum number of plan caches allowed = MAX_PLAN_CACHE_ENTRIES.
  • NUM: the number of SQL statements cached in the system.
  • LOOKUP: the number of times the Broker looked into cache to see if there is a cached plan. LOOKUP = HIT + MISS.
  • HIT: the number of times the query plan was actually read from the cache.
  • MISS: the number of times the execution plan was not found in the cache. This happens when the query is executed for the first time, or when the plan is outdated and needs to be regenerated.
  • FULL: the number of query plans which were generated but not cached as the MAX limit is reached.

Further the plan dump follows a list of statistics for each SQL statement which was cached in the system.

Auto-parameterization in CUBRID

Besides the fact that CUBRID shares the Query Plan Cache, there is one more notable feature which further enhances the caching optimization. It is the auto-parameterization feature.

You can understand auto-parameterization as always-prepare functionality for your SQL statements. In other words the Broker always prepares all the queries it processes. Auto-parameterization happens in the Query Optimizer component right before the execution plan is generated. Thus, auto-parameterization is at the core of Shared Query Plan Cache in CUBRID.

So, even if you write a SQL query in plain and execute it , the statement will always get auto-parameterized. For example, if you write SELECT * FROM my_table WHERE id = 10, the Broker will always convert it to SELECT * FROM my_table WHERE id = ?:0. This is the query whose execution plan gets cached.

Auto-parameterization, however, must not be confused with prepared statements. I have posted an article Why prepare queries if auto-parameterization already does that? Please refer to it for more information.

What happens when Query Plan Cache is full?

When MAX number of query plans are already cached, there is an algorithm in the Broker which will drop 5% of the plans based on how old they are and their hit ratio.

Performance Implications

To show the performance impact of the Query Plan Cache in CUBRID, I have written a small PHP program which simply executes 16 different SQL statements for 5,000 times in a loop against CUBRID’s demo database (demodb). The purpose of the program is to illustrate what would be if cache was forced not to accept any plans.

The program was executed twice.

  1. MAX_PLAN_CACHE_ENTRIES = 1000 (default)
  2. MAX_PLAN_CACHE_ENTRIES = 1 (allows only 1 plan cache, practically turns off plan caching)

Before I ran the program, I recorded the initial status of the query plan cache.

Initial status of the query plan cache
CACHE MAX NUM LOOKUP HIT MISS FULL
entry 1000 0 0 0 0 0

As a result the following query plan cache results were obtained.

At MAX_PLAN_CACHE_ENTRIES = 1000
CACHE MAX NUM LOOKUP HIT MISS FULL
entry 1000 13 65013 64987 26 0
  1. We can see that 13 out of 16 query plans were cached.
  2. There were 65013 lookups into the cache.
  3. 64987 times query plans were reused from the cache.
  4. Some 26 times query plans were not found in the cache or required plan regeneration (perhaps for those 3 statements whose plans were not cached).
At MAX_PLAN_CACHE_ENTRIES = 1
CACHE MAX NUM LOOKUP HIT MISS FULL
entry 1 1 130000 0 130000 64999
  1. When we force the server not to cache query plans, we can witness that only 1 query out of 16 query plans was actually cached.
  2. There were 130000 lookups into the cache.
  3. No query plan was reused from the cache.
  4. All 130000 times query plans were not found in the cache.
  5. 64999 times query plans were not cached as MAX was reached.

Thus, we can receive the evidence that Query Plan Caching allows to significantly reduce the overall time, CPU and memory spent to process queries.

How do other RDBMS vendors approach to this task?

Now let’s have a look at how other RDBMS solution providers approach to this task. For this we will test each database server against a set of SQL statements and determine how well the servers utilize Query Plan Caching.

First, let’s define the base SQL statement:

SELECT * FROM tbl;

Now we will assume that we have executed this query at least once in each database server so that each could cache this query’s execution plan.

Before going any further I will identify several SQL statements almost identical to this base query. In fact they all return almost the same result. The difference is only in how queries are written:

  1. Additional space or new line exist:
    1. SELECT *    FROM tbl;
    2. SELECT *
      FROM tbl;
  2. Referenced names are capitalized:
    1. SELECT * FROM TBL;
  3. Comment exists:
    1. SELECT /* comment */ * FROM tbl;
  4. Table owner name is specified:
    1. SELECT * FROM dba.tbl;
  5. Hint is used:
    1. SELECT /*+ NO_STATS */ * FROM tbl;
    2. SELECT /*+ USE_IDX */ * FROM tbl;
    3. SELECT /*+ ORDERED */ * FROM tbl;
    4. SELECT /*+ USE_NL */ * FROM tbl;
    5. SELECT /*+ USE_MERGE */ * FROM tbl;
  6. Condition value is different:
    1. SELECT * FROM tbl WHERE id = 100;
    2. SELECT * FROM tbl WHERE id = 200;

The following table shows if RDBMSs use existing plan from cache in these situations.

If Query Plan Cache is reused
Queries Oracle 11g MSSQL 2008 CUBRID 8.4.1
Additional space or new line exist no no yes
Referenced names are capitalized no no no
Comment exists no no yes
Table owner name is specified no no yes
Hint is used no no depends on the hint
Condition value is different no yes yes

Depending on the hint type in a query, CUBRID behaves differently. For query types we listed above CUBRID behaves in the following way:

Queries CUBRID Remark
SELECT /*+ NO_STATS */ * FROM tbl; yes /*+ NO_STATS */ hint does not affect on how query plan is generated, therefore having it or not, CUBRID can use existing plan from cache
SELECT /+ USE_IDX */ * FROM tbl;

SELECT /
+ ORDERED / * FROM tbl;

SELECT /
+ USE_NL / * FROM tbl;

SELECT /
+ USE_MERGE */ * FROM tbl;
no All these hints indicate whether CUBRID should use indexes, should order the result set, use inner joins or merge the results. All these hints directly affect how query plan (instructions to execute the query) is generated. Thus, the Broker cannot use existing plan cache.

From the above table I have intentionally excludeed PostgreSQL and MySQL.

PostgreSQL provides a half-baked query plan caching. It caches query plans only for the lifetime of a prepared statement. In other words, the same query executed from different sessions, different database connection, or even different transaction, will not be able to use already created cache and the query planner will go through the plan generation all over again.

MySQL has spared itself from plan caching. It does not cache query plans at all (do not confuse with query results caching).

Thus, only Oracle, MSSQL and CUBRID provide complete support for Shared Query Plan Cache though with variable success to reuse cached plans. In CUBRID have got huge improvements in overall performance when we added this feature in the early stage of development.

In the next article I will cover other aspects of CUBRID’s smart optimizations for Web services. Stay tuned!

comments powered by Disqus
Page info
viewed 9678 times
translations en
Author
posted 2 years ago by
CUBRID
Contributors
updated 2 years ago by
View revisions
Share this article