CUBRID Database is well known for its various optimizations for Web services. One more feature which makes CUBRID stand out is the support for Shared Query Plan Caching. We have got huge improvements in overall performance when we added this feature.
In simple words, the Query Execution Plan is a tree of instructions how should the database server execute the client’s SQL request.
When a SQL statement is sent for execution, the Broker, CUBRID’s middleware, performs the following steps to generate the Execution Plan for this particular query, and then relays this tree of instructions to the database server. More detailed explanation of each step is provided in CUBRID Query Processing tutorial.
The final XASL (eXtended Access Specification Language) tree, obtained from the Query Optimizer, is that particular query execution plan we are talking about. It tells the server engine how the incoming query should be executed.
There are 13 types of execution plans:
INSERT. The query execution plan also includes the scan information (heap scan, index scan, list file scan, set scan, and method scan), the value list (values required for query results) and the predicate.
All these steps are always performed unless the execution plan for the query is already cached. In that case, the execution plan is retrieved from the cache.
As a result the Broker sends this Query Execution Plan (XASL) to the Query Manager, a server module that executes the client’s query according to the plan.
Well, it is all about performance. Imagine the Broker no longer has to go through all the steps performed by the Semantic Checker and the Query Optimizer. Imagine all the Broker has to do is parse the query and get its execution plan from the cache as shown in the image below.
The Query Plan Caching does exactly what we want. It allows the Broker to save the execution plan of a query to the Shared Memory for further re-use in case the same query is requested again by the client.
More than that, the query plan cached in the Shared Memory can be literally Shared between transaction, sessions, and different clients concurrently. This means that once the query plan is cached, all clients benefit.
By default, the Broker caches 1,000 distinct SQL statements as set by the client-server paramater called
MAX_PLAN_CACHE_ENTRIES. It is relatively rare for a Web application to have more than 1,000 distinct SQL statements. If that is the case, you may increase this value in
cubrid.conf configuration file. But before you do, make sure you have that many distinct queries in your application. Also make sure that the size of the cache generated from query plans does not exceed the memory allowed by your system.
There is a cubrid command line utility for this called plandump. Type the following command in the terminal to obtain the statistics for desired database.
cubrid plandump -o output_file.txt database_name
-ocommand followed by a file name allows to save the exection plan statistics to the indicated file.
-dcommand will force the utility to drop all query plans stored in the cache (hash table). Thus you can clear the cache whenever you feel necessary.
The following is a sample output of
cubrid plandump command.
This header part of the output displays the following information:
LOOKUP = HIT + MISS.
Further the plan dump follows a list of statistics for each SQL statement which was cached in the system.
Besides the fact that CUBRID shares the Query Plan Cache, there is one more notable feature which further enhances the caching optimization. It is the auto-parameterization feature.
You can understand auto-parameterization as always-prepare functionality for your SQL statements. In other words the Broker always prepares all the queries it processes. Auto-parameterization happens in the Query Optimizer component right before the execution plan is generated. Thus, auto-parameterization is at the core of Shared Query Plan Cache in CUBRID.
So, even if you write a SQL query in plain and execute it , the statement will always get auto-parameterized. For example, if you write
SELECT * FROM my_table WHERE id = 10, the Broker will always convert it to
SELECT * FROM my_table WHERE id = ?:0. This is the query whose execution plan gets cached.
Auto-parameterization, however, must not be confused with prepared statements. I have posted an article Why prepare queries if auto-parameterization already does that? Please refer to it for more information.
MAX number of query plans are already cached, there is an algorithm in the Broker which will drop 5% of the plans based on how old they are and their hit ratio.
To show the performance impact of the Query Plan Cache in CUBRID, I have written a small PHP program which simply executes 16 different SQL statements for 5,000 times in a loop against CUBRID’s demo database (demodb). The purpose of the program is to illustrate what would be if cache was forced not to accept any plans.
The program was executed twice.
MAX_PLAN_CACHE_ENTRIES = 1000(default)
MAX_PLAN_CACHE_ENTRIES = 1(allows only 1 plan cache, practically turns off plan caching)
Before I ran the program, I recorded the initial status of the query plan cache.
As a result the following query plan cache results were obtained.
Thus, we can receive the evidence that Query Plan Caching allows to significantly reduce the overall time, CPU and memory spent to process queries.
Now let’s have a look at how other RDBMS solution providers approach to this task. For this we will test each database server against a set of SQL statements and determine how well the servers utilize Query Plan Caching.
First, let’s define the base SQL statement:
SELECT * FROM tbl;
Now we will assume that we have executed this query at least once in each database server so that each could cache this query’s execution plan.
Before going any further I will identify several SQL statements almost identical to this base query. In fact they all return almost the same result. The difference is only in how queries are written:
SELECT * FROM TBL;
SELECT /* comment */ * FROM tbl;
SELECT * FROM dba.tbl;
SELECT /*+ NO_STATS */ * FROM tbl;
SELECT /*+ USE_IDX */ * FROM tbl;
SELECT /*+ ORDERED */ * FROM tbl;
SELECT /*+ USE_NL */ * FROM tbl;
SELECT /*+ USE_MERGE */ * FROM tbl;
SELECT * FROM tbl WHERE id = 100;
SELECT * FROM tbl WHERE id = 200;
The following table shows if RDBMSs use existing plan from cache in these situations.
|Queries||Oracle 11g||MSSQL 2008||CUBRID 8.4.1|
|Additional space or new line exist||no||no||yes|
|Referenced names are capitalized||no||no||no|
|Table owner name is specified||no||no||yes|
|Hint is used||no||no||depends on the hint|
|Condition value is different||no||yes||yes|
Depending on the hint type in a query, CUBRID behaves differently. For query types we listed above CUBRID behaves in the following way:
|SELECT /*+ NO_STATS */ * FROM tbl;||yes||/*+ NO_STATS */ hint does not affect on how query plan is generated, therefore having it or not, CUBRID can use existing plan from cache|
|SELECT /+ USE_IDX */ * FROM tbl;
SELECT /+ ORDERED / * FROM tbl;
SELECT /+ USE_NL / * FROM tbl;
SELECT /+ USE_MERGE */ * FROM tbl;
|no||All these hints indicate whether CUBRID should use indexes, should order the result set, use inner joins or merge the results. All these hints directly affect how query plan (instructions to execute the query) is generated. Thus, the Broker cannot use existing plan cache.|
From the above table I have intentionally excludeed PostgreSQL and MySQL.
PostgreSQL provides a half-baked query plan caching. It caches query plans only for the lifetime of a prepared statement. In other words, the same query executed from different sessions, different database connection, or even different transaction, will not be able to use already created cache and the query planner will go through the plan generation all over again.
MySQL has spared itself from plan caching. It does not cache query plans at all (do not confuse with query results caching).
Thus, only Oracle, MSSQL and CUBRID provide complete support for Shared Query Plan Cache though with variable success to reuse cached plans. In CUBRID have got huge improvements in overall performance when we added this feature in the early stage of development.
In the next article I will cover other aspects of CUBRID’s smart optimizations for Web services. Stay tuned!