Open Source RDBMS - Seamless, Scalable, Stable and Free

한국어 | Login |Register

CUBRID Query Processing


Contents

 


Introduction to CUBRID Query Processing

This document mainly describes a statement’s query processing from parsing to result fetching with specified scenario. The whole procedure includes:

  • Parsing;
  • Semantic checking;
  • Plan generation from parsing tree;
  • Query optimization and XASL generation;
  • XASL execution;
  • Result fetching;

Scenario

CREATE TABLE wish (DB_year INT, nation CHAR(3), gold INT, silver INT, bronze INT);
CREATE TABLE wish1 (DB_year INT, nation CHAR(3), gold INT, silver INT, bronze INT);
INSERT INTO wish VALUES (1988, 'NZL', 3, 2, 8);
INSERT INTO wish VALUES (2004, 'JPN', 16, 9, 12);
INSERT INTO wish VALUES (1996, 'KOR', 7, 15, 5);
 
INSERT INTO wish1 VALUES (2000, 'RUS', 32, 28, 28);
INSERT INTO wish1 VALUES (1996, 'KOR', 7, 15, 5);
SELECT wish.DB_year,wish.gold, wish1.DB_year, wish1.gold 
FROM wish, wish1 
WHERE wish.gold<wish1.gold

Parsing

Parsing is the first step for query processing. This module uses lexer to scan statements input by user and then creates parse tree to store the statements’ information. Parse tree is identified by parser_ID. At the beginning of procedure, the parser_ID is initialized to 1 and increased by 1 according to each statement’s commit. Parser tree is stored in PT_NODE structure which is located at parser_Node__free_lists. The stack is as bellow:

  1. Ux_prepare()
    1. Db_open_buffer()
      1. Db_open_buffer_local()
        1. Db_open_local()
          1. Parser_create_parser()
            1. Pt_register_parser()

Session’ is created at beginning of query processing. ‘Session’ saves all of processing information including parsing and semantic check. ‘Session’ includes ‘dimension’, ‘parser’, and ‘statements’ and other fields.

  • Dimension indicates the number of SQL statements.
  • Parser includes ‘buffer’, ‘statements’, ’statement_number’, ‘id’ and other fields.
    • Buffer points to SQL statement
    • and statements indicate SQL statement parsing info.
    • Statement_number is SQL statement count
    • and id stores parser_ID.

Bellow is ‘session’ whole picture.


Session.png

Figure 1. Session.

We can see the detail ‘session’ structure. In our scenario, the ‘dimension’ is 1, ‘parser_id’ is 9, and ‘node_type’ is PT_SELECT. According to ‘node_type’, we can find ‘select’, ‘from’, ‘where’ segments which locate in statements.info.query.q.select.list (or from, where) as bellow.

Session_structure.png
Figure 2. Session structure.

Below is the detailed structures for ‘list’, ‘from’, and ‘where’:

Segment_structure.png
Figure 3. Segment structure.

The first line is linked from ‘list’ point. Each attribute has PT_NODE structure and each attribute is connected by each other. Attribute includes ‘next’, ‘node_type’, ‘parser_id’, ‘info’ and other fields. In our scenario, the attribute ‘node_type’ is PT_DOT_ and attribute is located at ‘name’ of ‘info’ field.

The second line is ‘from’ segment. Each class also have PT_NODE structure and each class connected by next point. Class structure includes ‘next’, ‘node_type’, ‘parser_id’, ‘info’ and other fields. In our scenario, the class ‘node_type’ is PT_SPEC and class name is locate at ‘name’ of ‘parts’.

The third line is ‘where’ segment. It includes ‘next’, ‘node_type’, ‘info’ and other fields. In our scenario, the ‘node_type’ is PT_EXPR, and the ‘dot’ structure of ‘info’ indicates ‘where’ condition. ‘op’ is PT_LT which means that ‘less than’ and ‘arg1’ and ‘arg2’ are the comparative objects that gold of wish and gold of wish1.

For summary, in PT_NODE, ‘Info’ includes detail SQL info. In case of SELECT statement, list, from, where, group_by, having field saves ‘SELECT’, ‘FROM’, ‘WHERE’, ‘GROUP BY’, ‘HAVING’ info.

Semantic Checker

After parsing process, PT_tree will be generated without syntax error. Then PT_tree will execute semantic check process. The detail is as bellow:

  • Name resolution and node type checking: checks if the class or attribute exist in the database, then infers the attribute type checking.
  • Semantic checking: validates whether legal calculation is performed or not.

According to semantic check process, stack status is as bellow.

Pt_apply_select() function executes ‘select’ statement’s name resolutions. In function, successively walk through ‘list’, ‘from’, ‘where’ node to fix schema info after semantic checking.

For example, in ‘from’ node walking procedure, class object will be found out from schema with class name and check user authentication for current class, etc.

  1. Ux_prepare()
    1. Db_compile_statement()
      1. Db_compile_statement_local()
        1. Pt_compile()
          1. Pt_semantic_check()
            1. Pt_check_with_info()
              1. Pt_resolve_names()
                1. ……
                2. Pt_apply_select()
                  1. Pt_walk_private(p->info.query.q.select.list)
                  2. Pt_walk_private(p->info.query.q.select.from)
                  3. Pt_walk_private(p->info.query.q.select.where)
                  4. ……
        2. Mq_translate()//translate views or virtual classes into base classes
          1. Mq_translate_helper()
            1. Mq_optimize()
              1. Qo_optimize_queries() //checks all subqueries for rewrite optimizations
                1. Qo_analyze_path_join() //analyze paths for possible optimizations
                2. ……
p->info.query.q.select.from
  1. ……
    1. Pt_make_flat_name_list()
      1. Class_op=db_find_class(class_name)//get class object
      2. Au_fetch_class()//fetch schema info of class
      3. Pt_find_users_class()
      4. Db_find_class()
      5. Pt_check_user_owns_class()
      6. ……

After name resolution, the ‘statements’ structure has the fixed schema info. The ‘list’ structure has fixed attribute name such as ‘DB_year’ at ‘original’ field, ‘wish’ at ‘resolved’ field. The ‘from’ structure has fixed class info such as ‘oid_info’, ‘class_mop’, ‘object’, ‘lock’, and other info. In our scenario, ‘select’ operation needs to ‘IS_LOCK’ objects. Detail is as bellow:

Segment_structure_details.png
Figure 4. Segment structure details.

Query Plan

After parsing and semantic checking the query statement input by user is changed to augmented parse tree which includes the catalog info. Based on augmented parse tree, query plan will be generated in this step. Before query plan generation, we need to mention some terminology for CUBRID. In CUBRID plan, there are several parts such as ‘join graph segments’, ‘join graph nodes’, ‘join graph equivalence classes’, ‘join graph edges’, ‘join graph partitions’, ‘query plan’, ‘query stmt’, etc.

Join graph segments

Segments indicate attributes and classes after ‘select’, ‘from’ and ‘where’. In our scenario, the segments are as bellow:

   seg[0]: [0]
   seg[1]: DB_year[0] (f)
   seg[2]: gold[0] (f)
   seg[3]: [1]
   seg[4]: DB_year[1] (f)
   seg[5]: gold[1] (f)

  1. seg[0] and seg[1] indicate classes: the first class is ‘wish’ and the second class is ‘wish1’.
  2. seg[1] DB_year[0](f) means that ‘DB_year’ attribute of the first class and ‘f’ means the attribute which will be selected out.

Other ‘seg’ can be explained the same way.

Join graph nodes

Nodes mean that class’s info.

   node[0]: wish wish(3/1)
   node[1]: wish1 wish1(2/1)
  • node[0] means the first class is ‘wish’ and ‘3’ is the records count and ‘1’ is the page count.

Join graph equivalence classes

Equivalence class is for equal condition after ‘where’. In our scenario, there is no equal condition.

Join graph edges

Edges mean that comparison between two or more classes.

   term[0]: wish.gold range (min inf_lt wish1.gold) (sel 0.1) (rank 2) (join term) (inner-join) (loc 0)

term’ is a statement condition which appears after ‘where’. For only one class ‘term’, we call it ‘scan term’ and for the two or more class ‘term’, we call it ‘join term’. In our scenario, it is join term. ‘sel’ is selectivity which means that the probability of tuples to be scaned out.

Join graph terms

As mentioned in ‘join graph edges’, there are 2 types of terms: ‘scan term’ and ‘join term’.

Join graph partitions

This is for scan group. Classes in one partition will be scaned together. We will provide more detail further.

Query plan

The Query Plan is the best execution plan for the current query which was selected based on the lowest cost. In our scenario, the best query plan is as bellow: nl-join (inner join)

   edge:  term[0]
   outer: sscan
              class: wish1 node[1]
              cost:  fixed 0(0.0/0.0) var 1(0.0/1.0) card 2
   inner: sscan
              class: wish node[0]
              sargs: term[0]
              cost:  fixed 0(0.0/0.0) var 1(0.0/1.0) card 3
   cost:  fixed 0(0.0/0.0) var 2(0.0/2.0) card 1

nl-join’ means nested loop join and ‘inner join’ is based on user input. In our scenario, there is no index file, so, it can just use a nested loop join. And the whole scan will execute outer loop scan for wish1 and inner loop scan for wish. In each scan, the scan method is sequential scan (sscan). For the outer loop scan, the cost can be split to ‘fixed cost’, ‘var cost’ and ‘card’. Each cost consist ‘CPU cost’ and ‘I/O cost’. ‘card’ means expected tuple count. Two class’s ‘I/O cost’ of ‘var cost’ are 1.0 according to the distribution of page count. And ‘wish1’ expects 2 ‘card’ and ‘wish’ expect 3 ‘card’. So set the less tuple class to outer loop considers of the less disk access cost.

Query stmt

This is the rewritten statement.

SELECT wish.DB_year, wish.gold, wish1.DB_year, wish1.gold FROM wish wish, wish1 wish1 WHERE (wish.gold<wish1.gold)

mq_transalte() function translate the views or virtual classes into base classes. Qo_optimize_queries() checks all subqueries for rewrite optimizations. And qo_optimize_query() function optimize a single select statement.

  1. Ux_prepare()
    1. Db_compile_statement()
      1. Db_compile_statement_local()
        1. Mq_translate()//translate views or virtual classes into base classes
          1. Mq_translate_helper()
            1. Mq_optimize()
              1. Qo_optimize_queries() //checks all subqueries for rewrite optimizations
                1. Qo_analyze_path_join() //analyze paths for possible optimizations
                2. ……
        2. Parser_generate_xasl()
          1. Pt_plan_query()
            1. Qo_optimize_query()//optimize a single select statement.
              1. Qo_env_init()//initialize an optimizer environment
              2. Qo_optimize_helper()
            2. Pt_to_buildlist_proc()
              1. Pt_gen_optimized_plan()
                1. Qo_to_xasl()
QO_ENV

QO_ENV’ is the repository of all optimizer data structures. ‘QO_ENV’ includes ‘PT_PARSER’ which contains parsing info, ‘PT_NODE, ‘QO_SEGMENT’, ‘QO_NODE’ which is related with query graph, ‘QO_EQCLASS’, ‘QO_TERM’, ‘QO_PLAN’ tree which is the final execution plan.

In QO_ENV’s init steps, parser point and pt_tree point will be set. Nnodes is the count of classes which indicated after ‘from’, nodes is assigned to classes indicated. Nsegs set to couple of Nnode and segs is assigned to QO_SEGMENT. Nterms is count of terms which are indicated after ‘where’, and terms are assigned to QO_TERM. After init the QO_ENV is as bellow:

After_inited_QO_ENV.png
Figure 5. After inited QO_ENV.
QO_PLAN

QO_PLAN is generated after query optimization. In QO_PLAN sarged_terms is scan condition. Vtb1 is pointed to functions which handle temporary result tables. In QO_PLAN, ‘plan_type’ indicates QO_PLAN’s type which has union value of QO_PLANTYPE_SCAN, QO_PLANTYPE_JOIN, QO_PLANTYPE_SORT, QO_PLANTYPE_FOLLOW, QO_PLANTYPE_WORST. When plan_type is QO_PLANTYPE_SCAN, we call this QO_PLAN as scan node which indicates the way of one class scan. In our scenario ‘plan_type’ is ‘QO_PLANTYPE_JOIN’.

The plan_un is scan type and when the value is sequential scan we set it as QO_SCANTYPE_SEG_SCAN, and when the value is index scan we set it as QO_SCANTYPE_INDEX_SCAN. When the plan type is QO_PLANTYPE_JOIN, we call the QO_PLAN as join node. Join node’s plan_type is Join type which value is union of QO_JOINTYPE_NL_JOIN, QO_JOINTYPE_MERGE_JOIN. And outer is the pointer of outer loop class and inner is the pointer of inner loop. Join_term is scan condition for two QO_PLAN’s join result. The QO_PLAN tree structure for our scenario is as bellow:

QO_PLAN_tree_structure.png
Figure 6. QO_PLAN tree structure.

XASL Generation

XASL_NODE
The XASL_NODE tree structure for scenario is as bellow:
XASL_NODE_tree_structure.png
Figure 7. XASL_NODE tree structure.

Query Manager

Query Manager execute XASL tree from client. In query processing, client send XASL tree to server through socket. Server execute statement through XASL tree and in case of select statement, the execution result will be stored in the list file.

Qexec_execute_mainblock() is the key function to perform scan process. In this key function, the XASL tree content will be analyzed to generate SCAN_ID which is sent to storage function to read data.

Cursor Manager

This module fetches tuples from result file (list file).

comments powered by Disqus
Page info
viewed 7610 times
translations en
Author
posted 3 years ago by
CUBRID
Contributors
updated 2 years ago by
View revisions
Share this article