This document mainly describes a statement’s query processing from parsing to result fetching with specified scenario. The whole procedure includes:
CREATE TABLE wish (DB_year INT, nation CHAR(3), gold INT, silver INT, bronze INT); CREATE TABLE wish1 (DB_year INT, nation CHAR(3), gold INT, silver INT, bronze INT);
INSERT INTO wish VALUES (1988, 'NZL', 3, 2, 8); INSERT INTO wish VALUES (2004, 'JPN', 16, 9, 12); INSERT INTO wish VALUES (1996, 'KOR', 7, 15, 5); INSERT INTO wish1 VALUES (2000, 'RUS', 32, 28, 28); INSERT INTO wish1 VALUES (1996, 'KOR', 7, 15, 5);
SELECT wish.DB_year,wish.gold, wish1.DB_year, wish1.gold FROM wish, wish1 WHERE wish.gold<wish1.gold
Parsing is the first step for query processing. This module uses lexer to scan statements input by user and then creates parse tree to store the statements’ information. Parse tree is identified by parser_ID. At the beginning of procedure, the parser_ID is initialized to 1 and increased by 1 according to each statement’s commit. Parser tree is stored in PT_NODE structure which is located at parser_Node__free_lists. The stack is as bellow:
‘Session’ is created at beginning of query processing. ‘Session’ saves all of processing information including parsing and semantic check. ‘Session’ includes ‘dimension’, ‘parser’, and ‘statements’ and other fields.
Bellow is ‘session’ whole picture.
We can see the detail ‘session’ structure. In our scenario, the ‘dimension’ is 1, ‘parser_id’ is 9, and ‘node_type’ is PT_SELECT. According to ‘node_type’, we can find ‘select’, ‘from’, ‘where’ segments which locate in statements.info.query.q.select.list (or from, where) as bellow.
Below is the detailed structures for ‘list’, ‘from’, and ‘where’:
The first line is linked from ‘list’ point. Each attribute has PT_NODE structure and each attribute is connected by each other. Attribute includes ‘next’, ‘node_type’, ‘parser_id’, ‘info’ and other fields. In our scenario, the attribute ‘node_type’ is PT_DOT_ and attribute is located at ‘name’ of ‘info’ field.
The second line is ‘from’ segment. Each class also have PT_NODE structure and each class connected by next point. Class structure includes ‘next’, ‘node_type’, ‘parser_id’, ‘info’ and other fields. In our scenario, the class ‘node_type’ is PT_SPEC and class name is locate at ‘name’ of ‘parts’.
The third line is ‘where’ segment. It includes ‘next’, ‘node_type’, ‘info’ and other fields. In our scenario, the ‘node_type’ is PT_EXPR, and the ‘dot’ structure of ‘info’ indicates ‘where’ condition. ‘op’ is PT_LT which means that ‘less than’ and ‘arg1’ and ‘arg2’ are the comparative objects that gold of wish and gold of wish1.
For summary, in PT_NODE, ‘Info’ includes detail SQL info. In case of SELECT statement, list, from, where, group_by, having field saves ‘SELECT’, ‘FROM’, ‘WHERE’, ‘GROUP BY’, ‘HAVING’ info.
After parsing process, PT_tree will be generated without syntax error. Then PT_tree will execute semantic check process. The detail is as bellow:
According to semantic check process, stack status is as bellow.
Pt_apply_select() function executes ‘select’ statement’s name resolutions. In function, successively walk through ‘list’, ‘from’, ‘where’ node to fix schema info after semantic checking.
For example, in ‘from’ node walking procedure, class object will be found out from schema with class name and check user authentication for current class, etc.
After name resolution, the ‘statements’ structure has the fixed schema info. The ‘list’ structure has fixed attribute name such as ‘DB_year’ at ‘original’ field, ‘wish’ at ‘resolved’ field. The ‘from’ structure has fixed class info such as ‘oid_info’, ‘class_mop’, ‘object’, ‘lock’, and other info. In our scenario, ‘select’ operation needs to ‘IS_LOCK’ objects. Detail is as bellow:
After parsing and semantic checking the query statement input by user is changed to augmented parse tree which includes the catalog info. Based on augmented parse tree, query plan will be generated in this step. Before query plan generation, we need to mention some terminology for CUBRID. In CUBRID plan, there are several parts such as ‘join graph segments’, ‘join graph nodes’, ‘join graph equivalence classes’, ‘join graph edges’, ‘join graph partitions’, ‘query plan’, ‘query stmt’, etc.
Segments indicate attributes and classes after ‘select’, ‘from’ and ‘where’. In our scenario, the segments are as bellow:
seg:  seg: DB_year (f) seg: gold (f) seg:  seg: DB_year (f) seg: gold (f)
Other ‘seg’ can be explained the same way.
Nodes mean that class’s info.
node: wish wish(3/1) node: wish1 wish1(2/1)
Equivalence class is for equal condition after ‘where’. In our scenario, there is no equal condition.
Edges mean that comparison between two or more classes.
term: wish.gold range (min inf_lt wish1.gold) (sel 0.1) (rank 2) (join term) (inner-join) (loc 0)
‘term’ is a statement condition which appears after ‘where’. For only one class ‘term’, we call it ‘scan term’ and for the two or more class ‘term’, we call it ‘join term’. In our scenario, it is join term. ‘sel’ is selectivity which means that the probability of tuples to be scaned out.
As mentioned in ‘join graph edges’, there are 2 types of terms: ‘scan term’ and ‘join term’.
This is for scan group. Classes in one partition will be scaned together. We will provide more detail further.
The Query Plan is the best execution plan for the current query which was selected based on the lowest cost. In our scenario, the best query plan is as bellow: nl-join (inner join)
edge: term outer: sscan class: wish1 node cost: fixed 0(0.0/0.0) var 1(0.0/1.0) card 2 inner: sscan class: wish node sargs: term cost: fixed 0(0.0/0.0) var 1(0.0/1.0) card 3 cost: fixed 0(0.0/0.0) var 2(0.0/2.0) card 1
‘nl-join’ means nested loop join and ‘inner join’ is based on user input. In our scenario, there is no index file, so, it can just use a nested loop join. And the whole scan will execute outer loop scan for wish1 and inner loop scan for wish. In each scan, the scan method is sequential scan (sscan). For the outer loop scan, the cost can be split to ‘fixed cost’, ‘var cost’ and ‘card’. Each cost consist ‘CPU cost’ and ‘I/O cost’. ‘card’ means expected tuple count. Two class’s ‘I/O cost’ of ‘var cost’ are 1.0 according to the distribution of page count. And ‘wish1’ expects 2 ‘card’ and ‘wish’ expect 3 ‘card’. So set the less tuple class to outer loop considers of the less disk access cost.
This is the rewritten statement.
SELECT wish.DB_year, wish.gold, wish1.DB_year, wish1.gold FROM wish wish, wish1 wish1 WHERE (wish.gold<wish1.gold)
mq_transalte() function translate the views or virtual classes into base classes. Qo_optimize_queries() checks all subqueries for rewrite optimizations. And qo_optimize_query() function optimize a single select statement.
‘QO_ENV’ is the repository of all optimizer data structures. ‘QO_ENV’ includes ‘PT_PARSER’ which contains parsing info, ‘PT_NODE, ‘QO_SEGMENT’, ‘QO_NODE’ which is related with query graph, ‘QO_EQCLASS’, ‘QO_TERM’, ‘QO_PLAN’ tree which is the final execution plan.
In QO_ENV’s init steps, parser point and pt_tree point will be set. Nnodes is the count of classes which indicated after ‘from’, nodes is assigned to classes indicated. Nsegs set to couple of Nnode and segs is assigned to QO_SEGMENT. Nterms is count of terms which are indicated after ‘where’, and terms are assigned to QO_TERM. After init the QO_ENV is as bellow:
QO_PLAN is generated after query optimization. In QO_PLAN sarged_terms is scan condition. Vtb1 is pointed to functions which handle temporary result tables. In QO_PLAN, ‘plan_type’ indicates QO_PLAN’s type which has union value of QO_PLANTYPE_SCAN, QO_PLANTYPE_JOIN, QO_PLANTYPE_SORT, QO_PLANTYPE_FOLLOW, QO_PLANTYPE_WORST. When plan_type is QO_PLANTYPE_SCAN, we call this QO_PLAN as scan node which indicates the way of one class scan. In our scenario ‘plan_type’ is ‘QO_PLANTYPE_JOIN’.
The plan_un is scan type and when the value is sequential scan we set it as QO_SCANTYPE_SEG_SCAN, and when the value is index scan we set it as QO_SCANTYPE_INDEX_SCAN. When the plan type is QO_PLANTYPE_JOIN, we call the QO_PLAN as join node. Join node’s plan_type is Join type which value is union of QO_JOINTYPE_NL_JOIN, QO_JOINTYPE_MERGE_JOIN. And outer is the pointer of outer loop class and inner is the pointer of inner loop. Join_term is scan condition for two QO_PLAN’s join result. The QO_PLAN tree structure for our scenario is as bellow:
Query Manager execute XASL tree from client. In query processing, client send XASL tree to server through socket. Server execute statement through XASL tree and in case of select statement, the execution result will be stored in the list file.
Qexec_execute_mainblock() is the key function to perform scan process. In this key function, the XASL tree content will be analyzed to generate SCAN_ID which is sent to storage function to read data.
This module fetches tuples from result file (list file).