Background Image

BLOG

Written by SeHun Park on 08/07/2021

 


What is Subquery 

A subquery is a query that appears inside another query statement. Subquery enables us to extract the desired data with a single query. For example, if you need to extract information about employees who have salary that is higher than last year’s average salary, you can use the following subquery: 

subquery1.png

 

It is possible to write a single query as above without writing another query statement to find out the average salary. Subquery like this has various special properties, and their properties vary depending on where they are written.

subquery2.png

  • scalar subquery: A subquery in a SELECT clause. Only one piece of data can be viewed.
  • inline view: A subquery in the FROM clause. Multiple data inquiry is possible.
  • subquery: A subquery in the WHERE clause. It depends on the operator and the properties of the scalar subquery or inline view.

The use of subqueries makes queries more versatile but it can adversely affect query performance.

 

 


The Sequence of Subquery Execution and Causes of Performance Degradation

 

A subquery is always executed before the main query to store temporary results. And as the main query is executed, the temporarily stored data of the subquery is retrieved to obtain the desired result.

 

For example, the subquery is executed first, the result is stored in the temporary storage, and the final result is extracted by checking the condition 'a.pk = 3'.

ccd42a160a189d65cd6b3d43a5e85199.jpg

 

 

As the number of subquery results increases, useless data will be stored during the process. And also, the index cannot be used because it is searched from the temporary data stored in the middle.

 

This kind of query process is very inefficient. Does DBMS execute the above query as we describe? No.

 


View Merging

Removing the in-line view and merging it into the main query is called view merging.

vewmerging_1.jpg

With view merging, there is no need to store temporary data, and index scans for pk are now possible. No matter how the user writes the subquery, if it is possible to merge with the main query, the DBMS proceeds with the merger. The merging of these views serves to remove the constraint of the execution order before going to the OPTIMIZER phase; which is, putting all the tables on the same level and finding the most optimal execution plan.

viewmerging2.jpg

In the case of the above query, the join of tbl2 and tbl3 in the subquery always proceeds together before the view is merged. For example, a join order such as tbl3->tbl1->tbl2 is not possible. The merging of views is ultimately aimed at removing the constraint of the execution order and allowing OPTIMIZER to find the optimal execution plan.

 

CUBRID supports view merging including in-line views from CUBRID 11.2. In the previous version, view merging was performed only for view objects, but this function has been expanded from CUBRID 11.2.

 

 


Subquery UNNSET

This is a rewriting technique applied to the subquery of the WHERE clause. Typically, it targets IN and EXISTS operators.

subquery unnset1.jpg

How is the above query executed?

 

In CUBRID, the IN operator extracts the data of the main query as the result of the subquery. If you think of it as table join order, tbl2 ==> tbl1 order.

 

Then what about other operators?

susbquery unnset2.jpg

The case for EXISTS operator is the opposite. If we express it in join order, the order will be tbl1 ==> tbl2. Depending on how the query is written, there are restrictions on the order of performance. Depending on the amount of data in the subquery and main query, IN operator may be advantageous or vice versa.

 

Overcoming this situation is the SUBQUERY UNNEST technique.

subqueryunnset3.jpg

If it is converted to a join as above, OPTIMIZER can choose which table to look up first. Rewriting the subquery of the WHERE clause as a join is called SUBQUERY UNNEST.

 

One peculiarity is that the IN and EXISTS operators must not affect the result of the main query even if there is duplicate data in the result of the subquery.

 

For that reason, a Semi Join is performed instead of a normal join. A Semi Join stops the search when the first data is found and proceeds to the next search. Semi Join is a join method used to achieve the same result as an operator such as IN.

(However, CUBRID has not supported SUBQUERY UNNEST yet. It is recommended to use IN and EXISTS operators according to the situation.)

 

By rewriting the query, the DBMS removes the constraint on the execution order implied in the query. Converting an OUTER JOIN to an INNER JOIN, or deleting an unnecessary table or query entry, serves the same purpose.

 

OPTIMIZER will be able to create an optimal execution plan in a situation where there is no restriction on the execution order as much as possible. And it will eventually increase query performance and allow DBMS users to get the data they want quickly.

 

Sometimes, when users check the execution plan of a query, it appears completely different from the query and users are confused. Understanding the query rewrite technique will be a great help. Currently, the CUBRID development team is working on rewriting these queries and improving OPTIMIZER. In the next blog, we will talk about what does OPTIMIZER do to find the optimal execution plan.


  1. Become a Jave GC Expert Series 5 : The Principles of Java Application Performance Tuning

    Written by Se Hoon Park on 06/30/2017 This is the fifth article in the series of "Become a Java GC Expert". In the first issue Understanding Java Garbage Collection we have learned about the processes for different GC algorithms, about how GC works, what Young and Old Generation is, what you should know about the 5 types of GC in the new JDK 7, and what the performance implications are for each of these GC types. In the second article How to Monitor Java Garbage Collection we have explained how JVM actually runs the Garbage Collection in the real time, how we can monitor GC, and which tools we can use to make this process faster and more effective. In the third article How to Tune Java Garbage Collection we have shown some of the best options based on real cases as our examples that you can...
    Read More
  2. CUBRID DBLink

    Written by DooHo Kang on 27/06/2022   What is CUBRID DBLink When retrieving information from a database, it is often necessary to retrieve information from an external database. Therefore, it is necessary to be able to search for information on other databases. CUBRID DBLink allows users to use the information on other databases.   CUBRID DBLink provides a function to inquire about information in the databases of homogeneous CUBRID and heterogeneous Oracle and MySQL.   * It is possible to set up multiple external databases, but when searching for information, it is possible to inquire about information from only one other database.   CUBRID DBLink Configuration CUBRID DBLink supports DBLink between homogeneous and heterogeneous DBLinks.   Homogeneous DBLink diagram   If you look at the conf...
    Read More
  3. CUBRID INSIDE: HASH SCAN Method

    Written by SeHun Park on 11/09/2021 - HASH SCAN Hash Scan is a scan method for hash join. Hash Scan is applied in view or hierarchical query. When a subquery such as view is joined as inner, index scan cannot be used. In this case, performance degradation occurs due to repeated inquiry of a lot of data. In this situation, Hash Scan is used. The picture above shows the difference between Nested Loop join and Hash Scan in the absence of an index. In the case of NL join, the entire data of INNER is scanned as many as the number of rows of OUTER. In contrast, Hash Scan scans INNER data once when building a hash data structure and scans OUTER once when searching. Therefore, you can search for the desired data relatively very quickly. Here, the internal structure of Hash Scan is written as the fl...
    Read More
  4. CUBRID INSIDE: Subquery and Query Rewriter (View Merging, Subquery Unnest)

    Written by SeHun Park on 08/07/2021   What is Subquery  A subquery is a query that appears inside another query statement. Subquery enables us to extract the desired data with a single query. For example, if you need to extract information about employees who have salary that is higher than last year’s average salary, you can use the following subquery:    It is possible to write a single query as above without writing another query statement to find out the average salary. Subquery like this has various special properties, and their properties vary depending on where they are written. scalar subquery: A subquery in a SELECT clause. Only one piece of data can be viewed. inline view: A subquery in the FROM clause. Multiple data inquiry is possible. subquery: A subquery in the WHERE clause. I...
    Read More
  5. CUBRID INTERNAL: CUBRID Double Write Buffer

    Written by MyungGyu Kim on 03/08/2022 INTRODUCTION Data in the database is allocated from disk to memory, some data is read and then modified, and some data is newly created and allocated to memory. Such data should eventually be stored on disk to ensure that it is permanently stored. In this article, we will introduce one of the methods of storing data on disk in CUBRID to help you understand the CUBRID database. The current version at the time of writing is CUBRID 11.2. DOUBLE WRITE BUFFER First of all, I would like to give a general description of the definition, purpose, and mechanism of Double Write Buffer. What is Double Write Buffer? By default, CUBRID stores data on disk through Double Write Buffer. Double Write Buffer is a buffer area composed of both memory and disk. By default, t...
    Read More
Board Pagination Prev 1 2 3 4 5 6 Next
/ 6

Join the CUBRID Project on