Background Image

BLOG

Written by SeHun Park on 08/07/2021

 


What is Subquery 

A subquery is a query that appears inside another query statement. Subquery enables us to extract the desired data with a single query. For example, if you need to extract information about employees who have salary that is higher than last year’s average salary, you can use the following subquery: 

subquery1.png

 

It is possible to write a single query as above without writing another query statement to find out the average salary. Subquery like this has various special properties, and their properties vary depending on where they are written.

subquery2.png

  • scalar subquery: A subquery in a SELECT clause. Only one piece of data can be viewed.
  • inline view: A subquery in the FROM clause. Multiple data inquiry is possible.
  • subquery: A subquery in the WHERE clause. It depends on the operator and the properties of the scalar subquery or inline view.

The use of subqueries makes queries more versatile but it can adversely affect query performance.

 

 


The Sequence of Subquery Execution and Causes of Performance Degradation

 

A subquery is always executed before the main query to store temporary results. And as the main query is executed, the temporarily stored data of the subquery is retrieved to obtain the desired result.

 

For example, the subquery is executed first, the result is stored in the temporary storage, and the final result is extracted by checking the condition 'a.pk = 3'.

ccd42a160a189d65cd6b3d43a5e85199.jpg

 

 

As the number of subquery results increases, useless data will be stored during the process. And also, the index cannot be used because it is searched from the temporary data stored in the middle.

 

This kind of query process is very inefficient. Does DBMS execute the above query as we describe? No.

 


View Merging

Removing the in-line view and merging it into the main query is called view merging.

vewmerging_1.jpg

With view merging, there is no need to store temporary data, and index scans for pk are now possible. No matter how the user writes the subquery, if it is possible to merge with the main query, the DBMS proceeds with the merger. The merging of these views serves to remove the constraint of the execution order before going to the OPTIMIZER phase; which is, putting all the tables on the same level and finding the most optimal execution plan.

viewmerging2.jpg

In the case of the above query, the join of tbl2 and tbl3 in the subquery always proceeds together before the view is merged. For example, a join order such as tbl3->tbl1->tbl2 is not possible. The merging of views is ultimately aimed at removing the constraint of the execution order and allowing OPTIMIZER to find the optimal execution plan.

 

CUBRID supports view merging including in-line views from CUBRID 11.2. In the previous version, view merging was performed only for view objects, but this function has been expanded from CUBRID 11.2.

 

 


Subquery UNNSET

This is a rewriting technique applied to the subquery of the WHERE clause. Typically, it targets IN and EXISTS operators.

subquery unnset1.jpg

How is the above query executed?

 

In CUBRID, the IN operator extracts the data of the main query as the result of the subquery. If you think of it as table join order, tbl2 ==> tbl1 order.

 

Then what about other operators?

susbquery unnset2.jpg

The case for EXISTS operator is the opposite. If we express it in join order, the order will be tbl1 ==> tbl2. Depending on how the query is written, there are restrictions on the order of performance. Depending on the amount of data in the subquery and main query, IN operator may be advantageous or vice versa.

 

Overcoming this situation is the SUBQUERY UNNEST technique.

subqueryunnset3.jpg

If it is converted to a join as above, OPTIMIZER can choose which table to look up first. Rewriting the subquery of the WHERE clause as a join is called SUBQUERY UNNEST.

 

One peculiarity is that the IN and EXISTS operators must not affect the result of the main query even if there is duplicate data in the result of the subquery.

 

For that reason, a Semi Join is performed instead of a normal join. A Semi Join stops the search when the first data is found and proceeds to the next search. Semi Join is a join method used to achieve the same result as an operator such as IN.

(However, CUBRID has not supported SUBQUERY UNNEST yet. It is recommended to use IN and EXISTS operators according to the situation.)

 

By rewriting the query, the DBMS removes the constraint on the execution order implied in the query. Converting an OUTER JOIN to an INNER JOIN, or deleting an unnecessary table or query entry, serves the same purpose.

 

OPTIMIZER will be able to create an optimal execution plan in a situation where there is no restriction on the execution order as much as possible. And it will eventually increase query performance and allow DBMS users to get the data they want quickly.

 

Sometimes, when users check the execution plan of a query, it appears completely different from the query and users are confused. Understanding the query rewrite technique will be a great help. Currently, the CUBRID development team is working on rewriting these queries and improving OPTIMIZER. In the next blog, we will talk about what does OPTIMIZER do to find the optimal execution plan.


  1. Getting Started With DBeaver for CUBRID

    Written by Thim Thorn at Phnom Voar Software, Cambodia Introduction DBeaver is a free multi-platform database tool for developers, database administrators, analysts and all people who need to work with databases. It supports all popular databases: MySQL, PostgreSQL, SQLite, Oracle, DB2, SQL Server, Sybase, MS Access, Teradata, Firebird, Apache Hive, Phoenix, Presto, etc. DBeaver is running on Windows, Mac OS X and Linux. This document summarizes how to get started with DBeaver for CUBRID for Windows users. Installing DBeaver on a Windows To install DBeaver, open a web browser and go to dbeaver.io/download/. Click Windows (Installer) under Community Edition. Follow the instructions on the installation screen. At “Choose Components,” if you already have a Java installed on your ma...
    Read More
  2. Getting Started with Cubrid Migration Toolkit Console Mode

    Written by Rathana Va at Phnom Voar Software, Cambodia Introduction CUBRID Migration Toolkit (CMT) Console is a tool to migrate the data and the schema from the source DB (MySQL, Oracle, CUBRID, etc) to the target DB (CUBRID). CMT Console mode is a separate product from the CMT GUI version. It could be useful for some cases like automating migration or linux command line mode. Installation Windows 1. Download through the link: http://ftp.cubrid.org/CUBRID_Tools/CUBRID_Migration_Toolkit/CUBRID-Migration-Toolkit-11.0-latest-windows-x64.zip 2. Extract the Zip file ​ Linux 1. Download through the link using web browser or wget command: http://ftp.cubrid.org/CUBRID_Tools/CUBRID_Migration_Toolkit/CUBRID-Migration-Toolkit-11.0-latest-linux-x86_64.tar.gz 2. Extract the tar.gz file ​ tar -xf CUBRID-...
    Read More
  3. CUBRID INSIDE: Subquery and Query Rewriter (View Merging, Subquery Unnest)

    Written by SeHun Park on 08/07/2021   What is Subquery  A subquery is a query that appears inside another query statement. Subquery enables us to extract the desired data with a single query. For example, if you need to extract information about employees who have salary that is higher than last year’s average salary, you can use the following subquery:    It is possible to write a single query as above without writing another query statement to find out the average salary. Subquery like this has various special properties, and their properties vary depending on where they are written. scalar subquery: A subquery in a SELECT clause. Only one piece of data can be viewed. inline view: A subquery in the FROM clause. Multiple data inquiry is possible. subquery: A subquery in the WHERE clause. I...
    Read More
  4. CUBRID DBLink

    Written by DooHo Kang on 27/06/2022   What is CUBRID DBLink When retrieving information from a database, it is often necessary to retrieve information from an external database. Therefore, it is necessary to be able to search for information on other databases. CUBRID DBLink allows users to use the information on other databases.   CUBRID DBLink provides a function to inquire about information in the databases of homogeneous CUBRID and heterogeneous Oracle and MySQL.   * It is possible to set up multiple external databases, but when searching for information, it is possible to inquire about information from only one other database.   CUBRID DBLink Configuration CUBRID DBLink supports DBLink between homogeneous and heterogeneous DBLinks.   Homogeneous DBLink diagram   If you look at the conf...
    Read More
  5. CUBRID INTERNAL: CUBRID Double Write Buffer

    Written by MyungGyu Kim on 03/08/2022 INTRODUCTION Data in the database is allocated from disk to memory, some data is read and then modified, and some data is newly created and allocated to memory. Such data should eventually be stored on disk to ensure that it is permanently stored. In this article, we will introduce one of the methods of storing data on disk in CUBRID to help you understand the CUBRID database. The current version at the time of writing is CUBRID 11.2. DOUBLE WRITE BUFFER First of all, I would like to give a general description of the definition, purpose, and mechanism of Double Write Buffer. What is Double Write Buffer? By default, CUBRID stores data on disk through Double Write Buffer. Double Write Buffer is a buffer area composed of both memory and disk. By default, t...
    Read More
Board Pagination Prev 1 2 3 4 5 6 Next
/ 6

Join the CUBRID Project on