Background Image

BLOG

?

Shortcut

PrevPrev Article

NextNext Article

Larger Font Smaller Font Up Down Go comment Print Attachment

Written by MyungGyu Kim on 03/08/2022

 

INTRODUCTION

Data in the database is allocated from disk to memory, some data is read and then modified, and some data is newly created and allocated to memory. Such data should eventually be stored on disk to ensure that it is permanently stored. In this article, we will introduce one of the methods of storing data on disk in CUBRID to help you understand the CUBRID database.

 

The current version at the time of writing is CUBRID 11.2.

 


DOUBLE WRITE BUFFER

First of all, I would like to give a general description of the definition, purpose, and mechanism of Double Write Buffer.

 

What is Double Write Buffer?

By default, CUBRID stores data on disk through Double Write Buffer. Double Write Buffer is a buffer area composed of both memory and disk. By default, the size is set to 2M, and the size can be adjusted up to 32M in the cubrid.conf file.

*Note: In CUBRID, the user can store the DB page directly to disk or using Double Write Buffer. In this article, we will only focus on the method of storing DB page using Double Write Buffer.

 

The Purpose 

When storing data on a disk, the Double Write Buffer generated in this way may prevent the corrupted DB page from being stored. The DB page-level corruption occurs for the following reasons:

 

1.jpg

 

In this document, it is assumed that 1 DB page existing in memory consists of 4 OS pages of Linux or Windows. When such a DB page is stored on disk, it is stored through a mechanism called flush.

 

When the DB page is stored to the disk through the flush mechanism, the DB page is divided and stored in units of OS pages. As shown in the figure above, if the system is shut down abruptly when the DB page is not stored on the disk, the DB page is broken. (Note: This disk write is called Partial Write.)

 

The Process 

The DB page of CUBRID is stored on disk as follows through the Double Write Buffer.

 

        2.jpg

 

As above, one DB page composed of 4 OS pages is stored in the Double Write Buffer in memory. After that, it is stored once in the DB internal Double Write Buffer and twice in the DB file. 

 

The process of storing in Double Write Buffer in memory can be described in detail as follows:

  • DB pages are stored in one slot in the Double Write Buffer of memory, and these slots are combined to form a block. (The green square in the figure below is one block).
    • These contiguous blocks are called Double Write Buffer in memory.
  • When one block is full of slots, it is stored to the disk through a mechanism called flush, which proceeds in blocks. 

 

         3.jpg

 

The process of storing DB page from the Double Write Buffer in memory to the DB on disk executes the following two processes in order:

  • A single block full of slots is stored in the DB internal Double Write Buffer.
  • Store the page to disk located in the same location as the stored page in a single block.

 

When data storage in the DB internal Double Write Buffer fails

In this case, in order to prevent the broken data from being stored on disk, the CUBRID server is stopped during the process of the server restart or creating a new Double Write Buffer, so that data not yet written to the disk is written.

 

When data storage in the disk internal DB fails

After allocating the Double Write Buffer stored in the DB to the memory, the page inside the block is compared with the page stored inside the DB. It prevents the DB page from being saved to the disk in a broken state by re-flushing only the broken DB page in the DB area.

 

When the DB page is stored in the DB 

The Double Write Buffer inside the DB remains the same, and the next block is overwritten when the DB page is stored on the disk through the flush. That is, the Double Write Buffer is continuously maintained. 

 


CONCLUSION

Through the above process, the DB page using the Double Write Buffer can be stored and the broken DB page can be prevented from being stored. 

 


REFERENCE

  1. CUBRID Manual - https://www.cubrid.org/manual/en/11.0/
  2. CUBRID Source Code - https://github.com/CUBRID/cubrid

 


  1. CUBRID License Model

    Written by Charis Chau on 06/08/2020   Why Licenses Matter?   Open source licenses allow software to be freely used, modified, and shared. Choosing a DBMS with suitable licenses could save the development cost of your application or the Total Cost of Ownership (TCO) for your company. Choosing a DBMS without a proper license, you might find yourself situate in a legal grey area!     CUBRID Licenses   Unlike other open source DBMS vendors, CUBRID is solely under open source license instead of having a dual license in both commercial license and open source license. Which means that for you, it is not mandatory to purchase a license or annual subscription; company/organizational users can achieve the saving from Total Cost of Ownership (TCO).   Since CUBRID has been open source DBMS from 2008,...
    Read More
  2. CUBRID Internal: Storage Management (Disk Manager, File Manager)

    Written by Jaeeun, Kim on 08/11/2021 Introduction Database, just as its name implies, it needs spaces to store data. CUBRID, the open source DBMS that operates for the operating system allocates as much space as needed from the operating system and uses it efficiently as needed. In this article, we will talk about how CUBRID internally manages the storage to store data in the persistent storage device. Through this article, we hope developers can access the open source database CUBRID more easily. - The content of this article is based on version 10.2.0-7094ba. (However, it seems to be no difference in the latest develop branch, 11.0.0-c83e33. ) CUBRID Storage Management The CUBRID server has multiple modules that operate and manage data complexly and sophisticatedly. Among them, there are ...
    Read More
  3. CUBRID INTERNAL: CUBRID Double Write Buffer

    Written by MyungGyu Kim on 03/08/2022 INTRODUCTION Data in the database is allocated from disk to memory, some data is read and then modified, and some data is newly created and allocated to memory. Such data should eventually be stored on disk to ensure that it is permanently stored. In this article, we will introduce one of the methods of storing data on disk in CUBRID to help you understand the CUBRID database. The current version at the time of writing is CUBRID 11.2. DOUBLE WRITE BUFFER First of all, I would like to give a general description of the definition, purpose, and mechanism of Double Write Buffer. What is Double Write Buffer? By default, CUBRID stores data on disk through Double Write Buffer. Double Write Buffer is a buffer area composed of both memory and disk. By default, t...
    Read More
  4. CUBRID INSIDE: Subquery and Query Rewriter (View Merging, Subquery Unnest)

    Written by SeHun Park on 08/07/2021   What is Subquery  A subquery is a query that appears inside another query statement. Subquery enables us to extract the desired data with a single query. For example, if you need to extract information about employees who have salary that is higher than last year’s average salary, you can use the following subquery:    It is possible to write a single query as above without writing another query statement to find out the average salary. Subquery like this has various special properties, and their properties vary depending on where they are written. scalar subquery: A subquery in a SELECT clause. Only one piece of data can be viewed. inline view: A subquery in the FROM clause. Multiple data inquiry is possible. subquery: A subquery in the WHERE clause. I...
    Read More
  5. CUBRID INSIDE: HASH SCAN Method

    Written by SeHun Park on 11/09/2021 - HASH SCAN Hash Scan is a scan method for hash join. Hash Scan is applied in view or hierarchical query. When a subquery such as view is joined as inner, index scan cannot be used. In this case, performance degradation occurs due to repeated inquiry of a lot of data. In this situation, Hash Scan is used. The picture above shows the difference between Nested Loop join and Hash Scan in the absence of an index. In the case of NL join, the entire data of INNER is scanned as many as the number of rows of OUTER. In contrast, Hash Scan scans INNER data once when building a hash data structure and scans OUTER once when searching. Therefore, you can search for the desired data relatively very quickly. Here, the internal structure of Hash Scan is written as the fl...
    Read More
Board Pagination Prev 1 2 3 4 5 6 Next
/ 6

Join the CUBRID Project on