Background Image

BLOG

?

Shortcut

PrevPrev Article

NextNext Article

Larger Font Smaller Font Up Down Go comment Print Attachment

Written by MyungGyu Kim on 03/08/2022

 

INTRODUCTION

Data in the database is allocated from disk to memory, some data is read and then modified, and some data is newly created and allocated to memory. Such data should eventually be stored on disk to ensure that it is permanently stored. In this article, we will introduce one of the methods of storing data on disk in CUBRID to help you understand the CUBRID database.

 

The current version at the time of writing is CUBRID 11.2.

 


DOUBLE WRITE BUFFER

First of all, I would like to give a general description of the definition, purpose, and mechanism of Double Write Buffer.

 

What is Double Write Buffer?

By default, CUBRID stores data on disk through Double Write Buffer. Double Write Buffer is a buffer area composed of both memory and disk. By default, the size is set to 2M, and the size can be adjusted up to 32M in the cubrid.conf file.

*Note: In CUBRID, the user can store the DB page directly to disk or using Double Write Buffer. In this article, we will only focus on the method of storing DB page using Double Write Buffer.

 

The Purpose 

When storing data on a disk, the Double Write Buffer generated in this way may prevent the corrupted DB page from being stored. The DB page-level corruption occurs for the following reasons:

 

1.jpg

 

In this document, it is assumed that 1 DB page existing in memory consists of 4 OS pages of Linux or Windows. When such a DB page is stored on disk, it is stored through a mechanism called flush.

 

When the DB page is stored to the disk through the flush mechanism, the DB page is divided and stored in units of OS pages. As shown in the figure above, if the system is shut down abruptly when the DB page is not stored on the disk, the DB page is broken. (Note: This disk write is called Partial Write.)

 

The Process 

The DB page of CUBRID is stored on disk as follows through the Double Write Buffer.

 

        2.jpg

 

As above, one DB page composed of 4 OS pages is stored in the Double Write Buffer in memory. After that, it is stored once in the DB internal Double Write Buffer and twice in the DB file. 

 

The process of storing in Double Write Buffer in memory can be described in detail as follows:

  • DB pages are stored in one slot in the Double Write Buffer of memory, and these slots are combined to form a block. (The green square in the figure below is one block).
    • These contiguous blocks are called Double Write Buffer in memory.
  • When one block is full of slots, it is stored to the disk through a mechanism called flush, which proceeds in blocks. 

 

         3.jpg

 

The process of storing DB page from the Double Write Buffer in memory to the DB on disk executes the following two processes in order:

  • A single block full of slots is stored in the DB internal Double Write Buffer.
  • Store the page to disk located in the same location as the stored page in a single block.

 

When data storage in the DB internal Double Write Buffer fails

In this case, in order to prevent the broken data from being stored on disk, the CUBRID server is stopped during the process of the server restart or creating a new Double Write Buffer, so that data not yet written to the disk is written.

 

When data storage in the disk internal DB fails

After allocating the Double Write Buffer stored in the DB to the memory, the page inside the block is compared with the page stored inside the DB. It prevents the DB page from being saved to the disk in a broken state by re-flushing only the broken DB page in the DB area.

 

When the DB page is stored in the DB 

The Double Write Buffer inside the DB remains the same, and the next block is overwritten when the DB page is stored on the disk through the flush. That is, the Double Write Buffer is continuously maintained. 

 


CONCLUSION

Through the above process, the DB page using the Double Write Buffer can be stored and the broken DB page can be prevented from being stored. 

 


REFERENCE

  1. CUBRID Manual - https://www.cubrid.org/manual/en/11.0/
  2. CUBRID Source Code - https://github.com/CUBRID/cubrid

 


  1. CUBRID INTERNAL: CUBRID Double Write Buffer

    Written by MyungGyu Kim on 03/08/2022 INTRODUCTION Data in the database is allocated from disk to memory, some data is read and then modified, and some data is newly created and allocated to memory. Such data should eventually be stored on disk to ensure that it is permanently stored. In this article, we will introduce one of the methods of storing data on disk in CUBRID to help you understand the CUBRID database. The current version at the time of writing is CUBRID 11.2. DOUBLE WRITE BUFFER First of all, I would like to give a general description of the definition, purpose, and mechanism of Double Write Buffer. What is Double Write Buffer? By default, CUBRID stores data on disk through Double Write Buffer. Double Write Buffer is a buffer area composed of both memory and disk. By default, t...
    Read More
  2. CUBRID INSIDE: HASH SCAN Method

    Written by SeHun Park on 11/09/2021 - HASH SCAN Hash Scan is a scan method for hash join. Hash Scan is applied in view or hierarchical query. When a subquery such as view is joined as inner, index scan cannot be used. In this case, performance degradation occurs due to repeated inquiry of a lot of data. In this situation, Hash Scan is used. The picture above shows the difference between Nested Loop join and Hash Scan in the absence of an index. In the case of NL join, the entire data of INNER is scanned as many as the number of rows of OUTER. In contrast, Hash Scan scans INNER data once when building a hash data structure and scans OUTER once when searching. Therefore, you can search for the desired data relatively very quickly. Here, the internal structure of Hash Scan is written as the fl...
    Read More
  3. CUBRID DBLink

    Written by DooHo Kang on 27/06/2022   What is CUBRID DBLink When retrieving information from a database, it is often necessary to retrieve information from an external database. Therefore, it is necessary to be able to search for information on other databases. CUBRID DBLink allows users to use the information on other databases.   CUBRID DBLink provides a function to inquire about information in the databases of homogeneous CUBRID and heterogeneous Oracle and MySQL.   * It is possible to set up multiple external databases, but when searching for information, it is possible to inquire about information from only one other database.   CUBRID DBLink Configuration CUBRID DBLink supports DBLink between homogeneous and heterogeneous DBLinks.   Homogeneous DBLink diagram   If you look at the conf...
    Read More
  4. Become a Jave GC Expert Series 5 : The Principles of Java Application Performance Tuning

    Written by Se Hoon Park on 06/30/2017 This is the fifth article in the series of "Become a Java GC Expert". In the first issue Understanding Java Garbage Collection we have learned about the processes for different GC algorithms, about how GC works, what Young and Old Generation is, what you should know about the 5 types of GC in the new JDK 7, and what the performance implications are for each of these GC types. In the second article How to Monitor Java Garbage Collection we have explained how JVM actually runs the Garbage Collection in the real time, how we can monitor GC, and which tools we can use to make this process faster and more effective. In the third article How to Tune Java Garbage Collection we have shown some of the best options based on real cases as our examples that you can...
    Read More
  5. Become a Jave GC Expert Series 4 : MaxClients in Apache and its effect on Tomcat during Full GC

    Written by Dongsoon Choi on 06/23/2017 This is the fourth article in the series of "Become a Java GC Expert". In the first issue Understanding Java Garbage Collection we have learned about the processes for different GC algorithms, about how GC works, what Young and Old Generation is, what you should know about the 5 types of GC in the new JDK 7, and what the performance implications are for each of these GC types. In the second article How to Monitor Java Garbage Collection we have explained how JVM actually runs the Garbage Collection in the real time, how we can monitor GC, and which tools we can use to make this process faster and more effective. In the third article How to Tune Java Garbage Collection we have shown some of the best options based on real cases as our examples that you c...
    Read More
Board Pagination Prev 1 2 3 4 5 Next
/ 5

Join the CUBRID Project on