Open Source RDBMS - Seamless, Scalable, Stable and Free

한국어 | Login |Register

Versions available for this page: CUBRID 8.4.1 |  CUBRID 8.4.3 |  CUBRID 9.0.0 | 

Rebuilding Replication

Replication rebuilding is required in CUBRID HA when data in the CUBRID HA group is inconsistent because of multiple failures in multiple-slave node structure, or because of a generic error. Rebuilding replications in CUBRID HA is perform done through a ha_make_slavedb.sh script. With the cubrid applyinfo utility, you can check the replication progress; however replication inconsistency is not detected. If you want to determine whether replication is inconsistent correctly, you must examine data of the master and slave nodes yourself.

For rebuilding replications, the following environment must be the same in the slave, master, and replica nodes.

  • CUBRID version
  • Environmental variable ($CUBRID, $CUBRID_DATABASES, $LD_LIBRARY_PATH, $PATH)
  • The paths of database volume, log, and replication
  • Username and password of the Linux server
  • HA-related parameters except for ha_mode and ha_copy_sync_mode, ha_ping_hosts
ha_make_slavedb.sh Script

To rebuild replications, use the ha_make_slavedb.sh script. This script is located in $CUBRID/share/scripts/ha. Before rebuilding replications, the following items must be configured for the environment of the user. This script is supported since the version 2008 R2.2 Patch 9 and its configuration is different from 2008 R4.1 Patch 2 or earlier. This document describes it in CUBIRD 2008 R4.1 Patch 2 or later.

  • target_host: The host name of the source node (master node in general) for rebuilding replication. It should be registered in /etc/hosts. A slave node can be replicated as the master node or the replica node. A replica node can be replicated and rebuilt as another replica node.
  • repl_log_home: Specifies the home directory of the replication log of the master node. It is usually the same as $CUBRID_DATABASES. You must enter an absolute path and should not use a symbolic link. You also cannot use a slash (/) after the path.

The following are optional items:

  • db_name: Specifies the name of the database to be replicated. If not specified, the first name that appears in ha_db_list in $CUBRID/conf/cubrid_ha.conf is used.
  • backup_dest_path: Specifies the path in which the backup volume is created when executing backupdb in source node for rebuilding replication.
  • backup_option: Specifies necessary options when executing backupdb in source node in which replication will be rebuilt.
  • restore_option: Specifies necessary options when executing restoredb in slave node in which replication will be rebuilt.
  • scp_option: Specifies the scp option which enables backup of source node in which replication is rebuilt to copy into the slave node. The default option is -l 131072, which does not impose a overload on network (limits the transfer rate to 16 MB).

Once the script has been configured, execute the ha_make_slavedb.sh script in slave node in which replication will be rebuilt. When the script is executed, rebuilding replication happens in a number of phases. To move to the next stage, the user must enter an appropriate value. The following are the descriptions of available values.

  • yes: Keeps going.
  • no: Does not move forward with any stages from now on.
  • skip: Skips to the next stage. This input value is used to ignore a stage that has not necessarily been executed when retrying the script after it has failed.
Constraints
  • Remote ssh connection must be available when using the script because it executes connection commands in the remote node by using expect and ssh.
  • Online backup of rebuilding replication node: Existing backup of the replica or slave nodes cannot be used for rebuilding replication. Therefore, you must use the online backup of the master node that is automatically created by the script.
  • Error while executing the rebuilding replication script: The rebuilding replication script is not automatically rolled back to its previous stage even when an error occurs during the execution. This is because the slave node cannot provide normal service before rebuilding replication script is executed. To return to the phase before rebuilding replication script is executed, you must back up the existing replication logs and db_ha_apply_info information which is internal catalog of the master and slave nodes before building replication is executed.
Remark

To replicate, you must copy the physical image of the database volume in the target node to the database of the node to be replicated. However, cubrid unloaddb backs up only logical images so replication using cubrid unloaddb and cubrid loaddb is unavailable. Because cubrid backupdb backs up physical images, replication is possible by using this utility. The ha_make_slavedb.sh script performs replication by using cubrid backupdb.

Example

The following example shows how to configure an original node for rebuilding replications as a master mode and rebuild a slave node from the master node.

admin_ha_scenario_rebuild.png

  • The host name in master node: nodeA
  • The host name in slave node: nodeB

Rebuilding replications can be performed while the master node is running, however, it is recommended to execute this when there are just a few transactions per hour in order to minimize replication delay.

Before starting to rebuild replications by executing the ha_make_slavedb.sh script, stop the HA service of the slave node and configure the ha_make_slavedb.sh script as shown below. Configure the host name of the master node to replicate (nodeA) to target_host and configure the nome directory of the replication log (default value: $CUBRID_DATABASES) to repl_log_home.

[nodeB]$ cubrid heartbeat stop

 

[nodeB]$ cd $CUBRID/share/scripts/ha

[nodeB]$ vi ha_make_slavedb.sh

target_host=nodeA

After configuration, execute the ha_make_slavedb.sh script on the slave node.

[nodeB]$ cd $CUBRID/share/scripts/ha

[nodeB]$ ./ha_make_slavedb.sh

When any error occurs while executing the script in step-by-step order, or if the script should be restarted before being stopped by entering n, you can enter s for the steps which have been succeeded and go to the next step.

  1. At this step, enter the password of a Linux account and password of DBA, the CUBRID database account, for HA rebuilding replication. Enter y to the question.

    ##### step 1 ###################################################################

    #

    # get HA/replica user password and DBA password

    #

    #  * warning !!!

    #   - Because ha_make_slavedb.sh use expect (ssh, scp) to control HA/replica node,

    #     the script has to know these passwords.

    #

    ################################################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

    Enter the password of a Linux account of the HA node and the password of DBA, the CUBRID database account. If you have not changed the password of DBA after installing CUBRID, press the <Enter> key without entering the password of DBA.

    HA/replica cubrid_usr's password :

    HA/replica cubrid_usr's password :

     

    testdb's DBA password :

    Retype testdb's DBA password :

  2. At this step, check whether the environment variables of the slave node are correct. Enter y to the question.

    ##### step 2 ###################################################################

    #

    #  ha_make_slavedb.sh is the script for making slave database more easily

    #

    #  * environment

    #   - db_name           : testdb

    #

    #   - master_host       : nodeA

    #   - slave_host        : nodeB

    #   - replica_hosts     :

    #

    #   - current_host      : nodeB

    #   - current_state     : slave

    #

    #   - target_host       : nodeA

    #   - target_state      : master

    #

    #   - repl_log_home     : /home/cubrid_usr/CUBRID/databases

    #   - backup_dest_path  : /home/cubrid_usr/.ha/backup

    #   - backup_option     :

    #   - restore_option    :

    #

    #  * warning !!!

    #   - environment on slave must be same as master

    #   - database and replication log on slave will be deleted

    #

    ################################################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

  3. At this step, copy the HA-related scripts of the slave node to the master node. Enter y to the question. Then the password will be asked for when you access the master node in every step. In addition, the password will be asked for when you send a file by using the scp command.

    ##### step 3 ###################################################################

    #

    #  copy scripts to master node

    #

    #  * details

    #   - scp scripts to '~/.ha' on nodeA(master).

    #

    ################################################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

     

    [nodeB]$ tar -zcf ha.tgz ha

    [nodeA]$ rm -rf /home/cubrid_usr/.ha

    cubrid_usr@nodeA's password:

    Connection to nodeA closed.

    [nodeB]$ scp -l 131072 -r CUBRID/share/scripts/ha/../ha.tgz nodeA:/home1/brightest

    cubrid_usr@nodeA's password:

    ha.tgz

    10KB  10.2KB/s   00:00

    [nodeA]$ tar -zxf ha.tgz

    cubrid_usr@nodeA's password:

    Connection to nodeA closed.

    [nodeA]$ mv ha /home/cubrid_usr/.ha

    cubrid_usr@nodeA's password:

    Connection to nodeA closed.

    [nodeA]$ mkdir /home/cubrid_usr/.ha/backup

    cubrid_usr@nodeA's password:

    Connection to nodeA closed.

    To skip the password entry while executing the scp command, configure the secret key of the scp to the slave node and the public key to the master node, as shown below. For more details, see How to Use ssh-keygen for Linux.

    1. Execute ssh-keygen -t rsa to check that .ssh/id_rsa file and .ssh/id_rsa.pub file have been created under the home directory of the Linux user account.
    2. Copy the id_rsa.pub file as a file named authorized_keys under the home directory of the Linux user account in master node.
    3. Execute a test to check that the file is copied without asking for the password (scp test.txt cubrid_usr@:/home/cubrid_usr/).
  4. At this step, copy the HA-related scripts to the replica node. In this scenario, if there is no replica node, skip this step and go to the next step by entering s.

    ##### step 4 #####################################

    #

    #  copy scripts to replication node

    #

    #  * details

    #   - scp scripts to '~/.ha' on replication node.

    #

    ##################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

     

    There is no replication server to copy scripts.

  5. At this step, check whether the environment variables of all nodes are correct. Enter y to the question.

    ##### step 5 ###################################################################

    #

    #  check environment of all ha node

    #

    #  * details

    #   - test $CUBRID == /home1/ha_qaf/CSUS-7524_Apricot

    #   - test $CUBRID_DATABASES == /home1/ha_qaf/DB

    #   - test -d /home1/ha_qaf/DB/demodb

    #

    ################################################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

  6. At this step, stop replication of the master node. Enter y to the question.

    ##### step 6 ###################################################################

    #

    #  suspend copylogdb/applylogdb on master if running

    #

    #  * details

    #   - deregister copylogdb/applylogdb on nodeA(master).

    #

    ################################################################################

       continue ? ([y]es / [n]o / [s]kip) : y

     

    [nodeA]$ sh /home/cubrid_usr/.ha/functions/ha_repl_suspend.sh -l /home/cubrid_usr/CUBRID/databases -d testdb -h nodeB -o /home/cubrid_usr/.ha/repl_utils.output

    cubrid_usr@nodeA's password:

    [nodeA]$ cubrid heartbeat deregister 9408

    suspend: (9408) cub_admin copylogdb -L /home/cubrid_usr/CUBRID/databases/testdb_nodeB -m sync testdb@nodeB

    [nodeA]$ cubrid heartbeat deregister 9410

    suspend: (9410) cub_admin applylogdb -L /home/cubrid_usr/CUBRID/databases/testdb_nodeB --max-mem-size=300 testdb@localhost

     

     

    3. heartbeat status on nodeA(master).

     

    [nodeA]$ cubrid heartbeat list

    @ cubrid heartbeat list

     

     HA-Node Info (current nodeA, state master)

       Node nodeB (priority 2, state unknown)

       Node nodeA (priority 1, state master)

     

     

     HA-Process Info (master 8362, state master)

       Copylogdb testdb@nodeB:/home/cubrid_usr/CUBRID/databases/testdb_nodeB (pid 9408, state deregistered)

       Server testdb (pid 9196, state registered_and_active)

     

    Connection to nodeA closed.

    Wait for 60s to deregister coppylogdb/applylogdb.

    ............................................................

  7. At this step, delete the old replication log from the slave node and initialize the HA meta information table of the master node. Enter y to the question.

    ##### step 7 ###################################################################

    #

    #  remove old copy log of slave and init db_ha_apply_info on master

    #

    #  * details

    #   - remove old copy log of slave

    #   - init db_ha_apply_info on master

    #

    ################################################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

     

    - 1. remove old copy log.

     

    [nodeA]$ rm -rf /home/cubrid_usr/CUBRID/databases/testdb\_nodeB/*

    cubrid_usr@nodeA's password:

    Connection to nodeA closed.

     

     - 2. init db_ha_apply_info.

     

    [nodeA]$ csql -C -u dba  --sysadm testdb@localhost -c "delete from db_ha_apply_info where db_name='testdb'"

    cubrid_usr@nodeA's password:

    Connection to nodeA closed.

    [nodeA]$ csql -C -u dba  --sysadm testdb@localhost -c "select * from db_ha_apply_info where db_name='testdb'"

    cubrid_usr@nodeA's password:

     

    === <Result of SELECT Command in Line 1> ===

     

    There are no results.

    Connection to nodeA closed.

  8. At this step, initialize the HA meta information table of replica node. In this scenario, if there is no replica node, skip this step and go to the next step by entering s.

    ##### step 8 ###################################################################

    #

    #  remove old copy log of slave and init db_ha_apply_info on replications

    #

    #  * details

    #   - remove old copy log of replica

    #   - init db_ha_apply_info on master

    #

    ################################################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

     

    There is no replication server to init ha_info

  9. At this step, create a backup volume from the master node (target_host) for HA replication rebuilding. You can skip this step and go to the next step by entering s if there is an existing backup volume. There are some constraints for rebuilding replication by using the existing backup volume, which are as follows:
    • The archive log, including the transaction being executed during backup, must be in the master node (target_host); this means that a backup volumen created long ago cannot be used.
    • The backup status information file must be created by using the -o option during backup. At this time, the path must be identical to the path of the backup volume file. The file name must be the db_name.bkup.output format. If the file name is not identical with the format, change the file name according to the format before executing the script.
    • The path of the existing backup volume and the status information file must be specified in the backup_dest_path parameter in the script. In other words, specify the absolute path of the directory containing the backup volume on the master node (target_host) to this parameter.

    ##### step 9 ###################################################################

    #

    #  online backup database  on master

    #

    #  * details

    #   - run 'cubrid backupdb -C -D ... -o ... testdb@localhost' on master

    #

    ################################################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

     

    [nodeA]$ cubrid backupdb  -C -D /home/cubrid_usr/.ha/backup -o /home/cubrid_usr/.ha/backup/testdb.bkup.output testdb@localhost

    cubrid_usr@nodeA's password:

    Backup Volume Label: Level: 0, Unit: 0, Database testdb, Backup Time: Thu Apr 19 18:52:03 2012

    Connection to nodeA closed.

    [cubrid_usr@nodeA]$ cat /home/cubrid_usr/.ha/backup/testdb.bkup.output

    cubrid_usr@nodeA's password:

    [ Database(testdb) Full Backup start ]

     

    - num-threads: 2

     

    - compression method: NONE

     

    - backup start time: Thu Apr 19 18:52:03 2012

     

    - number of permanent volumes: 1

     

    - HA apply info: testdb 1334739766 715 8680

     

    - backup progress status

     

    -----------------------------------------------------------------------------

     volume name                  | # of pages | backup progress status    | done

    -----------------------------------------------------------------------------

     testdb_vinf                  |          1 | ######################### | done

     testdb                       |       6400 | ######################### | done

     testdb_lgar000               |       6400 | ######################### | done

     testdb_lgar001               |       6400 | ######################### | done

     testdb_lginf                 |          1 | ######################### | done

     testdb_lgat                  |       6400 | ######################### | done

    -----------------------------------------------------------------------------

     

    # backup end time: Thu Apr 19 18:52:06 2012

     

    [ Database(testdb) Full Backup end ]

    Connection to nodeA closed.

  10. At this step, copy the database backup of the master node to the slave node. Enter y to the question.

    ##### step 10 ###################################################################

    #

    #  copy testdb databases backup to current host

    #

    #  * details

    #   - scp databases.txt from target host if there's no testdb info on current host

    #   - remove old database and replication log if exist

    #   - make new database volume and replication path

    #   - scp  database backup to current host

    #

    ################################################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

     

     

     - 1. check if the databases information is already registered.

     

     

     - thres's already testdb information in /home/cubrid_usr/CUBRID/databases/databases.txt

    [nodeB]$ grep testdb /home/cubrid_usr/CUBRID/databases/databases.txt

    testdb          /home/cubrid_usr/CUBRID/databases/testdb        nodeA:nodeB /home/cubrid_usr/CUBRID/databases/testdb/log file:/home/cubrid_usr/CUBRID/databases/testdb/lob

     

     - 2. get db_vol_path and db_log_path from databases.txt.

     

     

     - 3. remove old database and replication log.

     

    [nodeB]$ rm -rf /home/cubrid_usr/CUBRID/databases/testdb/log

    [nodeB]$ rm -rf /home/cubrid_usr/CUBRID/databases/testdb

    [nodeB]$ rm -rf /home/cubrid_usr/CUBRID/databases/testdb_*

     

     - 4. make new database volume and replication log directory.

     

    [nodeB]$ mkdir -p /home/cubrid_usr/CUBRID/databases/testdb

    [nodeB]$ mkdir -p /home/cubrid_usr/CUBRID/databases/testdb/log

    [nodeB]$ mkdir -p /home/cubrid_usr/.ha

    [nodeB]$ rm -rf /home/cubrid_usr/.ha/backup

    [nodeB]$ mkdir -p /home/cubrid_usr/.ha/backup

     

     - 5. copy backup volume and log from target host

     

    cubrid_usr@nodeA's password:

    testdb_bkvinf                                                                                     100%   49     0.1KB/s   00:00

    cubrid_usr@nodeA's password:

    testdb_bk0v000                                                                                    100% 1540MB   7.8MB/s   03:18

    testdb.bkup.output                                                                                100% 1023     1.0KB/s   00:00

  11. At this step, restore the copied database backup to the slave node. Enter y to the question.

    ##### step 11 ###################################################################

    #

    #  restore database testdb on current host

    #

    #  * details

    #   - cubrid restoredb -B ... testdb current host

    #

    ################################################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

     

    [nodeB]$ cubrid restoredb -B /home/cubrid_usr/.ha/backup  testdb

  12. At this step, configure the HA meta information table value of the slave node. Enter y to the question.

    ##### step 12 ###################################################################

    #

    #  set db_ha_apply_info on slave

    #

    #  * details

    #   - insert db_ha_apply_info on slave

    #

    ################################################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

     

     

     

    1. get db_ha_apply_info from backup output(/home/cubrid_usr/.ha/backup/testdb.bkup.output).

     

     - dn_name       : testdb

     - db_creation   : 1334841057

     - pageid        : 78

     - offset        : 7912

     - log_path      : /home/cubrid_usr/CUBRID/databases/testdb_nodeA

     

     

     

    2. select old db_ha_apply_info.

     

    [nodeB]$ csql -u dba -S testdb -l -c "SELECT db_name, db_creation_time, copied_log_path, page_id, offset, required_page_id FROM db_ha_apply_info WHERE db_name='testdb'"

     

    === <Result of SELECT Command in Line 1> ===

     

    There are no results.

     

     

     

    3. insert new db_ha_apply_info on slave.

     

    [nodeB]$ csql --sysadm -u dba -S testdb -c "DELETE FROM db_ha_apply_info WHERE db_name='testdb'"

    [nodeB]$ csql --sysadm -u dba -S testdb -c "INSERT INTO  db_ha_apply_info VALUES (       'testdb',       datetime '04/19/2012 22:10:57',         '/home/cubrid_usr/CUBRID/databases/testdb_nodeA',        -1, -1,         NULL,   NULL,   0,      0,      0,      0,      0,      0,      0,      78,     NULL )"

    [nodeB]$ csql -u dba -S testdb -l -c "SELECT db_name, db_creation_time, copied_log_path, page_id, offset, required_page_id FROM db_ha_apply_info WHERE db_name='testdb'"

     

    === <Result of SELECT Command in Line 1> ===

     

    <00001> db_name         : 'testdb'

            db_creation_time: 10:10:57.000 PM 04/19/2012

            copied_log_path : '/home/cubrid_usr/CUBRID/databases/testdb_nodeA'

            page_id         : -1

            offset          : -1

            required_page_id: 78

  13. At this step, initial the replication log of the master node and then copy the storage log of the master node to the slave node. Enter y to othe question.

    ##### step 13 ###################################################################

    #

    #  make initial replication active log on master, and copy archive logs from

    #  master

    #

    #  * details

    #   - remove old replication log on master if exist

    #   - start copylogdb to make replication active log

    #   - copy archive logs from master

    #

    ################################################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

     

     

     - 1. remove old replicaton log.

     

    [nodeB]$ rm -rf /home/cubrid_usr/CUBRID/databases/testdb_nodeA

    [nodeB]$ mkdir -p /home/cubrid_usr/CUBRID/databases/testdb_nodeA

     

     - 2. start copylogdb to initiate active log.

     

     

     - cubrid service stop

    [nodeB]$ cubrid service stop >/dev/null 2>&1

     

     - start cub_master

    [nodeB]$ cub_master >/dev/null 2>&1

     

     - start copylogdb and wait until replication active log header to be initialized

    [nodeB]$ cub_admin copylogdb -L /home/cubrid_usr/CUBRID/databases/testdb_nodeA -m 3 testdb@nodeA >/dev/null 2>&1 &

     

    ...

     

     - cubrid service stop

    [nodeB]$ cubrid service stop >/dev/null 2>&1

     

     - check copied active log header

    [nodeB]$  cubrid applyinfo -L /home/cubrid_usr/CUBRID/databases/testdb_nodeA testdb | grep -wqs "DB name"

     

     - 3. copy archive log from target.

     

    cubrid_usr@nodeA's password:

    testdb_lgar000                                                                                    100%  512MB   3.9MB/s   02:11

  14. At this step, restart the copylogdb process and the applylogdb process of the master node. Enter y to the question.

    ##### step 14 ###################################################################

    #

    #  restart copylogdb/applylogdb on master

    #

    #  * details

    #   - restart copylogdb/applylogdb

    #

    ################################################################################

     

       continue ? ([y]es / [n]o / [s]kip) : y

     

    [nodeA]$ sh /home/cubrid_usr/.ha/functions/ha_repl_resume.sh -i /home/cubrid_usr/.ha/repl_utils.output

    cubrid_usr@nodeA's password:

    nodeA ]$ cub_admin copylogdb -L /home/cubrid_usr/CUBRID/databases/testdb_nodeB -m sync testdb@nodeB >/dev/null 2>&1 &

    resume: cub_admin copylogdb -L /home/cubrid_usr/CUBRID/databases/testdb_nodeB -m sync testdb@nodeB

    nodeA ]$ cub_admin applylogdb -L /home/cubrid_usr/CUBRID/databases/testdb_nodeB --max-mem-size=300 testdb@localhost >/dev/null 2>&1 &

    resume: cub_admin applylogdb -L /home/cubrid_usr/CUBRID/databases/testdb_nodeB --max-mem-size=300 testdb@localhost

     

     - check heartbeat list on (master).

     

    nodeA ]$ cubrid heartbeat list

    @ cubrid heartbeat list

     

     HA-Node Info (current nodeA, state master)

       Node nodeB (priority 2, state unknown)

       Node nodeA (priority 1, state master)

     

     HA-Process Info (master 11847, state master)

       Server testdb (pid 11853, state registered_and_active)

     

     

    Connection to nodeA closed.

  15. At this step, the result of building the slave node is printed to check whether it was successful or failed.

    ##### step 15 ##################################################################

    #

    #  completed

    #

    ################################################################################

After the ha_make_slavedb.sh script has been stopped, check the HA status from the slave node and then run the HA.

[NodeB]$ cubrid heartbeat status

@ cubrid heartbeat list

++ cubrid master is not running.

[NodeB]$ cubrid heartbeat start

@ cubrid heartbeat start

@ cubrid master start

++ cubrid master start: success

 

@ HA processes start

@ cubrid server start: testdb

 

This may take a long time depending on the amount of recovery works to do.

 

CUBRID 9.0

 

++ cubrid server start: success

@ copylogdb start

++ copylogdb start: success

@ applylogdb start

++ applylogdb start: success

++ HA processes start: success

++ cubrid heartbeat start: success

[nodeB ha]$ cubrid heartbeat status

@ cubrid heartbeat list

 

 HA-Node Info (current nodeB, state slave)

   Node nodeB (priority 2, state slave)

   Node nodeA (priority 1, state master)

 

 HA-Process Info (master 26611, state slave)

   Applylogdb testdb@localhost:/home/cubrid_usr/CUBRID/databases/testdb_nodeA (pid 26831, state registered)

   Copylogdb testdb@nodeA:/home/cubrid_usr/CUBRID/databases/testdb_nodeA (pid 26829, state registered)

   Server testdb (pid 26617, state registered)