Open Source RDBMS - Seamless, Scalable, Stable and Free

한국어 | Login |Register

Configure CUBRID SHARD with Vagrant and Chef Cookbook under 2 minutes


This is the third tutorial in Vagrant/Chef series where I would like to show how to install CUBRID Database in multi VM environment and automatically configure the database sharding. If you haven't read the previous tutorials, at least read Create a CUBRID Database VM with Vagrant and Chef Cookbook under 5 minutes to learn about Vagrant and Chef cookbook we will be using in this tutorial.

In this tutorial you will learn how to create multiple VMs automatically with CUBRID installed on each of them and configured CUBRID SHARD for database sharding. I will tell you that with Vagrant and Chef Cookbook it is as easy as running vagrant up and waiting for less than 2 minutes until everything gets set up.

If you want to learn more about CUBRID SHARD refer to Database Sharding the Right Way: Easy, Reliable, and Open source we have presented at HighLoad++ 2012 conference.

Requirements

Requirements are the same as in the first tutorial. Refer to it for details. In short, you need VirtualBox and Vagrant installed.

Vagrantfile

If you have not already downloaded cubrid-vagrant-1.7.0.tar.gz (226KB), do it now (or check the latest version from Sf.net repo). You will find everything ready for you in that archive. Refer to the previous tutorial for details on what is included.

Now for this tutorial to configure CUBRID HA on multiple VMs, we will add a few more configuration options to our Vagrantfile which comes by default in that archive.

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant::Config.run do |config|
  #config.vm.box_url = "http://files.vagrantup.com/precise64.box"
  config.vm.box = "precise64" # Ubuntu 12.04 x64

  config.vm.define :node1 do |node1_config|
    node1_config.vm.host_name = "node1"
    node1_config.vm.network :hostonly, "10.11.12.13"
  end

  config.vm.define :node2 do |node2_config|
    node2_config.vm.host_name = "node2"
    node2_config.vm.network :hostonly, "10.11.12.14"
  end

  config.vm.customize ["modifyvm", :id, "--memory", 800]

  config.vm.provision :chef_solo do |chef|
    chef.cookbooks_path = "cookbooks"

    chef.json = {
        "cubrid" => {
            "version" => "8.4.3",

            "shard_db" => "sharddb",
            "shard_hosts" => [
              {"node1" => "10.11.12.13"},
              {"node2" => "10.11.12.14"}
            ]
        }
    }

    chef.add_recipe "cubrid"
    chef.add_recipe "cubrid::shard"
  end
end

I have removed the comments this time so that you can see the real code more prominently. In the Vagrantfile inside the archive you can see all the comments.

Number of VMs

The first thing to take care in this scenario, just like when configuring HA is to define how many VMs (hosts) we would like to run in this CUBRID SHARD environment. No matter how many, each host will run an independent CUBRID Server node and have a database which will serve as one of the shards of the "single logical" database.

In the above Vagrantfile you can notice the following lines:

config.vm.define :node1 do |node1_config|
   node1_config.vm.host_name = "node1"
   node1_config.vm.network :hostonly, "10.11.12.13"
 end

 config.vm.define :node2 do |node2_config|
   node2_config.vm.host_name = "node2"
   node2_config.vm.network :hostonly, "10.11.12.14"
 end

This means that we want Vagrant to build for us two VMs. The first VM should be distinguished by node1 name and should be assigned 10.11.12.13 IP address. The second VM will have a hostname node2 with an IP 10.11.12.14. In this example both hostname and IP are arbitrary, i.e. you can set your own, but you should remember to set IP within the same netmask (refer to Multiple Networks in Vagrant for details).

This is how you define the number of VMs to start up. Unlike with HA configuration where the first VM which gets started becomes a "master" node, in CUBRID SHARD the last node to be configured will start the CUBRID SHARD Service and serve as a SHARD Broker. This is the case because in order to start the SHARD Service all shard nodes must be up. Thus, configuring starting the SHARD Service on the last node will ensure us that all previous nodes are up.

VM memory size

In this example we set each VM to have 800MB RAM. This is enough, though you can set lower or higher values.

config.vm.customize ["modifyvm", :id, "--memory", 800]

If you need to have different memory on each VM, you can do so by specifying this same property when you define each VM.

Configure CUBRID SHARD

chef.json = {
    "cubrid" => {
        "version" => "8.4.3",

        "shard_db" => "sharddb",
        "shard_hosts" => [
          {"node1" => "10.11.12.13"},
          {"node2" => "10.11.12.14"}
        ]
    }
}

chef.add_recipe "cubrid"
chef.add_recipe "cubrid::shard"
  1. First of all, you need to add "cubrid::shard" recipe to have CUBRID SHARD configured.
  2. Second, you need to provide a database name via shard_db attribute which you want to create and have it shared among all nodes.
  3. Third, you need to provide a list of hosts and IPs where each database shard will be located. For this override shard_hosts attribute. These hosts and IPs must be identical to those you have defined for each VM above.
  4. You can optionally override shard_broker_port attribute which defaults to 45011.
  5. Also optionally you can specify shard_user and shard_user_password attributes if you want to create a specific database user for your sharded database. By default, a shard user will be created with shard123 password.

That is all you need to configure CUBRID SHARD. Now let's go and bring up our CUBRID SHARD.

Vagrant Box

I will assume that you have already added precise64 (Ubuntu 12.04 LTS x64) Vagrant box. If you haven't, see the first tutorial.

Start Up Vagrant

Run the following command to start Vagrant:

$ vagrant up

Wait about 2 minutes and you will have your VMs up and running in CUBRID SHARD configured.

Test CUBRID SHARD

Validate CUBRID SHARD

Let's first validate and see if CUBRID SHARD has been properly configured.

Open SSH connection

For this we will open SSH connection and login to the last node2 VM because on this node the SHARD Broker is configured. Remember that the last VM which gets started will start the CUBRID SHARD Service.

$ vagrant ssh node2
Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-23-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

100 packages can be updated.
50 updates are security updates.

Welcome to your Vagrant-built virtual machine.
Last login: Tue Jan 15 06:47:07 2013 from 10.0.2.2
vagrant@node2:~$

To connect to other nodes, simply change the node name as:

$ vagrant ssh node1

Check CUBRID shard status

$ cubrid shard status -f
@ cubrid shard status
% shard  - shard_cas [12366,45011] /opt/cubrid/log/broker//shard.err 
 JOB QUEUE:0, AUTO_ADD_APPL_SERVER:ON, SQL_LOG_MODE:ALL:100000, SLOW_LOG:ON
 LONG_TRANSACTION_TIME:60.00, LONG_QUERY_TIME:60.00, SESSION_TIMEOUT:300
 KEEP_CONNECTION:ON, ACCESS_MODE:RW, MAX_QUERY_TIMEOUT:0
----------------------------------------------------------------------------------------------------------------------------------------------------------
PROXY_ID SHARD_ID   CAS_ID   PID   QPS   LQS PSIZE STATUS          LAST ACCESS TIME               DB             HOST   LAST CONNECT TIME    SQL_LOG_MODE 
----------------------------------------------------------------------------------------------------------------------------------------------------------
       1        0        1 12374     0     0 55088 IDLE         2013/01/15 04:32:21    sharddb@node1            node1 2013/01/15 04:32:21               - 
       1        1        1 12375     0     0 55088 IDLE         2013/01/15 04:32:21    sharddb@node2            node2 2013/01/15 04:32:21               - 

We can notice that CUBRID SHARD Service is running and our sharddb database has been successfully started on both node1 and node2. You can also notice that the SHARD Broker is listening on port 45011.

If you execute this command on other nodes, you will not see similar result because they are not configured to serve as a SHARD Broker, though they can be and in that case they can serve as a Broker failover solution.

Create a common database schema

If you have already read Database Sharding the Right Way: Easy, Reliable, and Open source, you should know that in order to insert and retrieve data from multiple shards, we need to have identical table schema on all shards. Those tables which are present only on some of the shards will not be sharded, in which case they will be just like any other ordinary tables.

So, let's create a common table on all our nodes to store our blog posts.

CREATE TABLE posts (
post_id INT PRIMARY KEY,
post_title VARCHAR(255) NOT NULL
);

Execute this query in your favorite tool. You can use CUBRID Manager, CUBRID Query Browser administration tools, or CSQL command line tool. In this example I will use CSQL on our node1. Remember that you need to execute this query on all nodes to have identical schema.

vagrant@node1:~$ csql -u shard sharddb@localhost
Enter Password : 

CUBRID SQL Interpreter


Type `;help' for help messages.

csql> CREATE TABLE posts (
csql> post_id INT PRIMARY KEY,
csql> post_title VARCHAR(255) NOT NULL
csql> );
SQL statement execution time:     0.008398 sec

Current transaction has been committed.

1 command(s) successfully processed.
csql> ;ex
vagrant@node1:~$

Update shard_key.txt configuration file

Since we have create a new table which we want to shard between multiple nodes, we need to specify which of the columns of this table must be used as shard_key to identify the shard_id. For this we need to update the shard_key.txt configuration file in /opt/cubrid/conf/ directory on node2 where our SHARD Service is started.

#
# Cookbook Name:: cubrid
# Templates:: shard_keys.txt
#
[%post_id]
#min	max	shard_id
0	31	0
32	63	1
64	95	0
96	127	1
128	159	0
160	191	1
192	223	0
224	255	1

The above will instruct CUBRID to use post_id column as the shard_key for our table. You can define one shard key for each table.

By default CUBRID SHARD will use 256 modulus sharding strategy. You can define you own or even feed CUBRID SHARD with an external library which calculates the sharding id. In the above example, all records where post_id is between 0 and 31 inclusive will be stored in shard 0, in our case it's node1. All records where post_id is between 32 and 63 inclusive will be stored in shard 1, in our case it's node2. This goes on from 0 up to 255 (modulus - 1). These shard_id and host_name mappings are defined in shard_connection.txt configuration file in /opt/cubrid/conf/ directory in case you wonder.

Once we have updated the shard_key.txt configuration file, we need to restart our SHARD Service.

vagrant@node2:~$ cubrid shard restart
@ cubrid shard stop
++ cubrid shard stop: success
@ cubrid shard start
++ cubrid shard start: success

Insert sample data

Now, let's insert data by connecting to our SHARD Broker on port 45011. For this we will write a short Java program which will establish a database connection using CUBRID JDBC driver.

import cubrid.jdbc.driver.CUBRIDConnection;
import cubrid.jdbc.driver.CUBRIDPreparedStatement;
import cubrid.jdbc.driver.CUBRIDResultSet;

import java.sql.*;

public class SHARDBrokerInsertExample {
    static  {
        try {
            Class.forName("cubrid.jdbc.driver.CUBRIDDriver");
        } catch (ClassNotFoundException e) {
            throw new RuntimeException(e);
        }
    }

    public static CUBRIDConnection connect() {
        Connection conn = null;

        try {
            conn = DriverManager.getConnection("jdbc:cubrid:10.11.12.14:45011:sharddb:::?charSet=utf8", "shard", "shard123");
            System.out.println("Connected!");
        } catch ( Exception e ) {
            System.err.println("Error: " + e.getMessage());
        }

        return (CUBRIDConnection)conn;
    }

    public static void printdata(ResultSet rs) {
        try {
            ResultSetMetaData rsmd = null;

            rsmd = rs.getMetaData();

            int numberofColumns = rsmd.getColumnCount();
            System.out.println("Number of columns: " + numberofColumns);

            for(int j = 1; j <= numberofColumns; ++j ) {
                System.out.print(rsmd.getColumnName(j) + ", ");
            }

            System.out.println("");

            while (rs.next ()) {
                for(int j = 1; j <= numberofColumns; ++j ) {
                    System.out.print(rs.getString(j) + "  ");
                }

                System.out.println("");
            }

        } catch ( Exception e ) {
            System.err.println("SQLException : " + e.getMessage());
        }
    }

    public static void main(String[] args) throws Exception {
        CUBRIDConnection conn = null;

        conn = connect();

        try {
            String sql = "INSERT INTO posts VALUES (/*+ shard_key */ ?, ?)";

            CUBRIDPreparedStatement preStmt = (CUBRIDPreparedStatement)conn.prepareStatement(sql);
            preStmt.setInt(1, 1);
            preStmt.setString(2, "SHARD 0: Post 1 Title");

            preStmt.executeInsert();

            preStmt.setInt(1, 32);
            preStmt.setString(2, "SHARD 1: Post 2 Title");

            preStmt.executeInsert();

            preStmt.setInt(1, 128);
            preStmt.setString(2, "SHARD 0: Post 3 Title");

            preStmt.executeInsert();

            preStmt.setInt(1, 224);
            preStmt.setString(2, "SHARD 1: Post 4 Title");

            preStmt.executeInsert();

            String selectSql = "SELECT * FROM posts WHERE post_id = /*+ shard_key */ 128";

            Statement stmt = conn.createStatement();
            ResultSet rs = stmt.executeQuery(selectSql);

            printdata(rs);

            stmt.close();

            conn.close();

            System.out.println("Connection is closed");
        } catch ( Exception e ) {
            System.err.println("SQLException : " + e.getMessage());
            System.err.println(e);
        } finally {
            if ( conn != null ) conn.close();
        }
    }
}

There are two important things in this code which relate to database sharding.

  1. When inserting a record, give a hint to SHARD Broker which of the columns should be used as shard_key to identify the shard_id. This is done by adding /*+ shard_key */ SQL hint to the INSERT query before the value or the placeholder as shown above.
  2. Likewise, when selecting a data, give a hint to SHARD Broker which of the columns should be used as shard_key. This is done by adding /*+ shard_key */ SQL hint to the SELECT query in WHERE clause before the value or the placeholder as shown above.

If you compile and run this program, you will see the following output:

Connected!
Number of columns: 2
post_id, post_title, 
128  SHARD 0: Post 3 Title  
Connection is closed

Conclusion

In this tutorial you have learnt how to create multiple VMs automatically with CUBRID installed on each of them and configured CUBRID SHARD. As you can see, Vagrant along with Chef cookbook allow us to work with CUBRID Database and other software very easily. It is so much convenient and saves so much of our time. Now creating a development environment is no more painful.

If you have questions, feel free to ask at CUBRID Q&A site, forum, our Facebook page, or Twitter. If you have issues or feature requests to cubrid-cookbook, create a new issue at its Github repo.

comments powered by Disqus
Page info
viewed 4097 times
translations en
Author
posted last year by
Esen Sagynov
Contributors
updated last year by
View revisions
Share this article