Open Source RDBMS - Seamless, Scalable, Stable and Free

한국어 | Login |Register

author
message
[Level:0]idan

Post subject: Scalability and Sharding in CUBRID

registered: 05/13/2012

IP: *.179.214.39

views: 9

I think that cubrid might give an answer to all of my scalability needs. I wanted to know more about it.

1) How can I deploy cubrid on amazon aws?
2) How many servers I need to start with

3) How Cubrid knows which keys to shard against?
4) Does cubrid can automatically add Amazon EC2 servers or i need to add them and tell about it?
5) How can I backup/restore DB, is it easy? and how much time it takes to restore 1TB of data
6) What is the optimal size of each shard? (in GB)
7) recommended environment to run crubid (Windows or Linux), differences if there are any
8) Why I haven't heard about crubid after two weeks of intensive searching? from what I can tell, it should be super popular?

9) Does Curbid support clustering?

10) why should I use curbid over Amazon RDS, both features and price wise?

Please help me out, just starting with Cubrid

I've spent two weeks going over all the possible options, including all the available NoSQL, NewSQL options. All middleware options, building my sharding enviroment myslef, etc, etc. There are solutions out there, but they are so expensive it hurts. I just can't believe I found you guys. Hope that I can develop my new app using Curbid. Really want to find a solution already, a DB that can easily scale with all the features I need.

Quote
[Level:8]CUBRID

# Post subject:Re: Scalability and Sharding in CUBRID

profile

registered: 03/28/2010

IP: *.91.139.66

Hello idan,

Thank you for such great questions! Please give me some time. I will reply you soon.

Quote
[Level:0]idan

# Post subject:Re: Scalability and Sharding in CUBRID

profile

registered: 05/13/2012

IP: *.179.214.39

sure, I will wait... so happy that I came across this solution. Why I didn't hear about it? searching two weeks for an appropriate solution?!

Quote
[Level:8]CUBRID

# Post subject:Re: Scalability and Sharding in CUBRID

profile

registered: 03/28/2010

IP: *.91.139.66

Yes, CUBRID is exactly that FREE database which provides native scalability and availability features on top of competitive performance. Anyway, please find below the answers to your questions.

1) How can I deploy cubrid on amazon aws?

You can deploy CUBRID on AWS just like any other software. Depending on your operating system, follow CUBRID installation instructions.

2) How many servers I need to start with?

It all depends on your application, database size, and usage. If you plan to use CUBRID HA (High Availability) feature, you can start with 2 servers: one master, one slave.

For CUBRID SHARD, again, depends on the database size you plan to have. We will release Sharding feature in a few weeks, perhaps at the beginning of June, 2012. The first version will require administrators to create shards in advance because CUBRID SHARD will not provide Data Rebalancing features (which will be implemented in the next version). Data Rebalancing is required when you add/delete shards. Since this features will be absent in the first version, administrators cannot add/delete shards: shards have to be created in advance.

Considering this, you should forecast what will be your data size in X time. Based on that you need to create several shards. But to start with (to learn CUBRID), you can try with 2 or 3 servers.

3) How Cubrid knows which keys to shard against?

In shard.conf there are several parameters related to Shard ID generation algorithm. They are:

  1. SHARD_KEY_FILE - this is of string type; defaults to shard_key.txt. This is where Shard Keys are defined in advance.
  2. SHARD_KEY_MODULAR - int; defaults to 256 (this is the default algorithm to determine shard based on shard key)
  3. SHARD_KEY_LIBRARY_NAME - string; in case you want to implement your own algorithm to determine the shard ID, you can write here the full path to the library which will generate the Shard ID.
  4. SHARD_KEY_FUNCTION_NAME - string; the function within that custom library.

For example, shard_key.txt file located in $CUBRID/conf will contain the following information. student_no is that sharding key, the column in your table.

[%student_no]
#min max shard_id
0 31 0
32 63 1
64 95 2
96 127 3
128 159 0
160 191 1
192 223 2
224 255 3

4) Does cubrid can automatically add Amazon EC2 servers or i need to add them and tell about it?

When you add a new shard server, you need to "register" it in CUBRID in shard.conf and shard_connection.txt (this is where you indicate the connection information) files.

When Data Rebalancing will be implemented, I suppose, this process will not require manual configuration

5) How can I backup/restore DB, is it easy? and how much time it takes to restore 1TB of data

If you configure CUBRID HA, you do not need backup/restore. It can be configured in 3 ways: synchronous, semi-synchronous, and asynchronous. Thus, all you data will always be redundant in several servers (no need for a separate backup/restore). This is how High-Availability is achieved in CUBRID.

If you eventually decided to manually backup/restore the database (which I don't think you will need with CUBRID HA), CUBRID provides backup and restore utilities. I cannot tell you exactly how much will it take to backup and restore 1TB of data because it all depends on your hardware and server load. I can just suppose that it may take a few hours.

6) What is the optimal size of each shard? (in GB)

Well, again, there is no exact answer for this. Depends on your hardware and database schema. The index volume in a shard can grow big at certain point when the database gets really big. At that point you are recommended to shard to reduce the index volume size. The shard can be as big as your server hardware can handle. In our company we have CUBRID databases which have 3.5TB of data each working in HA environment.

7) recommended environment to run cubrid (Windows or Linux), differences if there are any.

Linux is, of course, preferable. Linux is for servers. Therefore, CUBRID on Linux has higher performance than CUBRID on Windows.

8) Why I haven't heard about cubrid after two weeks of intensive searching? from what I can tell, it should be super popular?

CUBRID is developed in South Korea, and the rest of the world is so blind with MySQL. This is why not many people know about CUBRID. Here in Korea, CUBRID is very popular. I hope one day the global open source community will eventually wake up and understand that MySQL is not the only option, that the community (free) version lacks so many Enterprise level features. But CUBRID provides HA, Sharding, Online Backup, ACID, true varchar, ANSI SQL, almost full MySQL compatibility and many other features for free. It just takes some time.

9) Does Curbid support clustering?

Right now we are in the middle of developing a clustering feature. CUBRID Sharding is the first step. Next we add support for Data Rebalancing. Further full support for clustering may released either as part of CUBRID Database or as a separate CUBRID Cluster Database product.

10) why should I use curbid over Amazon RDS, both features and price wise?

Price wise: CUBRID will cost you 0 USD. Amazon RDS will cost you. It may cost a lot depending on your consumption behavior. Running CUBRID on EC2 will also cost you, on the other hand. Moreover, administering CUBRID will require additional resources, but with Amazon RDS you do not need much administration work: everything is automated for you. In addition, Amazon RDS is a part of AWS family. It is integrated and plays well with the rest ecosystem. This is why you have to pay, sometimes a lot.

But if you are the one who strives for learning something through experience, you may appreciate working with CUBRID because it is designed for large Web services. It will take time to learn it, to tune it, to master it, but I am sure at the end of the day you will like it. Eventually you may use this experience and knowledge in other projects you will have in the future.

Let me know, if you have any other question. I will be glad to help you.

Quote




You are either using a very old browser or a browser that is not supported.
In order to browse cubrid.org you need to have one of the following browsers:



Internet Explorer: Mozilla Firefox: Google Chrome: