Open Source RDBMS - Seamless, Scalable, Stable and Free

한국어 | Login |Register

Versions available for this page: CUBRID 9.0.0 | 

Locale Setting

Step 1: Selecting a Locale

CUBRID supports locales as follows: en_US, de_DE, es_ES, fr_FR, it_IT, ja_JP, km_KH, ko_KR, tr_TR, vi_VN, zh_CN. The language and country for each locale are shown in the following table.

Locale Name

Language - Country

en_US

English - U.S.A.

de_DE

German - Germany

es_ES

Spanish - Spain

fr_FR

French - France

it_IT

Italian - Italy

ja_JP

Japanese - Japan

km_KH

Khmer - Cambodia

ko_KR

Korean - Korea

tr_TR

Turkish - Turkey

vi_VN

Vietnamese - Vietnam

zh_CN

Chinese - China

The list is written in $CUBRID/conf/cubrid_locales.all.txt. Specify the desired locale in $CUBRID/conf/cubrid_locales.txt. You can select all or part of supported locales.

The LDML files for the supported locales are named cubrid_<locale_name>.xml and they can be found in the $CUBRID/locales/data/ldml directory. If only a subset of these locales are to be supported by CUBRID, one must make sure their corresponding LDML files are present in the $CUBRID/locales/data/ldml folder.

A locale cannot be used by CUBRID, unless it has an entry in cubrid_locales.txt file and it has a corresponding cubrid_<locale_name>.xml in $CUBRID/locales/data/ldml directory.

Locale libraries are generated according to the contents of $CUBRID/conf/cubrid_locales.txt configuration file. This file contains the language codes of the desired locales (all user defined locales are generated with UTF-8 charset). Also, in this file can be configured the file paths for each locale LDML file and libraries can be optionally configured.

<lang_name> <LDML file>                    <lib file>

ko_KR    /home/CUBRID/locales/data/ldml/cubrid_ko_KR.xml    /home/CUBRID/lib/libcubrid_ko_KR.so

By default, the LDML files are found in $CUBRID/locales/data/ldml and the locale libraries in $CUBRID/lib; the filenames for LDML are formatted like: cubrid_<lang_name>.ldml

The filenames for libraries: libcubrid_<lang_name>.dll (.so for Linux).

Step 2: Compiling Locale

Once the requirements described above are met, the locales can be compiled. To do so, one must use the make_locale (.bat for Windows .sh for Linux) utility script from command console. The file is delivered in CUBRID/bin folder so it should be resolved by PATH environment variable. Here $CUBRID, $PATH are the environment variables of Linux, %CUBRID%, %PATH% are the environment variables of Windows.

Usage can be displayed by running make_locale.sh -h (make_locale.bat /h in Windows).  

make_locale.sh [OPTIONS] [LOCALE]

 

OPTIONS ::= [-t 32|64 ] [-m debug|release]

LOCALE ::= [de_DE|es_ES|fr_FR|it_IT|ja_JP|km_KH|ko_KR|tr_TR|vi_VN|zh_CN]

  • OPTIONS
    • -t: Selects 32bit or 64bit (default value: 32).
    • -m: Selects release or debug. In general, release is selected (default value: release). The debug mode is provided for developers who would like to write the locale library themselves. Selects release or debug. In general, release is selected (default value: release). The debug mode is provided for developers who would like to write the locale library themselves.  
  • LOCALE: The locale name of the library to build. If LOCALE is not specified, the build includes data from all configured locales. In this case, library file is stored in $CUBRID/lib directory with the name of libcubrid_all_locales.so (.dll for Windows).

To create user defined locale shared libraries, two choices are available:

  • Creating a single lib with all locales to be supported
  • make_locale.sh                         # Build and pack all locales (32/release)

  • Creating one lib for each locale to be supported
  • make_locale.sh -t 64 -m release ko_KR

The first choice is recommended. In this scenario, some data may be shared among locales. If you choose the first one, a lib supporting all locales has less than 15 MB; in the second one, consider for each locale library from 1 MB to more than 5 MB. Also the first one is recommended because it has no runtime overhead during restarting the servers when you choose the second one.

Procedure of Executing make_locale.sh(.bat) Script

The processing in make_locale.sh(.bat) script

  1. Reads the .ldml file corresponding to a language, along with some other installed common data files like $CUBRID/locales/data/ducet.txt, $CUBRID/locales/data/unicodedata.txt, and $CUBRID/locales/data/codepages/*.txt
  2. After processing of raw data, it writes in a temporary $CUBRID/locales/loclib/locale.c file C constants values and arrays consisting of locales data.
  3. The temporary file locale.c is passed to the platform compiler to build a .dll/.so file. This step assumes that the machines has an installed C/C++ compiler and linker. Currently, only the MS Visual Studio for Windows and gcc for Linux compilers are supported.
  4. Temporary files are removed.
Limitations and Rules
  • Do not change the contents of $CUBRID/conf/cubrid_locales.txt after locales generation; Once generated the locales libraries, the contents of $CUBRID/conf/cubrid_locales.txt should not change (order of languages within file must also be preserved). During locale generation, increasing numeric identifiers are assigned to each new encountered collation. These identifiers must be coherent at locale loading.
  • Do not change the contents for $CUBRID/locales/data/*.txt files. All customization should be performed by changing .ldml files.

Regarding the embedded locales in CUBRID, they can be used without compiling user locale library, so they can be used by skipping the step 3. But there are two differences between the embedded locale and the library locale.

  • Embedded(built-in) locale(and collation) are not aware of Unicode data For instance, casing (lower, upper) of (A, a) is not available in embedded locales. The LDML locales provide data for Unicode codepoints up to 65535.
  • Also, the embedded collations deals only with ASCII range, or in case of 'utf8_tr_cs' - only ASCII and letters from Turkish alphabet. Embedded UTF-8 locales are not Unicode compatible, while compiled (LDML) locales are.

Currently, the built-in locales which can be set by CUBRID_LANG environment variable are:

  • en_US.iso88591
  • en_US.utf8
  • ko_KR.utf8
  • ko_KR.euckr
  • ko_KR.iso88591: Will have romanized Korean names for month, day names.
  • tr_TR.utf8
  • tr_TR.iso88591: Will have romanized Korean names for month, day names.

The order stated above is important; if no charset is defined while configuring CUBRID_LANG, the charset is the charset of the locale shown first. For example, if CUBRID_LANG=ko_KR, the charset is specified to ko_KR.utf8, the first locale among the ko_KR in the above list. Locales of the other languages except the built-in locales should end with .utf8. For example, specify as CUBRID_LANG=de_DE.utf8 for German.

The names of month and day for ko_KR.iso88591 and tr_TR.iso88591 should be Romanized. For example, "일요일" for Korean (Sunday in English) is Romanized to "Iryoil". Providing ISO-8859-1 characters only is required.

Step 3: Setting CUBRID to Use a Specific Locale

Several locales can be defined, but only one locale can be selected as the default locale, by using the CUBRID_LANG environment variable.

In addition to the possibility of specifying a default locale, one can override the default calendar settings with the calendar settings from another locale, using the CUBRID_DATE_LANG environment variable

  • CUBRID_LANG will be in the format: <locale_name>.[utf8 | iso] (e.g. tr_TR.utf8, en_EN.ISO, ko_KR.utf8)
  • CUBRID_DATE_LANG: <locale_name> The possible values for <locale_name> are listed above, in Step 1: Selecting a locale.

By default, if no charset is included in CUBRID_LANG, the ISO charset is assumed.

Step 4: Creating a Database with the Selected Locale Setting

Once the CUBRID_LANG and CUBRID_DATE_LANG environment variables have been set, one can create a new database (or delete and recreate an existing one). When issuing the command “cubrid createdb <db_name>”, a database will be created using the settings in the variables described above.

The charset and locale name are stored in "db_root" system table. Once a database is created with a language and charset, it cannot change these settings.

Step 5 (optional): Manually Verifying the Locale File

The contents of locales libraries  may be displayed in human readable form using the dumplocale CUBRID utility.

Execute cubrid dumplocale -h to output the usage. The used syntax is as follows:

cubrid dumplocale [OPTION] [language-string]

 

OPTION ::= [-i|--input-file <shared_lib>] [-d|--calendar][-n|numeric] [{-a |--alphabet=}{l|lower|u|upper|both}] [-c|--codepoint-order] [-w|weight-order] [{-s|--start-value}  <starting_codepoint>] [{-e|--end-value} <ending_codepoint>] [-k]  [-z]

 

language-string ::= de_DE|es_ES|fr_FR|it_IT|ja_JP|km_KH|ko_KR|tr_TR|vi_VN|zh_CN

  • OPTION
    • -i, --input-file: The name of the locale shared library file (<shared_lib>) created previously.
    • -d, --calendar: Dumps the calendar and date/time data. Default value: No
    • -n, --numeric: Dumps the number data. Default value: No
    • -a, --alphabet=l|lower|u|upper|both: Dumps the alphabet and case data. Default value: No
    • --identifier-alphabet=l|lower|u|upper|both: Dumps the alphabet and case data for the identifier. Default value: No
    • -c, --codepoint-order: Dumps the collation data sorted by the codepoint value. Default value: No
    • (displayed data: cp, char, weight, next-cp, char and weight)
    • -w, --weight-order: Dumps the collation data sorted by the weight value. Default value: No
    • (displayed data: weight, cp, char)
    • -s, --start-value: Specifies the dump scope. Starting codepoint for -a, --identifier-alphabet, -c, -w options. Default value: 0
    • -e, --end-value: Specifies the dump scope. Ending codepoint for -a, --identifier-alphabet, -c, -w options. Default value: Max value read from the locale shared library.
    • -k, --console-conversion: Dumps the data of colsole conversion. Default value: No
    • -z, --normalization: Dumps the normalization data. Default value: No
  • language-string: specify the locale language used to dump the locale shared library. If no value is entered in language-string, all languages included in the cubrid_locales.txt are given.

The following example shows how to dump the calendar, number formatting, alphabet and case data, alphabet and case data for the identifier, collation sorting based on the codepoint order, collation sorting based on the weight, and the data in ko_KR locale by normalizing:

cubrid dumplocale -d -n -a both -c -w -z ko_KR > ko_KR_dump.txt

It is highly recommended to redirect the console output to a file, as it can exceed 15MB of data, and seeking information could prove to be difficult.

Step 6: Starting CUBRID-Related Processes

All CUBRID-related processes should be started in an identical environmental setting. The CUBRID server, the broker, CAS, and CSQL should use an identical CUBRID_LANG setting value and the locale binary file of an identical version. Also CUBRID HA, CUBRID Shard should use the same setting. For example, in the CUBRID HA, master server, slave server and replica server should use the same environmental variable setting.

There is no check on the compatibility of the locale used by server and CAS (client) process, so the user should make sure the LDML files used are the same.

Locale library loading is one of the first steps in CUBRID start-up. Locale (collation) information is required for initializing databases structures (indexes depends on collation).

This process is performed by each CUBRID process which requires locale information: server, CAS, CSQL, createdb, copydb, unload, load DB.

The process of loading a locale library is as follows:

  • If no lib path is provided, CUBRID will try to load $CUBRID/lib/libcubrid_<lang_name>.so; if this file is not found, then CUBRID assumes all locales are found in a single library: $CUBRID/lib/libcubrid_all_locales.so.
  • If no suitable locale library cannot be found or any other error occurs during loading, the CUBRID process stops.
Remark
Setting the Month/Day in Characters, AM/PM, and Number Format

For the function that inputs and outputs the day/time, you can set the month/day in characters, AM/PM, and number format by the locale in the intl_date_lang system parameter.

For the function that converts a string to numbers or the numbers to a string, you can set the string format by the locale in intl_number_lang system parameter.

The Month/Day in Korean and Turkish Characters for ISO-8859-1 Charset

In Korean or Turkish, which is charset UTF-8 or in Korean, which is charset EUC-KR, the month/day in characters, and AM/PM is encoded according to the country. However, for ISO-8859-1 charset, if the month/day in characters and AM/PM in Korean or Turkish is used as its original encoding, an unexpected behavior may occur in the server process because of its complex expression. As such, the name should be Romanized. The default charset of CUBRID is ISO-8859-1 and the charset can be used for Korean and Turkish. The Romanized output format is as follows:

Day in Characters

Day in Characters Long/Short Format

Long/Short Romanized Korean

Long/Short Romanized Turkish

Sunday / Sun

Iryoil / Il

Pazar / Pz

Monday / Mon

Woryoil / Wol

Pazartesi / Pt

Tuesday / Tue

Hwayoil / Hwa

Sali / Sa

Wednesday / Wed

Suyoil / Su

Carsamba / Ca

Thursday / Thu

Mogyoil / Mok

Persembe / Pe

Friday / Fri

Geumyoil / Geum

Cuma / Cu

Saturday / Sat

Toyoil / To

Cumartesi / Ct

Month in Characters

Month in Characters Long/Short Format

Long/Short Romanized Korean (Not Classified)

Long/Short Romanized Turkish

January / Jan

1wol

Ocak / Ock

February / Feb

2wol

Subat / Sbt

March / Mar

3wol

Mart / Mrt

April / Apr

4wol

Nisan / Nsn

May / May

5wol

Mayis / Mys

June / Jun

6wol

Haziran / Hzr

July / Jul

7wol

Temmuz / Tmz

August / Aug

8wol

Agustos / Ags

September / Sep

9wol

Eylul / Eyl

October / Oct

10wol

Ekim / Ekm

November / Nov

11wol

Kasim / Ksm

December / Dec

12wol

Aralik / Arl

AM/PM in Characters

 

Romanized in Korean

Romanized in Turkish

AM

ojeon

AM

PM

ohu

PM