Open Source RDBMS - Seamless, Scalable, Stable and Free

한국어 | Login |Register

Current Events
Join our developers event to win one of the valuable prizes!
posted 2 years ago
viewed 6865 times
Share this article

When you should and should NOT use ENUM data type

enum_data_type.png

ENUM is a new enumerated data type introduced in CUBRID 9.0. Like in all programming languages, the ENUM type is a data type composed of a set of static, ordered values. Users can define numeric and string values for ENUM columns.

Working with ENUM types

Creating an ENUM column is done by specifying a static list of possible values:

CREATE TABLE person(
    name VARCHAR(255),
    gender ENUM('Male', 'Female')
);

CUBRID understands the ENUM type as an ordered set of constants which, in the above example, is a set of {NULL: NULL, 1: 'Male', 2: 'Female”}. To assign a value to the gender column, users may either use the index of the value ({NULL, 1, 2}) or the actual constant literal ({NULL}{'Male'}{'Female'}). CUBRID restricts the values that can be assigned to this column to only values from this set + NULL. Moreover, ENUM column is case-sensitive, i.e. it will raise an error if you try to enter 'female' in lower case. Also, an empty string is allowed if it is defined as one of the elements of the ENUM column. In our examples, it is not allowed.

csql> INSERT INTO person(name, gender) VALUES('Eugene', 'Male');

1 row affected.
1 command(s) successfully processed.
csql> INSERT INTO person(name, gender) VALUES('Anne', 2);

1 row affected.
1 command(s) successfully processed.
csql> SELECT * FROM person;

=== <Result of SELECT Command IN Line 1> ===

  name                  gender
============================================
  'Anne'                'Female'
  'Eugene'              'Male'


2 rows selected.

Any attempt to insert a value outside of the defined set will result in a coercion error. In the below case, trying to insert an empty string raises an error because it is not in the set of allowed values defined in the person table.

csql> INSERT INTO person(name, gender) VALUES('John', 'N/A');

IN line 1, COLUMN 44,
ERROR: before ' ); '
Cannot coerce 'N/A' TO type enum.
0 command(s) successfully processed.

csql> INSERT INTO person(name, gender) VALUES('John', 4);

IN line 1, COLUMN 45,
ERROR: before ' ); '
Cannot coerce 4 TO type enum.
0 command(s) successfully processed.

csql> INSERT INTO person(name, gender) VALUES('John', '');

IN line 1, COLUMN 44,
ERROR: before ' ); '
Cannot coerce '' TO type enum.
0 command(s) successfully processed.

Why you should use the ENUM type

There are three important reasons for which you should consider using the ENUM type:

  • Reduce storage space.
  • Reduce join complexity.
  • Create cheap values constraints.

Storage Space

CUBRID uses only 1 byte per tuple when 255 or less ENUM elements are defined or 2 bytes for 256~65535 elements. This is because, rather that storing the constant literal of the value, CUBRID stores the index in the ordered set of that value. For very large tables, this might prove to be a significant storage space save.

Take, for example, a table with 1,000,000,000 records which has an ENUM column defined as ('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'). If you use a VARCHAR type instead of the ENUM type to store these values, the column would require anywhere between 5GB and 9GB of storage space. Using the ENUM type, you can reduce the required space to 2 bytes per tuple, adding up to a total of 2GB.

Reduce join complexity

JOIN way

The same effect of the ENUM type can be achieved by creating a one to many relationship on two or more tables. Considering the example above, you can store values for days of the week like this:

CREATE TABLE days_of_week(
    id SHORT PRIMARY KEY,
    name VARCHAR(9)
);
CREATE TABLE opening_hours(
    week_day SHORT,
    opening_time TIME,
    closing_time TIME,
    FOREIGN KEY fk_dow (week_day) REFERENCES days_of_week(id)
);

Then, when you wish to display the name of the week day, you would execute a query like:

SELECT d.name day_name, o.opening_time, o.closing_time
FROM days_of_week d, opening_hours o
WHERE d.id = o.week_day
ORDER BY d.id;

=== <Result of SELECT Command IN Line 4> ===
  day_name              opening_time  closing_time
==================================================
  'Monday'              09:00:00 AM   06:00:00 PM
  'Tuesday'             09:00:00 AM   06:00:00 PM
  'Wednesday'           09:00:00 AM   06:00:00 PM
  ...

ENUM way

You can achieve the same effect using an ENUM column:

CREATE TABLE opening_hours(
    week_day ENUM ('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'),
    opening_time TIME,
    closing_time TIME
);

And there’s no JOIN required to select opening hours:

SELECT week_day, opening_time, closing_time
FROM opening_hours
ORDER BY week_day;
=== <Result of SELECT Command IN Line 1> ===

  week_day              opening_time  closing_time
==================================================
  'Monday'              09:00:00 AM   06:00:00 PM
  'Tuesday'             09:00:00 AM   06:00:00 PM
  'Wednesday'           09:00:00 AM   06:00:00 PM
  ...

This can prove to be very useful, especially if your queries join several tables.

Value constraints

ENUM columns behave like foreign key relationships in the sense that values from an ENUM column are restricted to the values specified in the column definition. For a short list of values, this is more efficient than creating foreign key relationships. While foreign key relationships use index scans to enforce the restriction, ENUM columns just have to go through a list of predefined values which is faster even for small indexes.

Why/When you should NOT use the ENUM type

Even though ENUM is a great feature, there are cases when you’d better not use it. For example:

  • When ENUM type is not fixed
  • When ENUM type has a long list of values
  • When your application does not know the list of ENUM values
  • ENUM type is not reusable
  • Portability is a concern

When ENUM type is not fixed

If you’re not sure if the ENUM type holds all possible values for that column, you should consider using a one to many relationship instead. The only way in which an ENUM column can be changed to handle more values is by using an ALTER statement. This is a very expensive operation in any RDBMS and requires administrator rights. Also, ALTER statements are maintenance operations and should, as much as possible, be performed offline.

When ENUM type has a long list of values

ENUM types should not be used if you cannot limit a set of possible values to a few elements.

When your application does not know the list of ENUM values

There are only two ways of getting a list of values you have defined for an ENUM type:

  • parsing the output of SHOW CREATE TABLE statement:

    csql> SHOW CREATE TABLE opening_hours;
    
    === <Result of SELECT Command IN Line 1> ===
    
      TABLE                 CREATE TABLE
    ============================================
    'opening_hours'       'CREATE TABLE [opening_hours] ([week_day] ENUM('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'), [opening_time] TIME, [closing_time] TIME)

  • selecting information from CUBRID system tables:

    csql> SELECT d.enumeration
    FROM _db_domain d, _db_attribute a
    WHERE a.attr_name = 'week_day'
    AND a.class_of.class_name = 'opening_hours'
    AND d IN a.domains;
    
    === <Result of SELECT Command IN Line 1> ===
    enumeration
    ======================
    {'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'}

Both might require complex coding and selecting from system tables requires administrator privileges.

ENUM type is not reusable

If you have several tables which require the names of week days, you will have to create an ENUM type for each of them. If you create a table to hold week days names, you can join this table with whichever other table that requires this information.

Portability is a concern

The ENUM type is only supported by a few RDBMSs and each one has its own idea as to how ENUM type is supposed to work. Below is a list of a few notable differences between CUBRID, MySQL and PostgreSQL:

CUBRID PostgreSQL MySQL
Inserting out of range value Throws error Throws error Inserts special value index 0
Comparing to char literals Compare as strings Compare as ENUM elements Compare as strings
Comparing to values outside of the ENUM domain Compare as strings Throws error Compare as strings

These subtle differences will most probably break your application in interesting and hard to understand ways. If you’re migrating from PostgreSQL to CUBRID for example, and you expect comparisons with char literals to be performed as ENUM comparisons, you’ll have a hard time understanding why your query returns weird results.



comments powered by Disqus