Regular Expressions Functions and Operators

A regular expression is a powerful way to specify a pattern for a complex search. The functions and operators described in this section performs regular expression matching on a string.

Regular Expressions Pattern Syntax

To implement regular expressions, CUBRID uses the Google RE2 library and the C++ <regex> standard library. You can select the regular expression library to use by setting the regexp_engine system parameter, and the Google RE2 library is used as a default.

The following sub-sections describes supported regular expression grammars with several examples.

Note

Compatibility Considerations

  • In the prior version of CUBRID 11, CUBRID used Henry Spencer’s implementation of regular expressions.
  • From the CUBRID 11.0, CUBRID removes the Henry Spencer library and uses C++ <regex> standard library to support regular expression functions and operators.
  • From the CUBRID 11.2, CUBRID has added the Google RE2 library. Either C++ <regex> or Google RE2 can be used by setting the system parameter.

1. The Henry Spencer’s implementation of regular expressions operates in byte-wise fashion. So the REGEXP and RLIKE were not multibyte safe, they only worked as ASCII encoding without considering the collation of operands.

2. The Spencer library supports the POSIX collating sequence expressions ([.character.]). But it does not support anymore. Also, character equivalents ([=word=]) does not support. CUBRID occurs an error when these collating element syntax is given.

3. The Spencer library matches line-terminator characters for the dot operator (.). But it does not.

4. The word-beginning and word-end boundary ([[:<:]] and [[:>:]]) doesn’t support anymore. Instead, the word boundary notation (\b) can be used.

Warning

C++ <regex> library caveats

There is an issue with the C++ <regex> library that can cause excessive recursive calls when the input string is long or the regular expression pattern is complex. Therefore, it is recommended to use Google RE2 instead of C++ <regex>. C++ <regex> is reserved for backwards compatibility and has been deprecated.

Note

Multibyte Character Comparision Considerations

C++ <regex> performs multibyte comparision by C++ <locale> standard dependent on system-supplied locales. Therefore, system locale should be installed on your system for locale-sensitive functions.

Special Pattern Characters

Special pattern characters are characters (or sequences of characters) that have a special meaning when they appear in a regular expression pattern, either to represent a character that is difficult to express in a string, or to represent a category of characters. Each of these special pattern characters is matched in a string against a single character (unless a quantifier specifies otherwise).

Characters Description
. Any character except line terminators (LF, CR, LS, PS).
\t A horizontal tab character (same as \u0009).
\n A newline (line feed) character (same as \u000A).
\v A vertical tab character (same as \u000B).
\f A form feed character (same as \u000C).
\r A carriage return character (same as \u000D)
\cletter A control code character whose code unit value is the same as the remainder of dividing the code unit value of letter by 32.
\xhh A a character whose code unit value has an hex value equivalent to the two hex digits hh.
\uhhhh A character whose code unit value has an hex value equivalent to the four hex digits hhhh. Only supported in C++ <regex>
\x{digits} A hex character code corresponds to the value of digits Only supported in Google RE2
\0 A null character (same as \u0000). Only supported in C++ <regex>
\num The result of the submatch whose opening parenthesis is the num-th. See groups below for more info.
\d A decimal digit character (same as [[:digit:]]).
\D Any character that is not a decimal digit character (same as [^[:digit:]]).
\s A whitespace character (same as [[:space:]]).
\S Any character that is not a whitespace character (same as [^[:space:]]).
\w An alphanumeric or underscore character (same as [_[:alnum:]]).
\W Any character that is not an alphanumeric or underscore character (same as [^_[:alnum:]]).
\character
The character character as it is, without interpreting its special meaning within a regex expression.
Any character can be escaped except those which form any of the special character sequences above.
Needed for: ^ $ \ . * + ? ( ) [ ] { } |
[class] A string is part of the class. see POSIX-based character classes.
[^class] A string is not part of the class. see POSIX-based character classes.
\p{class} A string is range of Unicode range of class. see Unicode Character Class.
-- .: match any character
SELECT ('cubrid dbms' REGEXP '^c.*$');
('cubrid dbms' regexp '^c.*$')
================================
  1

To match special characters such as “\n”, “\t”, “\r”, and “\”, some must be escaped with the backslash (\) by specifying the value of no_backslash_escapes (default: yes) to no. For details on no_backslash_escapes, see Escape Special Characters.

-- \n : match a special character, when no_backslash_escapes=yes (default)
SELECT ('new\nline' REGEXP 'new\\nline');
('new\nline' REGEXP 'new\\nline');
=====================================
  1
-- \n : match a special character, when no_backslash_escapes=no
SELECT ('new\nline' REGEXP 'new
line');
('new
line' regexp 'new
line')
=====================================
  0

Quantifiers

Quantifiers follow a character or a special pattern character. They can modify the amount of times that character is repeated in the match:

Characters Description
* The preceding is matched 0 or more times.
+ The preceding is matched 1 or more times.
? The preceding is optional (matched either 0 times or once).
{num} The preceding is matched exactly num times.
{num,} The preceding is matched num or more times.
{min,max} The preceding is matched at least min times, but not more than max.
-- a+ : match any sequence of one or more a characters. case insensitive.
SELECT ('Aaaapricot' REGEXP '^A+pricot');
('Aaaapricot' regexp '^A+pricot')
================================
  1
-- a? : match either zero or one a character.
SELECT ('Apricot' REGEXP '^Aa?pricot');
('Apricot' regexp '^Aa?pricot')
==========================
  1
SELECT ('Aapricot' REGEXP '^Aa?pricot');
('Aapricot' regexp '^Aa?pricot')
===========================
  1
SELECT ('Aaapricot' REGEXP '^Aa?pricot');
('Aaapricot' regexp '^Aa?pricot')
============================
  0
-- (cub)* : match zero or more instances of the sequence abc.
SELECT ('cubcub' REGEXP '^(cub)*$');
('cubcub' regexp '^(cub)*$')
==========================
  1

By default, all these quantifiers perform in a greedy way which takes as many characters that meet the condition as possible. And this behavior can be overridden to non-greedy by adding a question mark (?) after the quantifier.

-- (a+), (a+?) : match with quantifiers performs greedy and ungreedy respectively.
SELECT REGEXP_SUBSTR ('aardvark', '(a+)'), REGEXP_SUBSTR ('aardvark', '(a+?)');
regexp_substr('aardvark', '(a+)')  regexp_substr('aardvark', '(a+?)')
============================================
  'aa'                  'a'

Groups

Groups allow to apply quantifiers to a sequence of characters (instead of a single character). There are two kinds of groups:

Warning

a backreference syntax using $int is supported only in C++ <regex>.

Characters Description
(subpattern) Group which creates a backreference.
(?:subpattern) Passive group which does not create a backreference.
-- The captured group can be referenced with $int
SELECT REGEXP_REPLACE ('hello cubrid','([[:alnum:]]+)','$1!');
regexp_replace('hello cubrid','([[:alnum:]]+)','$1!')
==========================
  'hello! cubrid!'

When a group creates a backreference, the characters that represent the subpattern in a string are stored as a submatch. Each submatch is numbered after the order of appearance of their opening parenthesis (the first submatch is number 1, the second is number 2, and so on…). These submatches can be used in the regular expression itself to specify that the entire subpattern should appear again somewhere else (see int in the special characters list). They can also be used in the replacement string in the REGEXP_REPLACE function.

-- performs regexp_substr without groups. the following is the case that fully matched.
SELECT REGEXP_SUBSTR ('abckabcjabc', '[a-c]{3}k[a-c]{3}j[a-c]{3}');

-- ([a-c]{3}) creates a backreference, \1
SELECT REGEXP_SUBSTR ('abckabcjabc', '([a-c]{3})k\1j\1');
regexp_substr('abckabcjabc', '[a-c]{3}k[a-c]{3}j[a-c]{3}')
======================
  'abckabcjabc'

regexp_substr('abckabcjabc', '([a-c]{3})k\1j\1')
======================
  'abckabcjabc'

Assertions

Assertions are conditions that do not consume characters in a string: they do not describe a character, but a condition that must be fulfilled before or after a character.

Characters Description
^ The beginning of a string, or follows a line terminator
$ The end of a string, or precedes a line terminator

Note

The following syntax is supported only in C++ <regex>.

Characters Description
\b The previous character is a word character and the next is a non-word character (or vice-versa).
\B The previous and next characters are both word characters or both are non-word characters.
(?=subpattern) Positive lookahead. The characters following the character must match subpattern, but no characters are consumed.
(?!subpattern) Negative lookahead. The characters following the assertion must not match subpattern, but no characters are consumed.
-- ^ : match the beginning of a string
SELECT ('cubrid dbms' REGEXP '^cub');
('cubrid dbms' regexp '^cub')
===============================
  1
-- $ : match the end of a string
SELECT ('this is cubrid dbms' REGEXP 'dbms$');
('this is cubrid dbms' regexp 'dbms$')
========================================
  1
-- (?=subpattern): positive lookahead
SELECT REGEXP_REPLACE ('cubrid dbms cubrid sql cubrid rdbms', 'cubrid(?= sql)', 'CUBRID');

-- (?!subpattern): nagative lookahead
SELECT REGEXP_REPLACE ('cubrid dbms cubrid sql cubrid rdbms', 'cubrid(?! sql)', 'CUBRID');
regexp_replace('cubrid dbms cubrid sql cubrid rdbms', 'cubrid(?= sql)', 'CUBRID')
======================
  'cubrid dbms CUBRID sql cubrid rdbms'

regexp_replace('cubrid dbms cubrid sql cubrid rdbms', 'cubrid(?! sql)', 'CUBRID')
======================
  'CUBRID dbms cubrid sql CUBRID rdbms'

Alternatives

A pattern can include different alternatives:

Characters Description
| Separates two alternative patterns or subpatterns.
-- a|b : matches any character that is either a or b.
SELECT ('a' REGEXP 'a|b');
SELECT ('d' REGEXP 'a|b');
('a' regexp 'a|b')
==============================
  1

('d' regexp 'a|b')
==============================
  0

A regular expression can contain multiple alternative patterns simply by separating them with the separator operator (|): The regular expression will match if any of the alternatives match, and as soon as one does. Subpatterns (in groups or assertions) can also use the separator operator to separate different alternatives.

-- a|b|c : matches any character that is either a, b or c.
SELECT ('a' REGEXP 'a|b|c');
SELECT ('d' REGEXP 'a|b|c');
('a' regexp 'a|b|c')
==============================
  1

('d' regexp 'a|b|c')
==============================
  0

Character classes

Character classes syntax matches one of characters or a category of characters within square brackets.

Individual characters

Any character specified is considered part of the class (except the characters \, [, ]).

-- [abc] : matches any character that is either a, b or c.
SELECT ('a' REGEXP '[abc]');
SELECT ('d' REGEXP '[abc]');
('a' regexp '[abc]')
==============================
  1

('d' regexp '[abc]')
==============================
  0

Ranges

To represent a range of characters, use the dash character (-) between two valid characters. For example, “[a-z]” matches any alphabet letter whereas “[0-9]” matches any single number.

SELECT ('adf' REGEXP '[a-f]');
SELECT ('adf' REGEXP '[g-z]');
('adf' regexp '[a-f]')
================================
  1

('adf' regexp '[g-z]')
================================
  0
-- [0-9]+: matches number sequence in a string
SELECT REGEXP_SUBSTR ('aas200gjb', '[0-9]+');
regexp_substr('aas200gjb', '[0-9]+')
======================
  '200'
SELECT ('strike' REGEXP '^[^a-dXYZ]+$');
('strike' regexp '^[^a-dXYZ]+$')
================================
  1

POSIX-based character classes

The POSIX-based character class ([:classname:]) defines categories of characters as shown below.

Note

Google RE2 matches only ASCII characters, and C++ <regex> matches Unicode characters as well.

Class Description
[:alnum:] Alpha-numerical character
[:alpha:] Alphabetic character
[:blank:] Blank character
[:cntrl:] Control character
[:digit:] Decimal digit character
[:graph:] Character with graphical representation
[:lower:] Lowercase letter
[:print:] Printable character
[:punct:] Punctuation mark character
[:space:] Whitespace character
[:upper:] Uppercase letter
[:xdigit:] Hexadecimal digit character

[:d:], [:w:] and [:s:] are an extension to the ECMAScript grammar. Only available in C++ <regex>.

Class Description
[:d:] Decimal digit character (0-9)
[:w:] Word character
[:s:] Whitespace character
SELECT REGEXP_SUBSTR ('Samseong-ro 86-gil, Gangnam-gu, Seoul 06178', '[[:digit:]]{5}');
regexp_substr('Samseong-ro 86-gil, Gangnam-gu, Seoul 06178', '[[:digit:]]{5}')
======================
  '06178'
-- ;set regexp_engine=cppstd
SET NAMES utf8 COLLATE utf8_ko_cs;
SELECT REGEXP_REPLACE ('가나다 가나 가나다라', '\b([[:alpha:]]{2})\b', '#');
regexp_replace('가나다 가나 가나다라' collate utf8_ko_cs, '\b([[:alpha:]]{2})\b' collate utf8_ko_cs, '#' collate utf8_ko_cs)
======================
'가나다 # 가나다라'

Unicode Character Class

Unicode character classes (\p{classname}) are only supported by Google RE2. The Unicode class name can be specified as shown in the table below. For example, to match Korean characters, you can use \p{Hangul}.

Unicode Character Class
Adlam
Ahom
Anatolian_Hieroglyphs
Arabic
Armenian
Avestan
Balinese
Bamum
Bassa_Vah
Batak
Bengali
Bhaiksuki
Bopomofo
Brahmi
Braille
Buginese
Buhid
Canadian_Aboriginal
Carian
Caucasian_Albanian
Chakma
Cham
Cherokee
Chorasmian
Common
Coptic
Cuneiform
Cypriot
Cypro_Minoan
Cyrillic
Deseret
Devanagari
Dives_Akuru
Dogra
Duployan
Egyptian_Hieroglyphs
Elbasan
Elymaic
Ethiopic
Georgian
Glagolitic
Gothic
Grantha
Greek
Gujarati
Gunjala_Gondi
Gurmukhi
Han
Hangul
Hanifi_Rohingya
Hanunoo
Hatran
Hebrew
Hiragana
Imperial_Aramaic
Inherited
Inscriptional_Pahlavi
Inscriptional_Parthian
Javanese
Kaithi
Kannada
Katakana
Kawi
Kayah_Li
Kharoshthi
Khitan_Small_Script
Khmer
Khojki
Khudawadi
Lao
Latin
Lepcha
Limbu
Linear_A
Linear_B
Lisu
Lycian
Lydian
Mahajani
Makasar
Malayalam
Mandaic
Manichaean
Marchen
Masaram_Gondi
Medefaidrin
Meetei_Mayek
Mende_Kikakui
Meroitic_Cursive
Meroitic_Hieroglyphs
Miao
Modi
Mongolian
Mro
Multani
Myanmar
Nabataean
Nag_Mundari
Nandinagari
New_Tai_Lue
Newa
Nko
Nushu
Nyiakeng_Puachue_Hmong
Ogham
Ol_Chiki
Old_Hungarian
Old_Italic
Old_North_Arabian
Old_Permic
Old_Persian
Old_Sogdian
Old_South_Arabian
Old_Turkic
Old_Uyghur
Oriya
Osage
Osmanya
Pahawh_Hmong
Palmyrene
Pau_Cin_Hau
Phags_Pa
Phoenician
Psalter_Pahlavi
Rejang
Runic
Samaritan
Saurashtra
Sharada
Shavian
Siddham
SignWriting
Sinhala
Sogdian
Sora_Sompeng
Soyombo
Sundanese
Syloti_Nagri
Syriac
Tagalog
Tagbanwa
Tai_Le
Tai_Tham
Tai_Viet
Takri
Tamil
Tangsa
Tangut
Telugu
Thaana
Thai
Tibetan
Tifinagh
Tirhuta
Toto
Ugaritic
Vai
Vithkuqi
Wancho
Warang_Citi
Yezidi
Yi
Zanabazar_Square
-- ;set regexp_engine=re2
SELECT REGEXP_COUNT('가나 가나다라 마바사아 자차카타 파하', '\p{Hangul}+');
regexp_count(_utf8'가나 가나다라 마바사아 자차카타 파하' collate utf8_ko_cs, _utf8'\p{Hangul}+' collate utf8_ko_cs)
==============================
5

REGEXP, RLIKE

The REGEXP and RLIKE are used interchangeably. It performs a regular expression matcinh of a string. In the below syntax, if expression matches pattern, 1 is returned; otherwise, 0 is returned. If either expression or pattern is NULL, NULL is returned. The second syntax has the same meaning as the third syntax, which both syntaxes are using NOT.

expression REGEXP | RLIKE [BINARY] pattern
expression NOT REGEXP | RLIKE pattern
NOT (expression REGEXP | RLIKE pattern)
  • expression : Column or input expression
  • pattern : Pattern used in regular expressions

The difference between REGEXP and LIKE are as follows:

  • The LIKE operator succeeds only if the pattern matches the entire value.
  • The REGEXP operator succeeds if the pattern matches anywhere in the value. To match the entire value, you should use “^” at the beginning and “$” at the end.
  • The LIKE operator is case sensitive, but patterns of regular expressions in REGEXP is not case sensitive. To enable case sensitive, you should use REGEXP BINARY statement.
-- [a-dX] : matches any character that is either a, b, c, d or X.
SELECT ('aXbc' REGEXP '[a-dX]');
('aXbc' regexp '[a-dX]')
==============================
1
-- When REGEXP is used in SELECT list, enclosing this with parentheses is required.
-- But used in WHERE clause, no need parentheses.
-- case insensitive, except when used with BINARY.
SELECT name FROM public.athlete where name REGEXP '^[a-d]';
name
======================
'Dziouba Irina'
'Dzieciol Iwona'
'Dzamalutdinov Kamil'
'Crucq Maurits'
'Crosta Daniele'
'Bukovec Brigita'
'Bukic Perica'
'Abdullayev Namik'

REGEXP_COUNT

REGEXP_COUNT(string, pattern_string[, position[, match_type]])

The REGEXP_COUNT function returns the number of occurrences of the regular expression pattern, pattern_string, within a given character string, string. If NULL is specified as an argument, NULL is returned.

Parameters:
  • string – Specifies the original string. If the value is NULL, NULL is returned.
  • pattern_string – Specifies the regular expression pattern string to be searched. If the value is NULL, NULL is returned.
  • position – Specifies the position of the string to start the search. If the value is omitted, the default value 1 is applied. If the value is negative or zero, an error will be returned. If the value is NULL, NULL is returned
  • match_type – Specifies the string to change default matching behavior of the function. If the value is omitted, the default value ‘i’ is applied. If the value is other than ‘c’ or ‘i’, an error will be returned. If the value is NULL, NULL is returned.
Return type:

INT

-- it returns NULL when an argument is specified with NULL value
SELECT REGEXP_COUNT('ab123ab111a','[a-d]+',NULL);
regexp_count('ab123ab111a','[a-d]+',NULL)
======================
  NULL
-- an empty string pattern doesn't match with any string
SELECT REGEXP_COUNT('ab123ab111a','');
regexp_count('ab123ab111a','')
======================
  0
SELECT REGEXP_COUNT('ab123Ab111aAA','[a-d]', 3);
regexp_count('ab123Ab111aAA', '[a-d]', 3)
===========================================
                                        5
-- case insensitive ('i') is the default value
SELECT REGEXP_COUNT('ab123Ab111aAA','[a-d]', 3, 'i');

-- If case sensitive ('c') is specified as match_type, A is not matched.
SELECT REGEXP_COUNT('ab123Ab111aAA','[a-d]', 3, 'c');
regexp_count('ab123Ab111aAA', '[a-d]', 3, 'i')
================================================
                                             5

regexp_count('ab123Ab111aAA', '[a-d]', 3, 'c')
================================================
                                             2
SET NAMES utf8 COLLATE utf8_ko_cs;
SELECT REGEXP_COUNT('가나123abc가다abc가가','[가-나]+');
regexp_count('가나123abc가다abc가가' collate utf8_ko_cs, '[가-나]+' collate utf8_ko_cs)
=================================================================================================
                                                                                                3

REGEXP_INSTR

REGEXP_INSTR(string, pattern_string[, position[, occurrence[, return_option[, match_type]]]])

The REGEXP_INSTR function returns the beginning or ending position by searching for a regular expression pattern, pattern_string, within a given character string, string. If NULL is specified as an argument, NULL is returned.

Parameters:
  • string – Specifies the original string. If the value is NULL, NULL is returned.
  • pattern_string – Specifies the regular expression pattern string to be searched. If the value is NULL, NULL is returned.
  • position – Specifies the position of the string to start the search. If the value is omitted, the default value 1 is applied. If the value is negative or zero, an error will be returned. If the value is NULL, NULL is returned
  • occurrence – Specifies the occurrence of the match to use. If the value is omitted, the default value 1 is applied. If the value is negative, an error will be returned. If the value is NULL, NULL is returned.
  • return_option – Specifies whether to return the position of the start or end of the matched string. If the value is 0, the position of the first character of the match is returned. If the value is 0, the starting position of the matched string is returned. If the value is 1, the end position of the matched string is returned. If the value is other than 0 or 1, an error will be returned. If the value is NULL, NULL is returned.
  • match_type – Specifies the string to change default matching behavior of the function. If the value is omitted, the default value ‘i’ is applied. If the value is other than ‘c’ or ‘i’, an error will be returned. If the value is NULL, NULL is returned.
Return type:

INT

-- it returns NULL when an argument is specified with NULL value
SELECT REGEXP_INSTR('12345abcdeabcde','[abc]',NULL);
regexp_instr('12345abcdeabcde', '[abc]', null)
======================
  NULL
-- an empty string pattern doesn't match with any string
SELECT REGEXP_INSTR('12345abcdeabcde','');
regexp_instr('12345abcdeabcde', '')
======================
  0
-- it returns the position of the first character of the match.
SELECT REGEXP_INSTR('12354abc5','[:alpha:]+',1,1,0);
regexp_instr('12354abc5','[:alpha:]+', 1, 1, 0);
======================
  6
-- it returns the position of the character following the match.
SELECT REGEXP_INSTR('12354abc5','[:alpha:]+',1,1,1);
regexp_instr('12354abc5','[:alpha:]+', 1, 1, 1);
======================
  7
SET NAMES utf8 COLLATE utf8_ko_cs;
SELECT REGEXP_INSTR('12345가나다라마가나다라마바','[가-다]+');
regexp_instr('12345가나다라마가나다라마바' collate utf8_ko_cs, '[가-다]+' collate utf8_ko_cs)
============================================================================================================
                                                                                                        6

REGEXP_LIKE

REGEXP_LIKE(string, pattern_string[, match_type])

The REGEXP_LIKE function searches for a regular expression pattern, pattern_string, within a given character string, string. If the pattern matched anywhere in the string, 1 is returned. Otherwise, 0 is returned. If NULL is specified as an argument, NULL is returned.

Parameters:
  • string – Specifies the original string. If the value is NULL, NULL is returned.
  • pattern_string – Specifies the regular expression pattern string to be searched. If the value is NULL, NULL is returned.
  • match_type – Specifies the string to change default matching behavior of the function. If the value is omitted, the default value ‘i’ is applied. If the value is other than ‘c’ or ‘i’, an error will be returned. If the value is NULL, NULL is returned.
Return type:

INT

SELECT REGEXP_LIKE('abbbbc','ab+c');
regexp_like('abbbbc', 'ab+c');
======================
  1
-- an empty string pattern doesn't match with any string
SELECT REGEXP_LIKE('abbbbc','');
regexp_like('abbbbc', '');
======================
  0
SELECT REGEXP_LIKE('abbbbc','AB+C', 'c');
regexp_like('abbbbc', 'AB+C');
======================
  0
SET NAMES utf8 COLLATE utf8_ko_cs;
SELECT REGEXP_LIKE('가나다','가나?다');
SELECT REGEXP_LIKE('가나라다','가나?다');
regexp_like('가나다' collate utf8_ko_cs, '가나?다' collate utf8_ko_cs)
==============================================================================
                                                                            1

regexp_like('가나라다' collate utf8_ko_cs, '가나?다' collate utf8_ko_cs)
=================================================================================
                                                                                0

REGEXP_REPLACE

REGEXP_REPLACE(string, pattern_string, replacement_string[, position[, occurrence[, match_type]]])

The REGEXP_REPLACE function searches for a regular expression pattern, pattern_string, within a given character string, string, and replaces it with a character string, replacement_string. If NULL is specified as an argument, NULL is returned.

Parameters:
  • string – Specifies the original string. If the value is NULL, NULL is returned.
  • pattern_string – Specifies the regular expression pattern string to be searched. If the value is NULL, NULL is returned.
  • replacement_string – Specifies the string to replace the matched string by pattern_string. If the value is NULL, NULL is returned.
  • position – Specifies the position of the string to start the search. If the value is omitted, the default value 1 is applied. If the value is negative or zero, an error will be returned. If the value is NULL, NULL is returned
  • occurrence – Specifies the occurrence of the pattern to use. If the value is omitted, the default value 0 is applied. If the value is negative, an error will be returned. If the value is NULL, NULL is returned.
  • match_type – Specifies the string to change default matching behavior of the function. If the value is omitted, the default value ‘i’ is applied. If the value is other than ‘c’ or ‘i’, an error will be returned. If the value is NULL, NULL is returned.
Return type:

STRING

-- it returns NULL when an argument is specified with NULL value
SELECT REGEXP_REPLACE('12345abcdeabcde','[a-d]',NULL);
regexp_replace('12345abcdeabcde', '[a-d]', null)
======================
NULL
-- an empty string pattern doesn't match with any string
SELECT REGEXP_REPLACE('12345abcdeabcde','','#');
regexp_replace('12345abcdeabcde', '', '#')
======================
  '12345abcdeabcde'
SELECT REGEXP_REPLACE('12345abDEKBcde','[a-d]','#');
regexp_replace('12345abDEKBcde', '[a-d]', '#')
======================
  '12345###EK###e'
-- case insensitive ('i') is the default value
SELECT REGEXP_REPLACE('12345abDEKBcde','[a-d]','#', 1, 0, 'i');

-- match_type is specified as case sensitive ('c'). 'B' and 'D' are not matched.
SELECT REGEXP_REPLACE('12345abDEKBcde','[a-d]','#', 1, 0, 'c');
regexp_replace('12345abDEKBcde', '[a-d]', '#', 1, 0, 'i')
======================
  '12345###EK###e'


regexp_replace('12345abDEKBcde', '[a-d]', '#', 1, 0, 'c')
======================
  '12345##DEKB##e'
SET NAMES utf8 COLLATE utf8_ko_cs;
SELECT REGEXP_REPLACE('a1가b2나다라','[가-다]','#',6);
regexp_replace('a1가b2나다라' collate utf8_ko_cs, '[가-다]' collate utf8_ko_cs, '#' collate utf8_ko_cs, 6)
======================
'a1가b2##라'

REGEXP_SUBSTR

REGEXP_SUBSTR(string, pattern_string[, position[, occurrence[, match_type]]])

The REGEXP_SUBSTR function extracts a character string matched for a regular expression pattern, pattern_string, within a given character string, string. If NULL is specified as an argument, NULL is returned.

Parameters:
  • string – Specifies the original string. If the value is NULL, NULL is returned.
  • pattern_string – Specifies the regular expression pattern string to be searched. If the value is NULL, NULL is returned.
  • position – Specifies the position of the string to start the search. If the value is omitted, the default value 1 is applied. If the value is negative or zero, an error will be returned. If the value is NULL, NULL is returned
  • occurrence – Specifies the occurrence of the pattern to use. If the value is omitted, the default value 0 is applied. If the value is negative, an error will be returned. If the value is NULL, NULL is returned.
  • match_type – Specifies the string to change default matching behavior of the function. If the value is omitted, the default value ‘i’ is applied. If the value is other than ‘c’ or ‘i’, an error will be returned. If the value is NULL, NULL is returned.
Return type:

STRING

-- if pattern is not matched, null is returned
SELECT REGEXP_SUBSTR('12345abcdeabcde','[k-z]+');
regexp_substr('12345abcdeabcde','[k-z]+');
======================
  NULL
-- an empty string pattern doesn't match with any string
SELECT REGEXP_SUBSTR('12345abcdeabcde','');
regexp_substr('12345abcdeabcde', '')
======================
  NULL
SELECT REGEXP_SUBSTR('Samseong-ro, Gangnam-gu, Seoul',',[^,]+,');
regexp_substr('Samseong-ro, Gangnam-gu, Seoul', ',[^,]+,')
======================
  ', Gangnam-gu,'
SET NAMES utf8 COLLATE utf8_ko_cs;
SELECT REGEXP_SUBSTR('삼성로, 강남구, 서울특별시','\p{Hangul}+',1,2);
regexp_substr('삼성로, 강남구, 서울특별시' collate utf8_ko_cs, '\p{Hangul}+' collate utf8_ko_cs, 1, 2)
======================
'강남구'