All functions in this section search case-sensitively by default. Case-insensitive search is usually provided by separate function variants.
Note
Case-insensitive search follows the lowercase-uppercase rules of the English language. E.g. Uppercased i in the English language is
I whereas in the Turkish language it is İ - results for languages other than English may be unexpected.
Functions in this section also assume that the searched string (referred to in this section as haystack) and the search string (referred to in this section as needle) are single-byte encoded text. If this assumption is
violated, no exception is thrown and results are undefined. Search with UTF-8 encoded strings is usually provided by separate function
variants. Likewise, if a UTF-8 function variant is used and the input strings are not UTF-8 encoded text, no exception is thrown and the
results are undefined. Note that no automatic Unicode normalization is performed, however you can use the
normalizeUTF8*() functions for that.
General strings functions and functions for replacing in strings are described separately.
Note
The documentation below is generated from the system.functions system table.
countMatches
Introduced in: v21.1.0
Returns number of matches of a regular expression in a string.
Version dependent behavior
The behavior of this function depends on the ClickHouse version:
- in versions < v25.6, the function stops counting at the first empty match even if a pattern accepts.
- in versions >= 25.6, the function continues execution when an empty match occurs. The legacy behavior can be restored using setting
count_matches_stop_at_empty_match = true;
Syntax
countMatches(haystack, pattern)
Arguments
Returned value
Returns the number of matches found. UInt64
Examples
Count digit sequences
SELECT countMatches('hello 123 world 456 test', '[0-9]+')
┌─countMatches('hello 123 world 456 test', '[0-9]+')─┐
│ 2 │
└─────────────────────────────────────────────────────┘
countMatchesCaseInsensitive
Introduced in: v21.1.0
Like countMatches but performs case-insensitive matching.
Syntax
countMatchesCaseInsensitive(haystack, pattern)
Arguments
haystack— The string to search in.Stringpattern— Regular expression pattern.const String
Returned value
Returns the number of matches found. UInt64
Examples
Case insensitive count
SELECT countMatchesCaseInsensitive('Hello HELLO world', 'hello')
┌─countMatchesCaseInsensitive('Hello HELLO world', 'hello')─┐
│ 2 │
└───────────────────────────────────────────────────────────┘
countSubstrings
Introduced in: v21.1.0
Returns how often a substring needle occurs in a string haystack.
Syntax
countSubstrings(haystack, needle[, start_pos])
Arguments
haystack— String in which the search is performed. String or Enum. -needle— Substring to be searched. String. -start_pos— Position (1-based) inhaystackat which the search starts. UInt. Optional.
Returned value
The number of occurrences. UInt64
Examples
Usage example
SELECT countSubstrings('aaaa', 'aa');
┌─countSubstrings('aaaa', 'aa')─┐
│ 2 │
└───────────────────────────────┘
With start_pos argument
SELECT countSubstrings('abc___abc', 'abc', 4);
┌─countSubstrings('abc___abc', 'abc', 4)─┐
│ 1 │
└────────────────────────────────────────┘
countSubstringsCaseInsensitive
Introduced in: v21.1.0
Like countSubstrings but counts case-insensitively.
Syntax
countSubstringsCaseInsensitive(haystack, needle[, start_pos])
Arguments
haystack— String in which the search is performed.StringorEnumneedle— Substring to be searched.Stringstart_pos— Optional. Position (1-based) inhaystackat which the search starts.UInt*
Returned value
Returns the number of occurrences of the neddle in the haystack. UInt64
Examples
Usage example
SELECT countSubstringsCaseInsensitive('AAAA', 'aa');
┌─countSubstri⋯AAA', 'aa')─┐
│ 2 │
└──────────────────────────┘
With start_pos argument
SELECT countSubstringsCaseInsensitive('abc___ABC___abc', 'abc', 4);
┌─countSubstri⋯, 'abc', 4)─┐
│ 2 │
└──────────────────────────┘
countSubstringsCaseInsensitiveUTF8
Introduced in: v21.1.0
Like countSubstrings but counts case-insensitively and assumes that haystack is a UTF-8 string.
Syntax
countSubstringsCaseInsensitiveUTF8(haystack, needle[, start_pos])
Arguments
haystack— UTF-8 string in which the search is performed.StringorEnumneedle— Substring to be searched.Stringstart_pos— Optional. Position (1-based) inhaystackat which the search starts.UInt*
Returned value
Returns the number of occurrences of the needle in the haystack. UInt64
Examples
Usage example
SELECT countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'КА');
┌─countSubstri⋯шка', 'КА')─┐
│ 4 │
└──────────────────────────┘
With start_pos argument
SELECT countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'КА', 13);
┌─countSubstri⋯, 'КА', 13)─┐
│ 2 │
└──────────────────────────┘
Introduced in: v1.1.0
Extracts the first match of a regular expression in a string. If 'haystack' doesn't match 'pattern', an empty string is returned.
This function uses the RE2 regular expression library. Please refer to re2 for supported syntax.
If the regular expression has capturing groups (sub-patterns), the function matches the input string against the first capturing group.
Syntax
extract(haystack, pattern)
Arguments
haystack— String from which to extract.Stringpattern— Regular expression, typically containing a capturing group.const String
Returned value
Returns extracted fragment as a string. String
Examples
Extract domain from email
┌─extract('[email protected]', '.*@(.*)$')─┐
│ clickhouse.com │
└───────────────────────────────────────────┘
No match returns empty string
┌─extract('[email protected]', 'no_match')─┐
│ │
└────────────────────────────────────────────┘
Introduced in: v1.1.0
Like extract, but returns an array of all matches of a regular expression in a string.
If 'haystack' doesn't match the 'pattern' regex, an empty array is returned.
If the regular expression has capturing groups (sub-patterns), the function matches the input string against the first capturing group.
Syntax
extractAll(haystack, pattern)
Arguments
haystack— String from which to extract fragments.Stringpattern— Regular expression, optionally containing capturing groups.const String
Returned value
Returns array of extracted fragments. Array(String)
Examples
Extract all numbers
SELECT extractAll('hello 123 world 456', '[0-9]+')
┌─extractAll('hello 123 world 456', '[0-9]+')─┐
│ ['123','456'] │
└─────────────────────────────────────────────┘
Extract using capturing group
Introduced in: v20.5.0
Matches all groups of a string using the provided regular expression and returns an array of arrays, where each array contains all captures from the same capturing group, organized by group number.
Syntax
extractAllGroupsHorizontal(s, regexp)
Arguments
s— Input string to extract from.StringorFixedStringregexp— Regular expression to match by.const Stringorconst FixedString
Returned value
Returns an array of arrays, where each inner array contains all captures from one capturing group across all matches. The first inner array contains all captures from group 1, the second from group 2, etc. If no matches are found, returns an empty array. Array(Array(String))
Examples
Usage example
WITH '< Server: nginx
< Date: Tue, 22 Jan 2019 00:26:14 GMT
< Content-Type: text/html; charset=UTF-8
< Connection: keep-alive
' AS s
SELECT extractAllGroupsHorizontal(s, '< ([\\w\\-]+): ([^\\r\\n]+)');
[['Server','Date','Content-Type','Connection'],['nginx','Tue, 22 Jan 2019 00:26:14 GMT','text/html; charset=UTF-8','keep-alive']]
Introduced in: v20.5.0
Extracts all groups from non-overlapping substrings matched by a regular expression.
Syntax
Arguments
s— Input string to extract from.StringorFixedStringregexp— Regular expression. Constant.const Stringorconst FixedString
Returned value
If the function finds at least one matching group, it returns Array(Array(String)) column, clustered by group_id (1 to N, where N is number of capturing groups in regexp). If there is no matching group, it returns an empty array. Array(Array(String))
Examples
Usage example
WITH '< Server: nginx
< Date: Tue, 22 Jan 2019 00:26:14 GMT
< Content-Type: text/html; charset=UTF-8
< Connection: keep-alive
' AS s
SELECT extractGroups(s, '< ([\\w\\-]+): ([^\\r\\n]+)');
[['Server','nginx'],['Date','Tue, 22 Jan 2019 00:26:14 GMT'],['Content-Type','text/html; charset=UTF-8'],['Connection','keep-alive']]
hasAllTokens
Introduced in: v25.10.0
Like hasAnyTokens, but returns 1, if all tokens in the needle string or array match the input string, and 0 otherwise. If input is a column, returns all rows that satisfy this condition.
Note
Column input should have a text index defined for optimal performance.
If no text index is defined, the function performs a brute-force column scan which is orders of magnitude slower than an index lookup.
Prior to searching, the function tokenizes
- the
inputargument (always), and - the
needleargument (if given as a String) using the tokenizer specified for the text index. If the column has no text index defined, thesplitByNonAlphatokenizer is used instead. If theneedleargument is of type Array(String), each array element is treated as a token — no additional tokenization takes place.
Duplicate tokens are ignored. For example, needles = ['ClickHouse', 'ClickHouse'] is treated the same as ['ClickHouse'].
Syntax
hasAllTokens(input, needles)
Aliases: hasAllToken
Arguments
input— The input column.StringorFixedStringorArray(String)orArray(FixedString)needles— Tokens to be searched.StringorArray(String)tokenizer— The tokenizer to use. Valid arguments aresplitByNonAlpha,ngrams,splitByString,array,sparseGrams, andasciiCJK. Optional, if not set explicitly, defaults tosplitByNonAlpha.const String
Returned value
Returns 1, if all needles match. 0, otherwise. UInt8
Examples
Basic usage with a string needle
CREATE TABLE table (
id UInt32,
msg String,
INDEX idx(msg) TYPE text(tokenizer = splitByString(['()', '\\']))
)
ENGINE = MergeTree
ORDER BY id;
INSERT INTO table VALUES (1, '()a,\\bc()d'), (2, '()\\a()bc\\d'), (3, ',()a\\,bc,(),d,');
SELECT count() FROM table WHERE hasAllTokens(msg, 'a\\d()');
┌─count()─┐
│ 1 │
└─────────┘
Specify needles to be searched for AS-IS (no tokenization) in an array
SELECT count() FROM table WHERE hasAllTokens(msg, ['a', 'd']);
┌─count()─┐
│ 1 │
└─────────┘
Generate needles using the tokens function
SELECT count() FROM table WHERE hasAllTokens(msg, tokens('a()d', 'splitByString', ['()', '\\']));
┌─count()─┐
│ 1 │
└─────────┘
Use a custom tokenizer via the 3rd argument
SELECT hasAllTokens('abcdef', 'abc', 'ngrams(3)');
┌─hasAllTokens('abcdef', 'abc', 'ngrams(3)')─┐
│ 1 │
└──────────────────────────────────────────────┘
Usage examples for array and map columns
CREATE TABLE log (
id UInt32,
tags Array(String),
attributes Map(String, String),
INDEX idx_tags (tags) TYPE text(tokenizer = splitByNonAlpha),
INDEX idx_attributes_keys mapKeys(attributes) TYPE text(tokenizer = array),
INDEX idx_attributes_vals mapValues(attributes) TYPE text(tokenizer = array)
)
ENGINE = MergeTree
ORDER BY id;
INSERT INTO log VALUES
(1, ['clickhouse', 'clickhouse cloud'], {'address': '192.0.0.1', 'log_level': 'INFO'}),
(2, ['chdb'], {'embedded': 'true', 'log_level': 'DEBUG'});
Example with an array column
SELECT count() FROM log WHERE hasAllTokens(tags, 'clickhouse');
┌─count()─┐
│ 1 │
└─────────┘
Example with mapKeys
SELECT count() FROM log WHERE hasAllTokens(mapKeys(attributes), ['address', 'log_level']);
┌─count()─┐
│ 1 │
└─────────┘
Example with mapValues
SELECT count() FROM log WHERE hasAllTokens(mapValues(attributes), ['192.0.0.1', 'DEBUG']);
┌─count()─┐
│ 0 │
└─────────┘
hasAnyTokens
Introduced in: v25.10.0
Returns 1, if at least one token in the needle string or array matches the input string, and 0 otherwise. If input is a column, returns all rows that satisfy this condition.
Note
Column input should have a text index defined for optimal performance.
If no text index is defined, the function performs a brute-force column scan which is orders of magnitude slower than an index lookup.
Prior to searching, the function tokenizes
- the
inputargument (always), and - the
needleargument (if given as a String) using the tokenizer specified for the text index. If the column has no text index defined, thesplitByNonAlphatokenizer is used instead. If theneedleargument is of type Array(String), each array element is treated as a token — no additional tokenization takes place.
Duplicate tokens are ignored. For example, ['ClickHouse', 'ClickHouse'] is treated the same as ['ClickHouse'].
Syntax
hasAnyTokens(input, needles)
Aliases: hasAnyToken
Arguments
input— The input column.StringorFixedStringorNullable(String)orNullable(FixedString)orArray(String)orArray(FixedString)orArray(Nullable(String))orArray(Nullable(FixedString))needles— Tokens to be searched.StringorArray(String)tokenizer— The tokenizer to use. Valid arguments aresplitByNonAlpha,ngrams,splitByString,array,sparseGrams, andasciiCJK. Optional, if not set explicitly, defaults tosplitByNonAlpha.const String
Returned value
Returns 1, if there was at least one match. 0, otherwise. UInt8
Examples
Basic usage with a string needle
CREATE TABLE table (
id UInt32,
msg String,
INDEX idx(msg) TYPE text(tokenizer = splitByString(['()', '\\']))
)
ENGINE = MergeTree
ORDER BY id;
INSERT INTO table VALUES (1, '()a,\\bc()d'), (2, '()\\a()bc\\d'), (3, ',()a\\,bc,(),d,');
SELECT count() FROM table WHERE hasAnyTokens(msg, 'a\\d()');
┌─count()─┐
│ 3 │
└─────────┘
Specify needles to be searched for AS-IS (no tokenization) in an array
SELECT count() FROM table WHERE hasAnyTokens(msg, ['a', 'd']);
┌─count()─┐
│ 3 │
└─────────┘
Generate needles using the tokens function
SELECT count() FROM table WHERE hasAnyTokens(msg, tokens('a()d', 'splitByString', ['()', '\\']));
┌─count()─┐
│ 3 │
└─────────┘
Usage examples for array and map columns
CREATE TABLE log (
id UInt32,
tags Array(String),
attributes Map(String, String),
INDEX idx_tags (tags) TYPE text(tokenizer = splitByNonAlpha),
INDEX idx_attributes_keys mapKeys(attributes) TYPE text(tokenizer = array),
INDEX idx_attributes_vals mapValues(attributes) TYPE text(tokenizer = array)
)
ENGINE = MergeTree
ORDER BY id;
INSERT INTO log VALUES
(1, ['clickhouse', 'clickhouse cloud'], {'address': '192.0.0.1', 'log_level': 'INFO'}),
(2, ['chdb'], {'embedded': 'true', 'log_level': 'DEBUG'});
Example with an array column
SELECT count() FROM log WHERE hasAnyTokens(tags, 'clickhouse');
┌─count()─┐
│ 1 │
└─────────┘
Example with mapKeys
SELECT count() FROM log WHERE hasAnyTokens(mapKeys(attributes), ['address', 'log_level']);
┌─count()─┐
│ 2 │
└─────────┘
Example with mapValues
SELECT count() FROM log WHERE hasAnyTokens(mapValues(attributes), ['192.0.0.1', 'DEBUG']);
┌─count()─┐
│ 2 │
└─────────┘
hasSubsequence
Introduced in: v23.7.0
Checks if a needle is a subsequence of a haystack. A subsequence of a string is a sequence that can be derived from another string by deleting some or no characters without changing the order of the remaining characters.
Syntax
hasSubsequence(haystack, needle)
Arguments
haystack— String in which to search for the subsequence.Stringneedle— Subsequence to be searched.String
Returned value
Returns 1 if needle is a subsequence of haystack, 0 otherwise. UInt8
Examples
Basic subsequence check
SELECT hasSubsequence('Hello World', 'HlWrd')
┌─hasSubsequence('Hello World', 'HlWrd')─┐
│ 1 │
└────────────────────────────────────────┘
No subsequence found
SELECT hasSubsequence('Hello World', 'xyz')
┌─hasSubsequence('Hello World', 'xyz')─┐
│ 0 │
└──────────────────────────────────────┘
hasSubsequenceCaseInsensitive
Introduced in: v23.7.0
Like hasSubsequence but searches case-insensitively.
Syntax
hasSubsequenceCaseInsensitive(haystack, needle)
Arguments
haystack— String in which the search is performed.Stringneedle— Subsequence to be searched.String
Returned value
Returns 1, if needle is a subsequence of haystack, 0 otherwise. UInt8
Examples
Usage example
SELECT hasSubsequenceCaseInsensitive('garbage', 'ARG');
┌─hasSubsequenceCaseInsensitive('garbage', 'ARG')─┐
│ 1 │
└─────────────────────────────────────────────────┘
hasSubsequenceCaseInsensitiveUTF8
Introduced in: v23.7.0
Like hasSubsequenceUTF8 but searches case-insensitively.
Syntax
hasSubsequenceCaseInsensitiveUTF8(haystack, needle)
Arguments
haystack— UTF8-encoded string in which the search is performed.Stringneedle— UTF8-encoded subsequence string to be searched.String
Returned value
Returns 1, if needle is a subsequence of haystack, 0 otherwise. UInt8
Examples
Usage example
SELECT hasSubsequenceCaseInsensitiveUTF8('ClickHouse - столбцовая система управления базами данных', 'СИСТЕМА');
┌─hasSubsequen⋯ 'СИСТЕМА')─┐
│ 1 │
└──────────────────────────┘
hasSubsequenceUTF8
Introduced in: v23.7.0
Like hasSubsequence but assumes haystack and needle are UTF-8 encoded strings.
Syntax
hasSubsequenceUTF8(haystack, needle)
Arguments
Returned value
Returns 1 if needle is a subsequence of haystack, otherwise 0. UInt8
Examples
Usage example
SELECT hasSubsequenceUTF8('картошка', 'кошка');
┌─hasSubsequen⋯', 'кошка')─┐
│ 1 │
└──────────────────────────┘
Non-matching subsequence
SELECT hasSubsequenceUTF8('картошка', 'апельсин');
┌─hasSubsequen⋯'апельсин')─┐
│ 0 │
└──────────────────────────┘
hasToken
Introduced in: v20.1.0
Checks if the given token is present in the haystack.
Uses splitByNonAlpha as tokenizer, i.e. a token is defined as the longest possible sub-sequence of consecutive characters [0-9A-Za-z_] (numbers, ASCII characters and underscore).
Syntax
hasToken(haystack, token)
Arguments
haystack— String to be searched.Stringtoken— Token to search for.const String
Returned value
Returns 1 if the token is found, 0 otherwise. UInt8
Examples
Token search
SELECT hasToken('clickhouse test', 'test')
┌─hasToken('clickhouse test', 'test')─┐
│ 1 │
└─────────────────────────────────────┘
hasTokenCaseInsensitive
Introduced in: v20.1.0
Performs case insensitive lookup of needle in haystack using tokenbf_v1 index.
Syntax
hasTokenCaseInsensitive(haystack, needle)
Arguments
- None.
Returned value
Examples
hasTokenCaseInsensitiveOrNull
Introduced in: v23.1.0
Performs case insensitive lookup of needle in haystack using tokenbf_v1 index. Returns null if needle is ill-formed.
Syntax
hasTokenCaseInsensitiveOrNull(haystack, needle)
Arguments
- None.
Returned value
Examples
hasTokenOrNull
Introduced in: v20.1.0
Like hasToken but returns null if token is ill-formed.
Syntax
hasTokenOrNull(haystack, token)
Arguments
haystack— String to be searched. Must be constant.Stringtoken— Token to search for.const String
Returned value
Returns 1 if the token is found, 0 otherwise, null if token is ill-formed. Nullable(UInt8)
Examples
Usage example
SELECT hasTokenOrNull('apple banana cherry', 'ban ana');
┌─hasTokenOrNu⋯ 'ban ana')─┐
│ ᴺᵁᴸᴸ │
└──────────────────────────┘
highlight
Introduced in: v26.4.0
Highlights occurrences of search terms in a text string by wrapping them with HTML tags.
The function performs ASCII case-insensitive matching. If multiple search terms overlap or are adjacent in the text, the matched regions are merged into a single highlighted span.
Syntax
highlight(haystack, needles[, open_tag, close_tag])
Arguments
haystack— The text to search in.StringorFixedStringneedles— An array of search terms to highlight.const Array(String)open_tag— The opening tag to insert before each match. Default:<em>.const Stringclose_tag— The closing tag to insert after each match. Default:</em>.const String
Returned value
Returns the input text with matched terms wrapped in the specified tags. String
Examples
Basic highlight
SELECT highlight('The quick brown fox', ['quick', 'fox'])
┌─highlight('The quick brown fox', ['quick', 'fox'])─┐
│ The <em>quick</em> brown <em>fox</em> │
└────────────────────────────────────────────────────┘
Custom tags
SELECT highlight('Hello World', ['hello'], '<b>', '</b>')
┌─highlight('Hello World', ['hello'], '<b>', '</b>')─┐
│ <b>Hello</b> World │
└────────────────────────────────────────────────────┘
ilike
Introduced in: v20.6.0
Like like but searches case-insensitively.
Syntax
ilike(haystack, pattern)
-- haystack ILIKE pattern
Arguments
haystack— String in which the search is performed.StringorFixedStringpattern— LIKE pattern to match against.String
Returned value
Returns 1 if the string matches the LIKE pattern (case-insensitive), otherwise 0. UInt8
Examples
Usage example
SELECT ilike('ClickHouse', '%house%');
┌─ilike('ClickHouse', '%house%')─┐
│ 1 │
└────────────────────────────────┘
like
Introduced in: v1.1.0
Returns whether string haystack matches the LIKE expression pattern.
A LIKE expression can contain normal characters and the following metasymbols:
%indicates an arbitrary number of arbitrary characters (including zero characters)._indicates a single arbitrary character.\is for escaping literals%,_and\.
Matching is based on UTF-8, e.g. _ matches the Unicode code point ¥ which is represented in UTF-8 using two bytes.
If the haystack or the LIKE expression are not valid UTF-8, the behavior is undefined.
No automatic Unicode normalization is performed, you can use the normalizeUTF8* functions for that.
To match against literal %, _ and \ (which are LIKE metacharacters), prepend them with a backslash: \%, \_ and \\.
The backslash loses its special meaning (i.e. is interpreted literally) if it prepends a character different than %, _ or \.
Note
ClickHouse requires backslashes in strings to be quoted as well, so you would actually need to write \\%, \\_ and \\\\.
For LIKE expressions of the form %needle%, the function is as fast as the position function.
All other LIKE expressions are internally converted to a regular expression and executed with a performance similar to function match.
Syntax
like(haystack, pattern)
-- haystack LIKE pattern
Arguments
haystack— String in which the search is performed.StringorFixedStringpattern—LIKEpattern to match against. Can contain%(matches any number of characters),_(matches single character), and\for escaping.String
Returned value
Returns 1 if the string matches the LIKE pattern, otherwise 0. UInt8
Examples
Usage example
SELECT like('ClickHouse', '%House');
┌─like('ClickHouse', '%House')─┐
│ 1 │
└──────────────────────────────┘
Single character wildcard
SELECT like('ClickHouse', 'Click_ouse');
┌─like('ClickH⋯lick_ouse')─┐
│ 1 │
└──────────────────────────┘
Non-matching pattern
SELECT like('ClickHouse', '%SQL%');
┌─like('ClickHouse', '%SQL%')─┐
│ 0 │
└─────────────────────────────┘
locate
Introduced in: v18.16.0
Like position but with arguments haystack and locate switched.
Version dependent behavior
The behavior of this function depends on the ClickHouse version:
- in versions < v24.3,
locatewas an alias of functionpositionand accepted arguments(haystack, needle[, start_pos]). - in versions >= 24.3,
locateis an individual function (for better compatibility with MySQL) and accepts arguments(needle, haystack[, start_pos]). The previous behavior can be restored using settingfunction_locate_has_mysql_compatible_argument_order = false.
Syntax
locate(needle, haystack[, start_pos])
Arguments
needle— Substring to be searched.Stringhaystack— String in which the search is performed.StringorEnumstart_pos— Optional. Position (1-based) inhaystackat which the search starts.UInt
Returned value
Returns starting position in bytes and counting from 1, if the substring was found, 0, if the substring was not found. UInt64
Examples
Basic usage
SELECT locate('ca', 'abcabc')
┌─locate('ca', 'abcabc')─┐
│ 3 │
└────────────────────────┘
match
Introduced in: v1.1.0
Checks if a provided string matches the provided regular expression pattern.
This function uses the RE2 regular expression library. Please refer to re2 for supported syntax.
Matching works under UTF-8 assumptions, e.g. ¥ uses two bytes internally but matching treats it as a single codepoint.
The regular expression must not contain NULL bytes.
If the haystack or the pattern are not valid UTF-8, the behavior is undefined.
Unlike re2's default behavior, . matches line breaks. To disable this, prepend the pattern with (?-s).
The pattern is not anchored. To match the entire string, anchor the pattern yourself using ^ and $.
If you just want to search for substrings, you can use functions like or position instead, which work much faster than this function.
Alternative operator syntax: haystack REGEXP pattern.
Syntax
Aliases: REGEXP_MATCHES
Arguments
haystack— String in which the pattern is searched.Stringpattern— Regular expression pattern.const String
Returned value
Returns 1 if the pattern matches, 0 otherwise. UInt8
Examples
Basic pattern matching
SELECT match('Hello World', 'Hello.*')
┌─match('Hello World', 'Hello.*')─┐
│ 1 │
└─────────────────────────────────┘
Pattern not matching
SELECT match('Hello World', 'goodbye.*')
┌─match('Hello World', 'goodbye.*')─┐
│ 0 │
└───────────────────────────────────┘
Matching a substring
SELECT match('abcde', 'b.*d'), match('abcde', '^b.*d$')
┌─match('abcde', 'b.*d')─┬─match('abcde', '^b.*d$')─┐
│ 1 │ 0 │
└─────────────────────────┴───────────────────────────┘
multiFuzzyMatchAllIndices
Introduced in: v20.1.0
Like multiFuzzyMatchAny but returns the array of all indices in any order that match the haystack within a constant edit distance.
Syntax
multiFuzzyMatchAllIndices(haystack, distance, [pattern1, pattern2, ..., patternN])
Arguments
haystack— String in which the search is performed.Stringdistance— The maximum edit distance for fuzzy matching.UInt8pattern— Array of patterns to match against.Array(String)
Returned value
Returns an array of all indices (starting from 1) that match the haystack within the specified edit distance in any order. Returns an empty array if no matches are found. Array(UInt64)
Examples
Usage example
SELECT multiFuzzyMatchAllIndices('ClickHouse', 2, ['ClickHouse', 'ClckHouse', 'ClickHose', 'House']);
┌─multiFuzzyMa⋯, 'House'])─┐
│ [3,1,4,2] │
└──────────────────────────┘
multiFuzzyMatchAny
Introduced in: v20.1.0
Like multiMatchAny but returns 1 if any pattern matches the haystack within a constant edit distance.
This function relies on the experimental feature of hyperscan library, and can be slow for some edge cases.
The performance depends on the edit distance value and patterns used, but it's always more expensive compared to non-fuzzy variants.
Note
multiFuzzyMatch*() function family do not support UTF-8 regular expressions (it treats them as a sequence of bytes) due to restrictions of hyperscan.
Syntax
multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, ..., patternN])
Arguments
haystack— String in which the search is performed.Stringdistance— The maximum edit distance for fuzzy matching.UInt8pattern— Optional. An array of patterns to match against.Array(String)
Returned value
Returns 1 if any pattern matches the haystack within the specified edit distance, otherwise 0. UInt8
Examples
Usage example
SELECT multiFuzzyMatchAny('ClickHouse', 2, ['ClickHouse', 'ClckHouse', 'ClickHose']);
┌─multiFuzzyMa⋯lickHose'])─┐
│ 1 │
└──────────────────────────┘
multiFuzzyMatchAnyIndex
Introduced in: v20.1.0
Like multiFuzzyMatchAny but returns any index that matches the haystack within a constant edit distance.
Syntax
multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, ..., patternn])
Arguments
haystack— String in which the search is performed.Stringdistance— The maximum edit distance for fuzzy matching.UInt8pattern— Array of patterns to match against.Array(String)
Returned value
Returns the index (starting from 1) of any pattern that matches the haystack within the specified edit distance, otherwise 0. UInt64
Examples
Usage example
SELECT multiFuzzyMatchAnyIndex('ClickHouse', 2, ['ClckHouse', 'ClickHose', 'ClickHouse']);
┌─multiFuzzyMa⋯ickHouse'])─┐
│ 2 │
└──────────────────────────┘
multiMatchAllIndices
Introduced in: v20.1.0
Like multiMatchAny but returns the array of all indices that match the haystack in any order.
Syntax
multiMatchAllIndices(haystack, [pattern1, pattern2, ..., patternn])
Arguments
haystack— String in which the search is performed.Stringpattern— Regular expressions to match against.String
Returned value
Array of all indices (starting from 1) that match the haystack in any order. Returns an empty array if no matches are found. Array(UInt64)
Examples
Usage example
SELECT multiMatchAllIndices('ClickHouse', ['[0-9]', 'House', 'Click', 'ouse']);
┌─multiMatchAl⋯', 'ouse'])─┐
│ [3, 2, 4] │
└──────────────────────────┘
multiMatchAny
Introduced in: v20.1.0
Check if at least one of multiple regular expression patterns matches a haystack.
If you only want to search multiple substrings in a string, you can use function multiSearchAny instead - it works much faster than this function.
Syntax
multiMatchAny(haystack, pattern1[, pattern2, ...])
Arguments
haystack— String in which patterns are searched.Stringpattern1[, pattern2, ...]— An array of one or more regular expression patterns.Array(String)
Returned value
Returns 1 if any pattern matches, 0 otherwise. UInt8
Examples
Multiple pattern matching
SELECT multiMatchAny('Hello World', ['Hello.*', 'foo.*'])
┌─multiMatchAny('Hello World', ['Hello.*', 'foo.*'])─┐
│ 1 │
└────────────────────────────────────────────────────┘
No patterns match
SELECT multiMatchAny('Hello World', ['goodbye.*', 'foo.*'])
┌─multiMatchAny('Hello World', ['goodbye.*', 'foo.*'])─┐
│ 0 │
└──────────────────────────────────────────────────────┘
multiMatchAnyIndex
Introduced in: v20.1.0
Like multiMatchAny but returns any index that matches the haystack.
Syntax
multiMatchAnyIndex(haystack, [pattern1, pattern2, ..., patternn])
Arguments
haystack— String in which the search is performed.Stringpattern— Regular expressions to match against.Array(String)
Returned value
Returns the index (starting from 1) of the first pattern that matches, or 0 if no match is found. UInt64
Examples
Usage example
SELECT multiMatchAnyIndex('ClickHouse', ['[0-9]', 'House', 'Click']);
┌─multiMatchAn⋯, 'Click'])─┐
│ 3 │
└──────────────────────────┘
multiSearchAllPositions
Introduced in: v20.1.0
Like position but returns an array of positions (in bytes, starting at 1) for multiple needle substrings in a haystack string.
All multiSearch*() functions only support up to 2^8 needles.
Syntax
multiSearchAllPositions(haystack, needle1[, needle2, ...])
Arguments
haystack— String in which the search is performed.Stringneedle1[, needle2, ...]— An array of one or more substrings to be searched.Array(String)
Returned value
Returns array of the starting position in bytes and counting from 1, if the substring was found, 0, if the substring was not found. Array(UInt64)
Examples
Multiple needle search
SELECT multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])
┌─multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])─┐
│ [0,13,0] │
└───────────────────────────────────────────────────────────────────┘
multiSearchAllPositionsCaseInsensitive
Introduced in: v20.1.0
Like multiSearchAllPositions but ignores case.
Syntax
multiSearchAllPositionsCaseInsensitive(haystack, needle1[, needle2, ...])
Arguments
haystack— String in which the search is performed.Stringneedle1[, needle2, ...]— An array of one or more substrings to be searched.Array(String)
Returned value
Returns array of the starting position in bytes and counting from 1 (if the substring was found), 0 if the substring was not found. Array(UInt64)
Examples
Case insensitive multi-search
SELECT multiSearchAllPositionsCaseInsensitive('ClickHouse',['c','h'])
┌─multiSearchA⋯['c', 'h'])─┐
│ [1,6] │
└──────────────────────────┘
multiSearchAllPositionsCaseInsensitiveUTF8
Introduced in: v20.1.0
Like multiSearchAllPositionsUTF8 but ignores case.
Syntax
multiSearchAllPositionsCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack— UTF-8 encoded string in which the search is performed.Stringneedle— UTF-8 encoded substrings to be searched.Array(String)
Returned value
Array of the starting position in bytes and counting from 1 (if the substring was found). Returns 0 if the substring was not found. Array
Examples
Case-insensitive UTF-8 search
SELECT multiSearchAllPositionsCaseInsensitiveUTF8('Здравствуй, мир!', ['здравствуй', 'МИР']);
┌─multiSearchA⋯й', 'МИР'])─┐
│ [1, 13] │
└──────────────────────────┘
multiSearchAllPositionsUTF8
Introduced in: v20.1.0
Like multiSearchAllPositions but assumes haystack and the needle substrings are UTF-8 encoded strings.
Syntax
multiSearchAllPositionsUTF8(haystack, needle1[, needle2, ...])
Arguments
haystack— UTF-8 encoded string in which the search is performed.Stringneedle1[, needle2, ...]— An array of UTF-8 encoded substrings to be searched.Array(String)
Returned value
Returns array of the starting position in bytes and counting from 1 (if the substring was found), 0 if the substring was not found. Array
Examples
UTF-8 multi-search
SELECT multiSearchAllPositionsUTF8('ClickHouse',['C','H'])
┌─multiSearchAllPositionsUTF8('ClickHouse', ['C', 'H'])─┐
│ [1,6] │
└───────────────────────────────────────────────────────┘
multiSearchAny
Introduced in: v20.1.0
Checks if at least one of a number of needle strings matches the haystack string.
Functions multiSearchAnyCaseInsensitive, multiSearchAnyUTF8 and multiSearchAnyCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.
Syntax
multiSearchAny(haystack, needle1[, needle2, ...])
Arguments
haystack— String in which the search is performed.Stringneedle1[, needle2, ...]— An array of substrings to be searched.Array(String)
Returned value
Returns 1, if there was at least one match, otherwise 0, if there was not at least one match. UInt8
Examples
Any match search
SELECT multiSearchAny('ClickHouse',['C','H'])
┌─multiSearchAny('ClickHouse', ['C', 'H'])─┐
│ 1 │
└──────────────────────────────────────────┘
multiSearchAnyCaseInsensitive
Introduced in: v20.1.0
Like multiSearchAny but ignores case.
Syntax
multiSearchAnyCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack— String in which the search is performed.Stringneedle— Substrings to be searched.Array(String)
Returned value
Returns 1, if there was at least one case-insensitive match, otherwise 0, if there was not at least one case-insensitive match. UInt8
Examples
Case insensitive search
SELECT multiSearchAnyCaseInsensitive('ClickHouse',['c','h'])
┌─multiSearchAnyCaseInsensitive('ClickHouse', ['c', 'h'])─┐
│ 1 │
└─────────────────────────────────────────────────────────┘
multiSearchAnyCaseInsensitiveUTF8
Introduced in: v20.1.0
Like multiSearchAnyUTF8 but ignores case.
Syntax
multiSearchAnyCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack— UTF-8 string in which the search is performed.Stringneedle— UTF-8 substrings to be searched.Array(String)
Returned value
Returns 1, if there was at least one case-insensitive match, otherwise 0, if there was not at least one case-insensitive match. UInt8
Examples
Given a UTF-8 string 'Здравствуйте', check if character 'з' (lowercase) is present
SELECT multiSearchAnyCaseInsensitiveUTF8('Здравствуйте',['з'])
┌─multiSearchA⋯те', ['з'])─┐
│ 1 │
└──────────────────────────┘
multiSearchAnyUTF8
Introduced in: v20.1.0
Like multiSearchAny but assumes haystack and the needle substrings are UTF-8 encoded strings.
Syntax
multiSearchAnyUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack— UTF-8 string in which the search is performed.Stringneedle— UTF-8 substrings to be searched.Array(String)
Returned value
Returns 1, if there was at least one match, otherwise 0, if there was not at least one match. UInt8
Examples
Given '你好,世界' ('Hello, world') as a UTF-8 string, check if there are any 你 or 界 characters in the string
SELECT multiSearchAnyUTF8('你好,世界', ['你', '界'])
┌─multiSearchA⋯你', '界'])─┐
│ 1 │
└──────────────────────────┘
multiSearchFirstIndex
Introduced in: v20.1.0
Searches for multiple needle strings in a haystack string (case-sensitive) and returns the 1-based index of the first needle found.
Syntax
multiSearchFirstIndex(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack— The string to search in.Stringneedles— Array of strings to search for.Array(String)
Returned value
Returns the 1-based index (position in the needles array) of the first needle found in the haystack. Returns 0 if no needles are found. The search is case-sensitive. UInt64
Examples
Usage example
SELECT multiSearchFirstIndex('ClickHouse Database', ['Click', 'Database', 'Server']);
┌─multiSearchF⋯ 'Server'])─┐
│ 1 │
└──────────────────────────┘
Case-sensitive behavior
SELECT multiSearchFirstIndex('ClickHouse Database', ['CLICK', 'Database', 'Server']);
┌─multiSearchF⋯ 'Server'])─┐
│ 2 │
└──────────────────────────┘
No match found
SELECT multiSearchFirstIndex('Hello World', ['goodbye', 'test']);
┌─multiSearchF⋯', 'test'])─┐
│ 0 │
└──────────────────────────┘
multiSearchFirstIndexCaseInsensitive
Introduced in: v20.1.0
Returns the index i (starting from 1) of the leftmost found needle_i in the string haystack and 0 otherwise.
Ignores case.
Syntax
multiSearchFirstIndexCaseInsensitive(haystack, [needle1, needle2, ..., needleN]
Arguments
haystack— String in which the search is performed.Stringneedle— Substrings to be searched.Array(String)
Returned value
Returns the index (starting from 1) of the leftmost found needle. Otherwise 0, if there was no match. UInt8
Examples
Usage example
SELECT multiSearchFirstIndexCaseInsensitive('hElLo WoRlD', ['World', 'Hello']);
┌─multiSearchF⋯, 'Hello'])─┐
│ 1 │
└──────────────────────────┘
multiSearchFirstIndexCaseInsensitiveUTF8
Introduced in: v20.1.0
Searches for multiple needle strings in a haystack string, case-insensitively with UTF-8 encoding support, and returns the 1-based index of the first needle found.
Syntax
multiSearchFirstIndexCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack— The string to search in.Stringneedles— Array of strings to search for.Array(String)
Returned value
Returns the 1-based index (position in the needles array) of the first needle found in the haystack. Returns 0 if no needles are found. The search is case-insensitive and respects UTF-8 character encoding. UInt64
Examples
Usage example
SELECT multiSearchFirstIndexCaseInsensitiveUTF8('ClickHouse Database', ['CLICK', 'data', 'server']);
┌─multiSearchF⋯ 'server'])─┐
│ 1 │
└──────────────────────────┘
UTF-8 case handling
SELECT multiSearchFirstIndexCaseInsensitiveUTF8('Привет Мир', ['мир', 'ПРИВЕТ']);
┌─multiSearchF⋯ 'ПРИВЕТ'])─┐
│ 1 │
└──────────────────────────┘
No match found
SELECT multiSearchFirstIndexCaseInsensitiveUTF8('Hello World', ['goodbye', 'test']);
┌─multiSearchF⋯', 'test'])─┐
│ 0 │
└──────────────────────────┘
multiSearchFirstIndexUTF8
Introduced in: v20.1.0
Returns the index i (starting from 1) of the leftmost found needle_i in the string haystack and 0 otherwise.
Assumes haystack and needle are UTF-8 encoded strings.
Syntax
multiSearchFirstIndexUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack— UTF-8 string in which the search is performed.Stringneedle— Array of UTF-8 substrings to be searched.Array(String)
Returned value
Returns the index (starting from 1) of the leftmost found needle. Otherwise 0, if there was no match. UInt8
Examples
Usage example
SELECT multiSearchFirstIndexUTF8('Здравствуйте мир', ['мир', 'здравствуйте']);
┌─multiSearchF⋯вствуйте'])─┐
│ 1 │
└──────────────────────────┘
multiSearchFirstPosition
Introduced in: v20.1.0
Like position but returns the leftmost offset in a haystack string which matches any of multiple needle strings.
Functions multiSearchFirstPositionCaseInsensitive, multiSearchFirstPositionUTF8 and multiSearchFirstPositionCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.
Syntax
multiSearchFirstPosition(haystack, needle1[, needle2, ...])
Arguments
haystack— String in which the search is performed.Stringneedle1[, needle2, ...]— An array of one or more substrings to be searched.Array(String)
Returned value
Returns the leftmost offset in a haystack string which matches any of multiple needle strings, otherwise 0, if there was no match. UInt64
Examples
First position search
SELECT multiSearchFirstPosition('Hello World',['llo', 'Wor', 'ld'])
┌─multiSearchFirstPosition('Hello World', ['llo', 'Wor', 'ld'])─┐
│ 3 │
└───────────────────────────────────────────────────────────────┘
multiSearchFirstPositionCaseInsensitive
Introduced in: v20.1.0
Like multiSearchFirstPosition but ignores case.
Syntax
multiSearchFirstPositionCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack— String in which the search is performed.Stringneedle— Array of substrings to be searched.Array(String)
Returned value
Returns the leftmost offset in a haystack string which matches any of multiple needle strings. Returns 0, if there was no match. UInt64
Examples
Case insensitive first position
SELECT multiSearchFirstPositionCaseInsensitive('HELLO WORLD',['wor', 'ld', 'ello'])
┌─multiSearchFirstPositionCaseInsensitive('HELLO WORLD', ['wor', 'ld', 'ello'])─┐
│ 2 │
└───────────────────────────────────────────────────────────────────────────────┘
multiSearchFirstPositionCaseInsensitiveUTF8
Introduced in: v20.1.0
Like multiSearchFirstPosition but assumes haystack and needle to be UTF-8 strings and ignores case.
Syntax
multiSearchFirstPositionCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack— UTF-8 string in which the search is performed.Stringneedle— Array of UTF-8 substrings to be searched.Array(String)
Returned value
Returns the leftmost offset in a haystack string which matches any of multiple needle strings, ignoring case. Returns 0, if there was no match. UInt64
Examples
Find the leftmost offset in UTF-8 string 'Здравствуй, мир' ('Hello, world') which matches any of the given needles
SELECT multiSearchFirstPositionCaseInsensitiveUTF8('Здравствуй, мир', ['МИР', 'вст', 'Здра'])
┌─multiSearchFirstPositionCaseInsensitiveUTF8('Здравствуй, мир', ['мир', 'вст', 'Здра'])─┐
│ 3 │
└────────────────────────────────────────────────────────────────────────────────────────┘
multiSearchFirstPositionUTF8
Introduced in: v20.1.0
Like multiSearchFirstPosition but assumes haystack and needle to be UTF-8 strings.
Syntax
multiSearchFirstPositionUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack— UTF-8 string in which the search is performed.Stringneedle— Array of UTF-8 substrings to be searched.Array(String)
Returned value
Leftmost offset in a haystack string which matches any of multiple needle strings. Returns 0, if there was no match. UInt64
Examples
Find the leftmost offset in UTF-8 string 'Здравствуй, мир' ('Hello, world') which matches any of the given needles
SELECT multiSearchFirstPositionUTF8('Здравствуй, мир',['мир', 'вст', 'авст'])
┌─multiSearchFirstPositionUTF8('Здравствуй, мир', ['мир', 'вст', 'авст'])─┐
│ 3 │
└─────────────────────────────────────────────────────────────────────────┘
ngramDistance
Introduced in: v20.1.0
Calculates the 4-gram distance between two strings. For this, it counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. The smaller the returned value, the more similar the strings are.
For case-insensitive search or/and in UTF8 format use functions ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseInsensitiveUTF8.
Syntax
ngramDistance(haystack, needle)
Arguments
Returned value
Returns a Float32 number between 0 and 1. The smaller the returned value, the more similar the strings are. Float32
Examples
Calculate 4-gram distance
SELECT ngramDistance('ClickHouse', 'ClickHouses')
┌─ngramDistance('ClickHouse', 'ClickHouses')─┐
│ 0.1 │
└────────────────────────────────────────────┘
ngramDistanceCaseInsensitive
Introduced in: v20.1.0
Provides a case-insensitive variant of ngramDistance.
Calculates the 4-gram distance between two strings, ignoring case.
The smaller the returned value, the more similar the strings are.
Syntax
ngramDistanceCaseInsensitive(haystack, needle)
Arguments
Returned value
Returns a Float32 number between 0 and 1. Float32
Examples
Case-insensitive 4-gram distance
SELECT ngramDistanceCaseInsensitive('ClickHouse','clickhouse')
┌─ngramDistanceCaseInsensitive('ClickHouse','clickhouse')─┐
│ 0 │
└─────────────────────────────────────────────────────────┘
ngramDistanceCaseInsensitiveUTF8
Introduced in: v20.1.0
Provides a case-insensitive UTF-8 variant of ngramDistance.
Assumes that needle and haystack strings are UTF-8 encoded strings and ignores case.
Calculates the 3-gram distance between two UTF-8 strings, ignoring case.
The smaller the returned value, the more similar the strings are.
Syntax
ngramDistanceCaseInsensitiveUTF8(haystack, needle)
Arguments
haystack— First UTF-8 encoded comparison string.Stringneedle— Second UTF-8 encoded comparison string.String
Returned value
Returns a Float32 number between 0 and 1. Float32
Examples
Case-insensitive UTF-8 3-gram distance
SELECT ngramDistanceCaseInsensitiveUTF8('abcde','CDE')
┌─ngramDistanceCaseInsensitiveUTF8('abcde','CDE')─┐
│ 0.5 │
└─────────────────────────────────────────────────┘
ngramDistanceUTF8
Introduced in: v20.1.0
Provides a UTF-8 variant of ngramDistance.
Assumes that needle and haystack strings are UTF-8 encoded strings.
Calculates the 3-gram distance between two UTF-8 strings.
The smaller the returned value, the more similar the strings are.
Syntax
ngramDistanceUTF8(haystack, needle)
Arguments
haystack— First UTF-8 encoded comparison string.Stringneedle— Second UTF-8 encoded comparison string.String
Returned value
Returns a Float32 number between 0 and 1. Float32
Examples
UTF-8 3-gram distance
SELECT ngramDistanceUTF8('abcde','cde')
┌─ngramDistanceUTF8('abcde','cde')─┐
│ 0.5 │
└───────────────────────────────────┘
ngramSearch
Introduced in: v20.1.0
Checks if the 4-gram distance between two strings is less than or equal to a given threshold.
For case-insensitive search or/and in UTF8 format use functions ngramSearchCaseInsensitive, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8.
Syntax
ngramSearch(haystack, needle)
Arguments
Returned value
Returns 1 if the 4-gram distance between the strings is less than or equal to a threshold (1.0 by default), 0 otherwise. UInt8
Examples
Search using 4-grams
SELECT ngramSearch('ClickHouse', 'Click')
┌─ngramSearch('ClickHouse', 'Click')─┐
│ 1 │
└────────────────────────────────────┘
ngramSearchCaseInsensitive
Introduced in: v20.1.0
Provides a case-insensitive variant of ngramSearch.
Calculates the non-symmetric difference between a needle string and a haystack string, i.e. the number of n-grams from the needle minus the common number of n-grams normalized by the number of needle n-grams.
Checks if the 4-gram distance between two strings is less than or equal to a given threshold, ignoring case.
Syntax
ngramSearchCaseInsensitive(haystack, needle)
Arguments
Returned value
Returns 1 if the 4-gram distance between the strings is less than or equal to a threshold (1.0 by default), 0 otherwise. UInt8
Examples
Case-insensitive search using 4-grams
SELECT ngramSearchCaseInsensitive('Hello World','hello')
┌─ngramSearchCaseInsensitive('Hello World','hello')─┐
│ 1 │
└────────────────────────────────────────────────────┘
ngramSearchCaseInsensitiveUTF8
Introduced in: v20.1.0
Provides a case-insensitive UTF-8 variant of ngramSearch.
Assumes haystack and needle to be UTF-8 strings and ignores case.
Checks if the 3-gram distance between two UTF-8 strings is less than or equal to a given threshold, ignoring case.
Syntax
ngramSearchCaseInsensitiveUTF8(haystack, needle)
Arguments
Returned value
Returns 1 if the 3-gram distance between the strings is less than or equal to a threshold (1.0 by default), 0 otherwise. UInt8
Examples
Case-insensitive UTF-8 search using 3-grams
SELECT ngramSearchCaseInsensitiveUTF8('абвГДЕёжз', 'АбвгдЕЁжз')
┌─ngramSearchCaseInsensitiveUTF8('абвГДЕёжз', 'АбвгдЕЁжз')─┐
│ 1 │
└──────────────────────────────────────────────────────────┘
ngramSearchUTF8
Introduced in: v20.1.0
Provides a UTF-8 variant of ngramSearch.
Assumes haystack and needle to be UTF-8 strings.
Checks if the 3-gram distance between two UTF-8 strings is less than or equal to a given threshold.
Syntax
ngramSearchUTF8(haystack, needle)
Arguments
Returned value
Returns 1 if the 3-gram distance between the strings is less than or equal to a threshold (1.0 by default), 0 otherwise. UInt8
Examples
UTF-8 search using 3-grams
SELECT ngramSearchUTF8('абвгдеёжз', 'гдеёзд')
┌─ngramSearchUTF8('абвгдеёжз', 'гдеёзд')─┐
│ 1 │
└────────────────────────────────────────┘
notILike
Introduced in: v20.6.0
Checks whether a string does not match a pattern, case-insensitive. The pattern can contain special characters % and _ for SQL LIKE matching.
Syntax
notILike(haystack, pattern)
Arguments
haystack— The input string to search in.StringorFixedStringpattern— The SQL LIKE pattern to match against.%matches any number of characters (including zero),_matches exactly one character.String
Returned value
Returns 1 if the string does not match the pattern (case-insensitive), otherwise 0. UInt8
Examples
Usage example
SELECT notILike('ClickHouse', '%house%');
┌─notILike('Cl⋯ '%house%')─┐
│ 0 │
└──────────────────────────┘
notLike
Introduced in: v1.1.0
Similar to like but negates the result.
Syntax
notLike(haystack, pattern)
-- haystack NOT LIKE pattern
Arguments
haystack— String in which the search is performed.StringorFixedStringpattern— LIKE pattern to match against.String
Returned value
Returns 1 if the string does not match the LIKE pattern, otherwise 0. UInt8
Examples
Usage example
SELECT notLike('ClickHouse', '%House%');
┌─notLike('Cli⋯ '%House%')─┐
│ 0 │
└──────────────────────────┘
Non-matching pattern
SELECT notLike('ClickHouse', '%SQL%');
┌─notLike('Cli⋯', '%SQL%')─┐
│ 1 │
└──────────────────────────┘
position
Introduced in: v1.1.0
Returns the position (in bytes, starting at 1) of a substring needle in a string haystack.
If substring needle is empty, these rules apply:
- if no
start_poswas specified: return1 - if
start_pos = 0: return1 - if
start_pos >= 1andstart_pos <= length(haystack) + 1: returnstart_pos - otherwise: return
0
The same rules also apply to functions locate, positionCaseInsensitive, positionUTF8 and positionCaseInsensitiveUTF8.
Syntax
position(haystack, needle[, start_pos])
Arguments
haystack— String in which the search is performed.StringorEnumneedle— Substring to be searched.Stringstart_pos— Position (1-based) inhaystackat which the search starts. Optional.UInt
Returned value
Returns starting position in bytes and counting from 1, if the substring was found, otherwise 0, if the substring was not found. UInt64
Examples
Basic usage
SELECT position('Hello, world!', '!')
┌─position('Hello, world!', '!')─┐
│ 13 │
└────────────────────────────────┘
With start_pos argument
SELECT position('Hello, world!', 'o', 1), position('Hello, world!', 'o', 7)
┌─position('Hello, world!', 'o', 1)─┬─position('Hello, world!', 'o', 7)─┐
│ 5 │ 9 │
└───────────────────────────────────┴───────────────────────────────────┘
Needle IN haystack syntax
SELECT 6 = position('/' IN s) FROM (SELECT 'Hello/World' AS s)
┌─equals(6, position(s, '/'))─┐
│ 1 │
└─────────────────────────────┘
Empty needle substring
SELECT position('abc', ''), position('abc', '', 0), position('abc', '', 1), position('abc', '', 2), position('abc', '', 3), position('abc', '', 4), position('abc', '', 5)
┌─position('abc', '')─┬─position('abc', '', 0)─┬─position('abc', '', 1)─┬─position('abc', '', 2)─┬─position('abc', '', 3)─┬─position('abc', '', 4)─┬─position('abc', '', 5)─┐
│ 1 │ 1 │ 1 │ 2 │ 3 │ 4 │ 0 │
└─────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┘
positionCaseInsensitive
Introduced in: v1.1.0
Like position but case-insensitive.
Syntax
positionCaseInsensitive(haystack, needle[, start_pos])
Aliases: instr
Arguments
haystack— String in which the search is performed.StringorEnumneedle— Substring to be searched.Stringstart_pos— Optional. Position (1-based) inhaystackat which the search starts.UInt*
Returned value
Returns starting position in bytes and counting from 1, if the substring was found, otherwise 0, if the substring was not found. UInt64
Examples
Case insensitive search
SELECT positionCaseInsensitive('Hello, world!', 'hello')
┌─positionCaseInsensitive('Hello, world!', 'hello')─┐
│ 1 │
└───────────────────────────────────────────────────┘
positionCaseInsensitiveUTF8
Introduced in: v1.1.0
Like positionUTF8 but searches case-insensitively.
Syntax
positionCaseInsensitiveUTF8(haystack, needle[, start_pos])
Arguments
haystack— String in which the search is performed.StringorEnumneedle— Substring to be searched.Stringstart_pos— Optional. Position (1-based) inhaystackat which the search starts.UInt*
Returned value
Returns starting position in bytes and counting from 1, if the substring was found, otherwise 0, if the substring was not found. UInt64
Examples
Case insensitive UTF-8 search
SELECT positionCaseInsensitiveUTF8('Привет мир', 'МИР')
┌─positionCaseInsensitiveUTF8('Привет мир', 'МИР')─┐
│ 8 │
└──────────────────────────────────────────────────┘
positionUTF8
Introduced in: v1.1.0
Like position but assumes haystack and needle are UTF-8 encoded strings.
Syntax
positionUTF8(haystack, needle[, start_pos])
Arguments
haystack— String in which the search is performed.StringorEnumneedle— Substring to be searched.Stringstart_pos— Optional. Position (1-based) inhaystackat which the search starts.UInt*
Returned value
Returns starting position in bytes and counting from 1, if the substring was found, otherwise 0, if the substring was not found. UInt64
Examples
UTF-8 character counting
SELECT positionUTF8('Motörhead', 'r')
┌─position('Motörhead', 'r')─┐
│ 5 │
└────────────────────────────┘