Regular Expression Search

Regular expression (regex) queries search text using pattern matching. V‑Spark regex syntax is briefly described in the following sections. Regex searches can be performed in the dashboard icon Dashboard Files view by selecting the Regex option for search text. Regex queries in V‑Spark will only match whole terms as with plain text queries. There are no partial matches.

Note:

Regex queries operate on individual terms and cannot be used to match multi-word phrases. For each regex query, the search engine scans the list of terms in the inverted index to find all matching terms. It then retrieves all documents for each term.

This means that running a regex query that matches many unique terms can be very resource intensive. Users should avoid using a pattern that starts with a wildcard (for example, *.foo).

For more information on regex syntax as used in V‑Spark, refer to the documentation for Elasticsearch 1.4.

Allowed characters

Any Unicode characters may be used in the pattern, but certain characters are reserved. The reserved characters are:

. ? + * | { } [ ] ( ) # @ & < > ~" \
Note: ^ and $ are not reserved characters.

Anchoring

Most regex search engines will match any part of a word. In these cases ^ and $ are used to anchor searches to the beginning and end of a word, respectively. However, since V‑Spark regex searches will only match whole words, these special anchors are not required and not valid except as literal characters. As an example, for the word "abcde":

ab.* # match
abcd # no match
^abcd # no match
abcd$ # no match

Characters and examples

Table 1. Regular Expression Examples

Character

Meaning

Example Text

Example Queries

Match?

.

Match any character

ab

a.

Yes

+

Match preceding shortest pattern 1 or more times

aaabbb

aaabbb

a+b+

a+b+c+

Yes

No

*

Match the preceding shortest pattern 0 or more times

aaabbb

aaabbb

a*b*

a*b*c*

Yes

No

?

Match the preceding shortest pattern 0 or 1 times

aaabbb

aaabbbc?

Yes

{n}

Indicate a minimum number of times the preceding shortest pattern should match

aaabbb

a{3}b{3}

a{4}a{4}

Yes

No

{n,n}

Indicate a minimum and maximum number of times the preceding shortest pattern should match

aaabbb

a{2,3}b{2}

Yes

( )

Group characters to form subpatterns

ababab

(ab)+

Yes

|

OR (applies to the longest pattern on either side)

aaabbb

a+|ccc

aaa|bbb

Yes

No

[ ]

Indicate lists or ranges of characters. ^ negates characters.

abc

[a-z]*

[^a-z]

Yes

No

~

Negate the following shortest pattern

abcdef

abc~df

ab~ef

Yes

No

< >

Range of numeric values

99

<1-100>

<100-101>

<99-100>

Yes

No

Yes

&

AND - text must meet conditions on both sides

aaabbb

aaa.+&.+bbb

aaa&bbb

Yes

No

@

Match any word

hello

@

@&~(hello)

Yes

No