Regular Expression Search
Regular expression (regex) queries search text using pattern matching. V‑Spark regex syntax is briefly described in the following sections. Regex searches can be performed in the Dashboard Files view by selecting the Regex option for search text. Regex queries in V‑Spark will only match whole terms as with plain text queries. There are no partial matches.
Regex queries operate on individual terms and cannot be used to match multi-word phrases. For each regex query, the search engine scans the list of terms in the inverted index to find all matching terms. It then retrieves all documents for each term.
This means that running a regex query that matches many unique terms can be very resource intensive. Users should avoid using a pattern that starts with a wildcard (for example, *.foo).
For more information on regex syntax as used in V‑Spark, refer to the documentation for Elasticsearch 1.4.
Allowed characters
Any Unicode characters may be used in the pattern, but certain characters are reserved. The reserved characters are:
. ? + * | { } [ ] ( ) # @ & < > ~" \
Anchoring
Most regex search engines will match any part of a word. In these cases ^ and $ are used to anchor searches to the beginning and end of a word, respectively. However, since V‑Spark regex searches will only match whole words, these special anchors are not required and not valid except as literal characters. As an example, for the word "abcde":
ab.* # match
abcd # no match
^abcd # no match
abcd$ # no match
Characters and examples
Character |
Meaning |
Example Text |
Example Queries |
Match? |
---|---|---|---|---|
. |
Match any character |
ab |
a. |
Yes |
+ |
Match preceding shortest pattern 1 or more times |
aaabbb aaabbb |
a+b+ a+b+c+ |
Yes No |
* |
Match the preceding shortest pattern 0 or more times |
aaabbb aaabbb |
a*b* a*b*c* |
Yes No |
? |
Match the preceding shortest pattern 0 or 1 times |
aaabbb |
aaabbbc? |
Yes |
{n} |
Indicate a minimum number of times the preceding shortest pattern should match |
aaabbb |
a{3}b{3} a{4}a{4} |
Yes No |
{n,n} |
Indicate a minimum and maximum number of times the preceding shortest pattern should match |
aaabbb |
a{2,3}b{2} |
Yes |
( ) |
Group characters to form subpatterns |
ababab |
(ab)+ |
Yes |
| |
OR (applies to the longest pattern on either side) |
aaabbb |
a+|ccc aaa|bbb |
Yes No |
[ ] |
Indicate lists or ranges of characters. ^ negates characters. |
abc |
[a-z]* [^a-z] |
Yes No |
~ |
Negate the following shortest pattern |
abcdef |
abc~df ab~ef |
Yes No |
< > |
Range of numeric values |
99 |
<1-100> <100-101> <99-100> |
Yes No Yes |
& |
AND - text must meet conditions on both sides |
aaabbb |
aaa.+&.+bbb aaa&bbb |
Yes No |
@ |
Match any word |
hello |
@ @&~(hello) |
Yes No |