What is a question mark in the regular expression

Regular Expressions (RegEx) Quick Reference

Table of Contents

Basics

Find a match anywhere: By default, a regular expression will match, if any, to a any Find position within a character string. For example, the regular expression abc would be found in abc123, 123abc, and 123abcxyz. If you only want a match to be found at the beginning or end of a string, use an anchor.

Escaped characters: Most characters like abc123 have no special function and can be used normally in a regular expression. The characters \.*?+[{|()^$ however, have a special function and must therefore be provided with a backslash if their function is to be overridden. For example, \. interpreted as a normal period and \ as a normal backslash. To treat not just one character but a whole range of characters as normal characters, use \ Q ... \ E. For example: \ QNormal Text \ E.

Upper / lower case sensitive: By default, regular expressions are case-sensitive (that is, case-sensitive). This can be changed with the "i" option. The search pattern i) abc would search for "abc", for example, without taking upper / lower case into account. See the table below for more options.

Options (case sensitive)

At the very beginning of a regular expression, include zero or more of the following options, followed by a closing parenthesis. The search pattern in) abc, for example, searches for "abc" with the options "not case sensitive" and "multiline" (the round bracket can be omitted if no options are available). This type of option specification has the advantage over conventional methods that no special delimiters (such as a slash) are required and consequently they do not have to be escaped within the search pattern. It also makes it easier to parse the options, which has a positive impact on performance.

optiondescription
iNon-case sensitive match. This option causes the letters A through Z and the corresponding lower case letters to be treated equally.
m

Multiline mode. haystack is not viewed as a continuous line, but as a collection of individual lines (if it contains line breaks). The following changes come into effect:

1) Circumflex (^) finds a match after every line break, as well as at the beginning of haystack (but not after a line break at the very end of haystack).

2) Dollar sign ($) matches before each line break (as well as at the very end of haystack).

For example, the search pattern m) ^ abc $ is found in xyz`r`nabc. Without the "m" option, this would not be the case.

The "D" option is ignored if the "m" option is present.

sDotAll mode. Causes a period (.) To match all types of characters, including line breaks (normally the period does not find line breaks). Note that a line break is usually two characters (`r`n), so two periods are required to match. This option does not affect negative classes such as B. [^ a] - these types of classes always find line breaks.
xCauses any whitespace characters in the search pattern to be ignored unless they are escaped or are in a character class. The characters `n and` t are also ignored because when they reach PCRE they are already raw / normal whitespace characters (whereas \ n and \ t are not ignored because they are PCRE escape sequences). The "x" option also ignores a character string that begins with a non-escaped hash (#) outside a character class and ends with a line break. This makes it possible to insert comments into a complicated search pattern. Note, however, that this only applies to data characters; special character sequences, such as B. (? (, Which introduces a conditional sub-search pattern, cannot contain whitespace characters.
A.Forces the search pattern to be anchored; that is, the search pattern only starts at the beginning of haystack can be found. This option is basically the same as anchoring the search pattern explicitly with "^".
D.Forces the dollar sign ($) to be a match at the very end of haystack to find even if the last element of haystack is a line break. If this option is not present, $ will match immediately before the last newline (if any). Note: This option is ignored if the "m" option is present.
JAllows multiple named sub-search patterns with the same name. This option can be useful for search patterns in which only one of a collection of sub-search patterns with the same name can match. Note: If more than one instance of a particular name matches something, only the leftmost one is saved. In addition, variable names are not case sensitive.
UUngreedy mode. Bring the quantifiers *, ?, + and {min, max} to use as few characters as possible to match and leave the remaining characters to the next part of the search pattern. If the "U" option is not active, a single quantifier can be made ungreedy by adding a question mark after it. At activated "U" option does the question mark exactly the opposite - it makes a single quantifier greedy.
XPCRE_EXTRA. Enables PCRE features that are incompatible with Perl. There is currently only one feature of this type - it causes any backslash followed by a letter with no special function to cause the match to fail and the ErrorLevel to be set accordingly. This option helps reserve unused PCRE escape sequences for future use. If you omit this option, the backslash in front of a letter is simply ignored with no special function (e.g. both \ g and g are recognized as normal g). The backslash before a non-alphabetic character with no special function is always ignored, regardless of this option (e.g. both \ / and / are recognized as normal slashes).
P.Position mode. Causes RegExMatch () to return the position and length of the match and its substring, not its substring found. For more information, see OutputVar.
OObject mode. [v1.1.05 +]: Causes RegExMatch () to save all information about the match and its partial search pattern as a match object in OutputVar. For more information, see OutputVar.
S.Analyzes the search pattern to try to improve its performance. This is useful when you want to run a particular search pattern (especially a complex one) many times. When PCRE has found a way to improve performance, PCRE will cache this discovery along with the search pattern for later use on subsequent search patterns of the same kind (these search patterns should also contain the S option because they are only cached can be found if the option letters are present and in the same order).
C.Activates the auto callout mode. For more information, see Regular Expression Callouts.
`nChanges from the standard line break (`r`n) to a single LF character (` n), which is the standard line break on UNIX systems. The line break character you choose affects the behavior of anchors (^ and $) and period placeholders.
`rChanges from the standard newline (`r`n) to a single CR character (` r).
`a[v1.0.46.06 +]: `a recognizes any kind of line break, more precisely` r, `n,` r`n, `v / VT / vertical tab / chr (0xB),` f / FF / formfeed / chr (0xC) and NEL / next-line / chr (0x85). [v1.0.47.05 +]: To restrict line breaks to CR, LF and CRLF only, enter (* ANYCRLF) in capital letters at the beginning of a search string (after the options); z. B. im) (* ANYCRLF) ^ abc $.

Note: Optionally, spaces and tabs can be used to separate the options.

Commonly used symbols and syntax

elementdescription
.By default the placeholder for any single character, except for the CR character (`r) for a line break (` r`n), but this can be done with the options DotAll (s), LF (`n), CR (` r), `a or (* ANYCRLF) can be changed. from. for example, abc, abz, and ab_ are found in.
*

The placeholder for 0 or more occurrences of the previous element (character, class, or sub-search pattern). For example, a * is found in ab and aaab, but also in a string that does not contain an "a" at all.

Point-star placeholder:. * is one of the most tolerant wildcards - it finds 0 or more occurrences of one any Character (except for line breaks: `r and` n). For example, abc. * 123 would be found in both abcSomething123 and abc123.

?The placeholder for 0 or 1 occurrences of the previous element (character, class, or sub-search pattern). Or also: "The preceding element is optional". For example, color is found in both color and color because the "u" is optional.
+The placeholder for 1 or more occurrences of the previous element (character, class, or sub-search pattern). For example, a + is found in ab and aaab. In contrast to a * and a? a + is not found in a string that does not contain an "a" at all.
{min, max}

The placeholder for min to Max Occurrence of the previous element (character, class, or sub-search pattern). For example, a {1,2} is found in ab and aaab.

{3} on the other hand means that exactly 3 occurrences are found, and {3,} means 3 or more occurrences are found. Note: The specified numbers must be less than 65536, and the first number must not be greater than the second number.

[...]

Character classes: The placeholder for a character that is defined in the square brackets directly or via the character range. For example, [abc] means "a character that is either a, b, or c". A hyphen can be used to define an area; For example, [a-z] means "a character from a to z". Character lists and ranges can be combined; For example, [a-zA-Z0-9_] means "a character that is alphanumeric or an underscore".

After a character class, *, ?, + or {min, max} respectively. For example, [0-9] + finds 1 or more occurrences of any digit; such as B. in xyz123, but not in abcxyz.

Furthermore, you can use predefined character areas (POSIX) in the form of [[: xxx:]] specify; xxx is one of the following words: alnum, alpha, ascii (0-127), blank (space or tabulator), cntrl (control character), digit (0-9), xdigit (hexadecimal digits), print, graph (print without space), punct, lower, upper, space (whitespace), word (same as \ w).

In a character class, only characters that have a special function within a class need to be escaped; z. B. [\ ^ a], [a \ -b], [a \]] and [\ a].

[^...]The placeholder for a character that Not is defined in the square brackets directly or via the drawing area. For example, [^ /] * finds 0 or more occurrences of any character that contains no Slash is, such as B. http: //. For example, [^ 0-9xyz] matches a character that is neither a digit nor the letter x, y, or z.
\ dThe placeholder for a digit (corresponds to class [0-9]). Big \ D however, the placeholder for a character is the no Digit is. This and the two lower placeholders can be used within a class; For example, [\ d.-] means "a digit, a period or a minus sign".
\ sThe placeholder for a space character, i.e. spaces, tabs, CR characters (`r) and LF characters (` n). Big \ S however, the placeholder for a character is the no Whitespace character is.
\ wThe placeholder for a character that is alphanumeric or an underscore. Corresponds to class [a-zA-Z0-9_]. Big \ W however, the placeholder for a character is the no is an alphanumeric character or an underscore.
^
$

There are circumflex (^) and dollar signs ($) anchor called and do not consume characters; instead, they anchor the search pattern at the beginning or end of the text to be searched.

^ is usually specified at the beginning of a search string to ensure that the match occurs at the very beginning of a line. For example, ^ abc is found in abc123 but not in 123abc.

$ is usually specified at the end of a search string to ensure that the match occurs at the very end of a line. For example, abc $ would be found in 123abc, but not abc123.

Both anchors can be combined. For example, ^ abc $ is only found in abc, not 123abc or abc123.

If the text to be searched contains multiple lines, you can use the "m" option to ensure that the anchors apply to each line rather than the entire text. m) ^ abc $ for example is found in 123`r`nabc`r`n789. Without the "m" option, this would not be the case.

\ b\ b means "word boundary" and acts like an anchor because it does not use any characters. It assumes that the status of the current character as a word character (\ w) is the opposite of the status of the previous character. It is usually used to prevent the word you are looking for from being found within another word. For example, \ bcat \ b will not be found in catfish, but will be found in cat regardless of the punctuation or whitespace around it. Big \ B does exactly the opposite: it assumes that the current character is Not is on a word boundary.
|The vertical bar separates two or more alternatives. A match is made if a the alternatives apply. For example, gray | gray is found in both gray and gray. gr (a | e) y does the same thing, but you have to use the round brackets described below.
(...)

Parenthesized elements are often used to do the following:

  • Determine the order of evaluation. For example, (Sun | Mon | Tues | Wednes | Thurs | Fri | Satur) day finds the English name of each day.
  • *, ?, + or {min, max} on several Apply characters. For example, (abc) + finds 1 or more occurrences of the string "abc"; such as B. in abcabc123, but not in ab123 or bc123.
  • Capture a partial search pattern, such as the dot-star wildcard in abc (. *) Xyz. In this case, the RegExMatch function stores the substring found by each partial search pattern in the output array. The RegExReplace function, on the other hand, can use back references such as $ 1 to insert the substring found by each partial search pattern back into the result. To use the parentheses without the side effect of capturing a partial search pattern, type for the first two characters inside the parentheses ?: at; for example: (?:.*)
  • Change options on the fly. (? im), for example, switches on the options "not case sensitive" and "multiline" for the remaining part of the search pattern (or partial search pattern). (? -im), on the other hand, would switch off both options. All options except DPS`r`n`a are supported.
\ t
\ r
etc.

These escape sequences represent special characters. The most common are \ t (Tab), \ r (CR symbol) and \ n (LF sign). AutoHotkey can optionally use an accent (`) instead of the backslash in such cases. Escape sequences in the form of \ xhh are also supported - hh is the hexadecimal code of any ANSI character between 00 and FF.

[v1.0.46.06 +]: \ R means "a line break of any kind", i.e. those that are listed under the `a option (within a character class, \ R is only treated as a normal" R "). [v1.0.47.05 +]: To limit \ R to CR, LF, and CRLF, enter (* BSR_ANYCRLF) in capital letters at the beginning of a search pattern (after the options); z. B. im) (* BSR_ANYCRLF) abc \ Rxyz

\ p {xx}
\ P {xx}
\ X

[AHK_L 61+]: Unicode properties. Does not work in ANSI versions. \ p {xx} finds a character with the xx property while \ P {xx} any character without finds the xx property. For example \ pL matches any letter and \ p {Lu} any capital letter. \ X finds any number of characters that form an extended Unicode sequence.

A complete list of all supported property names and further details can be found at www.pcre.org/pcre.txt using search terms such as "\ p {xx}".

(* UCP)

[AHK_L 61+]: For performance reasons, \ d, \ D, \ s, \ S, \ w, \ W, \ b and \ B only recognize ASCII characters by default, even in Unicode versions. If the search pattern starts with (* UCP) starts, Unicode properties are used to find characters. For example, \ w would then be equivalent to [\ p {L} \ p {N} _] and \ d would be equivalent to \ p {Nd}.

Greed: By default, the quantifiers try *, ?, +, and {min, max}, as many characters as possible involved to find a match. To this behavior on as few characters as possible add a question mark after the quantifiers. For example, the search pattern <. +> (Which does not contain a question mark) means: "Search for a <, followed by 1 or more characters, followed by a>". To prevent the search pattern from using the complete String <em> text > find, add a question mark after the plus sign: <. +?>. This means that the match ends with the first '>' and, accordingly, only the first HTML tag <em> Is found.

Forward-looking and retrospective claims: The groups (? = ...), (?! ...), (? <= ...) and (? Allegations and require that a condition must be met. They do not consume any characters. For example, abc (? =. * Xyz) is a predictive assertion that requires that the string xyz appear somewhere to the right of the string abc (if it is not there, the entire search pattern is considered a mismatch). (? = ...) is a positive predictive assertion because it requires that a certain search pattern must exist. (?! ...) on the other hand is one negative predictive assertion because it requires a specific search pattern absence got to. (? <= ...) and (? looking back Claims because they are after Left and don't look to the right of your current position. Backward-looking statements are more restrictive than forward-looking statements because they don't support quantifiers of variable size, such as B. *, ? and +. The escape sequence \ K is comparable to a retrospective assertion in that it causes all previously found characters to be omitted from the final found string. For example, foo \ Kbar will find "foobar" but report that it found "bar".

See also: RegExMatch (), RegExReplace () and SetTitleMatchMode RegEx.

Final remark: Note that this page only contains RegEx features that are used frequently. Other features such as B. conditional partial search patterns are completely missing. The complete PCRE operating instructions can be found at www.pcre.org/pcre.txt (English).