Regex JS

Creating a regular expression

You construct a regular expression in one of two ways:

Using a expression literal, which consists of a pattern enclosed between slashes, as follows:

            
  const re = /ab+c/;

Regular expression literals provide compilation of the regular expression when the script is loaded. If the regular expression remains constant, using this can improve performance.

Using the constructor function of the new RegExp() object, as follows:

            
  const re = new RegExp('ab+c');

Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.

MDN Web Docs

Patterns

Writing a regular expression pattern:

A regular expression pattern is composed of simple characters, such as /abc/, or a combination of simple and special characters, such as /ab*c/ or /Chapter (\d+)\.\d*/. The last example includes parentheses, which are used as a memory device. The match made with this part of the pattern is remembered for later use, as described in Using groups.

Simple patterns are constructed of characters for which you want to find a direct match. For example, the pattern /abc/ matches character combinations in strings only when the exact sequence "abc" occurs (all characters together and in that order).

MDN Web Docs

Using special characters:

When the search for a match requires something more than a direct match, such as finding one or more b's, or finding white space, you can include special characters in the pattern. For example, to match a single "a" followed by zero or more "b"s followed by "c", you'd use the pattern /ab*c/: the * after "b" means "0 or more occurrences of the preceding item." In the string "cbbabbbbcdebc", this pattern will match the substring "abbbbc".

MDN Web Docs

Using Parentheses:

Parentheses around any part of the regular expression pattern causes that part of the matched substring to be remembered. Once remembered, the substring can be recalled for other use. See Groups and ranges for more details.

MDN Web Docs

Character Classes

\

Indicates that the following character should be treated specially, or "escaped". It behaves one of two ways.

For characters that are usually treated literally, indicates that the next character is special and not to be interpreted literally. For example, /b/ matches the character "b". By placing a backslash in front of "b", that is by using /\b/, the character becomes special to mean match a word boundary.
For characters that are usually treated specially, indicates that the next character is not special and should be interpreted literally. For example, "*" is a special character that means 0 or more occurrences of the preceding character should be matched; for example, /a*/ means match 0 or more "a"s. To match * literally, precede it with a backslash; for example, /a\*/ matches "a*".

MDN Web Docs

.

Has one of the following meanings:

Matches any single character except line terminators: \n, \r, \u2028 or \u2029. For example, /.y/ matches "my" and "ay", but not "yes", in "yes make my day".
Inside a character class, the dot loses its special meaning and matches a literal dot.

Note that the m multiline flag doesn't change the dot behavior. So to match a pattern across multiple lines, the character class [^] can be used — it will match any character including newlines.

ES2018 added the s "dotAll" flag, which allows the dot to also match line terminators.

MDN Web Docs

\d

Matches any digit (Arabic numeral). Equivalent to [0-9]. For example, /\d/ or /[0-9]/ matches "2" in "B2 is the suite number".

MDN Web Docs

\D

Matches any character that is not a digit (Arabic numeral). Equivalent to [^0-9]. For example, /\D/ or /[^0-9]/ matches "B" in "B2 is the suite number".

MDN Web Docs

\w

Matches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to [A-Za-z0-9_]. For example, /\w/ matches "a" in "apple", "5" in "$5.28", "3" in "3D" and "m" in "Émanuel".

MDN Web Docs

\W

Matches any character that is not a word character from the basic Latin alphabet. Equivalent to [^A-Za-z0-9_]. For example, /\W/ or /[^A-Za-z0-9_]/ matches "%" in "50%" and "É" in "Émanuel".

MDN Web Docs

\s

Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces. Equivalent to [ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]. For example, /\s\w*/ matches " bar" in "foo bar".

MDN Web Docs

\S

Matches a single character other than white space. Equivalent to [^ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]. For example, /\S\w*/ matches "foo" in "foo bar".

MDN Web Docs

\t

Matches a horizontal tab.

MDN Web Docs

\n

Matches a linefeed.

MDN Web Docs

\r

Matches a carriage return.

MDN Web Docs

\v

Matches a vertical tab.

MDN Web Docs

\f

Matches a form-feed.

MDN Web Docs

\0

Matches a NUL character. Do not follow this with another digit.

MDN Web Docs

\cX

Matches a control character using caret notation, where "X" is a letter from A-Z (corresponding to codepoints U+0001-U+001A). For example, /\cM\cJ/ matches "\r\n".

MDN Web Docs

\xhh

Matches the character with the code hh (two hexadecimal digits).

MDN Web Docs

\uhhhh

Matches a UTF-16 code-unit with the value hhhh (four hexadecimal digits).

MDN Web Docs

\u{hhhh} or \u{hhhhh}

(Only when the u flag is set.) Matches the character with the Unicode value U+hhhh or U+hhhhh (hexadecimal digits).

MDN Web Docs

[\b]

Matches a backspace. If you're looking for the word-boundary character (\b), see Assertions.

MDN Web Docs

Assertions

Assertions include boundaries, which indicate the beginnings and endings of lines and words, and other patterns indicating in some way that a match is possible (including look-ahead, look-behind, and conditional expressions).

^

Matches the beginning of input. If the multiline flag is set to true, also matches immediately after a line break character. For example, /^A/ does not match the "A" in "an A", but does match the first "A" in "An A".

MDN Web Docs

$

Matches the end of input. If the multiline flag is set to true, also matches immediately before a line break character. For example, /t$/ does not match the "t" in "eater", but does match it in "eat".

MDN Web Docs

\b

Matches a word boundary. This is the position where a word character is not followed or preceded by another word-character, such as between a letter and a space. Note that a matched word boundary is not included in the match. In other words, the length of a matched word boundary is zero.

Examples:

/\bm/ matches the "m" in "moon".
/oo\b/ does not match the "oo" in "moon", because "oo" is followed by "n" which is a word character.
/oon\b/ matches the "oon" in "moon", because "oon" is the end of the string, thus not followed by a word character.
/\w\b\w/ will never match anything, because a word character can never be followed by both a non-word and a word character.

MDN Web Docs

\B

Matches a non-word boundary. This is a position where the previous and next character are of the same type: Either both must be words, or both must be non-words, for example between two letters or between two spaces. The beginning and end of a string are considered non-words. Same as the matched word boundary, the matched non-word boundary is also not included in the match. For example, /\Bon/ matches "on" in "at noon", and /ye\B/ matches "ye" in "possibly yesterday".

MDN Web Docs

x(?=y)

Lookahead assertion: Matches "x" only if "x" is followed by "y". For example, /Jack(?=Sprat)/ matches "Jack" only if it is followed by "Sprat". /Jack(?=Sprat|Frost)/ matches "Jack" only if it is followed by "Sprat" or "Frost". However, neither "Sprat" nor "Frost" is part of the match results.

MDN Web Docs

x(?!y)

Negative lookahead assertion: Matches "x" only if "x" is not followed by "y". For example, /\d+(?!\.)/ matches a number only if it is not followed by a decimal point. /\d+(?!\.)/.exec('3.141') matches "141" but not "3".

MDN Web Docs

(?<=y)x

Lookbehind assertion: Matches "x" only if "x" is preceded by "y". For example, /(?<=Jack)Sprat/ matches "Sprat" only if it is preceded by "Jack". /(?<=Jack|Tom)Sprat/ matches "Sprat" only if it is preceded by "Jack" or "Tom". However, neither "Jack" nor "Tom" is part of the match results.

MDN Web Docs

(?<!y)x

Negative lookbehind assertion: Matches "x" only if "x" is not preceded by "y". For example, /(?<!-)\d+/ matches a number only if it is not preceded by a minus sign. /(?<!-)\d+/.exec('3') matches "3". /(?<!-)\d+/.exec('-3') match is not found because the number is preceded by the minus sign.

MDN Web Docs

Groups & Ranges

Groups and ranges indicate groups and ranges of expression characters.

x|y

Matches either "x" or "y". For example, /green|red/ matches "green" in "green apple" and "red" in "red apple".

MDN Web Docs

[xyz][a-c]

A character class. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets it is taken as a literal hyphen to be included in the character class as a normal character.

For example, [abcd] is the same as [a-d]. They match the "b" in "brisket", and the "c" in "chop".

For example, [abcd-] and [-abcd] match the "b" in "brisket", the "c" in "chop", and the "-" (hyphen) in "non-profit".

For example, [\w-] is the same as [A-Za-z0-9_-]. They both match the "b" in "brisket", the "c" in "chop", and the "n" in "non-profit".

MDN Web Docs

[^xyz][^a-c]

A negated or complemented character class. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets it is taken as a literal hyphen to be included in the character class as a normal character. For example, [^abc] is the same as [^a-c]. They initially match "o" in "bacon" and "h" in "chop".

MDN Web Docs

(x)

Capturing group: Matches x and remembers the match. For example, /(foo)/ matches and remembers "foo" in "foo bar".

A regular expression may have multiple capturing groups. In results, matches to capturing groups typically in an array whose members are in the same order as the left parentheses in the capturing group. This is usually just the order of the capturing groups themselves. This becomes important when capturing groups are nested. Matches are accessed using the index of the result's elements ([1], ..., [n]) or from the predefined RegExp object's properties ($1, ..., $9).

Capturing groups have a performance penalty. If you don't need the matched substring to be recalled, prefer non-capturing parentheses (see below).

String.match() won't return groups if the /.../g flag is set. However, you can still use String.matchAll() to get all matches.

MDN Web Docs

\n

Where "n" is a positive integer. A back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses). For example, /apple(,)\sorange\1/ matches "apple, orange," in "apple, orange, cherry, peach".

MDN Web Docs

\k<Name>

A back reference to the last substring matching the Named capture group specified by <Name>.

For example, /(?<title>\w+), yes \k<title>/ matches "Sir, yes Sir" in "Do you copy? Sir, yes Sir!".

MDN Web Docs

\?<Name>x

Named capturing group: Matches "x" and stores it on the groups property of the returned matches under the name specified by <Name>. The angle brackets (< and >) are required for group name.

For example, to extract the United States area code from a phone number, we could use /$(?<area>\d\d\d)$/. The resulting number would appear under matches.groups.area.

MDN WebbDocs

(?:x)

Non-capturing group: Matches "x" but does not remember the match. The matched substring cannot be recalled from the resulting array's elements ([1], ..., [n]) or from the predefined RegExp object's properties ($1, ..., $9).

MDN Web Docs

Quantifiers

Quantifiers indicate numbers of characters or expressions to match.

x*

Matches the preceding item "x" 0 or more times. For example, /bo*/ matches "boooo" in "A ghost booooed" and "b" in "A bird warbled", but nothing in "A goat grunted".

MDN Web Docs

x+

Matches the preceding item "x" 1 or more times. Equivalent to {1,}. For example, /a+/ matches the "a" in "candy" and all the "a"'s in "caaaaaaandy".

MDN Web Docs

x?

Matches the preceding item "x" 0 or 1 times. For example, /e?le?/ matches the "el" in "angel" and the "le" in "angle."

If used immediately after any of the quantifiers *, +, ?, or {}, makes the quantifier non-greedy (matching the minimum number of times), as opposed to the default, which is greedy (matching the maximum number of times).

MDN Web Docs

x{n}

Where "n" is a positive integer, matches exactly "n" occurrences of the preceding item "x". For example, /a{2}/ doesn't match the "a" in "candy", but it matches all of the "a"'s in "caandy", and the first two "a"'s in "caaandy".

MDN Web Docs

x{n,}

Where "n" is a positive integer, matches at least "n" occurrences of the preceding item "x". For example, /a{2,}/ doesn't match the "a" in "candy", but matches all of the a's in "caandy" and in "caaaaaaandy".

MDN Web Docs

x{n,m}

Where "n" is 0 or a positive integer, "m" is a positive integer, and m > n, matches at least "n" and at most "m" occurrences of the preceding item "x". For example, /a{1,3}/ matches nothing in "cndy", the "a" in "candy", the two "a"'s in "caandy", and the first three "a"'s in "caaaaaaandy". Notice that when matching "caaaaaaandy", the match is "aaa", even though the original string had more "a"s in it.

MDN Web Docs

x*?, x+?, x??, x{n}?, x{n,}?, x{n,m}?

By default quantifiers like * and + are "greedy", meaning that they try to match as much of the string as possible. The ? character after the quantifier makes the quantifier "non-greedy": meaning that it will stop as soon as it finds a match. For example, given a string like "some <foo> <bar> new </bar> </foo> thing":

/<.*>/ will match "<foo> <bar> new </bar> </foo>"
/<.*?>/ will match "<foo>"

MDN Web Docs

Unicode Escapes

Before ES2018 there was no performance-efficient way to match characters from different sets based on scripts (like Macedonian, Greek, Georgian etc.) or propertyName (like Emoji etc) in JavaScript. Check out tc39 Proposal on Unicode Property Escapes for more info.

Unicode property escapes Regular Expressions allows for matching characters based on their Unicode properties. A character is described by several properties which are either binary ("boolean-like") or non-binary. For instance, unicode property escapes can be used to match emojis, punctuations, letters (even letters from specific languages or scripts), etc.

Note:

For Unicode property escapes to work, a regular expression must use the u flag which indicates a string must be considered as a series of Unicode code points. See also RegExp.prototype.unicode.

MDN Web Docs

Note:

Some Unicode properties encompasses many more characters than some character classes (such as \w which matches only latin letters, a to z) but the latter is better supported among browsers (as of January 2020).

MDN Web Docs

Syntax

            
              // Non-binary values
              \p{UnicodePropertyValue}
              \p{UnicodePropertyName=UnicodePropertyValue}
              
              // Binary and non-binary values
              \p{UnicodeBinaryPropertyName}
              
              // Negation: \P is negated \p
              \P{UnicodePropertyValue}
              \P{UnicodeBinaryPropertyName}

MDN Web Docs

General categories

General categories are used to classify Unicode characters and subcategories are available to define a more precise categorization. It is possible to use both short or long forms in Unicode property escapes. They can be used to match letters, numbers, symbols, punctuations, spaces, etc. For a more exhaustive list of general categories, please refer to the Unicode specification.

MDN Web Docs

          
  // finding all the letters of a text
  let story = "It's the Cheshire Cat: now I shall have somebody to talk to.";

  // Most explicit form
  story.match(/\p{General_Category=Letter}/gu);

  // It is not mandatory to use the property name for General categories
  story.match(/\p{Letter}/gu);

  // This is equivalent (short alias):
  story.match(/\p{L}/gu);

  // This is also equivalent (conjunction of all the subcategories using short aliases)
  story.match(/\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}/gu);

Unicode property escapes vs. character classes

With JavaScript regular expressions, it is also possible to use character classes and especially \w or \d to match letters or digits. However, such forms only match characters from the Latin script (in other words, a to z and A to Z for \w and 0 to 9 for \d). As shown in this example, it might be a bit clumsy to work with non Latin texts.

Unicode property escapes categories encompass much more characters and \p{Letter} or \p{Number} will work for any script.

            
              // Trying to use ranges to avoid \w limitations:
              
              const nonEnglishText = "Приключения Алисы в Стране чудес";
              const regexpBMPWord = /([\u0000-\u0019\u0021-\uFFFF])+/gu;
              // BMP goes through U+0000 to U+FFFF but space is U+0020
              
              console.table(nonEnglishText.match(regexpBMPWord));
              
              // Using Unicode property escapes instead
              const regexpUPE = /\p{L}+/gu;
              console.table(nonEnglishText.match(regexpUPE));

MDN Web Docs

Scripts and script extensions

Some languages use different scripts for their writing system. For instance, English and Spanish are written using the Latin script while Arabic and Russian are written with other scripts (respectively Arabic and Cyrillic). The Script and Script_Extensions Unicode properties allow regular expression to match characters according to the script they are mainly used with (Script) or according to the set of scripts they belong to (Script_Extensions).

For example, A belongs to the Latin script and ε to the Greek script.

            
              let mixedCharacters = "aεЛ";
              
              // Using the canonical "long" name of the script
              mixedCharacters.match(/\p{Script=Latin}/u); // a
              
              // Using a short alias for the script
              mixedCharacters.match(/\p{Script=Greek}/u); // ε
              
              // Using the short name Sc for the Script property
              mixedCharacters.match(/\p{Sc=Cyrillic}/u); // Л

MDN Web Docs

Flags

Advanced searching with flags

Regular expressions have optional flags that allow for functionality like global searching and case-insensitive searching. These flags can be used separately or together in any order, and are included as part of the regular expression.

d

Generate indices for substring matches.

RegExp.prototype.hasIndices

MDN Web Docs

g

Global search.

RegExp.prototype.global

MDN Web Docs

i

Case-insensitive search

RegExp.prototype.ignoreCase

MDN Web Docs

m

Multi-line search

RegExp.prototype.multiline

MDN Web Docs

s

Allows . to match newline characters.

RegExp.prototype.dotAll

MDN Web Docs

u

"unicode"; treat a pattern as a sequence of unicode code points.

RegExp.prototype.unicode

MDN Web Docs

uy

Perform a "sticky" search that matches starting at the current position in the target string. See sticky.

RegExp.prototype.sticky

MDN Web Docs

Methods

RegExp Methods

Regular expressions are used with the RegExp methods test() and exec()

exec()

Executes a search for a match in a string. It returns an array of information or null on a mismatch.

MDN Web Docs

test()

Tests for a match in a string. It returns true or false.

MDN Web Docs

Syntax

String Methods

Regular expressions are also used with the String methods match(), replace(), search(), and split().

match()

Returns an array containing all of the matches, including capturing groups, or null if no match is found.

MDN Web Docs

matchAll()

Returns an iterator containing all of the matches, including capturing groups.

MDN Web Docs

search()

Tests for a match in a string. It returns the index of the match, or -1 if the search fails.

MDN Web Docs

replace()

Executes a search for a match in a string, and replaces the matched substring with a replacement substring.

MDN Web Docs

replaceAll()

Executes a search for all matches in a string, and replaces the matched substrings with a replacement substring.

MDN Web Docs

split()

Uses a regular expression or a fixed string to break a string into an array of substrings.

MDN Web Docs

Syntax

MDN Web Docs

Escaping

If you need to use any of the special characters literally (actually searching for a "*", for instance), you must escape it by putting a backslash in front of it. For instance, to search for "a" followed by "*" followed by "b", you'd use /a\*b/ — the backslash "escapes" the "*", making it literal instead of special.

Similarly, if you're writing a regular expression literal and need to match a slash ("/"), you need to escape that (otherwise, it terminates the pattern). For instance, to search for the string "/example/" followed by one or more alphabetic characters, you'd use /\/example\/[a-z]+/i—the backslashes before each slash make them literal.

To match a literal backslash, you need to escape the backslash. For instance, to match the string "C:\" where "C" can be any letter, you'd use /[A-Z]:\\/ — the first backslash escapes the one after it, so the expression searches for a single literal backslash.

If using the RegExp constructor with a string literal, remember that the backslash is an escape in string literals, so to use it in the regular expression, you need to escape it at the string literal level. /a\*b/ and new RegExp("a\\*b") create the same expression, which searches for "a" followed by a literal "*" followed by "b".

If escape strings are not already part of your pattern you can add them using String.replace:

            
  function escapeRegExp(string) {
    return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); 
    // $& means the whole matched string
  }

MDN Web Docs

Welcome

Creating a regular expression

Patterns

Character Classes

Assertions

Groups & Ranges

Quantifiers

Unicode Escapes

Flags

Methods

Escaping