# node-re2 > Node.js bindings for RE2: a fast, safe alternative to backtracking regular expression engines. Drop-in RegExp replacement that prevents ReDoS (Regular Expression Denial of Service). Works with strings and Buffers. C++ native addon built with node-gyp and nan. - Drop-in replacement for RegExp with linear-time matching guarantee - Prevents ReDoS by disallowing backreferences and lookahead assertions - Full Unicode mode (always on) - Buffer support for high-performance binary/UTF-8 processing - Named capture groups - Symbol-based methods (Symbol.match, Symbol.search, Symbol.replace, Symbol.split, Symbol.matchAll) - RE2.Set for multi-pattern matching - Prebuilt binaries for Linux, macOS, Windows (x64 + arm64) - TypeScript declarations included ## Install ```bash npm install re2 ``` Prebuilt native binaries are downloaded automatically. Falls back to building from source via node-gyp if no prebuilt is available. Both paths run in re2's install script. Under npm 12+ defaults (July 2026), install scripts require approval in the consuming project's `package.json`: run `npm pkg set allowScripts.re2=true --json` before `npm install re2`, otherwise the install fails with `ESTRICTALLOWSCRIPTS`. npm 11.16+ runs the script but prints a warning until approved. ## Quick start ```js const RE2 = require('re2'); // Create and use like RegExp const re = new RE2('a(b*)', 'i'); const result = re.exec('aBbC'); console.log(result[0]); // "aBb" console.log(result[1]); // "Bb" // Works with ES6 string methods 'hello world'.match(new RE2('\\w+', 'g')); // ['hello', 'world'] 'hello world'.replace(new RE2('world'), 'RE2'); // 'hello RE2' ``` ## Importing ```js // CommonJS const RE2 = require('re2'); // ESM import { RE2 } from 're2'; ``` ## Construction `new RE2(pattern[, flags])` or `RE2(pattern[, flags])` (factory mode). Pattern can be: - **String**: `new RE2('\\d+')` - **String with flags**: `new RE2('\\d+', 'gi')` - **RegExp**: `new RE2(/ab*/ig)` — copies pattern and flags. - **RE2**: `new RE2(existingRE2)` — copies pattern and flags. - **Buffer**: `new RE2(Buffer.from('pattern'))` — pattern from UTF-8 buffer. Supported flags: - `g` — global (find all matches) - `i` — ignoreCase - `m` — multiline (`^`/`$` match line boundaries) - `s` — dotAll (`.` matches `\n`) - `u` — unicode (always on, added implicitly) - `y` — sticky (match at lastIndex only) - `d` — hasIndices (include index info for capture groups) Invalid patterns throw `SyntaxError`. Patterns with backreferences or lookahead throw `SyntaxError`. ## Properties ### Instance properties - `re.source` (string) — the pattern string, escaped for use in `new RE2(re.source)` or `new RegExp(re.source)`. - `re.flags` (string) — the flags string (e.g., `'giu'`). - `re.lastIndex` (number) — the index at which to start the next match (used with `g` or `y` flags). - `re.global` (boolean) — whether the `g` flag is set. - `re.ignoreCase` (boolean) — whether the `i` flag is set. - `re.multiline` (boolean) — whether the `m` flag is set. - `re.dotAll` (boolean) — whether the `s` flag is set. - `re.unicode` (boolean) — always `true` (RE2 always operates in Unicode mode). - `re.sticky` (boolean) — whether the `y` flag is set. - `re.hasIndices` (boolean) — whether the `d` flag is set. - `re.internalSource` (string) — the RE2-translated pattern (for debugging; may differ from `source`). ### Static properties - `RE2.unicodeWarningLevel` (string) — controls behavior when a non-Unicode regexp is created: - `'nothing'` (default) — silently add `u` flag. - `'warnOnce'` — warn once, then silently add `u`. Assigning resets the one-time flag. - `'warn'` — warn every time. - `'throw'` — throw `SyntaxError` every time. ## RegExp methods ### re.exec(str) Executes a search for a match. Returns a result array or `null`. ```js const re = new RE2('a(b+)', 'g'); const result = re.exec('abbc abbc'); // result[0] === 'abb' // result[1] === 'bb' // result.index === 0 // result.input === 'abbc abbc' // re.lastIndex === 3 ``` With `d` flag (hasIndices), result has `.indices` property with `[start, end]` pairs for each group. With `g` or `y` flag, advances `lastIndex`. Call repeatedly to iterate matches. ### re.test(str) Returns `true` if the pattern matches, `false` otherwise. ```js new RE2('\\d+').test('abc123'); // true new RE2('\\d+').test('abcdef'); // false ``` With `g` or `y` flag, advances `lastIndex`. ### re.toString() Returns `'/pattern/flags'` string representation. ```js new RE2('abc', 'gi').toString(); // '/abc/giu' ``` ## String methods (via Symbol) RE2 instances implement well-known symbols, so they work directly with ES6 string methods: ### str.match(re) / re[Symbol.match](str) ```js 'test 123 test 456'.match(new RE2('\\d+', 'g')); // ['123', '456'] 'test 123'.match(new RE2('(\\d+)')); // ['123', '123', index: 5, input: 'test 123'] ``` ### str.matchAll(re) / re[Symbol.matchAll](str) Returns an iterator of all matches (requires `g` flag). ```js const re = new RE2('\\d+', 'g'); for (const m of '1a2b3c'.matchAll(re)) { console.log(m[0]); // '1', '2', '3' } ``` ### str.search(re) / re[Symbol.search](str) Returns the index of the first match, or `-1`. ```js 'hello world'.search(new RE2('world')); // 6 ``` ### str.replace(re, replacement) / re[Symbol.replace](str, replacement) Returns a new string with matches replaced. ```js 'aabba'.replace(new RE2('b', 'g'), 'c'); // 'aacca' ``` Replacement string supports: - `$1`, `$2`, ... — numbered capture groups. - `$` — named capture groups. - `$&` — the matched substring. - `` $` `` — portion before the match. - `$'` — portion after the match. - `$$` — literal `$`. Replacement function receives `(match, ...groups, offset, input)`: ```js 'abc'.replace(new RE2('(b)'), (match, g1, offset) => `[${g1}@${offset}]`); // 'a[b@1]c' ``` ### str.split(re[, limit]) / re[Symbol.split](str[, limit]) Splits string by pattern. ```js 'a1b2c3'.split(new RE2('\\d')); // ['a', 'b', 'c', ''] 'a1b2c3'.split(new RE2('\\d'), 2); // ['a', 'b'] ``` ## String methods (direct) These are convenience methods on the RE2 instance with swapped argument order: - `re.match(str)` — equivalent to `str.match(re)`. - `re.search(str)` — equivalent to `str.search(re)`. - `re.replace(str, replacement)` — equivalent to `str.replace(re, replacement)`. - `re.split(str[, limit])` — equivalent to `str.split(re, limit)`. ```js const re = new RE2('\\d+', 'g'); re.match('test 123 test 456'); // ['123', '456'] re.search('test 123'); // 5 re.replace('test 1 and 2', 'N'); // 'test N and N' (global replaces all) re.split('a1b2c'); // ['a', 'b', 'c'] ``` ## Buffer support All methods accept Node.js Buffers (UTF-8) instead of strings. When given Buffer input, they return Buffer output. ```js const re = new RE2('матч', 'g'); const buf = Buffer.from('тест матч тест'); const result = re.exec(buf); // result[0] is a Buffer containing 'матч' in UTF-8 // result.index is in bytes (not characters) ``` Differences from string mode: - All offsets and lengths are in **bytes**, not characters. - Results contain Buffers instead of strings. - Use `buf.toString()` to convert results back to strings. ### useBuffers on replacer functions When using `re.replace(buf, replacerFn)`, the replacer receives string arguments and character offsets by default. Set `replacerFn.useBuffers = true` to receive byte offsets instead: ```js function replacer(match, offset, input) { return '<' + offset + ' bytes>'; } replacer.useBuffers = true; new RE2('б').replace(Buffer.from('абв'), replacer); ``` ## RE2.Set Multi-pattern matching — compile many patterns into a single automaton and test/match against all of them at once. Faster than testing individual patterns when the number of patterns is large. ### Constructor ```js new RE2.Set(patterns[, flagsOrOptions][, options]) ``` - `patterns` — any iterable of strings, Buffers, RegExp, or RE2 instances. - `flagsOrOptions` — optional string/Buffer with flags (apply to all patterns), or options object. - `options.anchor` — `'unanchored'` (default), `'start'`, or `'both'`. - `options.maxMem` — DFA memory budget in bytes (positive integer). Default 8 MiB; raise it when `new RE2.Set(...)` throws `"RE2.Set could not be compiled."` because the union DFA blew the budget. ```js const set = new RE2.Set([ '^/users/\\d+$', '^/posts/\\d+$', '^/api/.*$' ], 'i', {anchor: 'start'}); ``` ### set.test(str) Returns `true` if any pattern matches, `false` otherwise. ```js set.test('/users/42'); // true set.test('/unknown'); // false ``` ### set.match(str) Returns an array of indices of matching patterns, sorted ascending. Empty array if none match. ```js set.match('/users/42'); // [0] set.match('/api/users'); // [2] set.match('/unknown'); // [] ``` ### Properties - `set.size` (number) — number of patterns. - `set.source` (string) — all patterns joined with `|`. - `set.sources` (string[]) — individual pattern sources. - `set.flags` (string) — flags string. - `set.anchor` (string) — anchor mode. - `set.maxMem` (number) — effective DFA memory budget in bytes. ### set.toString() Returns `'/pattern1|pattern2|.../flags'`. ```js set.toString(); // '/^/users/\\d+$|^/posts/\\d+$|^/api/.*$/iu' ``` ## Static helpers ### RE2.getUtf8Length(str) Calculate the byte size needed to encode a UTF-16 string as UTF-8. ```js RE2.getUtf8Length('hello'); // 5 RE2.getUtf8Length('привет'); // 12 ``` ### RE2.getUtf16Length(buf) Calculate the character count needed to encode a UTF-8 buffer as a UTF-16 string. ```js RE2.getUtf16Length(Buffer.from('hello')); // 5 RE2.getUtf16Length(Buffer.from('привет')); // 6 ``` ## Named groups Named capture groups are supported: ```js const re = new RE2('(?\\d{4})-(?\\d{2})-(?\\d{2})'); const result = re.exec('2024-01-15'); result.groups.year; // '2024' result.groups.month; // '01' result.groups.day; // '15' ``` Named backreferences in replacement strings: ```js '2024-01-15'.replace( new RE2('(?\\d{4})-(?\\d{2})-(?\\d{2})'), '$/$/$' ); // '15/01/2024' ``` ## Unicode classes node-re2 accepts the same `\p{...}` escapes as JavaScript `RegExp` with the `u` flag. The MDN reference at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape is the canonical spec for what's accepted. ```js // General_Category — long and short names new RE2('\\p{Letter}+'); // \p{L}+ new RE2('\\p{Number}+'); // \p{N}+ new RE2('\\p{gc=Letter}+'); // gc= and General_Category= prefixes new RE2('\\p{General_Category=Letter}+'); // Script and Script_Extensions new RE2('\\p{Script=Latin}+'); // RE2 native new RE2('\\p{sc=Cyrillic}+'); new RE2('\\p{Script_Extensions=Hani}+'); // expanded inline new RE2('\\p{scx=Latn}+'); // ISO 15924 short code // Binary properties — full ECMAScript set new RE2('\\p{Alphabetic}+').test('héllo'); // true new RE2('\\p{ASCII}+').test('Hi!'); // true new RE2('\\p{ID_Start}\\p{ID_Continue}*').test('x1'); // true new RE2('\\p{White_Space}+').test(' \\t\\n'); // true new RE2('\\p{Emoji}').test('😀'); // true new RE2('\\p{Math}').test('∑'); // true // Short aliases from PropertyAliases.txt new RE2('\\p{Alpha}+'); // == Alphabetic new RE2('\\p{Hex}+'); // == Hex_Digit new RE2('\\p{Lower}+'); // == Lowercase // Negation and use inside character classes new RE2('\\P{ASCII}+'); // non-ASCII new RE2('[\\p{L}\\p{Emoji}]+'); // letters or emoji new RE2('[^\\p{ASCII}]+'); // negated inside class ``` **Not supported:** *Properties of Strings* (`\p{Basic_Emoji}`, `\p{RGI_Emoji}`, etc.). These match multi-codepoint sequences and require the `v` flag, which RE2 does not model. Trying to use one throws a syntax error at compile time. Tables are baked in from Unicode 17.0 (devDependency `@unicode/unicode-17.0.0`). Bump the package and run `node scripts/gen-unicode-properties.mjs` to target a newer Unicode version. ## Limitations RE2 does **not** support: - **Backreferences** (`\1`, `\2`, etc.) — throw `SyntaxError`. - **Lookahead assertions** (`(?=...)`, `(?!...)`) — throw `SyntaxError`. - **Lookbehind assertions** (`(?<=...)`, `(? 0 ? matches[0] : -1; } findRoute('/users/42'); // 0 findRoute('/posts/7'); // 1 findRoute('/api/v2/foo'); // 2 findRoute('/unknown'); // -1 ``` ### Validate user-supplied patterns safely ```js const RE2 = require('re2'); function safeMatch(input, pattern, flags) { try { const re = new RE2(pattern, flags); return re.test(input); } catch (e) { return false; // invalid pattern } } ``` ## TypeScript ```ts import RE2 from 're2'; const re: RE2 = new RE2('\\d+', 'g'); const result: RegExpExecArray | null = re.exec('test 123'); // Buffer overloads const bufResult: RE2BufferExecArray | null = re.exec(Buffer.from('test 123')); // RE2.Set const set: RE2Set = new RE2.Set(['a', 'b'], 'i'); const matches: number[] = set.match('abc'); ``` ## Project structure notes - Entry point: `re2.js` (loads native addon), types: `re2.d.ts`. - C++ addon source: `lib/*.cc`, `lib/*.h`. - Tests: `tests/test-*.mjs` (runtime), `ts-tests/test-*.ts` (type-checking). - Vendored dependencies: `vendor/re2/`, `vendor/abseil-cpp/` (git submodules) — **never modify files under `vendor/`**. ## Links - Docs: https://github.com/uhop/node-re2/wiki - npm: https://www.npmjs.com/package/re2 - Repository: https://github.com/uhop/node-re2 - RE2 syntax: https://github.com/google/re2/wiki/Syntax