regex::bytes

Struct Regex

Source
pub struct Regex { /* private fields */ }
Expand description

A compiled regular expression for searching Unicode haystacks.

A Regex can be used to search haystacks, split haystacks into substrings or replace substrings in a haystack with a different substring. All searching is done with an implicit (?s:.)*? at the beginning and end of an pattern. To force an expression to match the whole string (or a prefix or a suffix), you must use an anchor like ^ or $ (or \A and \z).

Like the Regex type in the parent module, matches with this regex return byte offsets into the haystack. Unlike the parent Regex type, these byte offsets may not correspond to UTF-8 sequence boundaries since the regexes in this module can match arbitrary bytes.

The only methods that allocate new byte strings are the string replacement methods. All other methods (searching and splitting) return borrowed references into the haystack given.

§Example

Find the offsets of a US phone number:

use regex::bytes::Regex;

let re = Regex::new("[0-9]{3}-[0-9]{3}-[0-9]{4}").unwrap();
let m = re.find(b"phone: 111-222-3333").unwrap();
assert_eq!(7..19, m.range());

§Example: extracting capture groups

A common way to use regexes is with capture groups. That is, instead of just looking for matches of an entire regex, parentheses are used to create groups that represent part of the match.

For example, consider a haystack with multiple lines, and each line has three whitespace delimited fields where the second field is expected to be a number and the third field a boolean. To make this convenient, we use the Captures::extract API to put the strings that match each group into a fixed size array:

use regex::bytes::Regex;

let hay = b"
rabbit         54 true
groundhog 2 true
does not match
fox   109    false
";
let re = Regex::new(r"(?m)^\s*(\S+)\s+([0-9]+)\s+(true|false)\s*$").unwrap();
let mut fields: Vec<(&[u8], i64, bool)> = vec![];
for (_, [f1, f2, f3]) in re.captures_iter(hay).map(|caps| caps.extract()) {
    // These unwraps are OK because our pattern is written in a way where
    // all matches for f2 and f3 will be valid UTF-8.
    let f2 = std::str::from_utf8(f2).unwrap();
    let f3 = std::str::from_utf8(f3).unwrap();
    fields.push((f1, f2.parse()?, f3.parse()?));
}
assert_eq!(fields, vec![
    (&b"rabbit"[..], 54, true),
    (&b"groundhog"[..], 2, true),
    (&b"fox"[..], 109, false),
]);

§Example: matching invalid UTF-8

One of the reasons for searching &[u8] haystacks is that the &[u8] might not be valid UTF-8. Indeed, with a bytes::Regex, patterns that match invalid UTF-8 are explicitly allowed. Here’s one example that looks for valid UTF-8 fields that might be separated by invalid UTF-8. In this case, we use (?s-u:.), which matches any byte. Attempting to use it in a top-level Regex will result in the regex failing to compile. Notice also that we use . with Unicode mode enabled, in which case, only valid UTF-8 is matched. In this way, we can build one pattern where some parts only match valid UTF-8 while other parts are more permissive.

use regex::bytes::Regex;

// F0 9F 92 A9 is the UTF-8 encoding for a Pile of Poo.
let hay = b"\xFF\xFFfoo\xFF\xFF\xFF\xF0\x9F\x92\xA9\xFF";
// An equivalent to '(?s-u:.)' is '(?-u:[\x00-\xFF])'.
let re = Regex::new(r"(?s)(?-u:.)*?(?<f1>.+)(?-u:.)*?(?<f2>.+)").unwrap();
let caps = re.captures(hay).unwrap();
assert_eq!(&caps["f1"], &b"foo"[..]);
assert_eq!(&caps["f2"], "💩".as_bytes());

Implementations§

Source§

impl Regex

Core regular expression methods.

Source

pub fn new(re: &str) -> Result<Regex, Error>

Compiles a regular expression. Once compiled, it can be used repeatedly to search, split or replace substrings in a haystack.

Note that regex compilation tends to be a somewhat expensive process, and unlike higher level environments, compilation is not automatically cached for you. One should endeavor to compile a regex once and then reuse it. For example, it’s a bad idea to compile the same regex repeatedly in a loop.

§Errors

If an invalid pattern is given, then an error is returned. An error is also returned if the pattern is valid, but would produce a regex that is bigger than the configured size limit via RegexBuilder::size_limit. (A reasonable size limit is enabled by default.)

§Example
use regex::bytes::Regex;

// An Invalid pattern because of an unclosed parenthesis
assert!(Regex::new(r"foo(bar").is_err());
// An invalid pattern because the regex would be too big
// because Unicode tends to inflate things.
assert!(Regex::new(r"\w{1000}").is_err());
// Disabling Unicode can make the regex much smaller,
// potentially by up to or more than an order of magnitude.
assert!(Regex::new(r"(?-u:\w){1000}").is_ok());
Source

pub fn is_match(&self, haystack: &[u8]) -> bool

Returns true if and only if there is a match for the regex anywhere in the haystack given.

It is recommended to use this method if all you need to do is test whether a match exists, since the underlying matching engine may be able to do less work.

§Example

Test if some haystack contains at least one word with exactly 13 Unicode word characters:

use regex::bytes::Regex;

let re = Regex::new(r"\b\w{13}\b").unwrap();
let hay = b"I categorically deny having triskaidekaphobia.";
assert!(re.is_match(hay));
Source

pub fn find<'h>(&self, haystack: &'h [u8]) -> Option<Match<'h>>

This routine searches for the first match of this regex in the haystack given, and if found, returns a Match. The Match provides access to both the byte offsets of the match and the actual substring that matched.

Note that this should only be used if you want to find the entire match. If instead you just want to test the existence of a match, it’s potentially faster to use Regex::is_match(hay) instead of Regex::find(hay).is_some().

§Example

Find the first word with exactly 13 Unicode word characters:

use regex::bytes::Regex;

let re = Regex::new(r"\b\w{13}\b").unwrap();
let hay = b"I categorically deny having triskaidekaphobia.";
let mat = re.find(hay).unwrap();
assert_eq!(2..15, mat.range());
assert_eq!(b"categorically", mat.as_bytes());
Source

pub fn find_iter<'r, 'h>(&'r self, haystack: &'h [u8]) -> Matches<'r, 'h>

Returns an iterator that yields successive non-overlapping matches in the given haystack. The iterator yields values of type Match.

§Time complexity

Note that since find_iter runs potentially many searches on the haystack and since each search has worst case O(m * n) time complexity, the overall worst case time complexity for iteration is O(m * n^2).

§Example

Find every word with exactly 13 Unicode word characters:

use regex::bytes::Regex;

let re = Regex::new(r"\b\w{13}\b").unwrap();
let hay = b"Retroactively relinquishing remunerations is reprehensible.";
let matches: Vec<_> = re.find_iter(hay).map(|m| m.as_bytes()).collect();
assert_eq!(matches, vec![
    &b"Retroactively"[..],
    &b"relinquishing"[..],
    &b"remunerations"[..],
    &b"reprehensible"[..],
]);
Source

pub fn captures<'h>(&self, haystack: &'h [u8]) -> Option<Captures<'h>>

This routine searches for the first match of this regex in the haystack given, and if found, returns not only the overall match but also the matches of each capture group in the regex. If no match is found, then None is returned.

Capture group 0 always corresponds to an implicit unnamed group that includes the entire match. If a match is found, this group is always present. Subsequent groups may be named and are numbered, starting at 1, by the order in which the opening parenthesis appears in the pattern. For example, in the pattern (?<a>.(?<b>.))(?<c>.), a, b and c correspond to capture group indices 1, 2 and 3, respectively.

You should only use captures if you need access to the capture group matches. Otherwise, Regex::find is generally faster for discovering just the overall match.

§Example

Say you have some haystack with movie names and their release years, like “‘Citizen Kane’ (1941)”. It’d be nice if we could search for strings looking like that, while also extracting the movie name and its release year separately. The example below shows how to do that.

use regex::bytes::Regex;

let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap();
let hay = b"Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(hay).unwrap();
assert_eq!(caps.get(0).unwrap().as_bytes(), b"'Citizen Kane' (1941)");
assert_eq!(caps.get(1).unwrap().as_bytes(), b"Citizen Kane");
assert_eq!(caps.get(2).unwrap().as_bytes(), b"1941");
// You can also access the groups by index using the Index notation.
// Note that this will panic on an invalid index. In this case, these
// accesses are always correct because the overall regex will only
// match when these capture groups match.
assert_eq!(&caps[0], b"'Citizen Kane' (1941)");
assert_eq!(&caps[1], b"Citizen Kane");
assert_eq!(&caps[2], b"1941");

Note that the full match is at capture group 0. Each subsequent capture group is indexed by the order of its opening (.

We can make this example a bit clearer by using named capture groups:

use regex::bytes::Regex;

let re = Regex::new(r"'(?<title>[^']+)'\s+\((?<year>\d{4})\)").unwrap();
let hay = b"Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(hay).unwrap();
assert_eq!(caps.get(0).unwrap().as_bytes(), b"'Citizen Kane' (1941)");
assert_eq!(caps.name("title").unwrap().as_bytes(), b"Citizen Kane");
assert_eq!(caps.name("year").unwrap().as_bytes(), b"1941");
// You can also access the groups by name using the Index notation.
// Note that this will panic on an invalid group name. In this case,
// these accesses are always correct because the overall regex will
// only match when these capture groups match.
assert_eq!(&caps[0], b"'Citizen Kane' (1941)");
assert_eq!(&caps["title"], b"Citizen Kane");
assert_eq!(&caps["year"], b"1941");

Here we name the capture groups, which we can access with the name method or the Index notation with a &str. Note that the named capture groups are still accessible with get or the Index notation with a usize.

The 0th capture group is always unnamed, so it must always be accessed with get(0) or [0].

Finally, one other way to to get the matched substrings is with the Captures::extract API:

use regex::bytes::Regex;

let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap();
let hay = b"Not my favorite movie: 'Citizen Kane' (1941).";
let (full, [title, year]) = re.captures(hay).unwrap().extract();
assert_eq!(full, b"'Citizen Kane' (1941)");
assert_eq!(title, b"Citizen Kane");
assert_eq!(year, b"1941");
Source

pub fn captures_iter<'r, 'h>( &'r self, haystack: &'h [u8], ) -> CaptureMatches<'r, 'h>

Returns an iterator that yields successive non-overlapping matches in the given haystack. The iterator yields values of type Captures.

This is the same as Regex::find_iter, but instead of only providing access to the overall match, each value yield includes access to the matches of all capture groups in the regex. Reporting this extra match data is potentially costly, so callers should only use captures_iter over find_iter when they actually need access to the capture group matches.

§Time complexity

Note that since captures_iter runs potentially many searches on the haystack and since each search has worst case O(m * n) time complexity, the overall worst case time complexity for iteration is O(m * n^2).

§Example

We can use this to find all movie titles and their release years in some haystack, where the movie is formatted like “‘Title’ (xxxx)”:

use regex::bytes::Regex;

let re = Regex::new(r"'([^']+)'\s+\(([0-9]{4})\)").unwrap();
let hay = b"'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931).";
let mut movies = vec![];
for (_, [title, year]) in re.captures_iter(hay).map(|c| c.extract()) {
    // OK because [0-9]{4} can only match valid UTF-8.
    let year = std::str::from_utf8(year).unwrap();
    movies.push((title, year.parse::<i64>()?));
}
assert_eq!(movies, vec![
    (&b"Citizen Kane"[..], 1941),
    (&b"The Wizard of Oz"[..], 1939),
    (&b"M"[..], 1931),
]);

Or with named groups:

use regex::bytes::Regex;

let re = Regex::new(r"'(?<title>[^']+)'\s+\((?<year>[0-9]{4})\)").unwrap();
let hay = b"'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931).";
let mut it = re.captures_iter(hay);

let caps = it.next().unwrap();
assert_eq!(&caps["title"], b"Citizen Kane");
assert_eq!(&caps["year"], b"1941");

let caps = it.next().unwrap();
assert_eq!(&caps["title"], b"The Wizard of Oz");
assert_eq!(&caps["year"], b"1939");

let caps = it.next().unwrap();
assert_eq!(&caps["title"], b"M");
assert_eq!(&caps["year"], b"1931");
Source

pub fn split<'r, 'h>(&'r self, haystack: &'h [u8]) -> Split<'r, 'h>

Returns an iterator of substrings of the haystack given, delimited by a match of the regex. Namely, each element of the iterator corresponds to a part of the haystack that isn’t matched by the regular expression.

§Time complexity

Since iterators over all matches requires running potentially many searches on the haystack, and since each search has worst case O(m * n) time complexity, the overall worst case time complexity for this routine is O(m * n^2).

§Example

To split a string delimited by arbitrary amounts of spaces or tabs:

use regex::bytes::Regex;

let re = Regex::new(r"[ \t]+").unwrap();
let hay = b"a b \t  c\td    e";
let fields: Vec<&[u8]> = re.split(hay).collect();
assert_eq!(fields, vec![
    &b"a"[..], &b"b"[..], &b"c"[..], &b"d"[..], &b"e"[..],
]);
§Example: more cases

Basic usage:

use regex::bytes::Regex;

let re = Regex::new(r" ").unwrap();
let hay = b"Mary had a little lamb";
let got: Vec<&[u8]> = re.split(hay).collect();
assert_eq!(got, vec![
    &b"Mary"[..], &b"had"[..], &b"a"[..], &b"little"[..], &b"lamb"[..],
]);

let re = Regex::new(r"X").unwrap();
let hay = b"";
let got: Vec<&[u8]> = re.split(hay).collect();
assert_eq!(got, vec![&b""[..]]);

let re = Regex::new(r"X").unwrap();
let hay = b"lionXXtigerXleopard";
let got: Vec<&[u8]> = re.split(hay).collect();
assert_eq!(got, vec![
    &b"lion"[..], &b""[..], &b"tiger"[..], &b"leopard"[..],
]);

let re = Regex::new(r"::").unwrap();
let hay = b"lion::tiger::leopard";
let got: Vec<&[u8]> = re.split(hay).collect();
assert_eq!(got, vec![&b"lion"[..], &b"tiger"[..], &b"leopard"[..]]);

If a haystack contains multiple contiguous matches, you will end up with empty spans yielded by the iterator:

use regex::bytes::Regex;

let re = Regex::new(r"X").unwrap();
let hay = b"XXXXaXXbXc";
let got: Vec<&[u8]> = re.split(hay).collect();
assert_eq!(got, vec![
    &b""[..], &b""[..], &b""[..], &b""[..],
    &b"a"[..], &b""[..], &b"b"[..], &b"c"[..],
]);

let re = Regex::new(r"/").unwrap();
let hay = b"(///)";
let got: Vec<&[u8]> = re.split(hay).collect();
assert_eq!(got, vec![&b"("[..], &b""[..], &b""[..], &b")"[..]]);

Separators at the start or end of a haystack are neighbored by empty substring.

use regex::bytes::Regex;

let re = Regex::new(r"0").unwrap();
let hay = b"010";
let got: Vec<&[u8]> = re.split(hay).collect();
assert_eq!(got, vec![&b""[..], &b"1"[..], &b""[..]]);

When the regex can match the empty string, it splits at every byte position in the haystack. This includes between all UTF-8 code units. (The top-level Regex::split will only split at valid UTF-8 boundaries.)

use regex::bytes::Regex;

let re = Regex::new(r"").unwrap();
let hay = "☃".as_bytes();
let got: Vec<&[u8]> = re.split(hay).collect();
assert_eq!(got, vec![
    &[][..], &[b'\xE2'][..], &[b'\x98'][..], &[b'\x83'][..], &[][..],
]);

Contiguous separators (commonly shows up with whitespace), can lead to possibly surprising behavior. For example, this code is correct:

use regex::bytes::Regex;

let re = Regex::new(r" ").unwrap();
let hay = b"    a  b c";
let got: Vec<&[u8]> = re.split(hay).collect();
assert_eq!(got, vec![
    &b""[..], &b""[..], &b""[..], &b""[..],
    &b"a"[..], &b""[..], &b"b"[..], &b"c"[..],
]);

It does not give you ["a", "b", "c"]. For that behavior, you’d want to match contiguous space characters:

use regex::bytes::Regex;

let re = Regex::new(r" +").unwrap();
let hay = b"    a  b c";
let got: Vec<&[u8]> = re.split(hay).collect();
// N.B. This does still include a leading empty span because ' +'
// matches at the beginning of the haystack.
assert_eq!(got, vec![&b""[..], &b"a"[..], &b"b"[..], &b"c"[..]]);
Source

pub fn splitn<'r, 'h>( &'r self, haystack: &'h [u8], limit: usize, ) -> SplitN<'r, 'h>

Returns an iterator of at most limit substrings of the haystack given, delimited by a match of the regex. (A limit of 0 will return no substrings.) Namely, each element of the iterator corresponds to a part of the haystack that isn’t matched by the regular expression. The remainder of the haystack that is not split will be the last element in the iterator.

§Time complexity

Since iterators over all matches requires running potentially many searches on the haystack, and since each search has worst case O(m * n) time complexity, the overall worst case time complexity for this routine is O(m * n^2).

Although note that the worst case time here has an upper bound given by the limit parameter.

§Example

Get the first two words in some haystack:

use regex::bytes::Regex;

let re = Regex::new(r"\W+").unwrap();
let hay = b"Hey! How are you?";
let fields: Vec<&[u8]> = re.splitn(hay, 3).collect();
assert_eq!(fields, vec![&b"Hey"[..], &b"How"[..], &b"are you?"[..]]);
§Examples: more cases
use regex::bytes::Regex;

let re = Regex::new(r" ").unwrap();
let hay = b"Mary had a little lamb";
let got: Vec<&[u8]> = re.splitn(hay, 3).collect();
assert_eq!(got, vec![&b"Mary"[..], &b"had"[..], &b"a little lamb"[..]]);

let re = Regex::new(r"X").unwrap();
let hay = b"";
let got: Vec<&[u8]> = re.splitn(hay, 3).collect();
assert_eq!(got, vec![&b""[..]]);

let re = Regex::new(r"X").unwrap();
let hay = b"lionXXtigerXleopard";
let got: Vec<&[u8]> = re.splitn(hay, 3).collect();
assert_eq!(got, vec![&b"lion"[..], &b""[..], &b"tigerXleopard"[..]]);

let re = Regex::new(r"::").unwrap();
let hay = b"lion::tiger::leopard";
let got: Vec<&[u8]> = re.splitn(hay, 2).collect();
assert_eq!(got, vec![&b"lion"[..], &b"tiger::leopard"[..]]);

let re = Regex::new(r"X").unwrap();
let hay = b"abcXdef";
let got: Vec<&[u8]> = re.splitn(hay, 1).collect();
assert_eq!(got, vec![&b"abcXdef"[..]]);

let re = Regex::new(r"X").unwrap();
let hay = b"abcdef";
let got: Vec<&[u8]> = re.splitn(hay, 2).collect();
assert_eq!(got, vec![&b"abcdef"[..]]);

let re = Regex::new(r"X").unwrap();
let hay = b"abcXdef";
let got: Vec<&[u8]> = re.splitn(hay, 0).collect();
assert!(got.is_empty());
Source

pub fn replace<'h, R: Replacer>( &self, haystack: &'h [u8], rep: R, ) -> Cow<'h, [u8]>

Replaces the leftmost-first match in the given haystack with the replacement provided. The replacement can be a regular string (where $N and $name are expanded to match capture groups) or a function that takes a Captures and returns the replaced string.

If no match is found, then the haystack is returned unchanged. In that case, this implementation will likely return a Cow::Borrowed value such that no allocation is performed.

When a Cow::Borrowed is returned, the value returned is guaranteed to be equivalent to the haystack given.

§Replacement string syntax

All instances of $ref in the replacement string are replaced with the substring corresponding to the capture group identified by ref.

ref may be an integer corresponding to the index of the capture group (counted by order of opening parenthesis where 0 is the entire match) or it can be a name (consisting of letters, digits or underscores) corresponding to a named capture group.

If ref isn’t a valid capture group (whether the name doesn’t exist or isn’t a valid index), then it is replaced with the empty string.

The longest possible name is used. For example, $1a looks up the capture group named 1a and not the capture group at index 1. To exert more precise control over the name, use braces, e.g., ${1}a.

To write a literal $ use $$.

§Example

Note that this function is polymorphic with respect to the replacement. In typical usage, this can just be a normal string:

use regex::bytes::Regex;

let re = Regex::new(r"[^01]+").unwrap();
assert_eq!(re.replace(b"1078910", b""), &b"1010"[..]);

But anything satisfying the Replacer trait will work. For example, a closure of type |&Captures| -> String provides direct access to the captures corresponding to a match. This allows one to access capturing group matches easily:

use regex::bytes::{Captures, Regex};

let re = Regex::new(r"([^,\s]+),\s+(\S+)").unwrap();
let result = re.replace(b"Springsteen, Bruce", |caps: &Captures| {
    let mut buf = vec![];
    buf.extend_from_slice(&caps[2]);
    buf.push(b' ');
    buf.extend_from_slice(&caps[1]);
    buf
});
assert_eq!(result, &b"Bruce Springsteen"[..]);

But this is a bit cumbersome to use all the time. Instead, a simple syntax is supported (as described above) that expands $name into the corresponding capture group. Here’s the last example, but using this expansion technique with named capture groups:

use regex::bytes::Regex;

let re = Regex::new(r"(?<last>[^,\s]+),\s+(?<first>\S+)").unwrap();
let result = re.replace(b"Springsteen, Bruce", b"$first $last");
assert_eq!(result, &b"Bruce Springsteen"[..]);

Note that using $2 instead of $first or $1 instead of $last would produce the same result. To write a literal $ use $$.

Sometimes the replacement string requires use of curly braces to delineate a capture group replacement when it is adjacent to some other literal text. For example, if we wanted to join two words together with an underscore:

use regex::bytes::Regex;

let re = Regex::new(r"(?<first>\w+)\s+(?<second>\w+)").unwrap();
let result = re.replace(b"deep fried", b"${first}_$second");
assert_eq!(result, &b"deep_fried"[..]);

Without the curly braces, the capture group name first_ would be used, and since it doesn’t exist, it would be replaced with the empty string.

Finally, sometimes you just want to replace a literal string with no regard for capturing group expansion. This can be done by wrapping a string with NoExpand:

use regex::bytes::{NoExpand, Regex};

let re = Regex::new(r"(?<last>[^,\s]+),\s+(\S+)").unwrap();
let result = re.replace(b"Springsteen, Bruce", NoExpand(b"$2 $last"));
assert_eq!(result, &b"$2 $last"[..]);

Using NoExpand may also be faster, since the replacement string won’t need to be parsed for the $ syntax.

Source

pub fn replace_all<'h, R: Replacer>( &self, haystack: &'h [u8], rep: R, ) -> Cow<'h, [u8]>

Replaces all non-overlapping matches in the haystack with the replacement provided. This is the same as calling replacen with limit set to 0.

If no match is found, then the haystack is returned unchanged. In that case, this implementation will likely return a Cow::Borrowed value such that no allocation is performed.

When a Cow::Borrowed is returned, the value returned is guaranteed to be equivalent to the haystack given.

The documentation for Regex::replace goes into more detail about what kinds of replacement strings are supported.

§Time complexity

Since iterators over all matches requires running potentially many searches on the haystack, and since each search has worst case O(m * n) time complexity, the overall worst case time complexity for this routine is O(m * n^2).

§Fallibility

If you need to write a replacement routine where any individual replacement might “fail,” doing so with this API isn’t really feasible because there’s no way to stop the search process if a replacement fails. Instead, if you need this functionality, you should consider implementing your own replacement routine:

use regex::bytes::{Captures, Regex};

fn replace_all<E>(
    re: &Regex,
    haystack: &[u8],
    replacement: impl Fn(&Captures) -> Result<Vec<u8>, E>,
) -> Result<Vec<u8>, E> {
    let mut new = Vec::with_capacity(haystack.len());
    let mut last_match = 0;
    for caps in re.captures_iter(haystack) {
        let m = caps.get(0).unwrap();
        new.extend_from_slice(&haystack[last_match..m.start()]);
        new.extend_from_slice(&replacement(&caps)?);
        last_match = m.end();
    }
    new.extend_from_slice(&haystack[last_match..]);
    Ok(new)
}

// Let's replace each word with the number of bytes in that word.
// But if we see a word that is "too long," we'll give up.
let re = Regex::new(r"\w+").unwrap();
let replacement = |caps: &Captures| -> Result<Vec<u8>, &'static str> {
    if caps[0].len() >= 5 {
        return Err("word too long");
    }
    Ok(caps[0].len().to_string().into_bytes())
};
assert_eq!(
    Ok(b"2 3 3 3?".to_vec()),
    replace_all(&re, b"hi how are you?", &replacement),
);
assert!(replace_all(&re, b"hi there", &replacement).is_err());
§Example

This example shows how to flip the order of whitespace (excluding line terminators) delimited fields, and normalizes the whitespace that delimits the fields:

use regex::bytes::Regex;

let re = Regex::new(r"(?m)^(\S+)[\s--\r\n]+(\S+)$").unwrap();
let hay = b"
Greetings  1973
Wild\t1973
BornToRun\t\t\t\t1975
Darkness                    1978
TheRiver 1980
";
let new = re.replace_all(hay, b"$2 $1");
assert_eq!(new, &b"
1973 Greetings
1973 Wild
1975 BornToRun
1978 Darkness
1980 TheRiver
"[..]);
Source

pub fn replacen<'h, R: Replacer>( &self, haystack: &'h [u8], limit: usize, rep: R, ) -> Cow<'h, [u8]>

Replaces at most limit non-overlapping matches in the haystack with the replacement provided. If limit is 0, then all non-overlapping matches are replaced. That is, Regex::replace_all(hay, rep) is equivalent to Regex::replacen(hay, 0, rep).

If no match is found, then the haystack is returned unchanged. In that case, this implementation will likely return a Cow::Borrowed value such that no allocation is performed.

When a Cow::Borrowed is returned, the value returned is guaranteed to be equivalent to the haystack given.

The documentation for Regex::replace goes into more detail about what kinds of replacement strings are supported.

§Time complexity

Since iterators over all matches requires running potentially many searches on the haystack, and since each search has worst case O(m * n) time complexity, the overall worst case time complexity for this routine is O(m * n^2).

Although note that the worst case time here has an upper bound given by the limit parameter.

§Fallibility

See the corresponding section in the docs for Regex::replace_all for tips on how to deal with a replacement routine that can fail.

§Example

This example shows how to flip the order of whitespace (excluding line terminators) delimited fields, and normalizes the whitespace that delimits the fields. But we only do it for the first two matches.

use regex::bytes::Regex;

let re = Regex::new(r"(?m)^(\S+)[\s--\r\n]+(\S+)$").unwrap();
let hay = b"
Greetings  1973
Wild\t1973
BornToRun\t\t\t\t1975
Darkness                    1978
TheRiver 1980
";
let new = re.replacen(hay, 2, b"$2 $1");
assert_eq!(new, &b"
1973 Greetings
1973 Wild
BornToRun\t\t\t\t1975
Darkness                    1978
TheRiver 1980
"[..]);
Source§

impl Regex

A group of advanced or “lower level” search methods. Some methods permit starting the search at a position greater than 0 in the haystack. Other methods permit reusing allocations, for example, when extracting the matches for capture groups.

Source

pub fn shortest_match(&self, haystack: &[u8]) -> Option<usize>

Returns the end byte offset of the first match in the haystack given.

This method may have the same performance characteristics as is_match. Behaviorlly, it doesn’t just report whether it match occurs, but also the end offset for a match. In particular, the offset returned may be shorter than the proper end of the leftmost-first match that you would find via Regex::find.

Note that it is not guaranteed that this routine finds the shortest or “earliest” possible match. Instead, the main idea of this API is that it returns the offset at the point at which the internal regex engine has determined that a match has occurred. This may vary depending on which internal regex engine is used, and thus, the offset itself may change based on internal heuristics.

§Example

Typically, a+ would match the entire first sequence of a in some haystack, but shortest_match may give up as soon as it sees the first a.

use regex::bytes::Regex;

let re = Regex::new(r"a+").unwrap();
let offset = re.shortest_match(b"aaaaa").unwrap();
assert_eq!(offset, 1);
Source

pub fn shortest_match_at(&self, haystack: &[u8], start: usize) -> Option<usize>

Returns the same as shortest_match, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

If a match is found, the offset returned is relative to the beginning of the haystack, not the beginning of the search.

§Panics

This panics when start >= haystack.len() + 1.

§Example

This example shows the significance of start by demonstrating how it can be used to permit look-around assertions in a regex to take the surrounding context into account.

use regex::bytes::Regex;

let re = Regex::new(r"\bchew\b").unwrap();
let hay = b"eschew";
// We get a match here, but it's probably not intended.
assert_eq!(re.shortest_match(&hay[2..]), Some(4));
// No match because the  assertions take the context into account.
assert_eq!(re.shortest_match_at(hay, 2), None);
Source

pub fn is_match_at(&self, haystack: &[u8], start: usize) -> bool

Returns the same as Regex::is_match, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

§Panics

This panics when start >= haystack.len() + 1.

§Example

This example shows the significance of start by demonstrating how it can be used to permit look-around assertions in a regex to take the surrounding context into account.

use regex::bytes::Regex;

let re = Regex::new(r"\bchew\b").unwrap();
let hay = b"eschew";
// We get a match here, but it's probably not intended.
assert!(re.is_match(&hay[2..]));
// No match because the  assertions take the context into account.
assert!(!re.is_match_at(hay, 2));
Source

pub fn find_at<'h>(&self, haystack: &'h [u8], start: usize) -> Option<Match<'h>>

Returns the same as Regex::find, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

§Panics

This panics when start >= haystack.len() + 1.

§Example

This example shows the significance of start by demonstrating how it can be used to permit look-around assertions in a regex to take the surrounding context into account.

use regex::bytes::Regex;

let re = Regex::new(r"\bchew\b").unwrap();
let hay = b"eschew";
// We get a match here, but it's probably not intended.
assert_eq!(re.find(&hay[2..]).map(|m| m.range()), Some(0..4));
// No match because the  assertions take the context into account.
assert_eq!(re.find_at(hay, 2), None);
Source

pub fn captures_at<'h>( &self, haystack: &'h [u8], start: usize, ) -> Option<Captures<'h>>

Returns the same as Regex::captures, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

§Panics

This panics when start >= haystack.len() + 1.

§Example

This example shows the significance of start by demonstrating how it can be used to permit look-around assertions in a regex to take the surrounding context into account.

use regex::bytes::Regex;

let re = Regex::new(r"\bchew\b").unwrap();
let hay = b"eschew";
// We get a match here, but it's probably not intended.
assert_eq!(&re.captures(&hay[2..]).unwrap()[0], b"chew");
// No match because the  assertions take the context into account.
assert!(re.captures_at(hay, 2).is_none());
Source

pub fn captures_read<'h>( &self, locs: &mut CaptureLocations, haystack: &'h [u8], ) -> Option<Match<'h>>

This is like Regex::captures, but writes the byte offsets of each capture group match into the locations given.

A CaptureLocations stores the same byte offsets as a Captures, but does not store a reference to the haystack. This makes its API a bit lower level and less convenient. But in exchange, callers may allocate their own CaptureLocations and reuse it for multiple searches. This may be helpful if allocating a Captures shows up in a profile as too costly.

To create a CaptureLocations value, use the Regex::capture_locations method.

This also returns the overall match if one was found. When a match is found, its offsets are also always stored in locs at index 0.

§Example
use regex::bytes::Regex;

let re = Regex::new(r"^([a-z]+)=(\S*)$").unwrap();
let mut locs = re.capture_locations();
assert!(re.captures_read(&mut locs, b"id=foo123").is_some());
assert_eq!(Some((0, 9)), locs.get(0));
assert_eq!(Some((0, 2)), locs.get(1));
assert_eq!(Some((3, 9)), locs.get(2));
Source

pub fn captures_read_at<'h>( &self, locs: &mut CaptureLocations, haystack: &'h [u8], start: usize, ) -> Option<Match<'h>>

Returns the same as Regex::captures_read, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

§Panics

This panics when start >= haystack.len() + 1.

§Example

This example shows the significance of start by demonstrating how it can be used to permit look-around assertions in a regex to take the surrounding context into account.

use regex::bytes::Regex;

let re = Regex::new(r"\bchew\b").unwrap();
let hay = b"eschew";
let mut locs = re.capture_locations();
// We get a match here, but it's probably not intended.
assert!(re.captures_read(&mut locs, &hay[2..]).is_some());
// No match because the  assertions take the context into account.
assert!(re.captures_read_at(&mut locs, hay, 2).is_none());
Source§

impl Regex

Auxiliary methods.

Source

pub fn as_str(&self) -> &str

Returns the original string of this regex.

§Example
use regex::bytes::Regex;

let re = Regex::new(r"foo\w+bar").unwrap();
assert_eq!(re.as_str(), r"foo\w+bar");
Source

pub fn capture_names(&self) -> CaptureNames<'_>

Returns an iterator over the capture names in this regex.

The iterator returned yields elements of type Option<&str>. That is, the iterator yields values for all capture groups, even ones that are unnamed. The order of the groups corresponds to the order of the group’s corresponding opening parenthesis.

The first element of the iterator always yields the group corresponding to the overall match, and this group is always unnamed. Therefore, the iterator always yields at least one group.

§Example

This shows basic usage with a mix of named and unnamed capture groups:

use regex::bytes::Regex;

let re = Regex::new(r"(?<a>.(?<b>.))(.)(?:.)(?<c>.)").unwrap();
let mut names = re.capture_names();
assert_eq!(names.next(), Some(None));
assert_eq!(names.next(), Some(Some("a")));
assert_eq!(names.next(), Some(Some("b")));
assert_eq!(names.next(), Some(None));
// the '(?:.)' group is non-capturing and so doesn't appear here!
assert_eq!(names.next(), Some(Some("c")));
assert_eq!(names.next(), None);

The iterator always yields at least one element, even for regexes with no capture groups and even for regexes that can never match:

use regex::bytes::Regex;

let re = Regex::new(r"").unwrap();
let mut names = re.capture_names();
assert_eq!(names.next(), Some(None));
assert_eq!(names.next(), None);

let re = Regex::new(r"[a&&b]").unwrap();
let mut names = re.capture_names();
assert_eq!(names.next(), Some(None));
assert_eq!(names.next(), None);
Source

pub fn captures_len(&self) -> usize

Returns the number of captures groups in this regex.

This includes all named and unnamed groups, including the implicit unnamed group that is always present and corresponds to the entire match.

Since the implicit unnamed group is always included in this length, the length returned is guaranteed to be greater than zero.

§Example
use regex::bytes::Regex;

let re = Regex::new(r"foo").unwrap();
assert_eq!(1, re.captures_len());

let re = Regex::new(r"(foo)").unwrap();
assert_eq!(2, re.captures_len());

let re = Regex::new(r"(?<a>.(?<b>.))(.)(?:.)(?<c>.)").unwrap();
assert_eq!(5, re.captures_len());

let re = Regex::new(r"[a&&b]").unwrap();
assert_eq!(1, re.captures_len());
Source

pub fn static_captures_len(&self) -> Option<usize>

Returns the total number of capturing groups that appear in every possible match.

If the number of capture groups can vary depending on the match, then this returns None. That is, a value is only returned when the number of matching groups is invariant or “static.”

Note that like Regex::captures_len, this does include the implicit capturing group corresponding to the entire match. Therefore, when a non-None value is returned, it is guaranteed to be at least 1. Stated differently, a return value of Some(0) is impossible.

§Example

This shows a few cases where a static number of capture groups is available and a few cases where it is not.

use regex::bytes::Regex;

let len = |pattern| {
    Regex::new(pattern).map(|re| re.static_captures_len())
};

assert_eq!(Some(1), len("a")?);
assert_eq!(Some(2), len("(a)")?);
assert_eq!(Some(2), len("(a)|(b)")?);
assert_eq!(Some(3), len("(a)(b)|(c)(d)")?);
assert_eq!(None, len("(a)|b")?);
assert_eq!(None, len("a|(b)")?);
assert_eq!(None, len("(b)*")?);
assert_eq!(Some(2), len("(b)+")?);
Source

pub fn capture_locations(&self) -> CaptureLocations

Returns a fresh allocated set of capture locations that can be reused in multiple calls to Regex::captures_read or Regex::captures_read_at.

§Example
use regex::bytes::Regex;

let re = Regex::new(r"(.)(.)(\w+)").unwrap();
let mut locs = re.capture_locations();
assert!(re.captures_read(&mut locs, b"Padron").is_some());
assert_eq!(locs.get(0), Some((0, 6)));
assert_eq!(locs.get(1), Some((0, 1)));
assert_eq!(locs.get(2), Some((1, 2)));
assert_eq!(locs.get(3), Some((2, 6)));

Trait Implementations§

Source§

impl Clone for Regex

Source§

fn clone(&self) -> Regex

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Regex

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Shows the original regular expression.

Source§

impl Display for Regex

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Shows the original regular expression.

Source§

impl FromStr for Regex

Source§

fn from_str(s: &str) -> Result<Regex, Error>

Attempts to parse a string into a regular expression

Source§

type Err = Error

The associated error which can be returned from parsing.
Source§

impl TryFrom<&str> for Regex

Source§

fn try_from(s: &str) -> Result<Regex, Error>

Attempts to parse a string into a regular expression

Source§

type Error = Error

The type returned in the event of a conversion error.
Source§

impl TryFrom<String> for Regex

Source§

fn try_from(s: String) -> Result<Regex, Error>

Attempts to parse a string into a regular expression

Source§

type Error = Error

The type returned in the event of a conversion error.

Auto Trait Implementations§

§

impl Freeze for Regex

§

impl RefUnwindSafe for Regex

§

impl Send for Regex

§

impl Sync for Regex

§

impl Unpin for Regex

§

impl UnwindSafe for Regex

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

default fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.