|
|
|
|
Quelle ahocorasick.rs
Sprache: unbekannt
|
|
use core::{
fmt::Debug,
panic::{RefUnwindSafe, UnwindSafe},
};
use alloc::{string::String, sync::Arc, vec::Vec};
use crate::{
automaton::{self, Automaton, OverlappingState},
dfa,
nfa::{contiguous, noncontiguous},
util::{
error::{BuildError, MatchError},
prefilter::Prefilter,
primitives::{PatternID, StateID},
search::{Anchored, Input, Match, MatchKind, StartKind},
},
};
/// An automaton for searching multiple strings in linear time.
///
/// The `AhoCorasick` type supports a few basic ways of constructing an
/// automaton, with the default being [`AhoCorasick::new`]. However, there
/// are a fair number of configurable options that can be set by using
/// [`AhoCorasickBuilder`] instead. Such options include, but are not limited
/// to, how matches are determined, simple case insensitivity, whether to use a
/// DFA or not and various knobs for controlling the space-vs-time trade offs
/// taken when building the automaton.
///
/// # Resource usage
///
/// Aho-Corasick automatons are always constructed in `O(p)` time, where
/// `p` is the combined length of all patterns being searched. With that
/// said, building an automaton can be fairly costly because of high constant
/// factors, particularly when enabling the [DFA](AhoCorasickKind::DFA) option
/// with [`AhoCorasickBuilder::kind`]. For this reason, it's generally a good
/// idea to build an automaton once and reuse it as much as possible.
///
/// Aho-Corasick automatons can also use a fair bit of memory. To get
/// a concrete idea of how much memory is being used, try using the
/// [`AhoCorasick::memory_usage`] method.
///
/// To give a quick idea of the differences between Aho-Corasick
/// implementations and their resource usage, here's a sample of construction
/// times and heap memory used after building an automaton from 100,000
/// randomly selected titles from Wikipedia:
///
/// * 99MB for a [`noncontiguous::NFA`] in 240ms.
/// * 21MB for a [`contiguous::NFA`] in 275ms.
/// * 1.6GB for a [`dfa::DFA`] in 1.88s.
///
/// (Note that the memory usage above reflects the size of each automaton and
/// not peak memory usage. For example, building a contiguous NFA requires
/// first building a noncontiguous NFA. Once the contiguous NFA is built, the
/// noncontiguous NFA is freed.)
///
/// This experiment very strongly argues that a contiguous NFA is often the
/// best balance in terms of resource usage. It takes a little longer to build,
/// but its memory usage is quite small. Its search speed (not listed) is
/// also often faster than a noncontiguous NFA, but a little slower than a
/// DFA. Indeed, when no specific [`AhoCorasickKind`] is used (which is the
/// default), a contiguous NFA is used in most cases.
///
/// The only "catch" to using a contiguous NFA is that, because of its variety
/// of compression tricks, it may not be able to support automatons as large as
/// what the noncontiguous NFA supports. In which case, building a contiguous
/// NFA will fail and (by default) `AhoCorasick` will automatically fall
/// back to a noncontiguous NFA. (This typically only happens when building
/// automatons from millions of patterns.) Otherwise, the small additional time
/// for building a contiguous NFA is almost certainly worth it.
///
/// # Cloning
///
/// The `AhoCorasick` type uses thread safe reference counting internally. It
/// is guaranteed that it is cheap to clone.
///
/// # Search configuration
///
/// Most of the search routines accept anything that can be cheaply converted
/// to an [`Input`]. This includes `&[u8]`, `&str` and `Input` itself.
///
/// # Construction failure
///
/// It is generally possible for building an Aho-Corasick automaton to fail.
/// Construction can fail in generally one way: when the inputs provided are
/// too big. Whether that's a pattern that is too long, too many patterns
/// or some combination of both. A first approximation for the scale at which
/// construction can fail is somewhere around "millions of patterns."
///
/// For that reason, if you're building an Aho-Corasick automaton from
/// untrusted input (or input that doesn't have any reasonable bounds on its
/// size), then it is strongly recommended to handle the possibility of an
/// error.
///
/// If you're constructing an Aho-Corasick automaton from static or trusted
/// data, then it is likely acceptable to panic (by calling `unwrap()` or
/// `expect()`) if construction fails.
///
/// # Fallibility
///
/// The `AhoCorasick` type provides a number of methods for searching, as one
/// might expect. Depending on how the Aho-Corasick automaton was built and
/// depending on the search configuration, it is possible for a search to
/// return an error. Since an error is _never_ dependent on the actual contents
/// of the haystack, this type provides both infallible and fallible methods
/// for searching. The infallible methods panic if an error occurs, and can be
/// used for convenience and when you know the search will never return an
/// error.
///
/// For example, the [`AhoCorasick::find_iter`] method is the infallible
/// version of the [`AhoCorasick::try_find_iter`] method.
///
/// Examples of errors that can occur:
///
/// * Running a search that requires [`MatchKind::Standard`] semantics (such
/// as a stream or overlapping search) with an automaton that was built with
/// [`MatchKind::LeftmostFirst`] or [`MatchKind::LeftmostLongest`] semantics.
/// * Running an anchored search with an automaton that only supports
/// unanchored searches. (By default, `AhoCorasick` only supports unanchored
/// searches. But this can be toggled with [`AhoCorasickBuilder::start_kind`].)
/// * Running an unanchored search with an automaton that only supports
/// anchored searches.
///
/// The common thread between the different types of errors is that they are
/// all rooted in the automaton construction and search configurations. If
/// those configurations are a static property of your program, then it is
/// reasonable to call infallible routines since you know an error will never
/// occur. And if one _does_ occur, then it's a bug in your program.
///
/// To re-iterate, if the patterns, build or search configuration come from
/// user or untrusted data, then you should handle errors at build or search
/// time. If only the haystack comes from user or untrusted data, then there
/// should be no need to handle errors anywhere and it is generally encouraged
/// to `unwrap()` (or `expect()`) both build and search time calls.
///
/// # Examples
///
/// This example shows how to search for occurrences of multiple patterns
/// simultaneously in a case insensitive fashion. Each match includes the
/// pattern that matched along with the byte offsets of the match.
///
/// ```
/// use aho_corasick::{AhoCorasick, PatternID};
///
/// let patterns = &["apple", "maple", "snapple"];
/// let haystack = "Nobody likes maple in their apple flavored Snapple.";
///
/// let ac = AhoCorasick::builder()
/// .ascii_case_insensitive(true)
/// .build(patterns)
/// .unwrap();
/// let mut matches = vec![];
/// for mat in ac.find_iter(haystack) {
/// matches.push((mat.pattern(), mat.start(), mat.end()));
/// }
/// assert_eq!(matches, vec![
/// (PatternID::must(1), 13, 18),
/// (PatternID::must(0), 28, 33),
/// (PatternID::must(2), 43, 50),
/// ]);
/// ```
///
/// This example shows how to replace matches with some other string:
///
/// ```
/// use aho_corasick::AhoCorasick;
///
/// let patterns = &["fox", "brown", "quick"];
/// let haystack = "The quick brown fox.";
/// let replace_with = &["sloth", "grey", "slow"];
///
/// let ac = AhoCorasick::new(patterns).unwrap();
/// let result = ac.replace_all(haystack, replace_with);
/// assert_eq!(result, "The slow grey sloth.");
/// ```
#[derive(Clone)]
pub struct AhoCorasick {
/// The underlying Aho-Corasick automaton. It's one of
/// nfa::noncontiguous::NFA, nfa::contiguous::NFA or dfa::DFA.
aut: Arc<dyn AcAutomaton>,
/// The specific Aho-Corasick kind chosen. This makes it possible to
/// inspect any `AhoCorasick` and know what kind of search strategy it
/// uses.
kind: AhoCorasickKind,
/// The start kind of this automaton as configured by the caller.
///
/// We don't really *need* to put this here, since the underlying automaton
/// will correctly return errors if the caller requests an unsupported
/// search type. But we do keep this here for API behavior consistency.
/// Namely, the NFAs in this crate support both unanchored and anchored
/// searches unconditionally. There's no way to disable one or the other.
/// They always both work. But the DFA in this crate specifically only
/// supports both unanchored and anchored searches if it's configured to
/// do so. Why? Because for the DFA, supporting both essentially requires
/// two copies of the transition table: one generated by following failure
/// transitions from the original NFA and one generated by not following
/// those failure transitions.
///
/// So why record the start kind here? Well, consider what happens
/// when no specific 'AhoCorasickKind' is selected by the caller and
/// 'StartKind::Unanchored' is used (both are the default). It *might*
/// result in using a DFA or it might pick an NFA. If it picks an NFA, the
/// caller would then be able to run anchored searches, even though the
/// caller only asked for support for unanchored searches. Maybe that's
/// fine, but what if the DFA was chosen instead? Oops, the caller would
/// get an error.
///
/// Basically, it seems bad to return an error or not based on some
/// internal implementation choice. So we smooth things out and ensure
/// anchored searches *always* report an error when only unanchored support
/// was asked for (and vice versa), even if the underlying automaton
/// supports it.
start_kind: StartKind,
}
/// Convenience constructors for an Aho-Corasick searcher. To configure the
/// searcher, use an [`AhoCorasickBuilder`] instead.
impl AhoCorasick {
/// Create a new Aho-Corasick automaton using the default configuration.
///
/// The default configuration optimizes for less space usage, but at the
/// expense of longer search times. To change the configuration, use
/// [`AhoCorasickBuilder`].
///
/// This uses the default [`MatchKind::Standard`] match semantics, which
/// reports a match as soon as it is found. This corresponds to the
/// standard match semantics supported by textbook descriptions of the
/// Aho-Corasick algorithm.
///
/// # Examples
///
/// Basic usage:
///
/// ```
/// use aho_corasick::{AhoCorasick, PatternID};
///
/// let ac = AhoCorasick::new(&["foo", "bar", "baz"]).unwrap();
/// assert_eq!(
/// Some(PatternID::must(1)),
/// ac.find("xxx bar xxx").map(|m| m.pattern()),
/// );
/// ```
pub fn new<I, P>(patterns: I) -> Result<AhoCorasick, BuildError>
where
I: IntoIterator<Item = P>,
P: AsRef<[u8]>,
{
AhoCorasickBuilder::new().build(patterns)
}
/// A convenience method for returning a new Aho-Corasick builder.
///
/// This usually permits one to just import the `AhoCorasick` type.
///
/// # Examples
///
/// Basic usage:
///
/// ```
/// use aho_corasick::{AhoCorasick, Match, MatchKind};
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(&["samwise", "sam"])
/// .unwrap();
/// assert_eq!(Some(Match::must(0, 0..7)), ac.find("samwise"));
/// ```
pub fn builder() -> AhoCorasickBuilder {
AhoCorasickBuilder::new()
}
}
/// Infallible search routines. These APIs panic when the underlying search
/// would otherwise fail. Infallible routines are useful because the errors are
/// a result of both search-time configuration and what configuration is used
/// to build the Aho-Corasick searcher. Both of these things are not usually
/// the result of user input, and thus, an error is typically indicative of a
/// programmer error. In cases where callers want errors instead of panics, use
/// the corresponding `try` method in the section below.
impl AhoCorasick {
/// Returns true if and only if this automaton matches the haystack at any
/// position.
///
/// `input` may be any type that is cheaply convertible to an `Input`. This
/// includes, but is not limited to, `&str` and `&[u8]`.
///
/// Aside from convenience, when `AhoCorasick` was built with
/// leftmost-first or leftmost-longest semantics, this might result in a
/// search that visits less of the haystack than [`AhoCorasick::find`]
/// would otherwise. (For standard semantics, matches are always
/// immediately returned once they are seen, so there is no way for this to
/// do less work in that case.)
///
/// Note that there is no corresponding fallible routine for this method.
/// If you need a fallible version of this, then [`AhoCorasick::try_find`]
/// can be used with [`Input::earliest`] enabled.
///
/// # Examples
///
/// Basic usage:
///
/// ```
/// use aho_corasick::AhoCorasick;
///
/// let ac = AhoCorasick::new(&[
/// "foo", "bar", "quux", "baz",
/// ]).unwrap();
/// assert!(ac.is_match("xxx bar xxx"));
/// assert!(!ac.is_match("xxx qux xxx"));
/// ```
pub fn is_match<'h, I: Into<Input<'h>>>(&self, input: I) -> bool {
self.aut
.try_find(&input.into().earliest(true))
.expect("AhoCorasick::try_find is not expected to fail")
.is_some()
}
/// Returns the location of the first match according to the match
/// semantics that this automaton was constructed with.
///
/// `input` may be any type that is cheaply convertible to an `Input`. This
/// includes, but is not limited to, `&str` and `&[u8]`.
///
/// This is the infallible version of [`AhoCorasick::try_find`].
///
/// # Panics
///
/// This panics when [`AhoCorasick::try_find`] would return an error.
///
/// # Examples
///
/// Basic usage, with standard semantics:
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind};
///
/// let patterns = &["b", "abc", "abcd"];
/// let haystack = "abcd";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::Standard) // default, not necessary
/// .build(patterns)
/// .unwrap();
/// let mat = ac.find(haystack).expect("should have a match");
/// assert_eq!("b", &haystack[mat.start()..mat.end()]);
/// ```
///
/// Now with leftmost-first semantics:
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind};
///
/// let patterns = &["b", "abc", "abcd"];
/// let haystack = "abcd";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let mat = ac.find(haystack).expect("should have a match");
/// assert_eq!("abc", &haystack[mat.start()..mat.end()]);
/// ```
///
/// And finally, leftmost-longest semantics:
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind};
///
/// let patterns = &["b", "abc", "abcd"];
/// let haystack = "abcd";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostLongest)
/// .build(patterns)
/// .unwrap();
/// let mat = ac.find(haystack).expect("should have a match");
/// ```
///
/// # Example: configuring a search
///
/// Because this method accepts anything that can be turned into an
/// [`Input`], it's possible to provide an `Input` directly in order to
/// configure the search. In this example, we show how to use the
/// `earliest` option to force the search to return as soon as it knows
/// a match has occurred.
///
/// ```
/// use aho_corasick::{AhoCorasick, Input, MatchKind};
///
/// let patterns = &["b", "abc", "abcd"];
/// let haystack = "abcd";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostLongest)
/// .build(patterns)
/// .unwrap();
/// let mat = ac.find(Input::new(haystack).earliest(true))
/// .expect("should have a match");
/// // The correct leftmost-longest match here is 'abcd', but since we
/// // told the search to quit as soon as it knows a match has occurred,
/// // we get a different match back.
/// assert_eq!("b", &haystack[mat.start()..mat.end()]);
/// ```
pub fn find<'h, I: Into<Input<'h>>>(&self, input: I) -> Option<Match> {
self.try_find(input)
.expect("AhoCorasick::try_find is not expected to fail")
}
/// Returns the location of the first overlapping match in the given
/// input with respect to the current state of the underlying searcher.
///
/// `input` may be any type that is cheaply convertible to an `Input`. This
/// includes, but is not limited to, `&str` and `&[u8]`.
///
/// Overlapping searches do not report matches in their return value.
/// Instead, matches can be accessed via [`OverlappingState::get_match`]
/// after a search call.
///
/// This is the infallible version of
/// [`AhoCorasick::try_find_overlapping`].
///
/// # Panics
///
/// This panics when [`AhoCorasick::try_find_overlapping`] would
/// return an error. For example, when the Aho-Corasick searcher
/// doesn't support overlapping searches. (Only searchers built with
/// [`MatchKind::Standard`] semantics support overlapping searches.)
///
/// # Example
///
/// This shows how we can repeatedly call an overlapping search without
/// ever needing to explicitly re-slice the haystack. Overlapping search
/// works this way because searches depend on state saved during the
/// previous search.
///
/// ```
/// use aho_corasick::{
/// automaton::OverlappingState,
/// AhoCorasick, Input, Match,
/// };
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::new(patterns).unwrap();
/// let mut state = OverlappingState::start();
///
/// ac.find_overlapping(haystack, &mut state);
/// assert_eq!(Some(Match::must(2, 0..3)), state.get_match());
///
/// ac.find_overlapping(haystack, &mut state);
/// assert_eq!(Some(Match::must(0, 0..6)), state.get_match());
///
/// ac.find_overlapping(haystack, &mut state);
/// assert_eq!(Some(Match::must(2, 11..14)), state.get_match());
///
/// ac.find_overlapping(haystack, &mut state);
/// assert_eq!(Some(Match::must(2, 22..25)), state.get_match());
///
/// ac.find_overlapping(haystack, &mut state);
/// assert_eq!(Some(Match::must(0, 22..28)), state.get_match());
///
/// ac.find_overlapping(haystack, &mut state);
/// assert_eq!(Some(Match::must(1, 22..31)), state.get_match());
///
/// // No more match matches to be found.
/// ac.find_overlapping(haystack, &mut state);
/// assert_eq!(None, state.get_match());
/// ```
pub fn find_overlapping<'h, I: Into<Input<'h>>>(
&self,
input: I,
state: &mut OverlappingState,
) {
self.try_find_overlapping(input, state).expect(
"AhoCorasick::try_find_overlapping is not expected to fail",
)
}
/// Returns an iterator of non-overlapping matches, using the match
/// semantics that this automaton was constructed with.
///
/// `input` may be any type that is cheaply convertible to an `Input`. This
/// includes, but is not limited to, `&str` and `&[u8]`.
///
/// This is the infallible version of [`AhoCorasick::try_find_iter`].
///
/// # Panics
///
/// This panics when [`AhoCorasick::try_find_iter`] would return an error.
///
/// # Examples
///
/// Basic usage, with standard semantics:
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind, PatternID};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::Standard) // default, not necessary
/// .build(patterns)
/// .unwrap();
/// let matches: Vec<PatternID> = ac
/// .find_iter(haystack)
/// .map(|mat| mat.pattern())
/// .collect();
/// assert_eq!(vec![
/// PatternID::must(2),
/// PatternID::must(2),
/// PatternID::must(2),
/// ], matches);
/// ```
///
/// Now with leftmost-first semantics:
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind, PatternID};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let matches: Vec<PatternID> = ac
/// .find_iter(haystack)
/// .map(|mat| mat.pattern())
/// .collect();
/// assert_eq!(vec![
/// PatternID::must(0),
/// PatternID::must(2),
/// PatternID::must(0),
/// ], matches);
/// ```
///
/// And finally, leftmost-longest semantics:
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind, PatternID};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostLongest)
/// .build(patterns)
/// .unwrap();
/// let matches: Vec<PatternID> = ac
/// .find_iter(haystack)
/// .map(|mat| mat.pattern())
/// .collect();
/// assert_eq!(vec![
/// PatternID::must(0),
/// PatternID::must(2),
/// PatternID::must(1),
/// ], matches);
/// ```
pub fn find_iter<'a, 'h, I: Into<Input<'h>>>(
&'a self,
input: I,
) -> FindIter<'a, 'h> {
self.try_find_iter(input)
.expect("AhoCorasick::try_find_iter is not expected to fail")
}
/// Returns an iterator of overlapping matches. Stated differently, this
/// returns an iterator of all possible matches at every position.
///
/// `input` may be any type that is cheaply convertible to an `Input`. This
/// includes, but is not limited to, `&str` and `&[u8]`.
///
/// This is the infallible version of
/// [`AhoCorasick::try_find_overlapping_iter`].
///
/// # Panics
///
/// This panics when `AhoCorasick::try_find_overlapping_iter` would return
/// an error. For example, when the Aho-Corasick searcher is built with
/// either leftmost-first or leftmost-longest match semantics. Stated
/// differently, overlapping searches require one to build the searcher
/// with [`MatchKind::Standard`] (it is the default).
///
/// # Example: basic usage
///
/// ```
/// use aho_corasick::{AhoCorasick, PatternID};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::new(patterns).unwrap();
/// let matches: Vec<PatternID> = ac
/// .find_overlapping_iter(haystack)
/// .map(|mat| mat.pattern())
/// .collect();
/// assert_eq!(vec![
/// PatternID::must(2),
/// PatternID::must(0),
/// PatternID::must(2),
/// PatternID::must(2),
/// PatternID::must(0),
/// PatternID::must(1),
/// ], matches);
/// ```
pub fn find_overlapping_iter<'a, 'h, I: Into<Input<'h>>>(
&'a self,
input: I,
) -> FindOverlappingIter<'a, 'h> {
self.try_find_overlapping_iter(input).expect(
"AhoCorasick::try_find_overlapping_iter is not expected to fail",
)
}
/// Replace all matches with a corresponding value in the `replace_with`
/// slice given. Matches correspond to the same matches as reported by
/// [`AhoCorasick::find_iter`].
///
/// Replacements are determined by the index of the matching pattern.
/// For example, if the pattern with index `2` is found, then it is
/// replaced by `replace_with[2]`.
///
/// This is the infallible version of [`AhoCorasick::try_replace_all`].
///
/// # Panics
///
/// This panics when [`AhoCorasick::try_replace_all`] would return an
/// error.
///
/// This also panics when `replace_with.len()` does not equal
/// [`AhoCorasick::patterns_len`].
///
/// # Example: basic usage
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let result = ac.replace_all(haystack, &["x", "y", "z"]);
/// assert_eq!("x the z to the xage", result);
/// ```
pub fn replace_all<B>(&self, haystack: &str, replace_with: &[B]) -> String
where
B: AsRef<str>,
{
self.try_replace_all(haystack, replace_with)
.expect("AhoCorasick::try_replace_all is not expected to fail")
}
/// Replace all matches using raw bytes with a corresponding value in the
/// `replace_with` slice given. Matches correspond to the same matches as
/// reported by [`AhoCorasick::find_iter`].
///
/// Replacements are determined by the index of the matching pattern.
/// For example, if the pattern with index `2` is found, then it is
/// replaced by `replace_with[2]`.
///
/// This is the infallible version of
/// [`AhoCorasick::try_replace_all_bytes`].
///
/// # Panics
///
/// This panics when [`AhoCorasick::try_replace_all_bytes`] would return an
/// error.
///
/// This also panics when `replace_with.len()` does not equal
/// [`AhoCorasick::patterns_len`].
///
/// # Example: basic usage
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = b"append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let result = ac.replace_all_bytes(haystack, &["x", "y", "z"]);
/// assert_eq!(b"x the z to the xage".to_vec(), result);
/// ```
pub fn replace_all_bytes<B>(
&self,
haystack: &[u8],
replace_with: &[B],
) -> Vec<u8>
where
B: AsRef<[u8]>,
{
self.try_replace_all_bytes(haystack, replace_with)
.expect("AhoCorasick::try_replace_all_bytes should not fail")
}
/// Replace all matches using a closure called on each match.
/// Matches correspond to the same matches as reported by
/// [`AhoCorasick::find_iter`].
///
/// The closure accepts three parameters: the match found, the text of
/// the match and a string buffer with which to write the replaced text
/// (if any). If the closure returns `true`, then it continues to the next
/// match. If the closure returns `false`, then searching is stopped.
///
/// Note that any matches with boundaries that don't fall on a valid UTF-8
/// boundary are silently skipped.
///
/// This is the infallible version of
/// [`AhoCorasick::try_replace_all_with`].
///
/// # Panics
///
/// This panics when [`AhoCorasick::try_replace_all_with`] would return an
/// error.
///
/// # Examples
///
/// Basic usage:
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let mut result = String::new();
/// ac.replace_all_with(haystack, &mut result, |mat, _, dst| {
/// dst.push_str(&mat.pattern().as_usize().to_string());
/// true
/// });
/// assert_eq!("0 the 2 to the 0age", result);
/// ```
///
/// Stopping the replacement by returning `false` (continued from the
/// example above):
///
/// ```
/// # use aho_corasick::{AhoCorasick, MatchKind, PatternID};
/// # let patterns = &["append", "appendage", "app"];
/// # let haystack = "append the app to the appendage";
/// # let ac = AhoCorasick::builder()
/// # .match_kind(MatchKind::LeftmostFirst)
/// # .build(patterns)
/// # .unwrap();
/// let mut result = String::new();
/// ac.replace_all_with(haystack, &mut result, |mat, _, dst| {
/// dst.push_str(&mat.pattern().as_usize().to_string());
/// mat.pattern() != PatternID::must(2)
/// });
/// assert_eq!("0 the 2 to the appendage", result);
/// ```
pub fn replace_all_with<F>(
&self,
haystack: &str,
dst: &mut String,
replace_with: F,
) where
F: FnMut(&Match, &str, &mut String) -> bool,
{
self.try_replace_all_with(haystack, dst, replace_with)
.expect("AhoCorasick::try_replace_all_with should not fail")
}
/// Replace all matches using raw bytes with a closure called on each
/// match. Matches correspond to the same matches as reported by
/// [`AhoCorasick::find_iter`].
///
/// The closure accepts three parameters: the match found, the text of
/// the match and a byte buffer with which to write the replaced text
/// (if any). If the closure returns `true`, then it continues to the next
/// match. If the closure returns `false`, then searching is stopped.
///
/// This is the infallible version of
/// [`AhoCorasick::try_replace_all_with_bytes`].
///
/// # Panics
///
/// This panics when [`AhoCorasick::try_replace_all_with_bytes`] would
/// return an error.
///
/// # Examples
///
/// Basic usage:
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = b"append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let mut result = vec![];
/// ac.replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
/// dst.extend(mat.pattern().as_usize().to_string().bytes());
/// true
/// });
/// assert_eq!(b"0 the 2 to the 0age".to_vec(), result);
/// ```
///
/// Stopping the replacement by returning `false` (continued from the
/// example above):
///
/// ```
/// # use aho_corasick::{AhoCorasick, MatchKind, PatternID};
/// # let patterns = &["append", "appendage", "app"];
/// # let haystack = b"append the app to the appendage";
/// # let ac = AhoCorasick::builder()
/// # .match_kind(MatchKind::LeftmostFirst)
/// # .build(patterns)
/// # .unwrap();
/// let mut result = vec![];
/// ac.replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
/// dst.extend(mat.pattern().as_usize().to_string().bytes());
/// mat.pattern() != PatternID::must(2)
/// });
/// assert_eq!(b"0 the 2 to the appendage".to_vec(), result);
/// ```
pub fn replace_all_with_bytes<F>(
&self,
haystack: &[u8],
dst: &mut Vec<u8>,
replace_with: F,
) where
F: FnMut(&Match, &[u8], &mut Vec<u8>) -> bool,
{
self.try_replace_all_with_bytes(haystack, dst, replace_with)
.expect("AhoCorasick::try_replace_all_with_bytes should not fail")
}
/// Returns an iterator of non-overlapping matches in the given
/// stream. Matches correspond to the same matches as reported by
/// [`AhoCorasick::find_iter`].
///
/// The matches yielded by this iterator use absolute position offsets in
/// the stream given, where the first byte has index `0`. Matches are
/// yieled until the stream is exhausted.
///
/// Each item yielded by the iterator is an `Result<Match,
/// std::io::Error>`, where an error is yielded if there was a problem
/// reading from the reader given.
///
/// When searching a stream, an internal buffer is used. Therefore, callers
/// should avoiding providing a buffered reader, if possible.
///
/// This is the infallible version of
/// [`AhoCorasick::try_stream_find_iter`]. Note that both methods return
/// iterators that produce `Result` values. The difference is that this
/// routine panics if _construction_ of the iterator failed. The `Result`
/// values yield by the iterator come from whether the given reader returns
/// an error or not during the search.
///
/// # Memory usage
///
/// In general, searching streams will use a constant amount of memory for
/// its internal buffer. The one requirement is that the internal buffer
/// must be at least the size of the longest possible match. In most use
/// cases, the default buffer size will be much larger than any individual
/// match.
///
/// # Panics
///
/// This panics when [`AhoCorasick::try_stream_find_iter`] would return
/// an error. For example, when the Aho-Corasick searcher doesn't support
/// stream searches. (Only searchers built with [`MatchKind::Standard`]
/// semantics support stream searches.)
///
/// # Example: basic usage
///
/// ```
/// use aho_corasick::{AhoCorasick, PatternID};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::new(patterns).unwrap();
/// let mut matches = vec![];
/// for result in ac.stream_find_iter(haystack.as_bytes()) {
/// let mat = result?;
/// matches.push(mat.pattern());
/// }
/// assert_eq!(vec![
/// PatternID::must(2),
/// PatternID::must(2),
/// PatternID::must(2),
/// ], matches);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
#[cfg(feature = "std")]
pub fn stream_find_iter<'a, R: std::io::Read>(
&'a self,
rdr: R,
) -> StreamFindIter<'a, R> {
self.try_stream_find_iter(rdr)
.expect("AhoCorasick::try_stream_find_iter should not fail")
}
}
/// Fallible search routines. These APIs return an error in cases where the
/// infallible routines would panic.
impl AhoCorasick {
/// Returns the location of the first match according to the match
/// semantics that this automaton was constructed with, and according
/// to the given `Input` configuration.
///
/// This is the fallible version of [`AhoCorasick::find`].
///
/// # Errors
///
/// This returns an error when this Aho-Corasick searcher does not support
/// the given `Input` configuration.
///
/// For example, if the Aho-Corasick searcher only supports anchored
/// searches or only supports unanchored searches, then providing an
/// `Input` that requests an anchored (or unanchored) search when it isn't
/// supported would result in an error.
///
/// # Example: leftmost-first searching
///
/// Basic usage with leftmost-first semantics:
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind, Input};
///
/// let patterns = &["b", "abc", "abcd"];
/// let haystack = "foo abcd";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let mat = ac.try_find(haystack)?.expect("should have a match");
/// assert_eq!("abc", &haystack[mat.span()]);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// # Example: anchored leftmost-first searching
///
/// This shows how to anchor the search, so that even if the haystack
/// contains a match somewhere, a match won't be reported unless one can
/// be found that starts at the beginning of the search:
///
/// ```
/// use aho_corasick::{AhoCorasick, Anchored, Input, MatchKind, StartKind};
///
/// let patterns = &["b", "abc", "abcd"];
/// let haystack = "foo abcd";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .start_kind(StartKind::Anchored)
/// .build(patterns)
/// .unwrap();
/// let input = Input::new(haystack).anchored(Anchored::Yes);
/// assert_eq!(None, ac.try_find(input)?);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// If the beginning of the search is changed to where a match begins, then
/// it will be found:
///
/// ```
/// use aho_corasick::{AhoCorasick, Anchored, Input, MatchKind, StartKind};
///
/// let patterns = &["b", "abc", "abcd"];
/// let haystack = "foo abcd";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .start_kind(StartKind::Anchored)
/// .build(patterns)
/// .unwrap();
/// let input = Input::new(haystack).range(4..).anchored(Anchored::Yes);
/// let mat = ac.try_find(input)?.expect("should have a match");
/// assert_eq!("abc", &haystack[mat.span()]);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// # Example: earliest leftmost-first searching
///
/// This shows how to run an "earliest" search even when the Aho-Corasick
/// searcher was compiled with leftmost-first match semantics. In this
/// case, the search is stopped as soon as it is known that a match has
/// occurred, even if it doesn't correspond to the leftmost-first match.
///
/// ```
/// use aho_corasick::{AhoCorasick, Input, MatchKind};
///
/// let patterns = &["b", "abc", "abcd"];
/// let haystack = "foo abcd";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let input = Input::new(haystack).earliest(true);
/// let mat = ac.try_find(input)?.expect("should have a match");
/// assert_eq!("b", &haystack[mat.span()]);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn try_find<'h, I: Into<Input<'h>>>(
&self,
input: I,
) -> Result<Option<Match>, MatchError> {
let input = input.into();
enforce_anchored_consistency(self.start_kind, input.get_anchored())?;
self.aut.try_find(&input)
}
/// Returns the location of the first overlapping match in the given
/// input with respect to the current state of the underlying searcher.
///
/// Overlapping searches do not report matches in their return value.
/// Instead, matches can be accessed via [`OverlappingState::get_match`]
/// after a search call.
///
/// This is the fallible version of [`AhoCorasick::find_overlapping`].
///
/// # Errors
///
/// This returns an error when this Aho-Corasick searcher does not support
/// the given `Input` configuration or if overlapping search is not
/// supported.
///
/// One example is that only Aho-Corasicker searchers built with
/// [`MatchKind::Standard`] semantics support overlapping searches. Using
/// any other match semantics will result in this returning an error.
///
/// # Example: basic usage
///
/// This shows how we can repeatedly call an overlapping search without
/// ever needing to explicitly re-slice the haystack. Overlapping search
/// works this way because searches depend on state saved during the
/// previous search.
///
/// ```
/// use aho_corasick::{
/// automaton::OverlappingState,
/// AhoCorasick, Input, Match,
/// };
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::new(patterns).unwrap();
/// let mut state = OverlappingState::start();
///
/// ac.try_find_overlapping(haystack, &mut state)?;
/// assert_eq!(Some(Match::must(2, 0..3)), state.get_match());
///
/// ac.try_find_overlapping(haystack, &mut state)?;
/// assert_eq!(Some(Match::must(0, 0..6)), state.get_match());
///
/// ac.try_find_overlapping(haystack, &mut state)?;
/// assert_eq!(Some(Match::must(2, 11..14)), state.get_match());
///
/// ac.try_find_overlapping(haystack, &mut state)?;
/// assert_eq!(Some(Match::must(2, 22..25)), state.get_match());
///
/// ac.try_find_overlapping(haystack, &mut state)?;
/// assert_eq!(Some(Match::must(0, 22..28)), state.get_match());
///
/// ac.try_find_overlapping(haystack, &mut state)?;
/// assert_eq!(Some(Match::must(1, 22..31)), state.get_match());
///
/// // No more match matches to be found.
/// ac.try_find_overlapping(haystack, &mut state)?;
/// assert_eq!(None, state.get_match());
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// # Example: implementing your own overlapping iteration
///
/// The previous example can be easily adapted to implement your own
/// iteration by repeatedly calling `try_find_overlapping` until either
/// an error occurs or no more matches are reported.
///
/// This is effectively equivalent to the iterator returned by
/// [`AhoCorasick::try_find_overlapping_iter`], with the only difference
/// being that the iterator checks for errors before construction and
/// absolves the caller of needing to check for errors on every search
/// call. (Indeed, if the first `try_find_overlapping` call succeeds and
/// the same `Input` is given to subsequent calls, then all subsequent
/// calls are guaranteed to succeed.)
///
/// ```
/// use aho_corasick::{
/// automaton::OverlappingState,
/// AhoCorasick, Input, Match,
/// };
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::new(patterns).unwrap();
/// let mut state = OverlappingState::start();
/// let mut matches = vec![];
///
/// loop {
/// ac.try_find_overlapping(haystack, &mut state)?;
/// let mat = match state.get_match() {
/// None => break,
/// Some(mat) => mat,
/// };
/// matches.push(mat);
/// }
/// let expected = vec![
/// Match::must(2, 0..3),
/// Match::must(0, 0..6),
/// Match::must(2, 11..14),
/// Match::must(2, 22..25),
/// Match::must(0, 22..28),
/// Match::must(1, 22..31),
/// ];
/// assert_eq!(expected, matches);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// # Example: anchored iteration
///
/// The previous example can also be adapted to implement
/// iteration over all anchored matches. In particular,
/// [`AhoCorasick::try_find_overlapping_iter`] does not support this
/// because it isn't totally clear what the match semantics ought to be.
///
/// In this example, we will find all overlapping matches that start at
/// the beginning of our search.
///
/// ```
/// use aho_corasick::{
/// automaton::OverlappingState,
/// AhoCorasick, Anchored, Input, Match, StartKind,
/// };
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .start_kind(StartKind::Anchored)
/// .build(patterns)
/// .unwrap();
/// let input = Input::new(haystack).anchored(Anchored::Yes);
/// let mut state = OverlappingState::start();
/// let mut matches = vec![];
///
/// loop {
/// ac.try_find_overlapping(input.clone(), &mut state)?;
/// let mat = match state.get_match() {
/// None => break,
/// Some(mat) => mat,
/// };
/// matches.push(mat);
/// }
/// let expected = vec![
/// Match::must(2, 0..3),
/// Match::must(0, 0..6),
/// ];
/// assert_eq!(expected, matches);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn try_find_overlapping<'h, I: Into<Input<'h>>>(
&self,
input: I,
state: &mut OverlappingState,
) -> Result<(), MatchError> {
let input = input.into();
enforce_anchored_consistency(self.start_kind, input.get_anchored())?;
self.aut.try_find_overlapping(&input, state)
}
/// Returns an iterator of non-overlapping matches, using the match
/// semantics that this automaton was constructed with.
///
/// This is the fallible version of [`AhoCorasick::find_iter`].
///
/// Note that the error returned by this method occurs during construction
/// of the iterator. The iterator itself yields `Match` values. That is,
/// once the iterator is constructed, the iteration itself will never
/// report an error.
///
/// # Errors
///
/// This returns an error when this Aho-Corasick searcher does not support
/// the given `Input` configuration.
///
/// For example, if the Aho-Corasick searcher only supports anchored
/// searches or only supports unanchored searches, then providing an
/// `Input` that requests an anchored (or unanchored) search when it isn't
/// supported would result in an error.
///
/// # Example: leftmost-first searching
///
/// Basic usage with leftmost-first semantics:
///
/// ```
/// use aho_corasick::{AhoCorasick, Input, MatchKind, PatternID};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let matches: Vec<PatternID> = ac
/// .try_find_iter(Input::new(haystack))?
/// .map(|mat| mat.pattern())
/// .collect();
/// assert_eq!(vec![
/// PatternID::must(0),
/// PatternID::must(2),
/// PatternID::must(0),
/// ], matches);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// # Example: anchored leftmost-first searching
///
/// This shows how to anchor the search, such that all matches must begin
/// at the starting location of the search. For an iterator, an anchored
/// search implies that all matches are adjacent.
///
/// ```
/// use aho_corasick::{
/// AhoCorasick, Anchored, Input, MatchKind, PatternID, StartKind,
/// };
///
/// let patterns = &["foo", "bar", "quux"];
/// let haystack = "fooquuxbar foo";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .start_kind(StartKind::Anchored)
/// .build(patterns)
/// .unwrap();
/// let matches: Vec<PatternID> = ac
/// .try_find_iter(Input::new(haystack).anchored(Anchored::Yes))?
/// .map(|mat| mat.pattern())
/// .collect();
/// assert_eq!(vec![
/// PatternID::must(0),
/// PatternID::must(2),
/// PatternID::must(1),
/// // The final 'foo' is not found because it is not adjacent to the
/// // 'bar' match. It needs to be adjacent because our search is
/// // anchored.
/// ], matches);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn try_find_iter<'a, 'h, I: Into<Input<'h>>>(
&'a self,
input: I,
) -> Result<FindIter<'a, 'h>, MatchError> {
let input = input.into();
enforce_anchored_consistency(self.start_kind, input.get_anchored())?;
Ok(FindIter(self.aut.try_find_iter(input)?))
}
/// Returns an iterator of overlapping matches.
///
/// This is the fallible version of [`AhoCorasick::find_overlapping_iter`].
///
/// Note that the error returned by this method occurs during construction
/// of the iterator. The iterator itself yields `Match` values. That is,
/// once the iterator is constructed, the iteration itself will never
/// report an error.
///
/// # Errors
///
/// This returns an error when this Aho-Corasick searcher does not support
/// the given `Input` configuration or does not support overlapping
/// searches.
///
/// One example is that only Aho-Corasicker searchers built with
/// [`MatchKind::Standard`] semantics support overlapping searches. Using
/// any other match semantics will result in this returning an error.
///
/// # Example: basic usage
///
/// ```
/// use aho_corasick::{AhoCorasick, Input, PatternID};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::new(patterns).unwrap();
/// let matches: Vec<PatternID> = ac
/// .try_find_overlapping_iter(Input::new(haystack))?
/// .map(|mat| mat.pattern())
/// .collect();
/// assert_eq!(vec![
/// PatternID::must(2),
/// PatternID::must(0),
/// PatternID::must(2),
/// PatternID::must(2),
/// PatternID::must(0),
/// PatternID::must(1),
/// ], matches);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// # Example: anchored overlapping search returns an error
///
/// It isn't clear what the match semantics for anchored overlapping
/// iterators *ought* to be, so currently an error is returned. Callers
/// may use [`AhoCorasick::try_find_overlapping`] to implement their own
/// semantics if desired.
///
/// ```
/// use aho_corasick::{AhoCorasick, Anchored, Input, StartKind};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "appendappendage app";
///
/// let ac = AhoCorasick::builder()
/// .start_kind(StartKind::Anchored)
/// .build(patterns)
/// .unwrap();
/// let input = Input::new(haystack).anchored(Anchored::Yes);
/// assert!(ac.try_find_overlapping_iter(input).is_err());
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn try_find_overlapping_iter<'a, 'h, I: Into<Input<'h>>>(
&'a self,
input: I,
) -> Result<FindOverlappingIter<'a, 'h>, MatchError> {
let input = input.into();
enforce_anchored_consistency(self.start_kind, input.get_anchored())?;
Ok(FindOverlappingIter(self.aut.try_find_overlapping_iter(input)?))
}
/// Replace all matches with a corresponding value in the `replace_with`
/// slice given. Matches correspond to the same matches as reported by
/// [`AhoCorasick::try_find_iter`].
///
/// Replacements are determined by the index of the matching pattern.
/// For example, if the pattern with index `2` is found, then it is
/// replaced by `replace_with[2]`.
///
/// # Panics
///
/// This panics when `replace_with.len()` does not equal
/// [`AhoCorasick::patterns_len`].
///
/// # Errors
///
/// This returns an error when this Aho-Corasick searcher does not support
/// the default `Input` configuration. More specifically, this occurs only
/// when the Aho-Corasick searcher does not support unanchored searches
/// since this replacement routine always does an unanchored search.
///
/// # Example: basic usage
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let result = ac.try_replace_all(haystack, &["x", "y", "z"])?;
/// assert_eq!("x the z to the xage", result);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn try_replace_all<B>(
&self,
haystack: &str,
replace_with: &[B],
) -> Result<String, MatchError>
where
B: AsRef<str>,
{
enforce_anchored_consistency(self.start_kind, Anchored::No)?;
self.aut.try_replace_all(haystack, replace_with)
}
/// Replace all matches using raw bytes with a corresponding value in the
/// `replace_with` slice given. Matches correspond to the same matches as
/// reported by [`AhoCorasick::try_find_iter`].
///
/// Replacements are determined by the index of the matching pattern.
/// For example, if the pattern with index `2` is found, then it is
/// replaced by `replace_with[2]`.
///
/// This is the fallible version of [`AhoCorasick::replace_all_bytes`].
///
/// # Panics
///
/// This panics when `replace_with.len()` does not equal
/// [`AhoCorasick::patterns_len`].
///
/// # Errors
///
/// This returns an error when this Aho-Corasick searcher does not support
/// the default `Input` configuration. More specifically, this occurs only
/// when the Aho-Corasick searcher does not support unanchored searches
/// since this replacement routine always does an unanchored search.
///
/// # Example: basic usage
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = b"append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let result = ac.try_replace_all_bytes(haystack, &["x", "y", "z"])?;
/// assert_eq!(b"x the z to the xage".to_vec(), result);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn try_replace_all_bytes<B>(
&self,
haystack: &[u8],
replace_with: &[B],
) -> Result<Vec<u8>, MatchError>
where
B: AsRef<[u8]>,
{
enforce_anchored_consistency(self.start_kind, Anchored::No)?;
self.aut.try_replace_all_bytes(haystack, replace_with)
}
/// Replace all matches using a closure called on each match.
/// Matches correspond to the same matches as reported by
/// [`AhoCorasick::try_find_iter`].
///
/// The closure accepts three parameters: the match found, the text of
/// the match and a string buffer with which to write the replaced text
/// (if any). If the closure returns `true`, then it continues to the next
/// match. If the closure returns `false`, then searching is stopped.
///
/// Note that any matches with boundaries that don't fall on a valid UTF-8
/// boundary are silently skipped.
///
/// This is the fallible version of [`AhoCorasick::replace_all_with`].
///
/// # Errors
///
/// This returns an error when this Aho-Corasick searcher does not support
/// the default `Input` configuration. More specifically, this occurs only
/// when the Aho-Corasick searcher does not support unanchored searches
/// since this replacement routine always does an unanchored search.
///
/// # Examples
///
/// Basic usage:
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let mut result = String::new();
/// ac.try_replace_all_with(haystack, &mut result, |mat, _, dst| {
/// dst.push_str(&mat.pattern().as_usize().to_string());
/// true
/// })?;
/// assert_eq!("0 the 2 to the 0age", result);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// Stopping the replacement by returning `false` (continued from the
/// example above):
///
/// ```
/// # use aho_corasick::{AhoCorasick, MatchKind, PatternID};
/// # let patterns = &["append", "appendage", "app"];
/// # let haystack = "append the app to the appendage";
/// # let ac = AhoCorasick::builder()
/// # .match_kind(MatchKind::LeftmostFirst)
/// # .build(patterns)
/// # .unwrap();
/// let mut result = String::new();
/// ac.try_replace_all_with(haystack, &mut result, |mat, _, dst| {
/// dst.push_str(&mat.pattern().as_usize().to_string());
/// mat.pattern() != PatternID::must(2)
/// })?;
/// assert_eq!("0 the 2 to the appendage", result);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn try_replace_all_with<F>(
&self,
haystack: &str,
dst: &mut String,
replace_with: F,
) -> Result<(), MatchError>
where
F: FnMut(&Match, &str, &mut String) -> bool,
{
enforce_anchored_consistency(self.start_kind, Anchored::No)?;
self.aut.try_replace_all_with(haystack, dst, replace_with)
}
/// Replace all matches using raw bytes with a closure called on each
/// match. Matches correspond to the same matches as reported by
/// [`AhoCorasick::try_find_iter`].
///
/// The closure accepts three parameters: the match found, the text of
/// the match and a byte buffer with which to write the replaced text
/// (if any). If the closure returns `true`, then it continues to the next
/// match. If the closure returns `false`, then searching is stopped.
///
/// This is the fallible version of
/// [`AhoCorasick::replace_all_with_bytes`].
///
/// # Errors
///
/// This returns an error when this Aho-Corasick searcher does not support
/// the default `Input` configuration. More specifically, this occurs only
/// when the Aho-Corasick searcher does not support unanchored searches
/// since this replacement routine always does an unanchored search.
///
/// # Examples
///
/// Basic usage:
///
/// ```
/// use aho_corasick::{AhoCorasick, MatchKind};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = b"append the app to the appendage";
///
/// let ac = AhoCorasick::builder()
/// .match_kind(MatchKind::LeftmostFirst)
/// .build(patterns)
/// .unwrap();
/// let mut result = vec![];
/// ac.try_replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
/// dst.extend(mat.pattern().as_usize().to_string().bytes());
/// true
/// })?;
/// assert_eq!(b"0 the 2 to the 0age".to_vec(), result);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// Stopping the replacement by returning `false` (continued from the
/// example above):
///
/// ```
/// # use aho_corasick::{AhoCorasick, MatchKind, PatternID};
/// # let patterns = &["append", "appendage", "app"];
/// # let haystack = b"append the app to the appendage";
/// # let ac = AhoCorasick::builder()
/// # .match_kind(MatchKind::LeftmostFirst)
/// # .build(patterns)
/// # .unwrap();
/// let mut result = vec![];
/// ac.try_replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
/// dst.extend(mat.pattern().as_usize().to_string().bytes());
/// mat.pattern() != PatternID::must(2)
/// })?;
/// assert_eq!(b"0 the 2 to the appendage".to_vec(), result);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn try_replace_all_with_bytes<F>(
&self,
haystack: &[u8],
dst: &mut Vec<u8>,
replace_with: F,
) -> Result<(), MatchError>
where
F: FnMut(&Match, &[u8], &mut Vec<u8>) -> bool,
{
enforce_anchored_consistency(self.start_kind, Anchored::No)?;
self.aut.try_replace_all_with_bytes(haystack, dst, replace_with)
}
/// Returns an iterator of non-overlapping matches in the given
/// stream. Matches correspond to the same matches as reported by
/// [`AhoCorasick::try_find_iter`].
///
/// The matches yielded by this iterator use absolute position offsets in
/// the stream given, where the first byte has index `0`. Matches are
/// yieled until the stream is exhausted.
///
/// Each item yielded by the iterator is an `Result<Match,
/// std::io::Error>`, where an error is yielded if there was a problem
/// reading from the reader given.
///
/// When searching a stream, an internal buffer is used. Therefore, callers
/// should avoiding providing a buffered reader, if possible.
///
/// This is the fallible version of [`AhoCorasick::stream_find_iter`].
/// Note that both methods return iterators that produce `Result` values.
/// The difference is that this routine returns an error if _construction_
/// of the iterator failed. The `Result` values yield by the iterator
/// come from whether the given reader returns an error or not during the
/// search.
///
/// # Memory usage
///
/// In general, searching streams will use a constant amount of memory for
/// its internal buffer. The one requirement is that the internal buffer
/// must be at least the size of the longest possible match. In most use
/// cases, the default buffer size will be much larger than any individual
/// match.
///
/// # Errors
///
/// This returns an error when this Aho-Corasick searcher does not support
/// the default `Input` configuration. More specifically, this occurs only
/// when the Aho-Corasick searcher does not support unanchored searches
/// since this stream searching routine always does an unanchored search.
///
/// This also returns an error if the searcher does not support stream
/// searches. Only searchers built with [`MatchKind::Standard`] semantics
/// support stream searches.
///
/// # Example: basic usage
///
/// ```
/// use aho_corasick::{AhoCorasick, PatternID};
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::new(patterns).unwrap();
/// let mut matches = vec![];
/// for result in ac.try_stream_find_iter(haystack.as_bytes())? {
/// let mat = result?;
/// matches.push(mat.pattern());
/// }
/// assert_eq!(vec![
/// PatternID::must(2),
/// PatternID::must(2),
/// PatternID::must(2),
/// ], matches);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
#[cfg(feature = "std")]
pub fn try_stream_find_iter<'a, R: std::io::Read>(
&'a self,
rdr: R,
) -> Result<StreamFindIter<'a, R>, MatchError> {
enforce_anchored_consistency(self.start_kind, Anchored::No)?;
self.aut.try_stream_find_iter(rdr).map(StreamFindIter)
}
/// Search for and replace all matches of this automaton in
/// the given reader, and write the replacements to the given
/// writer. Matches correspond to the same matches as reported by
/// [`AhoCorasick::try_find_iter`].
///
/// Replacements are determined by the index of the matching pattern. For
/// example, if the pattern with index `2` is found, then it is replaced by
/// `replace_with[2]`.
///
/// After all matches are replaced, the writer is _not_ flushed.
///
/// If there was a problem reading from the given reader or writing to the
/// given writer, then the corresponding `io::Error` is returned and all
/// replacement is stopped.
///
/// When searching a stream, an internal buffer is used. Therefore, callers
/// should avoiding providing a buffered reader, if possible. However,
/// callers may want to provide a buffered writer.
///
/// Note that there is currently no infallible version of this routine.
///
/// # Memory usage
///
--> --------------------
--> maximum size reached
--> --------------------
[ Dauer der Verarbeitung: 0.43 Sekunden
(vorverarbeitet)
]
|
2026-04-04
|
|
|
|
|