Name Description Size
arithmetic.rs # Arithmetic This parses arithmetic expressions and directly evaluates them. ```rust 171
error.rs # Custom Errors A lot can be accomplished with the built-in error tools, like: - [`ContextError`] - [`Parser::context`] - [`cut_err`] *(see [tutorial][chapter_7])* Most other needs can likely be met by using a custom context type with [`ContextError`] instead of [`StrContext`]. This will require implementing a custom renderer. ## `ParserError` Trait When needed, you can also create your own type that implements [`ParserError`]. Optional traits include: - [`AddContext`] - [`FromExternalError`] - [`ErrorConvert`] There are multiple strategies for implementing support for [`AddContext`] and [`FromExternalError`]: - Make your error type generic over the context or external error - Require a trait for the context or external error and `Box` it - Make the context an enum like [`StrContext`] - Implement the trait multiple times, one for each concrete context or external error type, allowing custom behavior per type Example: ```rust 1431
fromstr.rs # Implementing `FromStr` The [`FromStr` trait][std::str::FromStr] provides a common interface to parse from a string. ```rust 213
http.rs # HTTP ```rust 91
ini.rs # INI ```rust 89
json.rs # json ```rust,ignore 107
language.rs # Elements of Programming Languages These are short recipes for accomplishing common tasks. * [Whitespace](#whitespace) + [Wrapper combinators that eat whitespace before and after a parser](#wrapper-combinators-that-eat-whitespace-before-and-after-a-parser) * [Comments](#comments) + [`// C++/EOL-style comments`](#-ceol-style-comments) + [`/* C-style comments */`](#-c-style-comments-) * [Identifiers](#identifiers) + [`Rust-Style Identifiers`](#rust-style-identifiers) * [Literal Values](#literal-values) + [Escaped Strings](#escaped-strings) + [Integers](#integers) - [Hexadecimal](#hexadecimal) - [Octal](#octal) - [Binary](#binary) - [Decimal](#decimal) + [Floating Point Numbers](#floating-point-numbers) ## Whitespace ### Wrapper combinators that eat whitespace before and after a parser ```rust use winnow::prelude::*; use winnow::{ error::ParserError, combinator::delimited, ascii::multispace0, }; /// A combinator that takes a parser `inner` and produces a parser that also consumes both leading and /// trailing whitespace, returning the output of `inner`. fn ws<'a, F, O, E: ParserError<&'a str>>(inner: F) -> impl Parser<&'a str, O, E> where F: Parser<&'a str, O, E>, { delimited( multispace0, inner, multispace0 ) } ``` To eat only trailing whitespace, replace `delimited(...)` with `terminated(&inner, multispace0)`. Likewise, the eat only leading whitespace, replace `delimited(...)` with `preceded(multispace0, &inner)`. You can use your own parser instead of `multispace0` if you want to skip a different set of lexemes. ## Comments ### `// C++/EOL-style comments` This version uses `%` to start a comment, does not consume the newline character, and returns an output of `()`. ```rust use winnow::prelude::*; use winnow::{ error::ParserError, token::take_till, }; pub fn peol_comment<'a, E: ParserError<&'a str>>(i: &mut &'a str) -> ModalResult<(), E> { ('%', take_till(1.., ['\n', '\r'])) .void() // Output is thrown away. .parse_next(i) } ``` ### `/* C-style comments */` Inline comments surrounded with sentinel literals `(*` and `*)`. This version returns an output of `()` and does not handle nested comments. ```rust use winnow::prelude::*; use winnow::{ error::ParserError, token::take_until, }; pub fn pinline_comment<'a, E: ParserError<&'a str>>(i: &mut &'a str) -> ModalResult<(), E> { ( "(*", take_until(0.., "*)"), "*)" ) .void() // Output is thrown away. .parse_next(i) } ``` ## Identifiers ### `Rust-Style Identifiers` Parsing identifiers that may start with a letter (or underscore) and may contain underscores, letters and numbers may be parsed like this: ```rust use winnow::prelude::*; use winnow::{ stream::AsChar, token::take_while, token::one_of, }; pub fn identifier<'s>(input: &mut &'s str) -> ModalResult<&'s str> { ( one_of(|c: char| c.is_alpha() || c == '_'), take_while(0.., |c: char| c.is_alphanum() || c == '_') ) .take() .parse_next(input) } ``` Let's say we apply this to the identifier `hello_world123abc`. The first element of the tuple would uses [`one_of`][crate::token::one_of] which would take `h`. The tuple ensures that `ello_world123abc` will be piped to the next [`take_while`][crate::token::take_while] parser, which takes every remaining character. However, the tuple returns a tuple of the results of its sub-parsers. The [`take`][crate::Parser::take] parser produces a `&str` of the input text that was parsed, which in this case is the entire `&str` `hello_world123abc`. ## Literal Values ### Escaped Strings ```rust 9109
lexing.rs # Lexing and Parsing ## Parse to AST The simplest way to write a parser is to parse directly to the AST. Example: ```rust 546
mod.rs # Special Topics These are short recipes for accomplishing common tasks. - [Why `winnow`?][why] - [For `nom` users][nom] - Formats: - [Elements of Programming Languages][language] - [Arithmetic][arithmetic] - [s-expression][s_expression] - [json] - [INI][ini] - [HTTP][http] - Special Topics: - [Implementing `FromStr`][fromstr] - [Performance][performance] - [Parsing Partial Input][partial] - [Lexing and Parsing][lexing] - [Custom stream or token][stream] - [Custom errors][error] - [Debugging][crate::_tutorial::chapter_8] See also parsers written with `winnow`: - [`toml_edit`](https://crates.io/crates/toml_edit) - [`hcl-edit`](https://crates.io/crates/hcl-edit) 1068
nom.rs 9713
partial.rs # Parsing Partial Input Typically, the input being parsed is all in-memory, or is complete. Some data sources are too large to fit into memory, only allowing parsing an incomplete or [`Partial`] subset of the data, requiring incrementally parsing. By wrapping a stream, like `&[u8]`, with [`Partial`], parsers will report when the data is [`Incomplete`] and more input is [`Needed`], allowing the caller to stream-in additional data to be parsed. The data is then parsed a chunk at a time. Chunks are typically defined by either: - A header reporting the number of bytes, like with [`length_and_then`] - [`Partial`] can explicitly be changed to being complete once the specified bytes are acquired via [`StreamIsPartial::complete`]. - A delimiter, like with [ndjson](https://github.com/ndjson/ndjson-spec/) - You can parse up-to the delimiter or do a `take_until(0.., delim).and_then(parser)` If the chunks are not homogeneous, a state machine will be needed to track what the expected parser is for the next chunk. Caveats: - `winnow` takes the approach of re-parsing from scratch. Chunks should be relatively small to prevent the re-parsing overhead from dominating. - Parsers like [`repeat`] do not know when an `eof` is from insufficient data or the end of the stream, causing them to always report [`Incomplete`]. # Example `main.rs`: ```rust,ignore 1909
performance.rs # Performance ## Runtime Performance See also the general Rust [Performance Book](https://nnethercote.github.io/perf-book/) Tips - Try `cargo add winnow -F simd`. For some it offers significant performance improvements - When enough cases of an [`alt`] have unique prefixes, prefer [`dispatch`] - When parsing text, try to parse as bytes (`u8`) rather than `char`s ([`BStr`] can make debugging easier) - Find simplified subsets of the grammar to parse, falling back to the full grammar when it doesn't work. For example, when parsing json strings, parse them without support for escapes, falling back to escape support if it fails. - Watch for large return types. A surprising place these can show up is when chaining parsers with a tuple. ## Build-time Performance Returning complex types as `impl Trait` can negatively impact build times. This can hit in surprising cases like: ```rust # use winnow::prelude::*; fn foo<I, O, E>() -> impl Parser<I, O, E> # where # I: winnow::stream::Stream<Token=O>, # I: winnow::stream::StreamIsPartial, # E: winnow::error::ParserError<I>, { // ...some chained combinators... # winnow::token::any } ``` Instead, wrap the combinators in a closure to simplify the type: ```rust # use winnow::prelude::*; fn foo<I, O, E>() -> impl Parser<I, O, E> # where # I: winnow::stream::Stream<Token=O>, # I: winnow::stream::StreamIsPartial, # E: winnow::error::ParserError<I>, { move |input: &mut I| { // ...some chained combinators... # winnow::token::any .parse_next(input) } } ``` 1878
s_expression.rs # s-expression ```rust 107
stream.rs # Custom [`Stream`] `winnow` is batteries included with support for - Basic inputs like `&str`, newtypes with - Improved debug output like [`Bytes`] - [`Stateful`] for passing state through your parser, like tracking recursion depth - [`LocatingSlice`] for looking up the absolute position of a token ## Implementing a custom token The first level of customization is parsing [`&[MyItem]`][Stream#impl-Stream-for-%26%5BT%5D] or [`TokenSlice<MyItem>`]. The basic traits you may want for a custom token type are: | trait | usage | |---|---| | [`AsChar`] |Transforms common types to a char for basic token parsing| | [`ContainsToken`] |Look for the token in the given set| See also [`TokenSlice<MyItem>`], [lexing]. ## Implementing a custom stream Let's assume we have an input type we'll call `MyStream`. `MyStream` is a sequence of `MyItem` tokens. The goal is to define parsers with this signature: `&mut MyStream -> ModalResult<Output>`. ```rust # use winnow::prelude::*; # type MyStream<'i> = &'i str; # type Output<'i> = &'i str; fn parser<'s>(i: &mut MyStream<'s>) -> ModalResult<Output<'s>> { "test".parse_next(i) } ``` Like above, you'll need to implement the related token traits for `MyItem`. The traits you may want to implement for `MyStream` include: | trait | usage | |---|---| | [`Stream`] |Core trait for driving parsing| | [`StreamIsPartial`] | Marks the input as being the complete buffer or a partial buffer for streaming input | | [`AsBytes`] |Casts the input type to a byte slice| | [`AsBStr`] |Casts the input type to a slice of ASCII / UTF-8-like bytes| | [`Compare`] |Character comparison operations| | [`FindSlice`] |Look for a substring in self| | [`Location`] |Calculate location within initial input| | [`Offset`] |Calculate the offset between slices| And for `&[MyItem]` (slices returned by [`Stream`]): | trait | usage | |---|---| | [`SliceLen`] |Calculate the input length| | [`ParseSlice`] |Used to integrate `&str`'s `parse()` method| 2305
why.rs 5479