arithmetic.rs |
# Arithmetic
This parses arithmetic expressions and directly evaluates them.
```rust |
171 |
error.rs |
# Custom Errors
A lot can be accomplished with the built-in error tools, like:
- [`ContextError`]
- [`Parser::context`]
- [`cut_err`]
*(see [tutorial][chapter_7])*
Most other needs can likely be met by using a custom context type with [`ContextError`] instead
of [`StrContext`].
This will require implementing a custom renderer.
## `ParserError` Trait
When needed, you can also create your own type that implements [`ParserError`].
Optional traits include:
- [`AddContext`]
- [`FromExternalError`]
- [`ErrorConvert`]
There are multiple strategies for implementing support for [`AddContext`] and [`FromExternalError`]:
- Make your error type generic over the context or external error
- Require a trait for the context or external error and `Box` it
- Make the context an enum like [`StrContext`]
- Implement the trait multiple times, one for each concrete context or external error type,
allowing custom behavior per type
Example:
```rust |
1431 |
fromstr.rs |
# Implementing `FromStr`
The [`FromStr` trait][std::str::FromStr] provides
a common interface to parse from a string.
```rust |
213 |
http.rs |
# HTTP
```rust |
91 |
ini.rs |
# INI
```rust |
89 |
json.rs |
# json
```rust,ignore |
107 |
language.rs |
# Elements of Programming Languages
These are short recipes for accomplishing common tasks.
* [Whitespace](#whitespace)
+ [Wrapper combinators that eat whitespace before and after a parser](#wrapper-combinators-that-eat-whitespace-before-and-after-a-parser)
* [Comments](#comments)
+ [`// C++/EOL-style comments`](#-ceol-style-comments)
+ [`/* C-style comments */`](#-c-style-comments-)
* [Identifiers](#identifiers)
+ [`Rust-Style Identifiers`](#rust-style-identifiers)
* [Literal Values](#literal-values)
+ [Escaped Strings](#escaped-strings)
+ [Integers](#integers)
- [Hexadecimal](#hexadecimal)
- [Octal](#octal)
- [Binary](#binary)
- [Decimal](#decimal)
+ [Floating Point Numbers](#floating-point-numbers)
## Whitespace
### Wrapper combinators that eat whitespace before and after a parser
```rust
use winnow::prelude::*;
use winnow::{
error::ParserError,
combinator::delimited,
ascii::multispace0,
};
/// A combinator that takes a parser `inner` and produces a parser that also consumes both leading and
/// trailing whitespace, returning the output of `inner`.
fn ws<'a, F, O, E: ParserError<&'a str>>(inner: F) -> impl Parser<&'a str, O, E>
where
F: Parser<&'a str, O, E>,
{
delimited(
multispace0,
inner,
multispace0
)
}
```
To eat only trailing whitespace, replace `delimited(...)` with `terminated(&inner, multispace0)`.
Likewise, the eat only leading whitespace, replace `delimited(...)` with `preceded(multispace0,
&inner)`. You can use your own parser instead of `multispace0` if you want to skip a different set
of lexemes.
## Comments
### `// C++/EOL-style comments`
This version uses `%` to start a comment, does not consume the newline character, and returns an
output of `()`.
```rust
use winnow::prelude::*;
use winnow::{
error::ParserError,
token::take_till,
};
pub fn peol_comment<'a, E: ParserError<&'a str>>(i: &mut &'a str) -> ModalResult<(), E>
{
('%', take_till(1.., ['\n', '\r']))
.void() // Output is thrown away.
.parse_next(i)
}
```
### `/* C-style comments */`
Inline comments surrounded with sentinel literals `(*` and `*)`. This version returns an output of `()`
and does not handle nested comments.
```rust
use winnow::prelude::*;
use winnow::{
error::ParserError,
token::take_until,
};
pub fn pinline_comment<'a, E: ParserError<&'a str>>(i: &mut &'a str) -> ModalResult<(), E> {
(
"(*",
take_until(0.., "*)"),
"*)"
)
.void() // Output is thrown away.
.parse_next(i)
}
```
## Identifiers
### `Rust-Style Identifiers`
Parsing identifiers that may start with a letter (or underscore) and may contain underscores,
letters and numbers may be parsed like this:
```rust
use winnow::prelude::*;
use winnow::{
stream::AsChar,
token::take_while,
token::one_of,
};
pub fn identifier<'s>(input: &mut &'s str) -> ModalResult<&'s str> {
(
one_of(|c: char| c.is_alpha() || c == '_'),
take_while(0.., |c: char| c.is_alphanum() || c == '_')
)
.take()
.parse_next(input)
}
```
Let's say we apply this to the identifier `hello_world123abc`. The first element of the tuple
would uses [`one_of`][crate::token::one_of] which would take `h`. The tuple ensures that
`ello_world123abc` will be piped to the next [`take_while`][crate::token::take_while] parser,
which takes every remaining character. However, the tuple returns a tuple of the results
of its sub-parsers. The [`take`][crate::Parser::take] parser produces a `&str` of the
input text that was parsed, which in this case is the entire `&str` `hello_world123abc`.
## Literal Values
### Escaped Strings
```rust |
9109 |
lexing.rs |
# Lexing and Parsing
## Parse to AST
The simplest way to write a parser is to parse directly to the AST.
Example:
```rust |
546 |
mod.rs |
# Special Topics
These are short recipes for accomplishing common tasks.
- [Why `winnow`?][why]
- [For `nom` users][nom]
- Formats:
- [Elements of Programming Languages][language]
- [Arithmetic][arithmetic]
- [s-expression][s_expression]
- [json]
- [INI][ini]
- [HTTP][http]
- Special Topics:
- [Implementing `FromStr`][fromstr]
- [Performance][performance]
- [Parsing Partial Input][partial]
- [Lexing and Parsing][lexing]
- [Custom stream or token][stream]
- [Custom errors][error]
- [Debugging][crate::_tutorial::chapter_8]
See also parsers written with `winnow`:
- [`toml_edit`](https://crates.io/crates/toml_edit)
- [`hcl-edit`](https://crates.io/crates/hcl-edit) |
1068 |
nom.rs |
|
9713 |
partial.rs |
# Parsing Partial Input
Typically, the input being parsed is all in-memory, or is complete. Some data sources are too
large to fit into memory, only allowing parsing an incomplete or [`Partial`] subset of the
data, requiring incrementally parsing.
By wrapping a stream, like `&[u8]`, with [`Partial`], parsers will report when the data is
[`Incomplete`] and more input is [`Needed`], allowing the caller to stream-in additional data
to be parsed. The data is then parsed a chunk at a time.
Chunks are typically defined by either:
- A header reporting the number of bytes, like with [`length_and_then`]
- [`Partial`] can explicitly be changed to being complete once the specified bytes are
acquired via [`StreamIsPartial::complete`].
- A delimiter, like with [ndjson](https://github.com/ndjson/ndjson-spec/)
- You can parse up-to the delimiter or do a `take_until(0.., delim).and_then(parser)`
If the chunks are not homogeneous, a state machine will be needed to track what the expected
parser is for the next chunk.
Caveats:
- `winnow` takes the approach of re-parsing from scratch. Chunks should be relatively small to
prevent the re-parsing overhead from dominating.
- Parsers like [`repeat`] do not know when an `eof` is from insufficient data or the end of the
stream, causing them to always report [`Incomplete`].
# Example
`main.rs`:
```rust,ignore |
1909 |
performance.rs |
# Performance
## Runtime Performance
See also the general Rust [Performance Book](https://nnethercote.github.io/perf-book/)
Tips
- Try `cargo add winnow -F simd`. For some it offers significant performance improvements
- When enough cases of an [`alt`] have unique prefixes, prefer [`dispatch`]
- When parsing text, try to parse as bytes (`u8`) rather than `char`s ([`BStr`] can make
debugging easier)
- Find simplified subsets of the grammar to parse, falling back to the full grammar when it
doesn't work. For example, when parsing json strings, parse them without support for escapes,
falling back to escape support if it fails.
- Watch for large return types. A surprising place these can show up is when chaining parsers
with a tuple.
## Build-time Performance
Returning complex types as `impl Trait` can negatively impact build times. This can hit in
surprising cases like:
```rust
# use winnow::prelude::*;
fn foo<I, O, E>() -> impl Parser<I, O, E>
# where
# I: winnow::stream::Stream<Token=O>,
# I: winnow::stream::StreamIsPartial,
# E: winnow::error::ParserError<I>,
{
// ...some chained combinators...
# winnow::token::any
}
```
Instead, wrap the combinators in a closure to simplify the type:
```rust
# use winnow::prelude::*;
fn foo<I, O, E>() -> impl Parser<I, O, E>
# where
# I: winnow::stream::Stream<Token=O>,
# I: winnow::stream::StreamIsPartial,
# E: winnow::error::ParserError<I>,
{
move |input: &mut I| {
// ...some chained combinators...
# winnow::token::any
.parse_next(input)
}
}
``` |
1878 |
s_expression.rs |
# s-expression
```rust |
107 |
stream.rs |
# Custom [`Stream`]
`winnow` is batteries included with support for
- Basic inputs like `&str`, newtypes with
- Improved debug output like [`Bytes`]
- [`Stateful`] for passing state through your parser, like tracking recursion
depth
- [`LocatingSlice`] for looking up the absolute position of a token
## Implementing a custom token
The first level of customization is parsing [`&[MyItem]`][Stream#impl-Stream-for-%26%5BT%5D]
or [`TokenSlice<MyItem>`].
The basic traits you may want for a custom token type are:
| trait | usage |
|---|---|
| [`AsChar`] |Transforms common types to a char for basic token parsing|
| [`ContainsToken`] |Look for the token in the given set|
See also [`TokenSlice<MyItem>`], [lexing].
## Implementing a custom stream
Let's assume we have an input type we'll call `MyStream`.
`MyStream` is a sequence of `MyItem` tokens.
The goal is to define parsers with this signature: `&mut MyStream -> ModalResult<Output>`.
```rust
# use winnow::prelude::*;
# type MyStream<'i> = &'i str;
# type Output<'i> = &'i str;
fn parser<'s>(i: &mut MyStream<'s>) -> ModalResult<Output<'s>> {
"test".parse_next(i)
}
```
Like above, you'll need to implement the related token traits for `MyItem`.
The traits you may want to implement for `MyStream` include:
| trait | usage |
|---|---|
| [`Stream`] |Core trait for driving parsing|
| [`StreamIsPartial`] | Marks the input as being the complete buffer or a partial buffer for streaming input |
| [`AsBytes`] |Casts the input type to a byte slice|
| [`AsBStr`] |Casts the input type to a slice of ASCII / UTF-8-like bytes|
| [`Compare`] |Character comparison operations|
| [`FindSlice`] |Look for a substring in self|
| [`Location`] |Calculate location within initial input|
| [`Offset`] |Calculate the offset between slices|
And for `&[MyItem]` (slices returned by [`Stream`]):
| trait | usage |
|---|---|
| [`SliceLen`] |Calculate the input length|
| [`ParseSlice`] |Used to integrate `&str`'s `parse()` method| |
2305 |
why.rs |
|
5479 |