lib.rs |
Normalizing text into Unicode Normalization Forms.
This module is published as its own crate ([`icu_normalizer`](https://docs.rs/icu_normalizer/latest/icu_normalizer/))
and as part of the [`icu`](https://docs.rs/icu/latest/icu/) crate. See the latter for more details on the ICU4X project.
# Functionality
The top level of the crate provides normalization of input into the four normalization forms defined in [UAX #15: Unicode
Normalization Forms](https://www.unicode.org/reports/tr15/): NFC, NFD, NFKC, and NFKD.
Three kinds of contiguous inputs are supported: known-well-formed UTF-8 (`&str`), potentially-not-well-formed UTF-8,
and potentially-not-well-formed UTF-16. Additionally, an iterator over `char` can be wrapped in a normalizing iterator.
The `uts46` module provides the combination of mapping and normalization operations for [UTS #46: Unicode IDNA
Compatibility Processing](https://www.unicode.org/reports/tr46/). This functionality is not meant to be used by
applications directly. Instead, it is meant as a building block for a full implementation of UTS #46, such as the
[`idna`](https://docs.rs/idna/latest/idna/) crate.
The `properties` module provides the non-recursive canonical decomposition operation on a per `char` basis and
the canonical compositon operation given two `char`s. It also provides access to the Canonical Combining Class
property. These operations are primarily meant for [HarfBuzz](https://harfbuzz.github.io/) via the
[`icu_harfbuzz`](https://docs.rs/icu_harfbuzz/latest/icu_harfbuzz/) crate.
Notably, this normalizer does _not_ provide the normalization “quick check” that can result in “maybe” in
addition to “yes” and “no”. The normalization checks provided by this crate always give a definitive
non-“maybe” answer.
# Examples
```
let nfc = icu_normalizer::ComposingNormalizerBorrowed::new_nfc();
assert_eq!(nfc.normalize("a\u{0308}"), "ä");
assert!(nfc.is_normalized("ä"));
let nfd = icu_normalizer::DecomposingNormalizerBorrowed::new_nfd();
assert_eq!(nfd.normalize("ä"), "a\u{0308}");
assert!(!nfd.is_normalized("ä"));
``` |
123369 |
properties.rs |
Access to the Unicode properties or property-based operations that
are required for NFC and NFD.
Applications should generally use the full normalizers that are
provided at the top level of this crate. However, the APIs in this
module are provided for callers such as HarfBuzz that specifically
want access to the raw canonical composition operation e.g. for use in a
glyph-availability-guided custom normalizer. |
26285 |
provider.rs |
🚧 \[Unstable\] Data provider struct definitions for this ICU4X component.
<div class="stab unstable">
🚧 This code is considered unstable; it may change at any time, in breaking or non-breaking ways,
including in SemVer minor releases. While the serde representation of data structs is guaranteed
to be stable, their Rust representation might not be. Use with caution.
</div>
Read more about data providers: [`icu_provider`] |
8158 |
uts46.rs |
Bundles the part of UTS 46 that makes sense to implement as a
normalization.
This is meant to be used as a building block of an UTS 46
implementation, such as the `idna` crate. |
6850 |