bidi.rs |
This module exposes tooling for running the [unicode bidi algorithm](https://unicode.org/reports/tr9/) using ICU4X data.
`BidiClassAdapter` enables ICU4X to provide data to [`unicode-bidi`], an external crate implementing UAX #9.
✨ *Enabled with the `bidi` Cargo feature.*
# Examples
```
use icu::properties::bidi::BidiClassAdapter;
use icu::properties::maps;
use unicode_bidi::BidiInfo;
// This example text is defined using `concat!` because some browsers
// and text editors have trouble displaying bidi strings.
let text = concat!["א", // RTL#1
"ב", // RTL#2
"ג", // RTL#3
"a", // LTR#1
"b", // LTR#2
"c", // LTR#3
]; //
let adapter = BidiClassAdapter::new(maps::bidi_class());
// Resolve embedding levels within the text. Pass `None` to detect the
// paragraph level automatically.
let bidi_info = BidiInfo::new_with_data_source(&adapter, text, None);
// This paragraph has embedding level 1 because its first strong character is RTL.
assert_eq!(bidi_info.paragraphs.len(), 1);
let para = &bidi_info.paragraphs[0];
assert_eq!(para.level.number(), 1);
assert!(para.level.is_rtl());
// Re-ordering is done after wrapping each paragraph into a sequence of
// lines. For this example, I'll just use a single line that spans the
// entire paragraph.
let line = para.range.clone();
let display = bidi_info.reorder_line(para, line);
assert_eq!(display, concat!["a", // LTR#1
"b", // LTR#2
"c", // LTR#3
"ג", // RTL#3
"ב", // RTL#2
"א", // RTL#1
]);
``` |
5760 |
bidi_data.rs |
Data and APIs for supporting specific Bidi properties data in an efficient structure.
Supported properties are:
- `Bidi_Paired_Bracket`
- `Bidi_Paired_Bracket_Type`
- `Bidi_Mirrored`
- `Bidi_Mirroring_Glyph` |
8941 |
error.rs |
|
1329 |
exemplar_chars.rs |
This module provides APIs for getting exemplar characters for a locale.
Exemplars are characters used by a language, separated into different sets.
The sets are: main, auxiliary, punctuation, numbers, and index.
The sets define, according to typical usage in the language,
which characters occur in which contexts with which frequency.
For more information, see the documentation in the
[Exemplars section in Unicode Technical Standard #35](https://unicode.org/reports/tr35/tr35-general.html#Exemplars)
of the LDML specification.
# Examples
```
use icu::locid::locale;
use icu::properties::exemplar_chars;
let locale = locale!("en-001").into();
let data = exemplar_chars::exemplars_main(&locale)
.expect("locale should be present");
let exemplars_main = data.as_borrowed();
assert!(exemplars_main.contains_char('a'));
assert!(exemplars_main.contains_char('z'));
assert!(exemplars_main.contains("a"));
assert!(!exemplars_main.contains("ä"));
assert!(!exemplars_main.contains("ng"));
``` |
8548 |
lib.rs |
Definitions of [Unicode Properties] and APIs for
retrieving property data in an appropriate data structure.
This module is published as its own crate ([`icu_properties`](https://docs.rs/icu_properties/latest/icu_properties/))
and as part of the [`icu`](https://docs.rs/icu/latest/icu/) crate. See the latter for more details on the ICU4X project.
APIs that return a [`CodePointSetData`] exist for binary properties and certain enumerated
properties. See the [`sets`] module for more details.
APIs that return a [`CodePointMapData`] exist for certain enumerated properties. See the
[`maps`] module for more details.
# Examples
## Property data as `CodePointSetData`s
```
use icu::properties::{maps, sets, GeneralCategory};
// A binary property as a `CodePointSetData`
assert!(sets::emoji().contains('🎃')); // U+1F383 JACK-O-LANTERN
assert!(!sets::emoji().contains('木')); // U+6728
// An individual enumerated property value as a `CodePointSetData`
let line_sep_data = maps::general_category()
.get_set_for_value(GeneralCategory::LineSeparator);
let line_sep = line_sep_data.as_borrowed();
assert!(line_sep.contains32(0x2028));
assert!(!line_sep.contains32(0x2029));
```
## Property data as `CodePointMapData`s
```
use icu::properties::{maps, Script};
assert_eq!(maps::script().get('🎃'), Script::Common); // U+1F383 JACK-O-LANTERN
assert_eq!(maps::script().get('木'), Script::Han); // U+6728
```
[`ICU4X`]: ../icu/index.html
[Unicode Properties]: https://unicode-org.github.io/icu/userguide/strings/properties.html
[`CodePointSetData`]: crate::sets::CodePointSetData
[`CodePointMapData`]: crate::maps::CodePointMapData
[`sets`]: crate::sets |
3885 |
maps.rs |
The functions in this module return a [`CodePointMapData`] representing, for
each code point in the entire range of code points, the property values
for a particular Unicode property.
The descriptions of most properties are taken from [`TR44`], the documentation for the
Unicode Character Database.
[`TR44`]: https://www.unicode.org/reports/tr44 |
24693 |
props.rs |
A collection of property definitions shared across contexts
(ex: representing trie values).
This module defines enums / newtypes for enumerated properties.
String properties are represented as newtypes if their
values represent code points. |
125777 |
provider |
|
|
provider.rs |
🚧 \[Unstable\] Data provider struct definitions for this ICU4X component.
<div class="stab unstable">
🚧 This code is considered unstable; it may change at any time, in breaking or non-breaking ways,
including in SemVer minor releases. While the serde representation of data structs is guaranteed
to be stable, their Rust representation might not be. Use with caution.
</div>
Read more about data providers: [`icu_provider`] |
39447 |
runtime.rs |
🚧 \[Experimental\] This module is experimental and currently crate-private. Let us know if you
have a use case for this!
This module contains utilities for working with properties where the specific property in use
is not known at compile time.
For regex engines, [`crate::sets::load_for_ecma262_unstable()`] is a convenient API for working
with properties at runtime tailored for the use case of ECMA262-compatible regex engines. |
18612 |
script.rs |
Data and APIs for supporting both Script and Script_Extensions property
values in an efficient structure. |
25602 |
sets.rs |
The functions in this module return a [`CodePointSetData`] containing
the set of characters with a particular Unicode property.
The descriptions of most properties are taken from [`TR44`], the documentation for the
Unicode Character Database. Some properties are instead defined in [`TR18`], the
documentation for Unicode regular expressions. In particular, Annex C of this document
defines properties for POSIX compatibility.
[`CodePointSetData`]: crate::sets::CodePointSetData
[`TR44`]: https://www.unicode.org/reports/tr44
[`TR18`]: https://www.unicode.org/reports/tr18 |
85091 |
trievalue.rs |
|
7863 |