Name Description Size
complex 791 10229 3776 Macros and utilities to help implement the various iterator types. 646 Segment strings by lines, graphemes, words, and sentences. This module is published as its own crate ([`icu_segmenter`]( and as part of the [`icu`]( crate. See the latter for more details on the ICU4X project. This module contains segmenter implementation for the following rules. - Line segmenter that is compatible with [Unicode Standard Annex #14][UAX14], _Unicode Line Breaking Algorithm_, with options to tailor line-breaking behavior for CSS [`line-break`] and [`word-break`] properties. - Grapheme cluster segmenter, word segmenter, and sentence segmenter that are compatible with [Unicode Standard Annex #29][UAX29], _Unicode Text Segmentation_. [UAX14]: [UAX29]: [`line-break`]: [`word-break`]: # Examples ## Line Break Find line break opportunities: ```rust use icu::segmenter::LineSegmenter; let segmenter = LineSegmenter::new_auto(); let breakpoints: Vec<usize> = segmenter .segment_str("Hello World. Xin chào thế giới!") .collect(); assert_eq!(&breakpoints, &[0, 6, 13, 17, 23, 29, 36]); ``` See [`LineSegmenter`] for more examples. ## Grapheme Cluster Break Find all grapheme cluster boundaries: ```rust use icu::segmenter::GraphemeClusterSegmenter; let segmenter = GraphemeClusterSegmenter::new(); let breakpoints: Vec<usize> = segmenter .segment_str("Hello World. Xin chào thế giới!") .collect(); assert_eq!( &breakpoints, &[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 28, 29, 30, 31, 34, 35, 36 ] ); ``` See [`GraphemeClusterSegmenter`] for more examples. ## Word Break Find all word boundaries: ```rust use icu::segmenter::WordSegmenter; let segmenter = WordSegmenter::new_auto(); let breakpoints: Vec<usize> = segmenter .segment_str("Hello World. Xin chào thế giới!") .collect(); assert_eq!( &breakpoints, &[0, 5, 6, 11, 12, 13, 16, 17, 22, 23, 28, 29, 35, 36] ); ``` See [`WordSegmenter`] for more examples. ## Sentence Break Segment the string into sentences: ```rust use icu::segmenter::SentenceSegmenter; let segmenter = SentenceSegmenter::new(); let breakpoints: Vec<usize> = segmenter .segment_str("Hello World. Xin chào thế giới!") .collect(); assert_eq!(&breakpoints, &[0, 13, 36]); ``` See [`SentenceSegmenter`] for more examples. 5348 58241
provider 12470 7528 21225