Name Description Size
Encoding.h Return value from `Decoder`/`Encoder` to indicate that input was exhausted. 57463
EncodingDetector.h A Web browser-oriented detector for guessing what character encoding a stream of bytes is encoded in. The bytes are fed to the detector incrementally using the `feed` method. The current guess of the detector can be queried using the `guess` method. The guessing parameters are arguments to the `guess` method rather than arguments to the constructor in order to enable the application to check if the arguments affect the guessing outcome. (The specific use case is to disable UI for re-running the detector with UTF-8 allowed and the top-level domain name ignored if those arguments don't change the guess.) 5673
JapaneseDetector.h A Japanese legacy encoding detector for detecting between Shift_JIS, EUC-JP, and, optionally, ISO-2022-JP _given_ the assumption that the encoding is one of those. # Principle of Operation The detector is based on two observations: 1. The ISO-2022-JP escape sequences don't normally occur in Shift_JIS or EUC-JP, so encountering such an escape sequence (before non-ASCII has been encountered) can be taken as indication of ISO-2022-JP. 2. When normal (full-with) kana or common kanji encoded as Shift_JIS is decoded as EUC-JP, or vice versa, the result is either an error or half-width katakana, and it's very uncommon for Japanese HTML to have half-width katakana character before a normal kana or common kanji character. Therefore, if decoding as Shift_JIS results in error or have-width katakana, the detector decides that the content is EUC-JP, and vice versa. # Failure Modes The detector gives the wrong answer if the text has a half-width katakana character before normal kana or common kanji. Some uncommon kanji are undecidable. (All JIS X 0208 Level 1 kanji are decidable.) The half-width katakana issue is mainly relevant for old 8-bit JIS X 0201-only text files that would decode correctly as Shift_JIS but that the detector detects as EUC-JP. The undecidable kanji issue does not realistically show up when a full document is fed to the detector, because, realistically, in a full document, there is at least one kana or common kanji. It can occur, though, if the detector is only run on a prefix of a document and the prefix only contains the title of the document. It is possible for document title to consist entirely of undecidable kanji. (Indeed, Japanese Wikipedia has articles with such titles.) If the detector is undecided, a fallback to Shift_JIS should be used. 4998
build 2
components 3
docs 6
encoding_glue 2
gtest 2
hyphenation 2
icu 3
icu-patches 9 11130
l10n The content of this directory is partially sourced from the fluent.js project. 17
locale 38
locales 37
lwbrk 19 1438
strres 6
tzdata 4
uconv 15
unicharutil 4 3046 5818