Name Description Size Coverage
LICENSE 1091 -
utf8_range.c This is a wrapper for the Google range-sse.cc algorithm which checks whether a sequence of bytes is a valid UTF-8 sequence and finds the longest valid prefix of the UTF-8 sequence. The key difference is that it checks for as much ASCII symbols as possible and then falls back to the range-sse.cc algorithm. The changes to the algorithm are cosmetic, mostly to trick the clang compiler to produce optimal code. For API see the utf8_validity.h header. 6981 41 %
utf8_range.h 562 -
utf8_range_neon.inc This code is almost the same as SSE implementation, please reference utf8-range-sse.inc for detailed explanation. The only difference is the range adjustment step. NEON code is more straightforward. 3924 -
utf8_range_sse.inc This code checks that utf-8 ranges are structurally valid 16 bytes at once using superscalar instructions. The mapping between ranges of codepoint and their corresponding utf-8 sequences is below. 11516 -
utf8_validity.h 866 50 %