Source code
Revision control
Copy as Markdown
Other Tools
<!DOCTYPE html>
<html>
<meta charset="utf-8">
<!--
The test language content came from
where LANG is the language code.
The test content is the third paragraph of the first chapter of Alice's Adventures in Wonderland by Lewis Carroll from Project Gutenberg machine translated by Google.
The content came with the following license:
UNICODE LICENSE V3
COPYRIGHT AND PERMISSION NOTICE
Copyright © 2023-2024 Unicode, Inc.
NOTICE TO USER: Carefully read the following legal agreement. BY
DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING DATA FILES, AND/OR
SOFTWARE, YOU UNEQUIVOCALLY ACCEPT, AND AGREE TO BE BOUND BY, ALL OF THE
TERMS AND CONDITIONS OF THIS AGREEMENT. IF YOU DO NOT AGREE, DO NOT
DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE THE DATA FILES OR SOFTWARE.
Permission is hereby granted, free of charge, to any person obtaining a
copy of data files and any associated documentation (the "Data Files") or
software and any associated documentation (the "Software") to deal in the
Data Files or Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, and/or sell
copies of the Data Files or Software, and to permit persons to whom the
Data Files or Software are furnished to do so, provided that either (a)
this copyright and permission notice appear with all copies of the Data
Files or Software, or (b) this copyright and permission notice appear in
associated Documentation.
THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF
THIRD PARTY RIGHTS.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE
BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES,
OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE DATA
FILES OR SOFTWARE.
Except as contained in this notice, the name of a copyright holder shall
not be used in advertising or otherwise to promote the sale, use or other
dealings in these Data Files or Software without prior written
authorization of the copyright holder.
SPDX-License-Identifier: Unicode-3.0
(End UNICODE LICENSE V3)
The following applies to parts of this file other than the test language content:
Any copyright is dedicated to the Public Domain.
-->
<title>Normalizer bench</title>
<body>
<h1>Normalizer Bench</h1>
<dl>
<dt>S</dt>
<dd>Short: NFD fits in 32 UTF-16 code units. French and German are adjusted to take a substring that contains a non-ASCII character. (Long input contains the same information in each language instead of having a fixed UTF-16 length.)</dd>
<dt>L</dt>
<dd>Latin1.</dd>
<dt>U</dt>
<dd>Forced UTF-16 form for Latin1 languages. (One non-Latin1 character added to the string.)</dd>
<dt>W</dt>
<dd>Forced write: In the UTF-16 case, a singleton is prepended to force the normalizer to start writing from the start. In the Latin1 case, a character with a compatibility decomposition is prepended, since there are no singletons in Latin1. This means the effect is seen only in the K forms.</dd>
<dt>C</dt>
<dd>Forced copy: In the UTF-16 case, a singleton is appended to force the normalizer to make a copy even when normalizing from NFC to a C form or from NFD to a D form. In the Latin1 case, a character with a compatibility decomposition is appended, since there are no singletons in Latin1. This means the effect is seen only in the K form corresponding to the input C or D form.</dd>
</dl>
<p>Bench not started.</p>
<table>
<thead><tr><th>Input</th><th>NFC</th><th>NFKC</th><th>NFD</th><th>NFKD</th></tr></thead>
<tbody><tr><td colspan="5">Bench not run.</td></tr></tbody>
</table>
<script>
// Inclusion is as follows:
//
// English represents ASCII
//
// Multiple high-population Latin1 languages due to competing
// diacritic frenquencies.
//
// Multiple Latin2 languages due to competing diacritic frenquencies.
//
// Vietnamese: Multi-diacritic Latin above the pass-through bound.
//
// Greek: Frequent-accent Latin-like non-Latin.
//
// Bengali: Combining starters
//
// Chinese: Normalization-invariant in the fast trie range.
//
// Japanese: Varies under normalization in the fast trie range.
//
// Korean: Arithmetic composition/decomposition.
let rawData = [
{
lang: "en",
text: "There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, \"Oh dear! Oh dear! I shall be late!\" (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge.",
},
{
lang: "fr",
text: "Il n'y avait rien de si remarquable à cela ; et Alice ne trouvait pas non plus si étrange d'entendre le Lapin se dire : « Oh là là ! Oh là là ! Je vais être en retard ! » (En y repensant plus tard, il lui vint à l'esprit qu'elle aurait dû s'en étonner, mais sur le moment tout semblait tout naturel) ; mais lorsque le Lapin sortit effectivement une montre de la poche de son gilet , la regarda et se hâta de partir, Alice se leva d'un bond, car il lui traversa l'esprit qu'elle n'avait jamais vu auparavant un lapin avec une poche de gilet ou une montre à y prendre, et, brûlante de curiosité, elle traversa le champ à sa poursuite, et heureusement, elle arriva juste à temps pour le voir sauter dans un grand terrier sous la haie.",
},
{
lang: "es",
text: "No había nada de extraordinario en eso; ni a Alicia le pareció tan extraño oír al Conejo decirse a sí mismo: \"¡Ay, Dios mío! ¡Ay, Dios mío! ¡Llegaré tarde!\" (Cuando lo pensó después, se le ocurrió que debería haberse sorprendido, pero en ese momento todo le pareció bastante natural); pero cuando el Conejo sacó un reloj del bolsillo del chaleco , lo miró y luego se apresuró a seguir adelante, Alicia se puso de pie de un salto, pues recordó que nunca antes había visto un conejo con un bolsillo del chaleco ni un reloj que sacar de él, y, ardiendo de curiosidad, corrió por el campo tras él, y afortunadamente llegó justo a tiempo de verlo caer por una gran madriguera bajo el seto.",
},
{
lang: "pt",
text: "Não havia nada de tão extraordinário nisso; nem Alice achou tão fora do comum ouvir o Coelho dizer para si mesmo: \"Oh, céus! Oh, céus! Vou me atrasar!\" (quando ela pensou nisso depois, ocorreu-lhe que deveria ter se perguntado sobre isso, mas na hora tudo parecia bastante natural); mas quando o Coelho realmente tirou um relógio do bolso do colete , olhou para ele e então saiu apressado, Alice se levantou, pois lhe ocorreu que nunca tinha visto um coelho com um bolso no colete ou um relógio para tirar dele, e queimando de curiosidade, ela correu pelo campo atrás dele e, felizmente, chegou bem a tempo de vê-lo pular por uma grande toca de coelho sob a cerca viva.",
},
{
lang: "de",
text: "Daran war nichts besonders Bemerkenswertes; und Alice fand es auch nicht besonders ungewöhnlich, das Kaninchen zu sich selbst sagen zu hören: \"Oh je! Oh je! Ich werde zu spät kommen!\" (als sie später darüber nachdachte, kam ihr der Gedanke, dass sie sich darüber hätte wundern sollen, aber in dem Moment schien ihr alles ganz natürlich); aber als das Kaninchen tatsächlich eine Uhr aus seiner Westentasche nahm , sie ansah und dann weitereilte, sprang Alice auf, denn ihr schoss durch den Kopf, dass sie noch nie zuvor ein Kaninchen mit einer Westentasche oder einer Uhr gesehen hatte, die es herausnehmen konnte, und brennend vor Neugier rannte sie hinter ihm über das Feld her und kam glücklicherweise gerade noch rechtzeitig, um zu sehen, wie es in ein großes Kaninchenloch unter der Hecke verschwand.",
},
{
lang: "pl",
text: "Nie było w tym nic aż tak niezwykłego; Alicja też nie wydało się aż tak dziwne, że usłyszała Królika mówiącego do siebie: „Ojej! Ojej! Spóźnię się!” (kiedy później o tym pomyślała, przyszło jej do głowy, że powinna była się temu dziwić, ale wtedy wydawało się to całkiem naturalne); ale kiedy Królik rzeczywiście wyjął zegarek z kieszeni kamizelki i spojrzał na niego, a potem pospieszył dalej, Alicja zerwała się na równe nogi, bo błysnęła jej myśl, że nigdy wcześniej nie widziała królika z kieszenią kamizelki lub zegarkiem, który mógłby z niej wyjąć, i płonąc ciekawością, pobiegła za nim przez pole i na szczęście zdążyła zobaczyć, jak wpada do dużej króliczej nory pod żywopłotem.",
},
{
lang: "ro",
text: "Nu era nimic atât de remarcabil în asta; nici Alice nu s-a gândit atât de mult să-l audă pe Iepure spunându-și: „O, dragă! O, dragă! Voi întârzia!” (când s-a gândit după aceea, i-a trecut prin minte că ar fi trebuit să se întrebe de asta, dar la vremea aceea totul părea destul de firesc); dar când Iepurele a scos de fapt un ceas din buzunarul vestei și s-a uitat la el, apoi s-a grăbit mai departe, Alice a început să se ridice, căci îi trecu prin minte că nu mai văzuse niciodată un iepure cu buzunarul vestei sau cu un ceas de scos din el și, arzând de curiozitate, a fugit prin câmp după el, din fericire, să vadă că iepurașul a căzut într-un timp. gardul viu.",
},
{
lang: "hr",
text: "U tome nije bilo ničeg tako izuzetnog ; niti je Alice mislila da je toliko neobično čuti zeca kako govori sam sebi: \"O, Bože! O, Bože! Zakasnit ću!\" (kada je kasnije razmišljala, palo joj je na pamet da se tome trebala zapitati, ali tada je sve to izgledalo sasvim prirodno); ali kad je zec zaista izvadio sat iz džepa prsluka i pogledao ga, a zatim požurio dalje, Alice je krenula na noge, jer joj je sinulo u glavi da nikada prije nije vidjela zeca ni s džepom na prsluku, ni sa satom koji bi iz njega izvadio, i izgarajući od znatiželje, potrčala je preko polja za njim, i srećom stigla je baš na vrijeme da vidi kako iskače velika zečja rupa ispod živice.",
},
{
lang: "cs",
text: "Nebylo v tom nic tak pozoruhodného ; Ani Alice si nemyslela, že by to bylo tak od věci slyšet, jak si Králík říká: \"Ach miláčku! Ach drahá! Přijdu pozdě!\" (když si to potom rozmyslela, napadlo ji, že se tomu měla divit, ale v tu chvíli to všechno vypadalo docela přirozeně); ale když Králík skutečně vytáhl hodinky z kapsy vesty , podíval se na ně a pak spěchal dál, Alice se postavila na nohy, protože jí blesklo hlavou, že ještě nikdy neviděla králíka s kapsičkou ve vestě, ani s hodinkami, které by z ní vytáhla, a hořel zvědavostí, rozběhla se za nimi přes pole a on naštěstí viděl, jak to pod velkou dírou právě prasklo.",
},
{
lang: "vi",
text: "Chẳng có gì quá đáng chú ý trong chuyện đó; Alice cũng chẳng thấy có gì quá khác thường khi nghe Thỏ tự nhủ: \"Ôi trời! Ôi trời! Mình sẽ bị muộn mất!\" (sau này khi nghĩ lại, cô bé thấy lẽ ra mình phải ngạc nhiên về điều này, nhưng lúc đó mọi chuyện có vẻ hoàn toàn bình thường); nhưng khi Thỏ thực sự lấy một chiếc đồng hồ ra khỏi túi áo gi-lê , nhìn đồng hồ rồi vội vã chạy đi, Alice bật dậy, vì cô chợt nghĩ rằng mình chưa bao giờ thấy một con thỏ nào có túi áo gi-lê, hay lấy đồng hồ ra khỏi túi, và vì tò mò, cô bé chạy qua cánh đồng theo sau nó, và may mắn thay là vừa kịp lúc nhìn thấy nó chui xuống một cái hang thỏ lớn dưới hàng rào.",
},
{
lang: "el",
text: "Δεν υπήρχε τίποτα τόσο αξιοσημείωτο σε αυτό. Ούτε η Άλις πίστευε ότι ήταν τόσο παράξενο να ακούει το Κουνέλι να λέει στον εαυτό του: \"Ω αγάπη μου! Ω αγαπητέ! Θα αργήσω!\" (όταν το σκέφτηκε μετά, της πέρασε από το μυαλό ότι θα έπρεπε να αναρωτηθεί γι' αυτό, αλλά εκείνη τη στιγμή όλα φαινόταν αρκετά φυσιολογικά). αλλά όταν το κουνέλι έβγαλε ένα ρολόι από την τσέπη του γιλέκου του και το κοίταξε, και μετά βιάστηκε, η Αλίκη άρχισε να σηκώνεται, γιατί πέρασε από το μυαλό της ότι δεν είχε ξαναδεί κουνέλι με τσέπη γιλέκου ή ρολόι για να βγάλει από αυτό και καιγόταν από περιέργεια, έτρεξε λίγο μετά το κοίταξε στο χωράφι. κουνέλι-τρύπα κάτω από τον φράκτη.",
},
{
lang: "bn",
text: "এর মধ্যে খুব উল্লেখযোগ্য কিছু ছিল না; খরগোশ নিজেকে বলতে শুনে অ্যালিস খুব বেশি ভাবেনি, \"ওহ প্রিয়! ওহ প্রিয়! আমার দেরি হবে!\" (পরে যখন সে এটা ভেবেছিল, তখন তার মনে হয়েছিল যে এই বিষয়ে তার আশ্চর্য হওয়া উচিত ছিল, কিন্তু সেই সময়ে সবকিছুই স্বাভাবিক বলে মনে হয়েছিল); কিন্তু খরগোশটি যখন তার কোমরের পকেট থেকে একটি ঘড়ি বের করে সেটির দিকে তাকাল, এবং তারপরে তাড়াহুড়ো করে, অ্যালিস তার পায়ের দিকে যেতে শুরু করে, কারণ এটি তার মনের মধ্যে ছড়িয়ে পড়ে যে সে আগে কখনও একটি কোমর-পকেট বা ঘড়ি নিয়ে খরগোশকে দেখেনি, এবং কৌতূহলে জ্বলতে থাকে, সে ক্ষেতের দিকে ছুটে যায় এবং তা দেখার জন্য ঠিক সময় পায়ে চলে যায়। হেজের নীচে একটি বড় খরগোশের গর্ত।",
},
{
lang: "zh",
text: "这并没有什么特别之处;爱丽丝也不觉得听到兔子自言自语“哦,天哪!哦,天哪!我要迟到了!”有什么不寻常的(后来她仔细想了想,觉得她应该对此感到奇怪,但当时这一切都显得很自然);但是当兔子真的从背心口袋里掏出一块手表,看了看,然后匆匆走开时,爱丽丝跳了起来,因为她突然想到,她从来没有见过一只兔子有背心口袋,或者从里面掏出一块手表,她好奇心爆棚,追着它跑过田野,幸运的是,她正好看到它从树篱下的一个大兔子洞里钻了出来。",
},
{
lang: "ja",
text: "そこには特に驚くべきことは何もありませんでした。また、ウサギが「あらまあ!あらまあ!遅れちゃう!」と独り言を言っているのを聞いても、アリスはそれほど不自然だとは思いませんでした(あとで考えてみると、これは不思議に思うべきだったのだと気づきましたが、そのときはまったく当然のことのように思えました)。しかし、ウサギが実際にチョッキのポケットから時計を取り出し、それを見てから急いで歩き去ったとき、アリスは飛び上がりました。というのも、チョッキのポケットも、そこから取り出す時計も、ウサギを見たことがない、ということが頭をよぎったからです。好奇心に燃えて、アリスは野原を横切ってウサギを追って走りました。そして運よく、ウサギが垣根の下の大きなウサギの穴に飛び込むのを見るのにちょうど間に合いました。",
},
{
lang: "ko",
text: "영어: 그 안에는 그렇게 특별한 것이 없었다. 앨리스는 토끼가 \"아이고! 아이고! 늦겠다!\"라고 중얼거리는 것을 듣고도 그다지 이상하게 생각하지 않았다.(그녀가 나중에 생각해 보니, 그녀가 그것에 대해 궁금해해야 했지만 당시에는 모든 것이 아주 자연스러워 보였다.) 하지만 토끼가 조끼 주머니에서 시계를 꺼내 보고 서둘러 가자 앨리스는 일어섰다. 조끼 주머니나 시계를 꺼낼 토끼를 이전에 본 적이 없다는 생각이 번쩍 들었기 때문이다. 호기심에 불타는 앨리스는 들판을 가로질러 토끼를 쫓아갔고, 다행히 울타리 아래의 큰 토끼굴로 토끼가 뛰어드는 것을 볼 수 있었다.",
},
];
// Global variable for hopefully fooling side effect analysis.
let escapesScope = "";
let data = [];
function isAscii(s) {
for (c of s) {
if (c > '\u007F') {
return false;
}
}
return true;
}
function isLatin1(s) {
for (c of s) {
if (c > '\u00FF') {
return false;
}
}
return true;
}
function cat(a, b) {
let ab = a + b;
// Flatten rope
escapesScope = ab.toUpperCase();
return ab;
}
function asArr(s) {
return [s, s];
}
function append(lang, text) {
let nfd = text.normalize("NFD");
if (isLatin1(text)) {
data.push({
label: lang + "_NFC_L",
text: asArr(text),
});
data.push({
label: lang + "_NFC_L_W",
text: asArr(cat("\u00A0", text)),
});
data.push({
label: lang + "_NFC_L_C",
text: asArr(cat(text, "\u00A0")),
});
data.push({
label: lang + "_NFC_U",
text: asArr(cat("\u2014", text)),
});
data.push({
label: lang + "_NFC_U_W",
text: asArr(cat("\u2126", text)),
});
data.push({
label: lang + "_NFC_U_C",
text: asArr(cat(text, "\u2126")),
});
if (nfd != text) {
data.push({
label: lang + "_NFD_U",
text: asArr(cat("\u2014", nfd)),
});
data.push({
label: lang + "_NFD_U_W",
text: asArr(cat("\u2126", nfd)),
});
data.push({
label: lang + "_NFD_U_C",
text: asArr(cat(nfd, "\u2126")),
});
}
} else {
data.push({
label: lang + "_NFC",
text: asArr(text),
});
data.push({
label: lang + "_NFC_W",
text: asArr(cat("\u2126", text)),
});
data.push({
label: lang + "_NFC_C",
text: asArr(cat(text, "\u2126")),
});
if (nfd != text) {
data.push({
label: lang + "_NFD",
text: asArr(nfd),
});
data.push({
label: lang + "_NFD_W",
text: asArr(cat("\u2126", nfd)),
});
data.push({
label: lang + "_NFD_C",
text: asArr(cat(nfd, "\u2126")),
});
}
}
}
function makeShort(s) {
let wasAscii = isAscii(s);
let nfd = s.normalize("NFD");
for (let start = 0; start < s.length; start += 10) {
// 31 to leave space for the write/copy forcing character.
let sub = s.substring(start, start + 31);
if (wasAscii || !isAscii(sub)) {
return sub.normalize("NFC");
}
}
}
for (entry of rawData) {
if (entry.text != entry.text.normalize("NFC")) {
console.log("NOT NFC: " + entry.lang)
}
append(entry.lang, entry.text);
}
for (entry of rawData) {
append(entry.lang + "_S", makeShort(entry.text));
}
function benchEntry(entry, iterations) {
let tr = document.createElement("tr");
let label = document.createElement("td");
label.textContent = entry.label;
tr.appendChild(label);
for (f of normalizationForms) {
let arr = entry.text;
let t = Date.now();
for (let i = 0; i < iterations; ++i) {
escapesScope = arr[i & 1].normalize(f);
}
let d = Date.now() - t;
let td = document.createElement("td");
td.textContent = d;
tr.appendChild(td);
}
tbody.appendChild(tr);
}
let normalizationForms = [
"NFC",
"NFKC",
"NFD",
"NFKD"
];
let tbody = document.getElementsByTagName("tbody")[0];
tbody.removeChild(tbody.firstChild);
let p = document.getElementsByTagName("p")[0];
p.textContent = "Bench running.";
function pgo() {
for (entry of data) {
benchEntry(entry, 10);
}
p.textContent = "Lower is better. Benching done.";
}
function bench() {
if (!data.length) {
p.textContent = "Lower is better. Benching done.";
return;
}
let entry = data.shift();
benchEntry(entry, 200_000);
setTimeout(bench);
}
// bench() or pgo()
pgo();
</script>
</body>
</html>