All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] vt: more Unicode handling changes
@ 2025-05-05 16:55 Nicolas Pitre
  2025-05-05 16:55 ` [PATCH 1/8] vt: ucs.c: fix misappropriate in_range() usage Nicolas Pitre
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Nicolas Pitre @ 2025-05-05 16:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby; +Cc: Nicolas Pitre, linux-serial, linux-kernel

The Linux VT console has many problems with regards to proper Unicode
handling. A first set of patches was submitted here:

https://lore.kernel.org/all/20250417184849.475581-1-nico@fluxnic.net/

Those patches are currently in Greg's tty-next branch.

The first 2 patches in the following series contain fixes for those
already-applied patches.

Remaining patches introduce tables that map complex Unicode characters
to simpler fallback characters for terminal display when corresponding
glyphs are unavailable. Only the subset of Unicode that can reasonably
be substituted by ASCII/Latin-1 characters is covered. Substitution may
not be as good as the actual glyphs but still way more helpful than squared
question marks.

This applies on top of tty-next currently at commit 5ee558c5d9e9.

diffstat:
 drivers/tty/vt/.gitignore                   |    1 +
 drivers/tty/vt/Makefile                     |    8 +-
 drivers/tty/vt/gen_ucs_fallback_table.py    |  881 ++++++++++++
 drivers/tty/vt/ucs.c                        |   89 +-
 drivers/tty/vt/ucs_fallback_table.h_shipped | 1498 +++++++++++++++++++++
 drivers/tty/vt/vt.c                         |   95 +-
 include/linux/consolemap.h                  |    6 +
 7 files changed, 2535 insertions(+), 43 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/8] vt: ucs.c: fix misappropriate in_range() usage
  2025-05-05 16:55 [PATCH 0/8] vt: more Unicode handling changes Nicolas Pitre
@ 2025-05-05 16:55 ` Nicolas Pitre
  2025-05-06  5:58   ` Jiri Slaby
  2025-05-05 16:55 ` [PATCH 2/8] vt: make sure displayed double-width characters are remembered as such Nicolas Pitre
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2025-05-05 16:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby; +Cc: Nicolas Pitre, linux-serial, linux-kernel

From: Nicolas Pitre <npitre@baylibre.com>

The in_range() helper accepts a start and a length, not a start and
an end.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
---
 drivers/tty/vt/ucs.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/vt/ucs.c b/drivers/tty/vt/ucs.c
index 0b58cb7344a3..b0b23830170d 100644
--- a/drivers/tty/vt/ucs.c
+++ b/drivers/tty/vt/ucs.c
@@ -46,7 +46,7 @@ static int interval32_cmp(const void *key, const void *element)
 
 static bool cp_in_range16(u16 cp, const struct ucs_interval16 *ranges, size_t size)
 {
-	if (!in_range(cp, ranges[0].first, ranges[size - 1].last))
+	if (cp < ranges[0].first || cp > ranges[size - 1].last)
 		return false;
 
 	return __inline_bsearch(&cp, ranges, size, sizeof(*ranges),
@@ -55,7 +55,7 @@ static bool cp_in_range16(u16 cp, const struct ucs_interval16 *ranges, size_t si
 
 static bool cp_in_range32(u32 cp, const struct ucs_interval32 *ranges, size_t size)
 {
-	if (!in_range(cp, ranges[0].first, ranges[size - 1].last))
+	if (cp < ranges[0].first || cp > ranges[size - 1].last)
 		return false;
 
 	return __inline_bsearch(&cp, ranges, size, sizeof(*ranges),
@@ -144,8 +144,8 @@ static int recomposition_cmp(const void *key, const void *element)
 u32 ucs_recompose(u32 base, u32 mark)
 {
 	/* Check if characters are within the range of our table */
-	if (!in_range(base, UCS_RECOMPOSE_MIN_BASE, UCS_RECOMPOSE_MAX_BASE) ||
-	    !in_range(mark, UCS_RECOMPOSE_MIN_MARK, UCS_RECOMPOSE_MAX_MARK))
+	if (base < UCS_RECOMPOSE_MIN_BASE || base > UCS_RECOMPOSE_MAX_BASE ||
+	    mark < UCS_RECOMPOSE_MIN_MARK || mark > UCS_RECOMPOSE_MAX_MARK)
 		return 0;
 
 	struct compare_key key = { base, mark };
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/8] vt: make sure displayed double-width characters are remembered as such
  2025-05-05 16:55 [PATCH 0/8] vt: more Unicode handling changes Nicolas Pitre
  2025-05-05 16:55 ` [PATCH 1/8] vt: ucs.c: fix misappropriate in_range() usage Nicolas Pitre
@ 2025-05-05 16:55 ` Nicolas Pitre
  2025-05-05 16:55 ` [PATCH 3/8] vt: move glyph determination to a separate function Nicolas Pitre
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2025-05-05 16:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby; +Cc: Nicolas Pitre, linux-serial, linux-kernel

From: Nicolas Pitre <npitre@baylibre.com>

And to do so we ensure the Unicode screen buffer is initialized when
double-width characters are encountered.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
---
 drivers/tty/vt/vt.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 24c6cd2eed78..58fa1b285f22 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -2930,8 +2930,15 @@ static int vc_process_ucs(struct vc_data *vc, int *c, int *tc)
 {
 	u32 prev_c, curr_c = *c;
 
-	if (ucs_is_double_width(curr_c))
+	if (ucs_is_double_width(curr_c)) {
+		/*
+		 * The Unicode screen memory is allocated only when
+		 * required. This is one such case as we need to remember
+		 * which displayed characters are double-width.
+		 */
+		vc_uniscr_check(vc);
 		return 2;
+	}
 
 	if (!ucs_is_zero_width(curr_c))
 		return 1;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/8] vt: move glyph determination to a separate function
  2025-05-05 16:55 [PATCH 0/8] vt: more Unicode handling changes Nicolas Pitre
  2025-05-05 16:55 ` [PATCH 1/8] vt: ucs.c: fix misappropriate in_range() usage Nicolas Pitre
  2025-05-05 16:55 ` [PATCH 2/8] vt: make sure displayed double-width characters are remembered as such Nicolas Pitre
@ 2025-05-05 16:55 ` Nicolas Pitre
  2025-05-06  6:06   ` Jiri Slaby
  2025-05-05 16:55 ` [PATCH 4/8] vt: introduce gen_ucs_fallback_table.py to create ucs_fallback_table.h Nicolas Pitre
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2025-05-05 16:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby; +Cc: Nicolas Pitre, linux-serial, linux-kernel

From: Nicolas Pitre <npitre@baylibre.com>

No logical changes. Make it easier for enhancements to come.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
---
 drivers/tty/vt/vt.c | 73 +++++++++++++++++++++++++--------------------
 1 file changed, 40 insertions(+), 33 deletions(-)

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 58fa1b285f22..4e80384a419b 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -2925,6 +2925,7 @@ static void vc_con_rewind(struct vc_data *vc)
 
 #define UCS_ZWS		0x200b	/* Zero Width Space */
 #define UCS_VS16	0xfe0f	/* Variation Selector 16 */
+#define UCS_REPLACEMENT	0xfffd	/* Replacement Character */
 
 static int vc_process_ucs(struct vc_data *vc, int *c, int *tc)
 {
@@ -2984,12 +2985,40 @@ static int vc_process_ucs(struct vc_data *vc, int *c, int *tc)
 	return 0;
 }
 
+static int vc_get_glyph(struct vc_data *vc, int tc)
+{
+	int glyph = conv_uni_to_pc(vc, tc);
+	int charmask = vc->vc_hi_font_mask ? 0x1ff : 0xff;
+
+	if (!(glyph & ~charmask))
+		return glyph;
+
+	if (glyph == -1)
+		return -1; /* nothing to display */
+
+	/* Glyph not found */
+
+	if ((!vc->vc_utf || vc->vc_disp_ctrl || tc < 128) && !(tc & ~charmask)) {
+		/*
+		 * In legacy mode use the glyph we get by a 1:1 mapping.
+		 * This would make absolutely no sense with Unicode in mind,
+		 * but do this for ASCII characters since a font may lack
+		 * Unicode mapping info and we don't want to end up with
+		 * having question marks only.
+		 */
+		return tc;
+	}
+
+	/* Display U+FFFD (Unicode Replacement Character). */
+	return conv_uni_to_pc(vc, UCS_REPLACEMENT);
+}
+
 static int vc_con_write_normal(struct vc_data *vc, int tc, int c,
 		struct vc_draw_region *draw)
 {
 	int next_c;
 	unsigned char vc_attr = vc->vc_attr;
-	u16 himask = vc->vc_hi_font_mask, charmask = himask ? 0x1ff : 0xff;
+	u16 himask = vc->vc_hi_font_mask;
 	u8 width = 1;
 	bool inverse = false;
 
@@ -3000,39 +3029,17 @@ static int vc_con_write_normal(struct vc_data *vc, int tc, int c,
 	}
 
 	/* Now try to find out how to display it */
-	tc = conv_uni_to_pc(vc, tc);
-	if (tc & ~charmask) {
-		if (tc == -1)
-			return -1; /* nothing to display */
+	tc = vc_get_glyph(vc, tc);
+	if (tc == -1)
+		return -1; /* nothing to display */
+	if (tc < 0) {
+		inverse = true;
+		tc = conv_uni_to_pc(vc, '?');
+		if (tc < 0)
+			tc = '?';
 
-		/* Glyph not found */
-		if ((!vc->vc_utf || vc->vc_disp_ctrl || c < 128) &&
-				!(c & ~charmask)) {
-			/*
-			 * In legacy mode use the glyph we get by a 1:1
-			 * mapping.
-			 * This would make absolutely no sense with Unicode in
-			 * mind, but do this for ASCII characters since a font
-			 * may lack Unicode mapping info and we don't want to
-			 * end up with having question marks only.
-			 */
-			tc = c;
-		} else {
-			/*
-			 * Display U+FFFD. If it's not found, display an inverse
-			 * question mark.
-			 */
-			tc = conv_uni_to_pc(vc, 0xfffd);
-			if (tc < 0) {
-				inverse = true;
-				tc = conv_uni_to_pc(vc, '?');
-				if (tc < 0)
-					tc = '?';
-
-				vc_attr = vc_invert_attr(vc);
-				con_flush(vc, draw);
-			}
-		}
+		vc_attr = vc_invert_attr(vc);
+		con_flush(vc, draw);
 	}
 
 	next_c = c;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/8] vt: introduce gen_ucs_fallback_table.py to create ucs_fallback_table.h
  2025-05-05 16:55 [PATCH 0/8] vt: more Unicode handling changes Nicolas Pitre
                   ` (2 preceding siblings ...)
  2025-05-05 16:55 ` [PATCH 3/8] vt: move glyph determination to a separate function Nicolas Pitre
@ 2025-05-05 16:55 ` Nicolas Pitre
  2025-05-06  6:33   ` Jiri Slaby
  2025-05-05 16:55 ` [PATCH 5/8] vt: create ucs_fallback_table.h_shipped with gen_ucs_fallback_table.py Nicolas Pitre
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2025-05-05 16:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby; +Cc: Nicolas Pitre, linux-serial, linux-kernel

From: Nicolas Pitre <npitre@baylibre.com>

The generated table maps complex characters to their simpler fallback
forms for a terminal display when corresponding glyphs are unavailable.
This includes diacritics, symbols as well as many drawing characters.
Fallback characters aren't perfect replacements, obviously. But they are
still far more useful than a bunch of squared question marks.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
---
 drivers/tty/vt/gen_ucs_fallback_table.py | 882 +++++++++++++++++++++++
 1 file changed, 882 insertions(+)
 create mode 100755 drivers/tty/vt/gen_ucs_fallback_table.py

diff --git a/drivers/tty/vt/gen_ucs_fallback_table.py b/drivers/tty/vt/gen_ucs_fallback_table.py
new file mode 100755
index 000000000000..cb4e75b454fe
--- /dev/null
+++ b/drivers/tty/vt/gen_ucs_fallback_table.py
@@ -0,0 +1,882 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+#
+# Leverage Python's unicodedata module to generate ucs_fallback_table.h
+#
+# The generated table maps complex characters to their simpler fallback forms
+# for a terminal display when corresponding glyphs are unavailable.
+#
+# Usage:
+#   python3 gen_ucs_fallback_table.py         # Generate fallback tables
+#   python3 gen_ucs_fallback_table.py -o FILE # Specify output file
+
+import unicodedata
+import sys
+import argparse
+from collections import defaultdict
+
+# This script's file name
+from pathlib import Path
+this_file = Path(__file__).name
+
+# Default output file name
+DEFAULT_OUT_FILE = "ucs_fallback_table.h"
+
+def collect_accented_latin_letters():
+    """Collect already composed Latin letters with diacritics."""
+    fallback_map = {}
+
+    # Latin-1 Supplement (0x00C0-0x00FF)
+    # Capital letters with accents to their base forms
+    fallback_map[0x00C0] = ord('A')  # À LATIN CAPITAL LETTER A WITH GRAVE
+    fallback_map[0x00C1] = ord('A')  # Á LATIN CAPITAL LETTER A WITH ACUTE
+    fallback_map[0x00C2] = ord('A')  # Â LATIN CAPITAL LETTER A WITH CIRCUMFLEX
+    fallback_map[0x00C3] = ord('A')  # Ã LATIN CAPITAL LETTER A WITH TILDE
+    fallback_map[0x00C4] = ord('A')  # Ä LATIN CAPITAL LETTER A WITH DIAERESIS
+    fallback_map[0x00C5] = ord('A')  # Å LATIN CAPITAL LETTER A WITH RING ABOVE
+    fallback_map[0x00C7] = ord('C')  # Ç LATIN CAPITAL LETTER C WITH CEDILLA
+    fallback_map[0x00C8] = ord('E')  # È LATIN CAPITAL LETTER E WITH GRAVE
+    fallback_map[0x00C9] = ord('E')  # É LATIN CAPITAL LETTER E WITH ACUTE
+    fallback_map[0x00CA] = ord('E')  # Ê LATIN CAPITAL LETTER E WITH CIRCUMFLEX
+    fallback_map[0x00CB] = ord('E')  # Ë LATIN CAPITAL LETTER E WITH DIAERESIS
+    fallback_map[0x00CC] = ord('I')  # Ì LATIN CAPITAL LETTER I WITH GRAVE
+    fallback_map[0x00CD] = ord('I')  # Í LATIN CAPITAL LETTER I WITH ACUTE
+    fallback_map[0x00CE] = ord('I')  # Î LATIN CAPITAL LETTER I WITH CIRCUMFLEX
+    fallback_map[0x00CF] = ord('I')  # Ï LATIN CAPITAL LETTER I WITH DIAERESIS
+    fallback_map[0x00D1] = ord('N')  # Ñ LATIN CAPITAL LETTER N WITH TILDE
+    fallback_map[0x00D2] = ord('O')  # Ò LATIN CAPITAL LETTER O WITH GRAVE
+    fallback_map[0x00D3] = ord('O')  # Ó LATIN CAPITAL LETTER O WITH ACUTE
+    fallback_map[0x00D4] = ord('O')  # Ô LATIN CAPITAL LETTER O WITH CIRCUMFLEX
+    fallback_map[0x00D5] = ord('O')  # Õ LATIN CAPITAL LETTER O WITH TILDE
+    fallback_map[0x00D6] = ord('O')  # Ö LATIN CAPITAL LETTER O WITH DIAERESIS
+    fallback_map[0x00D9] = ord('U')  # Ù LATIN CAPITAL LETTER U WITH GRAVE
+    fallback_map[0x00DA] = ord('U')  # Ú LATIN CAPITAL LETTER U WITH ACUTE
+    fallback_map[0x00DB] = ord('U')  # Û LATIN CAPITAL LETTER U WITH CIRCUMFLEX
+    fallback_map[0x00DC] = ord('U')  # Ü LATIN CAPITAL LETTER U WITH DIAERESIS
+    fallback_map[0x00DD] = ord('Y')  # Ý LATIN CAPITAL LETTER Y WITH ACUTE
+
+    # Lowercase letters with accents to their base forms
+    fallback_map[0x00E0] = ord('a')  # à LATIN SMALL LETTER A WITH GRAVE
+    fallback_map[0x00E1] = ord('a')  # á LATIN SMALL LETTER A WITH ACUTE
+    fallback_map[0x00E2] = ord('a')  # â LATIN SMALL LETTER A WITH CIRCUMFLEX
+    fallback_map[0x00E3] = ord('a')  # ã LATIN SMALL LETTER A WITH TILDE
+    fallback_map[0x00E4] = ord('a')  # ä LATIN SMALL LETTER A WITH DIAERESIS
+    fallback_map[0x00E5] = ord('a')  # å LATIN SMALL LETTER A WITH RING ABOVE
+    fallback_map[0x00E7] = ord('c')  # ç LATIN SMALL LETTER C WITH CEDILLA
+    fallback_map[0x00E8] = ord('e')  # è LATIN SMALL LETTER E WITH GRAVE
+    fallback_map[0x00E9] = ord('e')  # é LATIN SMALL LETTER E WITH ACUTE
+    fallback_map[0x00EA] = ord('e')  # ê LATIN SMALL LETTER E WITH CIRCUMFLEX
+    fallback_map[0x00EB] = ord('e')  # ë LATIN SMALL LETTER E WITH DIAERESIS
+    fallback_map[0x00EC] = ord('i')  # ì LATIN SMALL LETTER I WITH GRAVE
+    fallback_map[0x00ED] = ord('i')  # í LATIN SMALL LETTER I WITH ACUTE
+    fallback_map[0x00EE] = ord('i')  # î LATIN SMALL LETTER I WITH CIRCUMFLEX
+    fallback_map[0x00EF] = ord('i')  # ï LATIN SMALL LETTER I WITH DIAERESIS
+    fallback_map[0x00F1] = ord('n')  # ñ LATIN SMALL LETTER N WITH TILDE
+    fallback_map[0x00F2] = ord('o')  # ò LATIN SMALL LETTER O WITH GRAVE
+    fallback_map[0x00F3] = ord('o')  # ó LATIN SMALL LETTER O WITH ACUTE
+    fallback_map[0x00F4] = ord('o')  # ô LATIN SMALL LETTER O WITH CIRCUMFLEX
+    fallback_map[0x00F5] = ord('o')  # õ LATIN SMALL LETTER O WITH TILDE
+    fallback_map[0x00F6] = ord('o')  # ö LATIN SMALL LETTER O WITH DIAERESIS
+    fallback_map[0x00F9] = ord('u')  # ù LATIN SMALL LETTER U WITH GRAVE
+    fallback_map[0x00FA] = ord('u')  # ú LATIN SMALL LETTER U WITH ACUTE
+    fallback_map[0x00FB] = ord('u')  # û LATIN SMALL LETTER U WITH CIRCUMFLEX
+    fallback_map[0x00FC] = ord('u')  # ü LATIN SMALL LETTER U WITH DIAERESIS
+    fallback_map[0x00FD] = ord('y')  # ý LATIN SMALL LETTER Y WITH ACUTE
+    fallback_map[0x00FF] = ord('y')  # ÿ LATIN SMALL LETTER Y WITH DIAERESIS
+
+    # Special letters
+    fallback_map[0x00D0] = ord('D')  # Ð LATIN CAPITAL LETTER ETH
+    fallback_map[0x00F0] = ord('d')  # ð LATIN SMALL LETTER ETH
+    fallback_map[0x00DE] = ord('P')  # Þ LATIN CAPITAL LETTER THORN
+    fallback_map[0x00FE] = ord('p')  # þ LATIN SMALL LETTER THORN
+
+    # Ligatures to component letters
+    fallback_map[0x00C6] = ord('E')  # Æ LATIN CAPITAL LETTER AE -> E (could also be 'AE')
+    fallback_map[0x00E6] = ord('e')  # æ LATIN SMALL LETTER AE -> e (could also be 'ae')
+    fallback_map[0x0152] = ord('E')  # Œ LATIN CAPITAL LIGATURE OE -> E
+    fallback_map[0x0153] = ord('e')  # œ LATIN SMALL LIGATURE OE -> e
+    fallback_map[0x00DF] = ord('s')  # ß LATIN SMALL LETTER SHARP S -> s
+
+    # These could also be handled by decomposition, but including for completeness
+    fallback_map[0x00D8] = ord('O')  # Ø LATIN CAPITAL LETTER O WITH STROKE -> O
+    fallback_map[0x00F8] = ord('o')  # ø LATIN SMALL LETTER O WITH STROKE -> o
+
+    # Space variants - map all to regular ASCII space
+    fallback_map[0x00A0] = ord(' ')  # NO-BREAK SPACE
+    fallback_map[0x1680] = ord(' ')  # OGHAM SPACE MARK
+
+    # Various space widths (EN QUAD through HAIR SPACE)
+    for cp in range(0x2000, 0x200A+1):
+        fallback_map[cp] = ord(' ')
+
+    fallback_map[0x202F] = ord(' ')  # NARROW NO-BREAK SPACE
+    fallback_map[0x205F] = ord(' ')  # MEDIUM MATHEMATICAL SPACE
+
+    # Extended Latin
+    fallback_map[0x0141] = ord('L')  # Ł LATIN CAPITAL LETTER L WITH STROKE -> L
+    fallback_map[0x0142] = ord('l')  # ł LATIN SMALL LETTER L WITH STROKE -> l
+
+    # Additional characters with cedilla and similar marks
+    fallback_map[0x0122] = ord('G')  # Ģ LATIN CAPITAL LETTER G WITH CEDILLA -> G
+    fallback_map[0x0123] = ord('g')  # ģ LATIN SMALL LETTER G WITH CEDILLA -> g
+    fallback_map[0x0136] = ord('K')  # Ķ LATIN CAPITAL LETTER K WITH CEDILLA -> K
+    fallback_map[0x0137] = ord('k')  # ķ LATIN SMALL LETTER K WITH CEDILLA -> k
+    fallback_map[0x0145] = ord('N')  # Ņ LATIN CAPITAL LETTER N WITH CEDILLA -> N
+    fallback_map[0x0146] = ord('n')  # ņ LATIN SMALL LETTER N WITH CEDILLA -> n
+    fallback_map[0x0156] = ord('R')  # Ŗ LATIN CAPITAL LETTER R WITH CEDILLA -> R
+    fallback_map[0x0157] = ord('r')  # ŗ LATIN SMALL LETTER R WITH CEDILLA -> r
+    fallback_map[0x015E] = ord('S')  # Ş LATIN CAPITAL LETTER S WITH CEDILLA -> S
+    fallback_map[0x015F] = ord('s')  # ş LATIN SMALL LETTER S WITH CEDILLA -> s
+    fallback_map[0x0162] = ord('T')  # Ţ LATIN CAPITAL LETTER T WITH CEDILLA -> T
+    fallback_map[0x0163] = ord('t')  # ţ LATIN SMALL LETTER T WITH CEDILLA -> t
+
+    # Additional Romanian and Turkish specific letters with cedilla/comma below
+    fallback_map[0x0218] = ord('S')  # Ș LATIN CAPITAL LETTER S WITH COMMA BELOW -> S
+    fallback_map[0x0219] = ord('s')  # ș LATIN SMALL LETTER S WITH COMMA BELOW -> s
+    fallback_map[0x021A] = ord('T')  # Ț LATIN CAPITAL LETTER T WITH COMMA BELOW -> T
+    fallback_map[0x021B] = ord('t')  # ț LATIN SMALL LETTER T WITH COMMA BELOW -> t
+
+    # Letters with caron/háček (Czech, Slovak, Slovenian, Croatian, Lithuanian, Latvian)
+    fallback_map[0x010C] = ord('C')  # Č LATIN CAPITAL LETTER C WITH CARON -> C
+    fallback_map[0x010D] = ord('c')  # č LATIN SMALL LETTER C WITH CARON -> c
+    fallback_map[0x010E] = ord('D')  # Ď LATIN CAPITAL LETTER D WITH CARON -> D
+    fallback_map[0x010F] = ord('d')  # ď LATIN SMALL LETTER D WITH CARON -> d
+    fallback_map[0x011A] = ord('E')  # Ě LATIN CAPITAL LETTER E WITH CARON -> E
+    fallback_map[0x011B] = ord('e')  # ě LATIN SMALL LETTER E WITH CARON -> e
+    fallback_map[0x013D] = ord('L')  # Ľ LATIN CAPITAL LETTER L WITH CARON -> L
+    fallback_map[0x013E] = ord('l')  # ľ LATIN SMALL LETTER L WITH CARON -> l
+    fallback_map[0x0147] = ord('N')  # Ň LATIN CAPITAL LETTER N WITH CARON -> N
+    fallback_map[0x0148] = ord('n')  # ň LATIN SMALL LETTER N WITH CARON -> n
+    fallback_map[0x0158] = ord('R')  # Ř LATIN CAPITAL LETTER R WITH CARON -> R
+    fallback_map[0x0159] = ord('r')  # ř LATIN SMALL LETTER R WITH CARON -> r
+    fallback_map[0x0160] = ord('S')  # Š LATIN CAPITAL LETTER S WITH CARON -> S
+    fallback_map[0x0161] = ord('s')  # š LATIN SMALL LETTER S WITH CARON -> s
+    fallback_map[0x0164] = ord('T')  # Ť LATIN CAPITAL LETTER T WITH CARON -> T
+    fallback_map[0x0165] = ord('t')  # ť LATIN SMALL LETTER T WITH CARON -> t
+    fallback_map[0x017D] = ord('Z')  # Ž LATIN CAPITAL LETTER Z WITH CARON -> Z
+    fallback_map[0x017E] = ord('z')  # ž LATIN SMALL LETTER Z WITH CARON -> z
+
+    # Letters with acute (Polish, Hungarian, Czech, Slovak, Icelandic)
+    fallback_map[0x0139] = ord('L')  # Ĺ LATIN CAPITAL LETTER L WITH ACUTE -> L
+    fallback_map[0x013A] = ord('l')  # ĺ LATIN SMALL LETTER L WITH ACUTE -> l
+    fallback_map[0x0143] = ord('N')  # Ń LATIN CAPITAL LETTER N WITH ACUTE -> N
+    fallback_map[0x0144] = ord('n')  # ń LATIN SMALL LETTER N WITH ACUTE -> n
+    fallback_map[0x0154] = ord('R')  # Ŕ LATIN CAPITAL LETTER R WITH ACUTE -> R
+    fallback_map[0x0155] = ord('r')  # ŕ LATIN SMALL LETTER R WITH ACUTE -> r
+    fallback_map[0x015A] = ord('S')  # Ś LATIN CAPITAL LETTER S WITH ACUTE -> S
+    fallback_map[0x015B] = ord('s')  # ś LATIN SMALL LETTER S WITH ACUTE -> s
+    fallback_map[0x0179] = ord('Z')  # Ź LATIN CAPITAL LETTER Z WITH ACUTE -> Z
+    fallback_map[0x017A] = ord('z')  # ź LATIN SMALL LETTER Z WITH ACUTE -> z
+    fallback_map[0x017B] = ord('Z')  # Ż LATIN CAPITAL LETTER Z WITH DOT ABOVE -> Z
+    fallback_map[0x017C] = ord('z')  # ż LATIN SMALL LETTER Z WITH DOT ABOVE -> z
+
+    # Letters with diaeresis/umlaut (used in various languages)
+    fallback_map[0x0178] = ord('Y')  # Ÿ LATIN CAPITAL LETTER Y WITH DIAERESIS -> Y
+
+    # Other common European letters
+    fallback_map[0x00D0] = ord('D')  # Đ LATIN CAPITAL LETTER ETH -> D
+    fallback_map[0x0110] = ord('D')  # Đ LATIN CAPITAL LETTER D WITH STROKE -> D
+    fallback_map[0x0111] = ord('d')  # đ LATIN SMALL LETTER D WITH STROKE -> d
+    fallback_map[0x0126] = ord('H')  # Ħ LATIN CAPITAL LETTER H WITH STROKE -> H
+    fallback_map[0x0127] = ord('h')  # ħ LATIN SMALL LETTER H WITH STROKE -> h
+
+    return fallback_map
+
+def collect_drawing_character_mappings():
+    """Collect box drawing characters with ASCII mappings."""
+    fallback_map = {}
+
+    # Box drawing characters
+    # Horizontal lines
+    for cp in range(0x2500, 0x2501+1):  # ─ ━
+        fallback_map[cp] = ord('-')
+
+    # Vertical lines
+    for cp in range(0x2502, 0x2503+1):  # │ ┃
+        fallback_map[cp] = ord('|')
+
+    # Box corners and intersections
+
+    # ┌ ┍ ┎ ┏
+    for cp in range(0x250C, 0x250F+1):
+        fallback_map[cp] = ord('+')
+
+    # ┐ ┑ ┒ ┓
+    for cp in range(0x2510, 0x2513+1):
+        fallback_map[cp] = ord('+')
+
+    # └ ┕ ┖ ┗
+    for cp in range(0x2514, 0x2517+1):
+        fallback_map[cp] = ord('+')
+
+    # ┘ ┙ ┚ ┛
+    for cp in range(0x2518, 0x251B+1):
+        fallback_map[cp] = ord('+')
+
+    # ├ ┝ ┞ ┟ ┠ ┡ ┢ ┣
+    for cp in range(0x251C, 0x2523+1):
+        fallback_map[cp] = ord('+')
+
+    # ┤ ┥ ┦ ┧ ┨ ┩ ┪ ┫
+    for cp in range(0x2524, 0x252B+1):
+        fallback_map[cp] = ord('+')
+
+    # ┬ ┭ ┮ ┯ ┰ ┱ ┲ ┳
+    for cp in range(0x252C, 0x2533+1):
+        fallback_map[cp] = ord('+')
+
+    # ┴ ┵ ┶ ┷ ┸ ┹ ┺ ┻
+    for cp in range(0x2534, 0x253B+1):
+        fallback_map[cp] = ord('+')
+
+    # ┼ ┽ ┾ ┿ ╀ ╁ ╂ ╃ ╄ ╅ ╆ ╇ ╈ ╉ ╊ ╋
+    for cp in range(0x253C, 0x254B+1):
+        fallback_map[cp] = ord('+')
+
+    # Double box drawing characters
+    fallback_map[0x2550] = ord('-')  # ═ BOX DRAWINGS DOUBLE HORIZONTAL
+    fallback_map[0x2551] = ord('|')  # ║ BOX DRAWINGS DOUBLE VERTICAL
+
+    # Double and mixed box corners and intersections
+    # ╒ ╓ ╔ - top-left corners
+    for cp in range(0x2552, 0x2554+1):
+        fallback_map[cp] = ord('+')
+
+    # ╕ ╖ ╗ - top-right corners
+    for cp in range(0x2555, 0x2557+1):
+        fallback_map[cp] = ord('+')
+
+    # ╘ ╙ ╚ - bottom-left corners
+    for cp in range(0x2558, 0x255A+1):
+        fallback_map[cp] = ord('+')
+
+    # ╛ ╜ ╝ - bottom-right corners
+    for cp in range(0x255B, 0x255D+1):
+        fallback_map[cp] = ord('+')
+
+    # ╞ ╟ ╠ - left T-junctions
+    for cp in range(0x255E, 0x2560+1):
+        fallback_map[cp] = ord('+')
+
+    # ╡ ╢ ╣ - right T-junctions
+    for cp in range(0x2561, 0x2563+1):
+        fallback_map[cp] = ord('+')
+
+    # ╤ ╥ ╦ - top T-junctions
+    for cp in range(0x2564, 0x2566+1):
+        fallback_map[cp] = ord('+')
+
+    # ╧ ╨ ╩ - bottom T-junctions
+    for cp in range(0x2567, 0x2569+1):
+        fallback_map[cp] = ord('+')
+
+    # ╪ ╫ ╬ - crosses
+    for cp in range(0x256A, 0x256C+1):
+        fallback_map[cp] = ord('+')
+
+    # Box drawing with arcs
+    # ╭ ╮ ╯ ╰
+    for cp in range(0x256D, 0x2570+1):
+        fallback_map[cp] = ord('+')
+
+    # Box drawing partials
+    # Horizontal segments - map to dash
+    # ╴ ╶ ╸ ╺ ╼ ╾
+    fallback_map[0x2574] = ord('-')  # light left
+    fallback_map[0x2576] = ord('-')  # light right
+    fallback_map[0x2578] = ord('-')  # heavy left
+    fallback_map[0x257A] = ord('-')  # heavy right
+    fallback_map[0x257C] = ord('-')  # light left and heavy right
+    fallback_map[0x257E] = ord('-')  # heavy left and light right
+
+    # Vertical segments - map to pipe
+    # ╵ ╷ ╹ ╻ ╽ ╿
+    fallback_map[0x2575] = ord('|')  # light up
+    fallback_map[0x2577] = ord('|')  # light down
+    fallback_map[0x2579] = ord('|')  # heavy up
+    fallback_map[0x257B] = ord('|')  # heavy down
+    fallback_map[0x257D] = ord('|')  # light up and heavy down
+    fallback_map[0x257F] = ord('|')  # heavy up and light down
+
+    # Block elements
+    # █ ▉ ▊ ▋ ▌ ▍ ▎ ▏ - map to #
+    for cp in range(0x2588, 0x258F+1):
+        fallback_map[cp] = ord('#')
+
+    # ▀ ▁ ▂ ▃ ▄ ▅ ▆ ▇
+    for cp in range(0x2580, 0x2587+1):
+        fallback_map[cp] = ord('#')
+
+    # Right side blocks
+    fallback_map[0x2590] = ord('#')  # ▐ RIGHT HALF BLOCK
+    fallback_map[0x2595] = ord('#')  # ▕ RIGHT ONE EIGHTH BLOCK
+    fallback_map[0x2594] = ord('#')  # ▔ UPPER ONE EIGHTH BLOCK
+
+    # Quadrant blocks
+    for cp in range(0x2596, 0x259F+1):
+        fallback_map[cp] = ord('#')  # Quadrant blocks (▖ ▗ ▘ ▙ ▚ ▛ ▜ ▝ ▞ ▟)
+
+    # ▓ ▒ ░ - map to different densities of shading
+    fallback_map[0x2593] = ord('#')  # ▓ Dark shade
+    fallback_map[0x2592] = ord('%')  # ▒ Medium shade
+    fallback_map[0x2591] = ord('.')  # ░ Light shade
+
+    # Additional square/rectangle characters
+    fallback_map[0x25AA] = ord('.')  # ▪ BLACK SMALL SQUARE
+    fallback_map[0x25AB] = ord('.')  # ▫ WHITE SMALL SQUARE
+    fallback_map[0x25AC] = ord('#')  # ▬ BLACK RECTANGLE
+    fallback_map[0x25AD] = ord('-')  # ▭ WHITE RECTANGLE
+    fallback_map[0x25AE] = ord('|')  # ▮ BLACK VERTICAL RECTANGLE
+    fallback_map[0x25AF] = ord('|')  # ▯ WHITE VERTICAL RECTANGLE
+
+    # Technical corner/bracket characters
+    # Bottom corners
+    fallback_map[0x23A3] = ord('|')  # ⎣ RIGHT BOTTOM CORNER -> |
+    fallback_map[0x23A6] = ord('|')  # ⎦ RIGHT SQUARE BRACKET LOWER CORNER -> |
+    fallback_map[0x23A9] = ord('|')  # ⎩ RIGHT SQUARE BRACKET LOWER CORNER WITH UPPER CORNER -> |
+    fallback_map[0x23B3] = ord('|')  # ⎳ BOTTOM CURLY BRACKET -> |
+    fallback_map[0x23B8] = ord('|')  # ⎸ LEFT VERTICAL BOX LINE -> |
+    fallback_map[0x23B9] = ord('|')  # ⎹ RIGHT VERTICAL BOX LINE -> |
+    fallback_map[0x23BD] = ord('_')  # ⎽ BOTTOM SQUARE BRACKET -> _
+    fallback_map[0x23BF] = ord('L')  # ⎿ BOTTOM RIGHT CORNER -> L
+    fallback_map[0x23BE] = ord('L')  # ⎾ TOP RIGHT CORNER -> L
+    fallback_map[0x23BC] = ord('J')  # ⎼ TOP SQUARE BRACKET -> J
+
+    # Top corners
+    fallback_map[0x23A1] = ord('|')  # ⎡ LEFT SQUARE BRACKET UPPER CORNER -> |
+    fallback_map[0x23A4] = ord('|')  # ⎤ RIGHT SQUARE BRACKET UPPER CORNER -> |
+    fallback_map[0x23A7] = ord('|')  # ⎧ LEFT CURLY BRACKET UPPER HOOK -> |
+    fallback_map[0x23AB] = ord('|')  # ⎫ RIGHT CURLY BRACKET UPPER HOOK -> |
+    fallback_map[0x23B0] = ord('(')  # ⎰ UPPER LEFT OR LOWER RIGHT CURLY BRACKET SECTION -> (
+    fallback_map[0x23B1] = ord(')')  # ⎱ UPPER RIGHT OR LOWER LEFT CURLY BRACKET SECTION -> )
+
+    # Other useful box-drawing-like characters
+    # Diagonal lines
+    # ╱ ╲ ╳
+    fallback_map[0x2571] = ord('/')   # ╱ Diagonal up-right to down-left
+    fallback_map[0x2572] = ord('\\')  # ╲ Diagonal up-left to down-right
+    fallback_map[0x2573] = ord('X')   # ╳ Diagonal cross
+
+    # Arrows to ASCII equivalent
+    # → ⇒ ⟹ etc. to ->
+    for cp in [0x2192, 0x21D2, 0x27F9]:
+        fallback_map[cp] = ord('>')  # Treat as '>' for simplicity
+
+    # ← ⇐ ⟸ etc. to <-
+    for cp in [0x2190, 0x21D0, 0x27F8]:
+        fallback_map[cp] = ord('<')  # Treat as '<' for simplicity
+
+    # ↑ ⇑ etc. to ^
+    for cp in [0x2191, 0x21D1]:
+        fallback_map[cp] = ord('^')
+
+    # ↓ ⇓ etc. to v
+    for cp in [0x2193, 0x21D3]:
+        fallback_map[cp] = ord('v')
+
+    # Mathematical symbols
+    fallback_map[0x00B1] = ord('+')  # ± PLUS-MINUS SIGN -> +
+    fallback_map[0x00D7] = ord('x')  # × MULTIPLICATION SIGN -> x
+    fallback_map[0x00F7] = ord('/')  # ÷ DIVISION SIGN -> /
+    fallback_map[0x2212] = ord('-')  # − MINUS SIGN -> -
+    fallback_map[0x2213] = ord('+')  # ∓ MINUS-OR-PLUS SIGN -> +
+    fallback_map[0x2215] = ord('/')  # ∕ DIVISION SLASH -> /
+    fallback_map[0x2216] = ord('\\')  # ∖ SET MINUS -> \
+    fallback_map[0x2217] = ord('*')  # ∗ ASTERISK OPERATOR -> *
+    fallback_map[0x2218] = ord('o')  # ∘ RING OPERATOR -> o
+    fallback_map[0x2219] = ord('.')  # ∙ BULLET OPERATOR -> .
+    fallback_map[0x221A] = ord('v')  # √ SQUARE ROOT -> v
+    fallback_map[0x221E] = ord('8')  # ∞ INFINITY -> 8
+    fallback_map[0x2223] = ord('|')  # ∣ DIVIDES -> |
+    fallback_map[0x2225] = ord('|')  # ∥ PARALLEL TO -> |
+    fallback_map[0x2227] = ord('&')  # ∧ LOGICAL AND -> & (C-style)
+    fallback_map[0x2228] = ord('|')  # ∨ LOGICAL OR -> | (C-style)
+    fallback_map[0x2229] = ord('n')  # ∩ INTERSECTION -> n
+    fallback_map[0x222A] = ord('u')  # ∪ UNION -> u
+    fallback_map[0x222B] = ord('S')  # ∫ INTEGRAL -> S
+    fallback_map[0x2234] = ord(':')  # ∴ THEREFORE -> :
+    fallback_map[0x2235] = ord(':')  # ∵ BECAUSE -> :
+    fallback_map[0x2248] = ord('~')  # ≈ ALMOST EQUAL TO -> ~
+    fallback_map[0x2264] = ord('<')  # ≤ LESS-THAN OR EQUAL TO -> <
+    fallback_map[0x2265] = ord('>')  # ≥ GREATER-THAN OR EQUAL TO -> >
+    fallback_map[0x2282] = ord('c')  # ⊂ SUBSET OF -> c
+    fallback_map[0x2283] = ord('C')  # ⊃ SUPERSET OF -> C
+    fallback_map[0x2286] = ord('c')  # ⊆ SUBSET OF OR EQUAL TO -> c
+    fallback_map[0x2287] = ord('C')  # ⊇ SUPERSET OF OR EQUAL TO -> C
+    fallback_map[0x22C5] = ord('.')  # ⋅ DOT OPERATOR -> .
+
+    # Currency symbols
+    fallback_map[0x00A2] = ord('c')  # ¢ CENT SIGN -> c
+    fallback_map[0x00A3] = ord('L')  # £ POUND SIGN -> L
+    fallback_map[0x00A5] = ord('Y')  # ¥ YEN SIGN -> Y
+    fallback_map[0x20AC] = ord('E')  # € EURO SIGN -> E
+
+    # Common symbols
+    fallback_map[0x00A9] = ord('C')  # © COPYRIGHT SIGN -> C
+    fallback_map[0x00AE] = ord('R')  # ® REGISTERED SIGN -> R
+    fallback_map[0x2122] = ord('T')  # ™ TRADE MARK SIGN -> T
+    fallback_map[0x00A7] = ord('S')  # § SECTION SIGN -> S
+    fallback_map[0x00B6] = ord('P')  # ¶ PILCROW SIGN -> P
+    fallback_map[0x00A6] = ord('|')  # ¦ BROKEN BAR -> |
+    fallback_map[0x00B0] = ord('o')  # ° DEGREE SIGN -> o
+    fallback_map[0x00B5] = ord('u')  # µ MICRO SIGN -> u
+    fallback_map[0x2103] = ord('C')  # ℃ DEGREE CELSIUS -> C
+    fallback_map[0x2109] = ord('F')  # ℉ DEGREE FAHRENHEIT -> F
+
+    # Superscript and subscript numbers
+    fallback_map[0x00B2] = ord('2')  # ² SUPERSCRIPT TWO -> 2
+    fallback_map[0x00B3] = ord('3')  # ³ SUPERSCRIPT THREE -> 3
+    fallback_map[0x00B9] = ord('1')  # ¹ SUPERSCRIPT ONE -> 1
+    fallback_map[0x2070] = ord('0')  # ⁰ SUPERSCRIPT ZERO -> 0
+    fallback_map[0x2074] = ord('4')  # ⁴ SUPERSCRIPT FOUR -> 4
+    fallback_map[0x2075] = ord('5')  # ⁵ SUPERSCRIPT FIVE -> 5
+    fallback_map[0x2076] = ord('6')  # ⁶ SUPERSCRIPT SIX -> 6
+    fallback_map[0x2077] = ord('7')  # ⁷ SUPERSCRIPT SEVEN -> 7
+    fallback_map[0x2078] = ord('8')  # ⁸ SUPERSCRIPT EIGHT -> 8
+    fallback_map[0x2079] = ord('9')  # ⁹ SUPERSCRIPT NINE -> 9
+    fallback_map[0x2080] = ord('0')  # ₀ SUBSCRIPT ZERO -> 0
+    fallback_map[0x2081] = ord('1')  # ₁ SUBSCRIPT ONE -> 1
+    fallback_map[0x2082] = ord('2')  # ₂ SUBSCRIPT TWO -> 2
+    fallback_map[0x2083] = ord('3')  # ₃ SUBSCRIPT THREE -> 3
+    fallback_map[0x2084] = ord('4')  # ₄ SUBSCRIPT FOUR -> 4
+    fallback_map[0x2085] = ord('5')  # ₅ SUBSCRIPT FIVE -> 5
+    fallback_map[0x2086] = ord('6')  # ₆ SUBSCRIPT SIX -> 6
+    fallback_map[0x2087] = ord('7')  # ₇ SUBSCRIPT SEVEN -> 7
+    fallback_map[0x2088] = ord('8')  # ₈ SUBSCRIPT EIGHT -> 8
+    fallback_map[0x2089] = ord('9')  # ₉ SUBSCRIPT NINE -> 9
+
+    # Common Greek letters used in math/science
+    fallback_map[0x03B1] = ord('a')  # α GREEK SMALL LETTER ALPHA -> a
+    fallback_map[0x03B2] = ord('B')  # β GREEK SMALL LETTER BETA -> B
+    fallback_map[0x03B3] = ord('y')  # γ GREEK SMALL LETTER GAMMA -> y
+    fallback_map[0x0393] = ord('I')  # Γ GREEK CAPITAL LETTER GAMMA -> I
+    fallback_map[0x03B4] = ord('d')  # δ GREEK SMALL LETTER DELTA -> d
+    fallback_map[0x0394] = ord('A')  # Δ GREEK CAPITAL LETTER DELTA -> A
+    fallback_map[0x03B5] = ord('e')  # ε GREEK SMALL LETTER EPSILON -> e
+    fallback_map[0x03B6] = ord('z')  # ζ GREEK SMALL LETTER ZETA -> z
+    fallback_map[0x03B7] = ord('n')  # η GREEK SMALL LETTER ETA -> n
+    fallback_map[0x03B8] = ord('0')  # θ GREEK SMALL LETTER THETA -> 0
+    fallback_map[0x0398] = ord('O')  # Θ GREEK CAPITAL LETTER THETA -> O
+    fallback_map[0x03BB] = ord('l')  # λ GREEK SMALL LETTER LAMBDA -> l
+    fallback_map[0x039B] = ord('A')  # Λ GREEK CAPITAL LETTER LAMBDA -> A
+    fallback_map[0x03BC] = ord('u')  # μ GREEK SMALL LETTER MU -> u
+    fallback_map[0x03C0] = ord('n')  # π GREEK SMALL LETTER PI -> n
+    fallback_map[0x03A0] = ord('n')  # Π GREEK CAPITAL LETTER PI -> n
+    fallback_map[0x03C1] = ord('p')  # ρ GREEK SMALL LETTER RHO -> p
+    fallback_map[0x03C3] = ord('o')  # σ GREEK SMALL LETTER SIGMA -> o
+    fallback_map[0x03A3] = ord('E')  # Σ GREEK CAPITAL LETTER SIGMA -> E
+    fallback_map[0x03C4] = ord('t')  # τ GREEK SMALL LETTER TAU -> t
+    fallback_map[0x03C6] = ord('f')  # φ GREEK SMALL LETTER PHI -> f
+    fallback_map[0x03A6] = ord('O')  # Φ GREEK CAPITAL LETTER PHI -> O
+    fallback_map[0x03C7] = ord('X')  # χ GREEK SMALL LETTER CHI -> X
+    fallback_map[0x03C8] = ord('w')  # ψ GREEK SMALL LETTER PSI -> w
+    fallback_map[0x03A8] = ord('Y')  # Ψ GREEK CAPITAL LETTER PSI -> Y
+    fallback_map[0x03C9] = ord('w')  # ω GREEK SMALL LETTER OMEGA -> w
+    fallback_map[0x03A9] = ord('O')  # Ω GREEK CAPITAL LETTER OMEGA -> O
+
+    # Additional punctuation
+    fallback_map[0x2018] = ord('\'')  # ' LEFT SINGLE QUOTATION MARK -> '
+    fallback_map[0x2019] = ord('\'')  # ' RIGHT SINGLE QUOTATION MARK -> '
+    fallback_map[0x201A] = ord(',')  # ‚ SINGLE LOW-9 QUOTATION MARK -> ,
+    fallback_map[0x201B] = ord('\'')  # ‛ SINGLE HIGH-REVERSED-9 QUOTATION MARK -> '
+    fallback_map[0x201C] = ord('"')  # " LEFT DOUBLE QUOTATION MARK -> "
+    fallback_map[0x201D] = ord('"')  # " RIGHT DOUBLE QUOTATION MARK -> "
+    fallback_map[0x201E] = ord('"')  # „ DOUBLE LOW-9 QUOTATION MARK -> "
+    fallback_map[0x201F] = ord('"')  # ‟ DOUBLE HIGH-REVERSED-9 QUOTATION MARK -> "
+    fallback_map[0x2026] = ord('.')  # … HORIZONTAL ELLIPSIS -> .
+    fallback_map[0x2039] = ord('<')  # ‹ SINGLE LEFT-POINTING ANGLE QUOTATION MARK -> <
+    fallback_map[0x203A] = ord('>')  # › SINGLE RIGHT-POINTING ANGLE QUOTATION MARK -> >
+    fallback_map[0x00AB] = ord('<')  # « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK -> <
+    fallback_map[0x00BB] = ord('>')  # » RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK -> >
+
+    # Various dashes and hyphens - all map to ASCII hyphen-minus
+    for cp in range(0x2010, 0x2015+1):
+        fallback_map[cp] = ord('-')  # All forms of hyphens and dashes -> -
+    fallback_map[0x2043] = ord('-')  # ⁃ HYPHEN BULLET -> -
+    fallback_map[0x2052] = ord('-')  # ⁒ COMMERCIAL MINUS SIGN -> -
+
+    # Other punctuation and symbols
+    fallback_map[0x2023] = ord('>')  # ‣ TRIANGULAR BULLET -> >
+    fallback_map[0x2027] = ord('.')  # ‧ HYPHENATION POINT -> .
+    fallback_map[0x2032] = ord('\'')  # ′ PRIME -> '
+    fallback_map[0x2033] = ord('"')  # ″ DOUBLE PRIME -> "
+    # Note: Triple prime (U+2034) intentionally omitted to avoid misleading fallback
+    fallback_map[0x203B] = ord('*')  # ※ REFERENCE MARK -> *
+    fallback_map[0x203C] = ord('!')  # ‼ DOUBLE EXCLAMATION MARK -> !
+    fallback_map[0x203D] = ord('?')  # ‽ INTERROBANG -> ?
+    fallback_map[0x2044] = ord('/')  # ⁄ FRACTION SLASH -> /
+    fallback_map[0x2047] = ord('?')  # ⁇ DOUBLE QUESTION MARK -> ?
+    fallback_map[0x2048] = ord('?')  # ⁈ QUESTION EXCLAMATION MARK -> ?
+    fallback_map[0x2049] = ord('!')  # ⁉ EXCLAMATION QUESTION MARK -> !
+    fallback_map[0x204A] = ord('&')  # ⁊ TIRONIAN SIGN ET -> &
+    fallback_map[0x204B] = ord('P')  # ⁋ REVERSED PILCROW SIGN -> P
+    fallback_map[0x204C] = ord('<')  # ⁌ BLACK LEFTWARDS BULLET -> <
+    fallback_map[0x204D] = ord('>')  # ⁍ BLACK RIGHTWARDS BULLET -> >
+    fallback_map[0x204E] = ord('*')  # ⁎ LOW ASTERISK -> *
+    fallback_map[0x204F] = ord(';')  # ⁏ REVERSED SEMICOLON -> ;
+    fallback_map[0x2053] = ord('~')  # ⁓ SWUNG DASH -> ~
+    fallback_map[0x2055] = ord('*')  # ⁕ FLOWER PUNCTUATION MARK -> *
+    fallback_map[0x205B] = ord(':')  # ⁛ FOUR DOT MARK -> :
+
+    # Precomposed negated symbols require special handling. Standard Unicode
+    # decomposition would strip the negation and preserve only the base symbol,
+    # which would reverse the intended meaning (e.g., "not equal" would become
+    # "equal"). This could lead to confusion or errors when reading text with
+    # fallback characters. Let's override decomposition by providing explicit
+    # mappings in a best effort at avoiding misleading interpretations.
+
+    # Negated mathematical operators
+    fallback_map[0x2204] = ord('!')  # ∄ THERE DOES NOT EXIST -> !
+    fallback_map[0x2209] = ord('!')  # ∉ NOT AN ELEMENT OF -> !
+    fallback_map[0x220C] = ord('!')  # ∌ DOES NOT CONTAIN AS MEMBER -> !
+    fallback_map[0x2224] = ord('!')  # ∤ DOES NOT DIVIDE -> !
+    fallback_map[0x2226] = ord('!')  # ∦ NOT PARALLEL TO -> !
+    fallback_map[0x2241] = ord('#')  # ≁ NOT TILDE -> # (better than ~)
+    fallback_map[0x2244] = ord('#')  # ≄ NOT ASYMPTOTICALLY EQUAL TO -> # (better than ~)
+    fallback_map[0x2249] = ord('#')  # ≉ NOT ALMOST EQUAL TO -> # (better than ~)
+    fallback_map[0x2260] = ord('#')  # ≠ NOT EQUAL TO -> # (better than =)
+    fallback_map[0x2262] = ord('#')  # ≢ NOT IDENTICAL TO -> # (better than =)
+    fallback_map[0x2268] = ord('#')  # ≨ LESS-THAN BUT NOT EQUAL TO -> # (better than <)
+    fallback_map[0x2269] = ord('#')  # ≩ GREATER-THAN BUT NOT EQUAL TO -> # (better than >)
+    fallback_map[0x226D] = ord('#')  # ≭ NOT EQUIVALENT TO -> # (better than =)
+    fallback_map[0x226E] = ord('!')  # ≮ NOT LESS-THAN -> ! (better than <)
+    fallback_map[0x226F] = ord('!')  # ≯ NOT GREATER-THAN -> ! (better than >)
+
+    # Negated set operators
+    fallback_map[0x2280] = ord('!')  # ⊀ DOES NOT PRECEDE -> !
+    fallback_map[0x2281] = ord('!')  # ⊁ DOES NOT SUCCEED -> !
+    fallback_map[0x2284] = ord('!')  # ⊄ NOT A SUBSET OF -> ! (better than c)
+    fallback_map[0x2285] = ord('!')  # ⊅ NOT A SUPERSET OF -> ! (better than C)
+    fallback_map[0x228A] = ord('#')  # ⊊ SUBSET OF WITH NOT EQUAL TO -> #
+    fallback_map[0x228B] = ord('#')  # ⊋ SUPERSET OF WITH NOT EQUAL TO -> #
+
+    # Negated logical operators
+    fallback_map[0x22AC] = ord('!')  # ⊬ DOES NOT PROVE -> !
+    fallback_map[0x22AD] = ord('!')  # ⊭ NOT TRUE -> !
+    fallback_map[0x22AE] = ord('!')  # ⊮ DOES NOT FORCE -> !
+    fallback_map[0x22E0] = ord('!')  # ⋠ DOES NOT PRECEDE OR EQUAL -> !
+    fallback_map[0x22E1] = ord('!')  # ⋡ DOES NOT SUCCEED OR EQUAL -> !
+    fallback_map[0x22EA] = ord('!')  # ⋪ NOT NORMAL SUBGROUP OF -> !
+    fallback_map[0x22EB] = ord('!')  # ⋫ DOES NOT CONTAIN AS NORMAL SUBGROUP -> !
+
+    # Negated arrows
+    fallback_map[0x219A] = ord('!')  # ↚ LEFTWARDS ARROW WITH STROKE -> !
+    fallback_map[0x219B] = ord('!')  # ↛ RIGHTWARDS ARROW WITH STROKE -> !
+    fallback_map[0x21AE] = ord('!')  # ↮ LEFT RIGHT ARROW WITH STROKE -> !
+    fallback_map[0x21CD] = ord('!')  # ⇍ LEFTWARDS DOUBLE ARROW WITH STROKE -> !
+    fallback_map[0x21CE] = ord('!')  # ⇎ LEFT RIGHT DOUBLE ARROW WITH STROKE -> !
+    fallback_map[0x21CF] = ord('!')  # ⇏ RIGHTWARDS DOUBLE ARROW WITH STROKE -> !
+
+    # Bullets and geometric shapes
+    # • ◦ ○ ◎ ● ◆ ■ □ ▲ △ ▼ ▽
+    fallback_map[0x2022] = ord('*')  # • Bullet
+    fallback_map[0x25E6] = ord('o')  # ◦ White bullet
+    fallback_map[0x25CB] = ord('o')  # ○ White circle
+    fallback_map[0x25CE] = ord('o')  # ◎ Bullseye
+    fallback_map[0x25CF] = ord('*')  # ● Black circle
+    fallback_map[0x25C6] = ord('*')  # ◆ Black diamond
+    fallback_map[0x25A0] = ord('#')  # ■ Black square
+    fallback_map[0x25A1] = ord('o')  # □ White square
+    fallback_map[0x25B2] = ord('^')  # ▲ Black up-pointing triangle
+    fallback_map[0x25B3] = ord('^')  # △ White up-pointing triangle
+    fallback_map[0x25BC] = ord('v')  # ▼ Black down-pointing triangle
+    fallback_map[0x25BD] = ord('v')  # ▽ White down-pointing triangle
+
+    # Middle dot and other punctuation
+    fallback_map[0x00B7] = ord('.')  # · MIDDLE DOT
+    fallback_map[0x0387] = ord('.')  # · GREEK ANO TELEIA (identical to middle dot)
+    fallback_map[0x2027] = ord('.')  # ‧ HYPHENATION POINT
+    fallback_map[0x2219] = ord('.')  # ∙ BULLET OPERATOR
+    fallback_map[0x22C5] = ord('.')  # ⋅ DOT OPERATOR
+    fallback_map[0x00A1] = ord('!')  # ¡ INVERTED EXCLAMATION MARK
+    fallback_map[0x00BF] = ord('?')  # ¿ INVERTED QUESTION MARK
+    fallback_map[0x203D] = ord('?')  # ‽ INTERROBANG
+
+    # Note: Vulgar fractions (like ½, ¼, etc.) are intentionally not mapped to ASCII.
+    # Using just the numerator (e.g., ½ → 1) would be misleading, and there's no good
+    # single-character ASCII representation for fractions.
+
+    # Check marks and X marks
+    fallback_map[0x2713] = ord('v')  # ✓ CHECK MARK
+    fallback_map[0x2714] = ord('V')  # ✔ HEAVY CHECK MARK
+    fallback_map[0x2715] = ord('x')  # ✕ MULTIPLICATION X
+    fallback_map[0x2716] = ord('X')  # ✖ HEAVY MULTIPLICATION X
+    fallback_map[0x2717] = ord('x')  # ✗ BALLOT X
+    fallback_map[0x2718] = ord('X')  # ✘ HEAVY BALLOT X
+
+    # Asterism and asterisk variants
+    fallback_map[0x2042] = ord('*')  # ⁂ ASTERISM
+    fallback_map[0x204E] = ord('*')  # ⁎ LOW ASTERISK
+    fallback_map[0x2051] = ord('*')  # ⁑ TWO ASTERISKS ALIGNED VERTICALLY
+    fallback_map[0x2055] = ord('*')  # ⁕ FLOWER PUNCTUATION MARK
+    fallback_map[0x2217] = ord('*')  # ∗ ASTERISK OPERATOR
+    fallback_map[0x229B] = ord('*')  # ⊛ CIRCLED ASTERISK OPERATOR
+    fallback_map[0x22C6] = ord('*')  # ⋆ STAR OPERATOR
+    fallback_map[0x235F] = ord('*')  # ⍟ APL FUNCTIONAL SYMBOL CIRCLE STAR
+    fallback_map[0x2363] = ord('*')  # ⍣ APL FUNCTIONAL SYMBOL STAR DIAERESIS
+
+    # Stars
+    fallback_map[0x2605] = ord('*')  # ★ BLACK STAR
+    fallback_map[0x2606] = ord('*')  # ☆ WHITE STAR
+    fallback_map[0x262A] = ord('*')  # ☪ STAR AND CRESCENT
+    fallback_map[0x269D] = ord('*')  # ⚝ OUTLINED WHITE STAR
+    fallback_map[0x2721] = ord('*')  # ✡ STAR OF DAVID
+    fallback_map[0x2726] = ord('*')  # ✦ BLACK FOUR POINTED STAR
+    fallback_map[0x2727] = ord('*')  # ✧ WHITE FOUR POINTED STAR
+    fallback_map[0x2729] = ord('*')  # ✩ STRESS OUTLINED WHITE STAR
+    fallback_map[0x272A] = ord('*')  # ✪ CIRCLED WHITE STAR
+    fallback_map[0x272B] = ord('*')  # ✫ OPEN CENTRE BLACK STAR
+    fallback_map[0x272C] = ord('*')  # ✬ BLACK CENTRE WHITE STAR
+    fallback_map[0x272D] = ord('*')  # ✭ OUTLINED BLACK STAR
+    fallback_map[0x272E] = ord('*')  # ✮ HEAVY OUTLINED BLACK STAR
+    fallback_map[0x272F] = ord('*')  # ✯ PINWHEEL STAR
+    fallback_map[0x2730] = ord('*')  # ✰ SHADOWED WHITE STAR
+    fallback_map[0x2734] = ord('*')  # ✴ EIGHT POINTED BLACK STAR
+    fallback_map[0x2735] = ord('*')  # ✵ EIGHT POINTED PINWHEEL STAR
+    fallback_map[0x2736] = ord('*')  # ✶ SIX POINTED BLACK STAR
+    fallback_map[0x2737] = ord('*')  # ✷ EIGHT POINTED RECTILINEAR BLACK STAR
+    fallback_map[0x2738] = ord('*')  # ✸ HEAVY EIGHT POINTED RECTILINEAR BLACK STAR
+    fallback_map[0x2739] = ord('*')  # ✹ TWELVE POINTED BLACK STAR
+
+    # Asterisk variants
+    fallback_map[0x273A] = ord('*')  # ✺ SIXTEEN POINTED ASTERISK
+    fallback_map[0x273B] = ord('*')  # ✻ TEARDROP-SPOKED ASTERISK
+    fallback_map[0x273C] = ord('*')  # ✼ OPEN CENTRE TEARDROP-SPOKED ASTERISK
+    fallback_map[0x273D] = ord('*')  # ✽ HEAVY TEARDROP-SPOKED ASTERISK
+    fallback_map[0x2722] = ord('*')  # ✢ FOUR TEARDROP-SPOKED ASTERISK
+    fallback_map[0x2723] = ord('*')  # ✣ FOUR BALLOON-SPOKED ASTERISK
+    fallback_map[0x2724] = ord('*')  # ✤ HEAVY FOUR BALLOON-SPOKED ASTERISK
+    fallback_map[0x2725] = ord('*')  # ✥ FOUR CLUB-SPOKED ASTERISK
+    fallback_map[0x2731] = ord('*')  # ✱ HEAVY ASTERISK
+    fallback_map[0x2732] = ord('*')  # ✲ OPEN CENTRE ASTERISK
+    fallback_map[0x2733] = ord('*')  # ✳ EIGHT SPOKED ASTERISK
+    fallback_map[0x2749] = ord('*')  # ❉ BALLOON-SPOKED ASTERISK
+    fallback_map[0x274A] = ord('*')  # ❊ EIGHT TEARDROP-SPOKED PROPELLER ASTERISK
+    fallback_map[0x274B] = ord('*')  # ❋ HEAVY EIGHT TEARDROP-SPOKED PROPELLER ASTERISK
+    fallback_map[0x2743] = ord('*')  # ❃ HEAVY TEARDROP-SPOKED PINWHEEL ASTERISK
+
+    # Florettes and snowflakes
+    fallback_map[0x273E] = ord('*')  # ✾ SIX PETALLED BLACK AND WHITE FLORETTE
+    fallback_map[0x273F] = ord('*')  # ✿ BLACK FLORETTE
+    fallback_map[0x2740] = ord('*')  # ❀ WHITE FLORETTE
+    fallback_map[0x2741] = ord('*')  # ❁ EIGHT PETALLED OUTLINED BLACK FLORETTE
+    fallback_map[0x2742] = ord('*')  # ❂ CIRCLED OPEN CENTRE EIGHT POINTED STAR
+    fallback_map[0x2744] = ord('*')  # ❄ SNOWFLAKE
+    fallback_map[0x2745] = ord('*')  # ❅ TIGHT TRIFOLIATE SNOWFLAKE
+    fallback_map[0x2746] = ord('*')  # ❆ HEAVY CHEVRON SNOWFLAKE
+    fallback_map[0x2698] = ord('*')  # ⚘ FLOWER
+
+    # Add special ASCII characters with full-width equivalents
+    # Map between full-width and ASCII forms
+    for i, cp in enumerate(range(0xFF01, 0xFF5E+1)):
+        # Full-width to ASCII mapping (covering all printable ASCII 33-126)
+        # 0xFF01 (!) to 0xFF5E (~) -> ASCII 33 (!) to 126 (~)
+        fallback_map[cp] = 33 + i
+
+    return fallback_map
+
+def collect_decomposition_pairs():
+    """Collect all possible decomposition pairs from the Unicode data."""
+    # Map to store decomposition pairs: composite -> base
+    fallback_map = {}
+
+    # Process all assigned Unicode code points in BMP (Basic Multilingual Plane)
+    # We limit to BMP (0x0000-0xFFFF) to keep our table smaller with uint16_t
+    for cp in range(0, 0x10000):
+        try:
+            char = chr(cp)
+
+            # Skip unassigned or control characters
+            if not unicodedata.name(char, ''):
+                continue
+
+            # Find decomposition
+            decomp = unicodedata.decomposition(char)
+            if not decomp or '<' in decomp:  # Skip compatibility decompositions
+                continue
+
+            # Parse the decomposition
+            parts = decomp.split()
+            if len(parts) == 2:  # Simple base + combining mark
+                base = int(parts[0], 16)
+                combining = int(parts[1], 16)
+
+                # Only store if base is in BMP
+                if base < 0x10000:
+                    fallback_map[cp] = base
+
+        except (ValueError, TypeError):
+            continue
+
+    return fallback_map
+
+def create_hybrid_tables(fallback_map):
+    """
+    Create optimized hybrid tables for fallback characters.
+
+    Args:
+        fallback_map: The original mapping of complex characters to base characters
+
+    Returns:
+        A tuple of (intervals, singles, dropped) where:
+        - intervals: List of (first, last, fallback) tuples
+        - singles: List of (codepoint, fallback) tuples
+        - dropped: List of (codepoint, fallback) tuples
+    """
+
+    # Create a map with fallbacks that fit in a single byte (≤ 0xFF)
+    # following fallback chains until a suitable byte-sized fallback is found.
+    # Using byte-sized fallbacks saves table space and Latin-1 glyphs are
+    # more likely to exist. Runtime code may recurse further if necessary.
+    byte_fallback_map = {}
+    dropped = []
+    for complex_char, fallback in fallback_map.items():
+        while fallback > 0xFF and fallback in fallback_map:
+            fallback = fallback_map[fallback]
+        if fallback <= 0xFF:
+            byte_fallback_map[complex_char] = fallback
+        else:
+            dropped.append((complex_char, fallback))
+
+    # Group characters by their base character
+    base_groups = defaultdict(list)
+    for complex_char, base in byte_fallback_map.items():
+        base_groups[base].append(complex_char)
+
+    # Sort complex characters in each group
+    for base in base_groups:
+        base_groups[base].sort()
+
+    # Create interval tables and single-entry tables
+    intervals = []
+    singles = []
+
+    for base, complex_char_list in base_groups.items():
+        # Identify continuous ranges
+        ranges = []
+        current_range = [complex_char_list[0], complex_char_list[0]]
+
+        for i in range(1, len(complex_char_list)):
+            if complex_char_list[i] == current_range[1] + 1:
+                # Extend current range
+                current_range[1] = complex_char_list[i]
+            else:
+                # Finish current range and start a new one
+                ranges.append(tuple(current_range))
+                current_range = [complex_char_list[i], complex_char_list[i]]
+
+        # Add the last range
+        ranges.append(tuple(current_range))
+
+        # Add to appropriate table
+        for first, last in ranges:
+            if first == last:
+                # Single entry
+                singles.append((first, base))
+            else:
+                # Range
+                intervals.append((first, last, base))
+
+    # Sort tables by first code point for binary search
+    intervals.sort()
+    singles.sort()
+
+    return intervals, singles, dropped
+
+def cp_name(cp):
+    try:
+        return unicodedata.name(chr(cp))
+    except:
+        return f"U+{cp:04X}"
+
+def generate_fallback_tables(out_file=DEFAULT_OUT_FILE):
+    """Generate the fallback character tables."""
+    # Collect standard decomposition pairs
+    decomposition_map = collect_decomposition_pairs()
+    print(f"Collected {len(decomposition_map)} standard decomposition pairs")
+
+    # Collect composed Latin letters
+    latin_map = collect_accented_latin_letters()
+    print(f"Collected {len(latin_map)} already composed Latin letter mappings")
+
+    # Collect drawing character mappings
+    drawing_map = collect_drawing_character_mappings()
+    print(f"Collected {len(drawing_map)} drawing character mappings")
+
+    # Combine maps - prioritize explicit mappings over decomposition
+    # This ensures that composed characters can be handled directly even if
+    # decomposition would also work
+    fallback_map = {**decomposition_map, **latin_map, **drawing_map}
+    print(f"Combined into {len(fallback_map)} total mappings")
+
+    # Create hybrid tables with fallback values limited to 1 byte (0xFF)
+    intervals, singles, dropped = create_hybrid_tables(fallback_map)
+    print(f"Dropped {len(dropped)} mappings whose fallback was larger than a byte")
+    print(f"Created {len(intervals)} intervals and {len(singles)} single entries")
+
+    # Generate C tables
+    with open(out_file, 'w') as f:
+        f.write(f"""\
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * {out_file} - Unicode character fallback table for display simplification
+ *
+ * Auto-generated by {this_file}
+ *
+ * Unicode Version: {unicodedata.unidata_version}
+ *
+ * This file contains tables that map complex Unicode characters to simpler
+ * fallback characters for terminal display when corresponding glyphs are
+ * unavailable.
+ */
+
+static const struct ucs_interval16 ucs_fallback_intervals[] = {{
+""")
+
+        # Write interval table
+        for first, last, fallback in intervals:
+            comment = f"/* {cp_name(first)} - {cp_name(last)} -> {cp_name(fallback)} */"
+            f.write(f"\t{{ 0x{first:04X}, 0x{last:04X}, }}, {comment}\n")
+
+        f.write("""\
+};
+
+static const u8 ucs_fallback_intervals_subs[] = {
+""")
+
+        # Write interval fallback character table
+        for first, last, fallback in intervals:
+            comment = f"/* {cp_name(first)} - {cp_name(last)} -> {cp_name(fallback)} */"
+            f.write(f"\t0x{fallback:02X}, {comment}\n")
+
+        f.write("""\
+};
+
+static const u16 ucs_fallback_singles[] = {
+""")
+
+        # Write single entry table
+        for codepoint, fallback in singles:
+            comment = f"/* {cp_name(codepoint)} -> {cp_name(fallback)} */"
+            f.write(f"\t0x{codepoint:04X}, {comment}\n")
+
+        f.write("""\
+};
+
+static const u8 ucs_fallback_singles_subs[] = {
+""")
+
+        # Write single fallback character table
+        for codepoint, fallback in singles:
+            comment = f"/* {cp_name(codepoint)} -> {cp_name(fallback)} */"
+            f.write(f"\t0x{fallback:02X}, {comment}\n")
+
+        f.write("};\n")
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Generate Unicode fallback character tables")
+    parser.add_argument("-o", "--output", dest="output_file", default=DEFAULT_OUT_FILE,
+                       help=f"Output file name (default: {DEFAULT_OUT_FILE})")
+    args = parser.parse_args()
+
+    generate_fallback_tables(out_file=args.output_file)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/8] vt: create ucs_fallback_table.h_shipped with gen_ucs_fallback_table.py
  2025-05-05 16:55 [PATCH 0/8] vt: more Unicode handling changes Nicolas Pitre
                   ` (3 preceding siblings ...)
  2025-05-05 16:55 ` [PATCH 4/8] vt: introduce gen_ucs_fallback_table.py to create ucs_fallback_table.h Nicolas Pitre
@ 2025-05-05 16:55 ` Nicolas Pitre
  2025-05-05 16:55 ` [PATCH 6/8] vt: add ucs_get_fallback() Nicolas Pitre
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2025-05-05 16:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby; +Cc: Nicolas Pitre, linux-serial, linux-kernel

From: Nicolas Pitre <npitre@baylibre.com>

The generated table maps complex characters to their simpler fallback
forms for a terminal display when corresponding glyphs are unavailable.
Fallback characters are limited to 8-bits LATIN-1 and stored in a separate
table to save space.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
---
 drivers/tty/vt/.gitignore                   |    1 +
 drivers/tty/vt/Makefile                     |    5 +-
 drivers/tty/vt/ucs_fallback_table.h_shipped | 1686 +++++++++++++++++++
 3 files changed, 1691 insertions(+), 1 deletion(-)
 create mode 100644 drivers/tty/vt/ucs_fallback_table.h_shipped

diff --git a/drivers/tty/vt/.gitignore b/drivers/tty/vt/.gitignore
index 49ce44edad65..a74859bab862 100644
--- a/drivers/tty/vt/.gitignore
+++ b/drivers/tty/vt/.gitignore
@@ -2,5 +2,6 @@
 /conmakehash
 /consolemap_deftbl.c
 /defkeymap.c
+/ucs_fallback_table.h
 /ucs_recompose_table.h
 /ucs_width_table.h
diff --git a/drivers/tty/vt/Makefile b/drivers/tty/vt/Makefile
index 8ba33cc942c7..509362a3e11e 100644
--- a/drivers/tty/vt/Makefile
+++ b/drivers/tty/vt/Makefile
@@ -12,7 +12,7 @@ obj-$(CONFIG_CONSOLE_TRANSLATIONS)	+= consolemap.o consolemap_deftbl.o \
 
 # Files generated that shall be removed upon make clean
 clean-files :=	consolemap_deftbl.c defkeymap.c \
-		ucs_width_table.h ucs_recompose_table.h
+		ucs_width_table.h ucs_recompose_table.h ucs_fallback_table.h
 
 hostprogs += conmakehash
 
@@ -58,4 +58,7 @@ endif
 $(obj)/ucs_recompose_table.h: $(src)/gen_ucs_recompose_table.py
 	$(PYTHON3) $< -o $@ $(gen_recomp_arg)
 
+$(obj)/ucs_fallback_table.h: $(src)/gen_ucs_fallback_table.py
+	$(PYTHON3) $< -o $@
+
 endif
diff --git a/drivers/tty/vt/ucs_fallback_table.h_shipped b/drivers/tty/vt/ucs_fallback_table.h_shipped
new file mode 100644
index 000000000000..d528d500ec9d
--- /dev/null
+++ b/drivers/tty/vt/ucs_fallback_table.h_shipped
@@ -0,0 +1,1686 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * ucs_fallback_table.h - Unicode character fallback table for display simplification
+ *
+ * Auto-generated by gen_ucs_fallback_table.py
+ *
+ * Unicode Version: 16.0.0
+ *
+ * This file contains tables that map complex Unicode characters to simpler
+ * fallback characters for terminal display when corresponding glyphs are
+ * unavailable.
+ */
+
+static const struct ucs_interval16 ucs_fallback_intervals[] = {
+	{ 0x00C0, 0x00C5, }, /* LATIN CAPITAL LETTER A WITH GRAVE - LATIN CAPITAL LETTER A WITH RING ABOVE -> LATIN CAPITAL LETTER A */
+	{ 0x00C8, 0x00CB, }, /* LATIN CAPITAL LETTER E WITH GRAVE - LATIN CAPITAL LETTER E WITH DIAERESIS -> LATIN CAPITAL LETTER E */
+	{ 0x00CC, 0x00CF, }, /* LATIN CAPITAL LETTER I WITH GRAVE - LATIN CAPITAL LETTER I WITH DIAERESIS -> LATIN CAPITAL LETTER I */
+	{ 0x00D2, 0x00D6, }, /* LATIN CAPITAL LETTER O WITH GRAVE - LATIN CAPITAL LETTER O WITH DIAERESIS -> LATIN CAPITAL LETTER O */
+	{ 0x00D9, 0x00DC, }, /* LATIN CAPITAL LETTER U WITH GRAVE - LATIN CAPITAL LETTER U WITH DIAERESIS -> LATIN CAPITAL LETTER U */
+	{ 0x00E0, 0x00E5, }, /* LATIN SMALL LETTER A WITH GRAVE - LATIN SMALL LETTER A WITH RING ABOVE -> LATIN SMALL LETTER A */
+	{ 0x00E8, 0x00EB, }, /* LATIN SMALL LETTER E WITH GRAVE - LATIN SMALL LETTER E WITH DIAERESIS -> LATIN SMALL LETTER E */
+	{ 0x00EC, 0x00EF, }, /* LATIN SMALL LETTER I WITH GRAVE - LATIN SMALL LETTER I WITH DIAERESIS -> LATIN SMALL LETTER I */
+	{ 0x00F2, 0x00F6, }, /* LATIN SMALL LETTER O WITH GRAVE - LATIN SMALL LETTER O WITH DIAERESIS -> LATIN SMALL LETTER O */
+	{ 0x00F9, 0x00FC, }, /* LATIN SMALL LETTER U WITH GRAVE - LATIN SMALL LETTER U WITH DIAERESIS -> LATIN SMALL LETTER U */
+	{ 0x03C8, 0x03C9, }, /* GREEK SMALL LETTER PSI - GREEK SMALL LETTER OMEGA -> LATIN SMALL LETTER W */
+	{ 0x1F00, 0x1F07, }, /* GREEK SMALL LETTER ALPHA WITH PSILI - GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI -> LATIN SMALL LETTER A */
+	{ 0x1F10, 0x1F15, }, /* GREEK SMALL LETTER EPSILON WITH PSILI - GREEK SMALL LETTER EPSILON WITH DASIA AND OXIA -> LATIN SMALL LETTER E */
+	{ 0x1F20, 0x1F27, }, /* GREEK SMALL LETTER ETA WITH PSILI - GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI -> LATIN SMALL LETTER N */
+	{ 0x1F60, 0x1F67, }, /* GREEK SMALL LETTER OMEGA WITH PSILI - GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI -> LATIN SMALL LETTER W */
+	{ 0x1F68, 0x1F6F, }, /* GREEK CAPITAL LETTER OMEGA WITH PSILI - GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI -> LATIN CAPITAL LETTER O */
+	{ 0x1F80, 0x1F87, }, /* GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI - GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI -> LATIN SMALL LETTER A */
+	{ 0x1F90, 0x1F97, }, /* GREEK SMALL LETTER ETA WITH PSILI AND YPOGEGRAMMENI - GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI -> LATIN SMALL LETTER N */
+	{ 0x1FA0, 0x1FA7, }, /* GREEK SMALL LETTER OMEGA WITH PSILI AND YPOGEGRAMMENI - GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI -> LATIN SMALL LETTER W */
+	{ 0x1FA8, 0x1FAF, }, /* GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI - GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI -> LATIN CAPITAL LETTER O */
+	{ 0x1FB0, 0x1FB4, }, /* GREEK SMALL LETTER ALPHA WITH VRACHY - GREEK SMALL LETTER ALPHA WITH OXIA AND YPOGEGRAMMENI -> LATIN SMALL LETTER A */
+	{ 0x1FB6, 0x1FB7, }, /* GREEK SMALL LETTER ALPHA WITH PERISPOMENI - GREEK SMALL LETTER ALPHA WITH PERISPOMENI AND YPOGEGRAMMENI -> LATIN SMALL LETTER A */
+	{ 0x1FC2, 0x1FC4, }, /* GREEK SMALL LETTER ETA WITH VARIA AND YPOGEGRAMMENI - GREEK SMALL LETTER ETA WITH OXIA AND YPOGEGRAMMENI -> LATIN SMALL LETTER N */
+	{ 0x1FC6, 0x1FC7, }, /* GREEK SMALL LETTER ETA WITH PERISPOMENI - GREEK SMALL LETTER ETA WITH PERISPOMENI AND YPOGEGRAMMENI -> LATIN SMALL LETTER N */
+	{ 0x1FE4, 0x1FE5, }, /* GREEK SMALL LETTER RHO WITH PSILI - GREEK SMALL LETTER RHO WITH DASIA -> LATIN SMALL LETTER P */
+	{ 0x1FF2, 0x1FF4, }, /* GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI - GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI -> LATIN SMALL LETTER W */
+	{ 0x1FF6, 0x1FF7, }, /* GREEK SMALL LETTER OMEGA WITH PERISPOMENI - GREEK SMALL LETTER OMEGA WITH PERISPOMENI AND YPOGEGRAMMENI -> LATIN SMALL LETTER W */
+	{ 0x2000, 0x200A, }, /* EN QUAD - HAIR SPACE -> SPACE */
+	{ 0x2010, 0x2015, }, /* HYPHEN - HORIZONTAL BAR -> HYPHEN-MINUS */
+	{ 0x2018, 0x2019, }, /* LEFT SINGLE QUOTATION MARK - RIGHT SINGLE QUOTATION MARK -> APOSTROPHE */
+	{ 0x201C, 0x201F, }, /* LEFT DOUBLE QUOTATION MARK - DOUBLE HIGH-REVERSED-9 QUOTATION MARK -> QUOTATION MARK */
+	{ 0x2026, 0x2027, }, /* HORIZONTAL ELLIPSIS - HYPHENATION POINT -> FULL STOP */
+	{ 0x2047, 0x2048, }, /* DOUBLE QUESTION MARK - QUESTION EXCLAMATION MARK -> QUESTION MARK */
+	{ 0x219A, 0x219B, }, /* LEFTWARDS ARROW WITH STROKE - RIGHTWARDS ARROW WITH STROKE -> EXCLAMATION MARK */
+	{ 0x21CD, 0x21CF, }, /* LEFTWARDS DOUBLE ARROW WITH STROKE - RIGHTWARDS DOUBLE ARROW WITH STROKE -> EXCLAMATION MARK */
+	{ 0x2234, 0x2235, }, /* THEREFORE - BECAUSE -> COLON */
+	{ 0x2268, 0x2269, }, /* LESS-THAN BUT NOT EQUAL TO - GREATER-THAN BUT NOT EQUAL TO -> NUMBER SIGN */
+	{ 0x226E, 0x226F, }, /* NOT LESS-THAN - NOT GREATER-THAN -> EXCLAMATION MARK */
+	{ 0x2280, 0x2281, }, /* DOES NOT PRECEDE - DOES NOT SUCCEED -> EXCLAMATION MARK */
+	{ 0x2284, 0x2285, }, /* NOT A SUBSET OF - NOT A SUPERSET OF -> EXCLAMATION MARK */
+	{ 0x228A, 0x228B, }, /* SUBSET OF WITH NOT EQUAL TO - SUPERSET OF WITH NOT EQUAL TO -> NUMBER SIGN */
+	{ 0x22AC, 0x22AE, }, /* DOES NOT PROVE - DOES NOT FORCE -> EXCLAMATION MARK */
+	{ 0x22E0, 0x22E1, }, /* DOES NOT PRECEDE OR EQUAL - DOES NOT SUCCEED OR EQUAL -> EXCLAMATION MARK */
+	{ 0x22EA, 0x22EB, }, /* NOT NORMAL SUBGROUP OF - DOES NOT CONTAIN AS NORMAL SUBGROUP -> EXCLAMATION MARK */
+	{ 0x23A3, 0x23A4, }, /* LEFT SQUARE BRACKET LOWER CORNER - RIGHT SQUARE BRACKET UPPER CORNER -> VERTICAL LINE */
+	{ 0x23A6, 0x23A7, }, /* RIGHT SQUARE BRACKET LOWER CORNER - LEFT CURLY BRACKET UPPER HOOK -> VERTICAL LINE */
+	{ 0x23B8, 0x23B9, }, /* LEFT VERTICAL BOX LINE - RIGHT VERTICAL BOX LINE -> VERTICAL LINE */
+	{ 0x23BE, 0x23BF, }, /* DENTISTRY SYMBOL LIGHT VERTICAL AND TOP RIGHT - DENTISTRY SYMBOL LIGHT VERTICAL AND BOTTOM RIGHT -> LATIN CAPITAL LETTER L */
+	{ 0x2500, 0x2501, }, /* BOX DRAWINGS LIGHT HORIZONTAL - BOX DRAWINGS HEAVY HORIZONTAL -> HYPHEN-MINUS */
+	{ 0x2502, 0x2503, }, /* BOX DRAWINGS LIGHT VERTICAL - BOX DRAWINGS HEAVY VERTICAL -> VERTICAL LINE */
+	{ 0x250C, 0x254B, }, /* BOX DRAWINGS LIGHT DOWN AND RIGHT - BOX DRAWINGS HEAVY VERTICAL AND HORIZONTAL -> PLUS SIGN */
+	{ 0x2552, 0x2570, }, /* BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE - BOX DRAWINGS LIGHT ARC UP AND RIGHT -> PLUS SIGN */
+	{ 0x2580, 0x2590, }, /* UPPER HALF BLOCK - RIGHT HALF BLOCK -> NUMBER SIGN */
+	{ 0x2593, 0x25A0, }, /* DARK SHADE - BLACK SQUARE -> NUMBER SIGN */
+	{ 0x25AA, 0x25AB, }, /* BLACK SMALL SQUARE - WHITE SMALL SQUARE -> FULL STOP */
+	{ 0x25AE, 0x25AF, }, /* BLACK VERTICAL RECTANGLE - WHITE VERTICAL RECTANGLE -> VERTICAL LINE */
+	{ 0x25B2, 0x25B3, }, /* BLACK UP-POINTING TRIANGLE - WHITE UP-POINTING TRIANGLE -> CIRCUMFLEX ACCENT */
+	{ 0x25BC, 0x25BD, }, /* BLACK DOWN-POINTING TRIANGLE - WHITE DOWN-POINTING TRIANGLE -> LATIN SMALL LETTER V */
+	{ 0x2605, 0x2606, }, /* BLACK STAR - WHITE STAR -> ASTERISK */
+	{ 0x2721, 0x2727, }, /* STAR OF DAVID - WHITE FOUR POINTED STAR -> ASTERISK */
+	{ 0x2729, 0x2746, }, /* STRESS OUTLINED WHITE STAR - HEAVY CHEVRON SNOWFLAKE -> ASTERISK */
+	{ 0x2749, 0x274B, }, /* BALLOON-SPOKED ASTERISK - HEAVY EIGHT TEARDROP-SPOKED PROPELLER ASTERISK -> ASTERISK */
+};
+
+static const u8 ucs_fallback_intervals_subs[] = {
+	0x41, /* LATIN CAPITAL LETTER A WITH GRAVE - LATIN CAPITAL LETTER A WITH RING ABOVE -> LATIN CAPITAL LETTER A */
+	0x45, /* LATIN CAPITAL LETTER E WITH GRAVE - LATIN CAPITAL LETTER E WITH DIAERESIS -> LATIN CAPITAL LETTER E */
+	0x49, /* LATIN CAPITAL LETTER I WITH GRAVE - LATIN CAPITAL LETTER I WITH DIAERESIS -> LATIN CAPITAL LETTER I */
+	0x4F, /* LATIN CAPITAL LETTER O WITH GRAVE - LATIN CAPITAL LETTER O WITH DIAERESIS -> LATIN CAPITAL LETTER O */
+	0x55, /* LATIN CAPITAL LETTER U WITH GRAVE - LATIN CAPITAL LETTER U WITH DIAERESIS -> LATIN CAPITAL LETTER U */
+	0x61, /* LATIN SMALL LETTER A WITH GRAVE - LATIN SMALL LETTER A WITH RING ABOVE -> LATIN SMALL LETTER A */
+	0x65, /* LATIN SMALL LETTER E WITH GRAVE - LATIN SMALL LETTER E WITH DIAERESIS -> LATIN SMALL LETTER E */
+	0x69, /* LATIN SMALL LETTER I WITH GRAVE - LATIN SMALL LETTER I WITH DIAERESIS -> LATIN SMALL LETTER I */
+	0x6F, /* LATIN SMALL LETTER O WITH GRAVE - LATIN SMALL LETTER O WITH DIAERESIS -> LATIN SMALL LETTER O */
+	0x75, /* LATIN SMALL LETTER U WITH GRAVE - LATIN SMALL LETTER U WITH DIAERESIS -> LATIN SMALL LETTER U */
+	0x77, /* GREEK SMALL LETTER PSI - GREEK SMALL LETTER OMEGA -> LATIN SMALL LETTER W */
+	0x61, /* GREEK SMALL LETTER ALPHA WITH PSILI - GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI -> LATIN SMALL LETTER A */
+	0x65, /* GREEK SMALL LETTER EPSILON WITH PSILI - GREEK SMALL LETTER EPSILON WITH DASIA AND OXIA -> LATIN SMALL LETTER E */
+	0x6E, /* GREEK SMALL LETTER ETA WITH PSILI - GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI -> LATIN SMALL LETTER N */
+	0x77, /* GREEK SMALL LETTER OMEGA WITH PSILI - GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI -> LATIN SMALL LETTER W */
+	0x4F, /* GREEK CAPITAL LETTER OMEGA WITH PSILI - GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI -> LATIN CAPITAL LETTER O */
+	0x61, /* GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI - GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI -> LATIN SMALL LETTER A */
+	0x6E, /* GREEK SMALL LETTER ETA WITH PSILI AND YPOGEGRAMMENI - GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI -> LATIN SMALL LETTER N */
+	0x77, /* GREEK SMALL LETTER OMEGA WITH PSILI AND YPOGEGRAMMENI - GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI -> LATIN SMALL LETTER W */
+	0x4F, /* GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI - GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI -> LATIN CAPITAL LETTER O */
+	0x61, /* GREEK SMALL LETTER ALPHA WITH VRACHY - GREEK SMALL LETTER ALPHA WITH OXIA AND YPOGEGRAMMENI -> LATIN SMALL LETTER A */
+	0x61, /* GREEK SMALL LETTER ALPHA WITH PERISPOMENI - GREEK SMALL LETTER ALPHA WITH PERISPOMENI AND YPOGEGRAMMENI -> LATIN SMALL LETTER A */
+	0x6E, /* GREEK SMALL LETTER ETA WITH VARIA AND YPOGEGRAMMENI - GREEK SMALL LETTER ETA WITH OXIA AND YPOGEGRAMMENI -> LATIN SMALL LETTER N */
+	0x6E, /* GREEK SMALL LETTER ETA WITH PERISPOMENI - GREEK SMALL LETTER ETA WITH PERISPOMENI AND YPOGEGRAMMENI -> LATIN SMALL LETTER N */
+	0x70, /* GREEK SMALL LETTER RHO WITH PSILI - GREEK SMALL LETTER RHO WITH DASIA -> LATIN SMALL LETTER P */
+	0x77, /* GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI - GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI -> LATIN SMALL LETTER W */
+	0x77, /* GREEK SMALL LETTER OMEGA WITH PERISPOMENI - GREEK SMALL LETTER OMEGA WITH PERISPOMENI AND YPOGEGRAMMENI -> LATIN SMALL LETTER W */
+	0x20, /* EN QUAD - HAIR SPACE -> SPACE */
+	0x2D, /* HYPHEN - HORIZONTAL BAR -> HYPHEN-MINUS */
+	0x27, /* LEFT SINGLE QUOTATION MARK - RIGHT SINGLE QUOTATION MARK -> APOSTROPHE */
+	0x22, /* LEFT DOUBLE QUOTATION MARK - DOUBLE HIGH-REVERSED-9 QUOTATION MARK -> QUOTATION MARK */
+	0x2E, /* HORIZONTAL ELLIPSIS - HYPHENATION POINT -> FULL STOP */
+	0x3F, /* DOUBLE QUESTION MARK - QUESTION EXCLAMATION MARK -> QUESTION MARK */
+	0x21, /* LEFTWARDS ARROW WITH STROKE - RIGHTWARDS ARROW WITH STROKE -> EXCLAMATION MARK */
+	0x21, /* LEFTWARDS DOUBLE ARROW WITH STROKE - RIGHTWARDS DOUBLE ARROW WITH STROKE -> EXCLAMATION MARK */
+	0x3A, /* THEREFORE - BECAUSE -> COLON */
+	0x23, /* LESS-THAN BUT NOT EQUAL TO - GREATER-THAN BUT NOT EQUAL TO -> NUMBER SIGN */
+	0x21, /* NOT LESS-THAN - NOT GREATER-THAN -> EXCLAMATION MARK */
+	0x21, /* DOES NOT PRECEDE - DOES NOT SUCCEED -> EXCLAMATION MARK */
+	0x21, /* NOT A SUBSET OF - NOT A SUPERSET OF -> EXCLAMATION MARK */
+	0x23, /* SUBSET OF WITH NOT EQUAL TO - SUPERSET OF WITH NOT EQUAL TO -> NUMBER SIGN */
+	0x21, /* DOES NOT PROVE - DOES NOT FORCE -> EXCLAMATION MARK */
+	0x21, /* DOES NOT PRECEDE OR EQUAL - DOES NOT SUCCEED OR EQUAL -> EXCLAMATION MARK */
+	0x21, /* NOT NORMAL SUBGROUP OF - DOES NOT CONTAIN AS NORMAL SUBGROUP -> EXCLAMATION MARK */
+	0x7C, /* LEFT SQUARE BRACKET LOWER CORNER - RIGHT SQUARE BRACKET UPPER CORNER -> VERTICAL LINE */
+	0x7C, /* RIGHT SQUARE BRACKET LOWER CORNER - LEFT CURLY BRACKET UPPER HOOK -> VERTICAL LINE */
+	0x7C, /* LEFT VERTICAL BOX LINE - RIGHT VERTICAL BOX LINE -> VERTICAL LINE */
+	0x4C, /* DENTISTRY SYMBOL LIGHT VERTICAL AND TOP RIGHT - DENTISTRY SYMBOL LIGHT VERTICAL AND BOTTOM RIGHT -> LATIN CAPITAL LETTER L */
+	0x2D, /* BOX DRAWINGS LIGHT HORIZONTAL - BOX DRAWINGS HEAVY HORIZONTAL -> HYPHEN-MINUS */
+	0x7C, /* BOX DRAWINGS LIGHT VERTICAL - BOX DRAWINGS HEAVY VERTICAL -> VERTICAL LINE */
+	0x2B, /* BOX DRAWINGS LIGHT DOWN AND RIGHT - BOX DRAWINGS HEAVY VERTICAL AND HORIZONTAL -> PLUS SIGN */
+	0x2B, /* BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE - BOX DRAWINGS LIGHT ARC UP AND RIGHT -> PLUS SIGN */
+	0x23, /* UPPER HALF BLOCK - RIGHT HALF BLOCK -> NUMBER SIGN */
+	0x23, /* DARK SHADE - BLACK SQUARE -> NUMBER SIGN */
+	0x2E, /* BLACK SMALL SQUARE - WHITE SMALL SQUARE -> FULL STOP */
+	0x7C, /* BLACK VERTICAL RECTANGLE - WHITE VERTICAL RECTANGLE -> VERTICAL LINE */
+	0x5E, /* BLACK UP-POINTING TRIANGLE - WHITE UP-POINTING TRIANGLE -> CIRCUMFLEX ACCENT */
+	0x76, /* BLACK DOWN-POINTING TRIANGLE - WHITE DOWN-POINTING TRIANGLE -> LATIN SMALL LETTER V */
+	0x2A, /* BLACK STAR - WHITE STAR -> ASTERISK */
+	0x2A, /* STAR OF DAVID - WHITE FOUR POINTED STAR -> ASTERISK */
+	0x2A, /* STRESS OUTLINED WHITE STAR - HEAVY CHEVRON SNOWFLAKE -> ASTERISK */
+	0x2A, /* BALLOON-SPOKED ASTERISK - HEAVY EIGHT TEARDROP-SPOKED PROPELLER ASTERISK -> ASTERISK */
+};
+
+static const u16 ucs_fallback_singles[] = {
+	0x00A0, /* NO-BREAK SPACE -> SPACE */
+	0x00A1, /* INVERTED EXCLAMATION MARK -> EXCLAMATION MARK */
+	0x00A2, /* CENT SIGN -> LATIN SMALL LETTER C */
+	0x00A3, /* POUND SIGN -> LATIN CAPITAL LETTER L */
+	0x00A5, /* YEN SIGN -> LATIN CAPITAL LETTER Y */
+	0x00A6, /* BROKEN BAR -> VERTICAL LINE */
+	0x00A7, /* SECTION SIGN -> LATIN CAPITAL LETTER S */
+	0x00A9, /* COPYRIGHT SIGN -> LATIN CAPITAL LETTER C */
+	0x00AB, /* LEFT-POINTING DOUBLE ANGLE QUOTATION MARK -> LESS-THAN SIGN */
+	0x00AE, /* REGISTERED SIGN -> LATIN CAPITAL LETTER R */
+	0x00B0, /* DEGREE SIGN -> LATIN SMALL LETTER O */
+	0x00B1, /* PLUS-MINUS SIGN -> PLUS SIGN */
+	0x00B2, /* SUPERSCRIPT TWO -> DIGIT TWO */
+	0x00B3, /* SUPERSCRIPT THREE -> DIGIT THREE */
+	0x00B5, /* MICRO SIGN -> LATIN SMALL LETTER U */
+	0x00B6, /* PILCROW SIGN -> LATIN CAPITAL LETTER P */
+	0x00B7, /* MIDDLE DOT -> FULL STOP */
+	0x00B9, /* SUPERSCRIPT ONE -> DIGIT ONE */
+	0x00BB, /* RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK -> GREATER-THAN SIGN */
+	0x00BF, /* INVERTED QUESTION MARK -> QUESTION MARK */
+	0x00C6, /* LATIN CAPITAL LETTER AE -> LATIN CAPITAL LETTER E */
+	0x00C7, /* LATIN CAPITAL LETTER C WITH CEDILLA -> LATIN CAPITAL LETTER C */
+	0x00D0, /* LATIN CAPITAL LETTER ETH -> LATIN CAPITAL LETTER D */
+	0x00D1, /* LATIN CAPITAL LETTER N WITH TILDE -> LATIN CAPITAL LETTER N */
+	0x00D7, /* MULTIPLICATION SIGN -> LATIN SMALL LETTER X */
+	0x00D8, /* LATIN CAPITAL LETTER O WITH STROKE -> LATIN CAPITAL LETTER O */
+	0x00DD, /* LATIN CAPITAL LETTER Y WITH ACUTE -> LATIN CAPITAL LETTER Y */
+	0x00DE, /* LATIN CAPITAL LETTER THORN -> LATIN CAPITAL LETTER P */
+	0x00DF, /* LATIN SMALL LETTER SHARP S -> LATIN SMALL LETTER S */
+	0x00E6, /* LATIN SMALL LETTER AE -> LATIN SMALL LETTER E */
+	0x00E7, /* LATIN SMALL LETTER C WITH CEDILLA -> LATIN SMALL LETTER C */
+	0x00F0, /* LATIN SMALL LETTER ETH -> LATIN SMALL LETTER D */
+	0x00F1, /* LATIN SMALL LETTER N WITH TILDE -> LATIN SMALL LETTER N */
+	0x00F7, /* DIVISION SIGN -> SOLIDUS */
+	0x00F8, /* LATIN SMALL LETTER O WITH STROKE -> LATIN SMALL LETTER O */
+	0x00FD, /* LATIN SMALL LETTER Y WITH ACUTE -> LATIN SMALL LETTER Y */
+	0x00FE, /* LATIN SMALL LETTER THORN -> LATIN SMALL LETTER P */
+	0x00FF, /* LATIN SMALL LETTER Y WITH DIAERESIS -> LATIN SMALL LETTER Y */
+	0x0100, /* LATIN CAPITAL LETTER A WITH MACRON -> LATIN CAPITAL LETTER A */
+	0x0101, /* LATIN SMALL LETTER A WITH MACRON -> LATIN SMALL LETTER A */
+	0x0102, /* LATIN CAPITAL LETTER A WITH BREVE -> LATIN CAPITAL LETTER A */
+	0x0103, /* LATIN SMALL LETTER A WITH BREVE -> LATIN SMALL LETTER A */
+	0x0104, /* LATIN CAPITAL LETTER A WITH OGONEK -> LATIN CAPITAL LETTER A */
+	0x0105, /* LATIN SMALL LETTER A WITH OGONEK -> LATIN SMALL LETTER A */
+	0x0106, /* LATIN CAPITAL LETTER C WITH ACUTE -> LATIN CAPITAL LETTER C */
+	0x0107, /* LATIN SMALL LETTER C WITH ACUTE -> LATIN SMALL LETTER C */
+	0x0108, /* LATIN CAPITAL LETTER C WITH CIRCUMFLEX -> LATIN CAPITAL LETTER C */
+	0x0109, /* LATIN SMALL LETTER C WITH CIRCUMFLEX -> LATIN SMALL LETTER C */
+	0x010A, /* LATIN CAPITAL LETTER C WITH DOT ABOVE -> LATIN CAPITAL LETTER C */
+	0x010B, /* LATIN SMALL LETTER C WITH DOT ABOVE -> LATIN SMALL LETTER C */
+	0x010C, /* LATIN CAPITAL LETTER C WITH CARON -> LATIN CAPITAL LETTER C */
+	0x010D, /* LATIN SMALL LETTER C WITH CARON -> LATIN SMALL LETTER C */
+	0x010E, /* LATIN CAPITAL LETTER D WITH CARON -> LATIN CAPITAL LETTER D */
+	0x010F, /* LATIN SMALL LETTER D WITH CARON -> LATIN SMALL LETTER D */
+	0x0110, /* LATIN CAPITAL LETTER D WITH STROKE -> LATIN CAPITAL LETTER D */
+	0x0111, /* LATIN SMALL LETTER D WITH STROKE -> LATIN SMALL LETTER D */
+	0x0112, /* LATIN CAPITAL LETTER E WITH MACRON -> LATIN CAPITAL LETTER E */
+	0x0113, /* LATIN SMALL LETTER E WITH MACRON -> LATIN SMALL LETTER E */
+	0x0114, /* LATIN CAPITAL LETTER E WITH BREVE -> LATIN CAPITAL LETTER E */
+	0x0115, /* LATIN SMALL LETTER E WITH BREVE -> LATIN SMALL LETTER E */
+	0x0116, /* LATIN CAPITAL LETTER E WITH DOT ABOVE -> LATIN CAPITAL LETTER E */
+	0x0117, /* LATIN SMALL LETTER E WITH DOT ABOVE -> LATIN SMALL LETTER E */
+	0x0118, /* LATIN CAPITAL LETTER E WITH OGONEK -> LATIN CAPITAL LETTER E */
+	0x0119, /* LATIN SMALL LETTER E WITH OGONEK -> LATIN SMALL LETTER E */
+	0x011A, /* LATIN CAPITAL LETTER E WITH CARON -> LATIN CAPITAL LETTER E */
+	0x011B, /* LATIN SMALL LETTER E WITH CARON -> LATIN SMALL LETTER E */
+	0x011C, /* LATIN CAPITAL LETTER G WITH CIRCUMFLEX -> LATIN CAPITAL LETTER G */
+	0x011D, /* LATIN SMALL LETTER G WITH CIRCUMFLEX -> LATIN SMALL LETTER G */
+	0x011E, /* LATIN CAPITAL LETTER G WITH BREVE -> LATIN CAPITAL LETTER G */
+	0x011F, /* LATIN SMALL LETTER G WITH BREVE -> LATIN SMALL LETTER G */
+	0x0120, /* LATIN CAPITAL LETTER G WITH DOT ABOVE -> LATIN CAPITAL LETTER G */
+	0x0121, /* LATIN SMALL LETTER G WITH DOT ABOVE -> LATIN SMALL LETTER G */
+	0x0122, /* LATIN CAPITAL LETTER G WITH CEDILLA -> LATIN CAPITAL LETTER G */
+	0x0123, /* LATIN SMALL LETTER G WITH CEDILLA -> LATIN SMALL LETTER G */
+	0x0124, /* LATIN CAPITAL LETTER H WITH CIRCUMFLEX -> LATIN CAPITAL LETTER H */
+	0x0125, /* LATIN SMALL LETTER H WITH CIRCUMFLEX -> LATIN SMALL LETTER H */
+	0x0126, /* LATIN CAPITAL LETTER H WITH STROKE -> LATIN CAPITAL LETTER H */
+	0x0127, /* LATIN SMALL LETTER H WITH STROKE -> LATIN SMALL LETTER H */
+	0x0128, /* LATIN CAPITAL LETTER I WITH TILDE -> LATIN CAPITAL LETTER I */
+	0x0129, /* LATIN SMALL LETTER I WITH TILDE -> LATIN SMALL LETTER I */
+	0x012A, /* LATIN CAPITAL LETTER I WITH MACRON -> LATIN CAPITAL LETTER I */
+	0x012B, /* LATIN SMALL LETTER I WITH MACRON -> LATIN SMALL LETTER I */
+	0x012C, /* LATIN CAPITAL LETTER I WITH BREVE -> LATIN CAPITAL LETTER I */
+	0x012D, /* LATIN SMALL LETTER I WITH BREVE -> LATIN SMALL LETTER I */
+	0x012E, /* LATIN CAPITAL LETTER I WITH OGONEK -> LATIN CAPITAL LETTER I */
+	0x012F, /* LATIN SMALL LETTER I WITH OGONEK -> LATIN SMALL LETTER I */
+	0x0130, /* LATIN CAPITAL LETTER I WITH DOT ABOVE -> LATIN CAPITAL LETTER I */
+	0x0134, /* LATIN CAPITAL LETTER J WITH CIRCUMFLEX -> LATIN CAPITAL LETTER J */
+	0x0135, /* LATIN SMALL LETTER J WITH CIRCUMFLEX -> LATIN SMALL LETTER J */
+	0x0136, /* LATIN CAPITAL LETTER K WITH CEDILLA -> LATIN CAPITAL LETTER K */
+	0x0137, /* LATIN SMALL LETTER K WITH CEDILLA -> LATIN SMALL LETTER K */
+	0x0139, /* LATIN CAPITAL LETTER L WITH ACUTE -> LATIN CAPITAL LETTER L */
+	0x013A, /* LATIN SMALL LETTER L WITH ACUTE -> LATIN SMALL LETTER L */
+	0x013B, /* LATIN CAPITAL LETTER L WITH CEDILLA -> LATIN CAPITAL LETTER L */
+	0x013C, /* LATIN SMALL LETTER L WITH CEDILLA -> LATIN SMALL LETTER L */
+	0x013D, /* LATIN CAPITAL LETTER L WITH CARON -> LATIN CAPITAL LETTER L */
+	0x013E, /* LATIN SMALL LETTER L WITH CARON -> LATIN SMALL LETTER L */
+	0x0141, /* LATIN CAPITAL LETTER L WITH STROKE -> LATIN CAPITAL LETTER L */
+	0x0142, /* LATIN SMALL LETTER L WITH STROKE -> LATIN SMALL LETTER L */
+	0x0143, /* LATIN CAPITAL LETTER N WITH ACUTE -> LATIN CAPITAL LETTER N */
+	0x0144, /* LATIN SMALL LETTER N WITH ACUTE -> LATIN SMALL LETTER N */
+	0x0145, /* LATIN CAPITAL LETTER N WITH CEDILLA -> LATIN CAPITAL LETTER N */
+	0x0146, /* LATIN SMALL LETTER N WITH CEDILLA -> LATIN SMALL LETTER N */
+	0x0147, /* LATIN CAPITAL LETTER N WITH CARON -> LATIN CAPITAL LETTER N */
+	0x0148, /* LATIN SMALL LETTER N WITH CARON -> LATIN SMALL LETTER N */
+	0x014C, /* LATIN CAPITAL LETTER O WITH MACRON -> LATIN CAPITAL LETTER O */
+	0x014D, /* LATIN SMALL LETTER O WITH MACRON -> LATIN SMALL LETTER O */
+	0x014E, /* LATIN CAPITAL LETTER O WITH BREVE -> LATIN CAPITAL LETTER O */
+	0x014F, /* LATIN SMALL LETTER O WITH BREVE -> LATIN SMALL LETTER O */
+	0x0150, /* LATIN CAPITAL LETTER O WITH DOUBLE ACUTE -> LATIN CAPITAL LETTER O */
+	0x0151, /* LATIN SMALL LETTER O WITH DOUBLE ACUTE -> LATIN SMALL LETTER O */
+	0x0152, /* LATIN CAPITAL LIGATURE OE -> LATIN CAPITAL LETTER E */
+	0x0153, /* LATIN SMALL LIGATURE OE -> LATIN SMALL LETTER E */
+	0x0154, /* LATIN CAPITAL LETTER R WITH ACUTE -> LATIN CAPITAL LETTER R */
+	0x0155, /* LATIN SMALL LETTER R WITH ACUTE -> LATIN SMALL LETTER R */
+	0x0156, /* LATIN CAPITAL LETTER R WITH CEDILLA -> LATIN CAPITAL LETTER R */
+	0x0157, /* LATIN SMALL LETTER R WITH CEDILLA -> LATIN SMALL LETTER R */
+	0x0158, /* LATIN CAPITAL LETTER R WITH CARON -> LATIN CAPITAL LETTER R */
+	0x0159, /* LATIN SMALL LETTER R WITH CARON -> LATIN SMALL LETTER R */
+	0x015A, /* LATIN CAPITAL LETTER S WITH ACUTE -> LATIN CAPITAL LETTER S */
+	0x015B, /* LATIN SMALL LETTER S WITH ACUTE -> LATIN SMALL LETTER S */
+	0x015C, /* LATIN CAPITAL LETTER S WITH CIRCUMFLEX -> LATIN CAPITAL LETTER S */
+	0x015D, /* LATIN SMALL LETTER S WITH CIRCUMFLEX -> LATIN SMALL LETTER S */
+	0x015E, /* LATIN CAPITAL LETTER S WITH CEDILLA -> LATIN CAPITAL LETTER S */
+	0x015F, /* LATIN SMALL LETTER S WITH CEDILLA -> LATIN SMALL LETTER S */
+	0x0160, /* LATIN CAPITAL LETTER S WITH CARON -> LATIN CAPITAL LETTER S */
+	0x0161, /* LATIN SMALL LETTER S WITH CARON -> LATIN SMALL LETTER S */
+	0x0162, /* LATIN CAPITAL LETTER T WITH CEDILLA -> LATIN CAPITAL LETTER T */
+	0x0163, /* LATIN SMALL LETTER T WITH CEDILLA -> LATIN SMALL LETTER T */
+	0x0164, /* LATIN CAPITAL LETTER T WITH CARON -> LATIN CAPITAL LETTER T */
+	0x0165, /* LATIN SMALL LETTER T WITH CARON -> LATIN SMALL LETTER T */
+	0x0168, /* LATIN CAPITAL LETTER U WITH TILDE -> LATIN CAPITAL LETTER U */
+	0x0169, /* LATIN SMALL LETTER U WITH TILDE -> LATIN SMALL LETTER U */
+	0x016A, /* LATIN CAPITAL LETTER U WITH MACRON -> LATIN CAPITAL LETTER U */
+	0x016B, /* LATIN SMALL LETTER U WITH MACRON -> LATIN SMALL LETTER U */
+	0x016C, /* LATIN CAPITAL LETTER U WITH BREVE -> LATIN CAPITAL LETTER U */
+	0x016D, /* LATIN SMALL LETTER U WITH BREVE -> LATIN SMALL LETTER U */
+	0x016E, /* LATIN CAPITAL LETTER U WITH RING ABOVE -> LATIN CAPITAL LETTER U */
+	0x016F, /* LATIN SMALL LETTER U WITH RING ABOVE -> LATIN SMALL LETTER U */
+	0x0170, /* LATIN CAPITAL LETTER U WITH DOUBLE ACUTE -> LATIN CAPITAL LETTER U */
+	0x0171, /* LATIN SMALL LETTER U WITH DOUBLE ACUTE -> LATIN SMALL LETTER U */
+	0x0172, /* LATIN CAPITAL LETTER U WITH OGONEK -> LATIN CAPITAL LETTER U */
+	0x0173, /* LATIN SMALL LETTER U WITH OGONEK -> LATIN SMALL LETTER U */
+	0x0174, /* LATIN CAPITAL LETTER W WITH CIRCUMFLEX -> LATIN CAPITAL LETTER W */
+	0x0175, /* LATIN SMALL LETTER W WITH CIRCUMFLEX -> LATIN SMALL LETTER W */
+	0x0176, /* LATIN CAPITAL LETTER Y WITH CIRCUMFLEX -> LATIN CAPITAL LETTER Y */
+	0x0177, /* LATIN SMALL LETTER Y WITH CIRCUMFLEX -> LATIN SMALL LETTER Y */
+	0x0178, /* LATIN CAPITAL LETTER Y WITH DIAERESIS -> LATIN CAPITAL LETTER Y */
+	0x0179, /* LATIN CAPITAL LETTER Z WITH ACUTE -> LATIN CAPITAL LETTER Z */
+	0x017A, /* LATIN SMALL LETTER Z WITH ACUTE -> LATIN SMALL LETTER Z */
+	0x017B, /* LATIN CAPITAL LETTER Z WITH DOT ABOVE -> LATIN CAPITAL LETTER Z */
+	0x017C, /* LATIN SMALL LETTER Z WITH DOT ABOVE -> LATIN SMALL LETTER Z */
+	0x017D, /* LATIN CAPITAL LETTER Z WITH CARON -> LATIN CAPITAL LETTER Z */
+	0x017E, /* LATIN SMALL LETTER Z WITH CARON -> LATIN SMALL LETTER Z */
+	0x01A0, /* LATIN CAPITAL LETTER O WITH HORN -> LATIN CAPITAL LETTER O */
+	0x01A1, /* LATIN SMALL LETTER O WITH HORN -> LATIN SMALL LETTER O */
+	0x01AF, /* LATIN CAPITAL LETTER U WITH HORN -> LATIN CAPITAL LETTER U */
+	0x01B0, /* LATIN SMALL LETTER U WITH HORN -> LATIN SMALL LETTER U */
+	0x01CD, /* LATIN CAPITAL LETTER A WITH CARON -> LATIN CAPITAL LETTER A */
+	0x01CE, /* LATIN SMALL LETTER A WITH CARON -> LATIN SMALL LETTER A */
+	0x01CF, /* LATIN CAPITAL LETTER I WITH CARON -> LATIN CAPITAL LETTER I */
+	0x01D0, /* LATIN SMALL LETTER I WITH CARON -> LATIN SMALL LETTER I */
+	0x01D1, /* LATIN CAPITAL LETTER O WITH CARON -> LATIN CAPITAL LETTER O */
+	0x01D2, /* LATIN SMALL LETTER O WITH CARON -> LATIN SMALL LETTER O */
+	0x01D3, /* LATIN CAPITAL LETTER U WITH CARON -> LATIN CAPITAL LETTER U */
+	0x01D4, /* LATIN SMALL LETTER U WITH CARON -> LATIN SMALL LETTER U */
+	0x01D5, /* LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON -> LATIN CAPITAL LETTER U WITH DIAERESIS */
+	0x01D6, /* LATIN SMALL LETTER U WITH DIAERESIS AND MACRON -> LATIN SMALL LETTER U WITH DIAERESIS */
+	0x01D7, /* LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE -> LATIN CAPITAL LETTER U WITH DIAERESIS */
+	0x01D8, /* LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE -> LATIN SMALL LETTER U WITH DIAERESIS */
+	0x01D9, /* LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON -> LATIN CAPITAL LETTER U WITH DIAERESIS */
+	0x01DA, /* LATIN SMALL LETTER U WITH DIAERESIS AND CARON -> LATIN SMALL LETTER U WITH DIAERESIS */
+	0x01DB, /* LATIN CAPITAL LETTER U WITH DIAERESIS AND GRAVE -> LATIN CAPITAL LETTER U WITH DIAERESIS */
+	0x01DC, /* LATIN SMALL LETTER U WITH DIAERESIS AND GRAVE -> LATIN SMALL LETTER U WITH DIAERESIS */
+	0x01DE, /* LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON -> LATIN CAPITAL LETTER A WITH DIAERESIS */
+	0x01DF, /* LATIN SMALL LETTER A WITH DIAERESIS AND MACRON -> LATIN SMALL LETTER A WITH DIAERESIS */
+	0x01E0, /* LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON -> LATIN CAPITAL LETTER A */
+	0x01E1, /* LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON -> LATIN SMALL LETTER A */
+	0x01E2, /* LATIN CAPITAL LETTER AE WITH MACRON -> LATIN CAPITAL LETTER AE */
+	0x01E3, /* LATIN SMALL LETTER AE WITH MACRON -> LATIN SMALL LETTER AE */
+	0x01E6, /* LATIN CAPITAL LETTER G WITH CARON -> LATIN CAPITAL LETTER G */
+	0x01E7, /* LATIN SMALL LETTER G WITH CARON -> LATIN SMALL LETTER G */
+	0x01E8, /* LATIN CAPITAL LETTER K WITH CARON -> LATIN CAPITAL LETTER K */
+	0x01E9, /* LATIN SMALL LETTER K WITH CARON -> LATIN SMALL LETTER K */
+	0x01EA, /* LATIN CAPITAL LETTER O WITH OGONEK -> LATIN CAPITAL LETTER O */
+	0x01EB, /* LATIN SMALL LETTER O WITH OGONEK -> LATIN SMALL LETTER O */
+	0x01EC, /* LATIN CAPITAL LETTER O WITH OGONEK AND MACRON -> LATIN CAPITAL LETTER O */
+	0x01ED, /* LATIN SMALL LETTER O WITH OGONEK AND MACRON -> LATIN SMALL LETTER O */
+	0x01F0, /* LATIN SMALL LETTER J WITH CARON -> LATIN SMALL LETTER J */
+	0x01F4, /* LATIN CAPITAL LETTER G WITH ACUTE -> LATIN CAPITAL LETTER G */
+	0x01F5, /* LATIN SMALL LETTER G WITH ACUTE -> LATIN SMALL LETTER G */
+	0x01F8, /* LATIN CAPITAL LETTER N WITH GRAVE -> LATIN CAPITAL LETTER N */
+	0x01F9, /* LATIN SMALL LETTER N WITH GRAVE -> LATIN SMALL LETTER N */
+	0x01FA, /* LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE -> LATIN CAPITAL LETTER A WITH RING ABOVE */
+	0x01FB, /* LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE -> LATIN SMALL LETTER A WITH RING ABOVE */
+	0x01FC, /* LATIN CAPITAL LETTER AE WITH ACUTE -> LATIN CAPITAL LETTER AE */
+	0x01FD, /* LATIN SMALL LETTER AE WITH ACUTE -> LATIN SMALL LETTER AE */
+	0x01FE, /* LATIN CAPITAL LETTER O WITH STROKE AND ACUTE -> LATIN CAPITAL LETTER O WITH STROKE */
+	0x01FF, /* LATIN SMALL LETTER O WITH STROKE AND ACUTE -> LATIN SMALL LETTER O WITH STROKE */
+	0x0200, /* LATIN CAPITAL LETTER A WITH DOUBLE GRAVE -> LATIN CAPITAL LETTER A */
+	0x0201, /* LATIN SMALL LETTER A WITH DOUBLE GRAVE -> LATIN SMALL LETTER A */
+	0x0202, /* LATIN CAPITAL LETTER A WITH INVERTED BREVE -> LATIN CAPITAL LETTER A */
+	0x0203, /* LATIN SMALL LETTER A WITH INVERTED BREVE -> LATIN SMALL LETTER A */
+	0x0204, /* LATIN CAPITAL LETTER E WITH DOUBLE GRAVE -> LATIN CAPITAL LETTER E */
+	0x0205, /* LATIN SMALL LETTER E WITH DOUBLE GRAVE -> LATIN SMALL LETTER E */
+	0x0206, /* LATIN CAPITAL LETTER E WITH INVERTED BREVE -> LATIN CAPITAL LETTER E */
+	0x0207, /* LATIN SMALL LETTER E WITH INVERTED BREVE -> LATIN SMALL LETTER E */
+	0x0208, /* LATIN CAPITAL LETTER I WITH DOUBLE GRAVE -> LATIN CAPITAL LETTER I */
+	0x0209, /* LATIN SMALL LETTER I WITH DOUBLE GRAVE -> LATIN SMALL LETTER I */
+	0x020A, /* LATIN CAPITAL LETTER I WITH INVERTED BREVE -> LATIN CAPITAL LETTER I */
+	0x020B, /* LATIN SMALL LETTER I WITH INVERTED BREVE -> LATIN SMALL LETTER I */
+	0x020C, /* LATIN CAPITAL LETTER O WITH DOUBLE GRAVE -> LATIN CAPITAL LETTER O */
+	0x020D, /* LATIN SMALL LETTER O WITH DOUBLE GRAVE -> LATIN SMALL LETTER O */
+	0x020E, /* LATIN CAPITAL LETTER O WITH INVERTED BREVE -> LATIN CAPITAL LETTER O */
+	0x020F, /* LATIN SMALL LETTER O WITH INVERTED BREVE -> LATIN SMALL LETTER O */
+	0x0210, /* LATIN CAPITAL LETTER R WITH DOUBLE GRAVE -> LATIN CAPITAL LETTER R */
+	0x0211, /* LATIN SMALL LETTER R WITH DOUBLE GRAVE -> LATIN SMALL LETTER R */
+	0x0212, /* LATIN CAPITAL LETTER R WITH INVERTED BREVE -> LATIN CAPITAL LETTER R */
+	0x0213, /* LATIN SMALL LETTER R WITH INVERTED BREVE -> LATIN SMALL LETTER R */
+	0x0214, /* LATIN CAPITAL LETTER U WITH DOUBLE GRAVE -> LATIN CAPITAL LETTER U */
+	0x0215, /* LATIN SMALL LETTER U WITH DOUBLE GRAVE -> LATIN SMALL LETTER U */
+	0x0216, /* LATIN CAPITAL LETTER U WITH INVERTED BREVE -> LATIN CAPITAL LETTER U */
+	0x0217, /* LATIN SMALL LETTER U WITH INVERTED BREVE -> LATIN SMALL LETTER U */
+	0x0218, /* LATIN CAPITAL LETTER S WITH COMMA BELOW -> LATIN CAPITAL LETTER S */
+	0x0219, /* LATIN SMALL LETTER S WITH COMMA BELOW -> LATIN SMALL LETTER S */
+	0x021A, /* LATIN CAPITAL LETTER T WITH COMMA BELOW -> LATIN CAPITAL LETTER T */
+	0x021B, /* LATIN SMALL LETTER T WITH COMMA BELOW -> LATIN SMALL LETTER T */
+	0x021E, /* LATIN CAPITAL LETTER H WITH CARON -> LATIN CAPITAL LETTER H */
+	0x021F, /* LATIN SMALL LETTER H WITH CARON -> LATIN SMALL LETTER H */
+	0x0226, /* LATIN CAPITAL LETTER A WITH DOT ABOVE -> LATIN CAPITAL LETTER A */
+	0x0227, /* LATIN SMALL LETTER A WITH DOT ABOVE -> LATIN SMALL LETTER A */
+	0x0228, /* LATIN CAPITAL LETTER E WITH CEDILLA -> LATIN CAPITAL LETTER E */
+	0x0229, /* LATIN SMALL LETTER E WITH CEDILLA -> LATIN SMALL LETTER E */
+	0x022A, /* LATIN CAPITAL LETTER O WITH DIAERESIS AND MACRON -> LATIN CAPITAL LETTER O WITH DIAERESIS */
+	0x022B, /* LATIN SMALL LETTER O WITH DIAERESIS AND MACRON -> LATIN SMALL LETTER O WITH DIAERESIS */
+	0x022C, /* LATIN CAPITAL LETTER O WITH TILDE AND MACRON -> LATIN CAPITAL LETTER O WITH TILDE */
+	0x022D, /* LATIN SMALL LETTER O WITH TILDE AND MACRON -> LATIN SMALL LETTER O WITH TILDE */
+	0x022E, /* LATIN CAPITAL LETTER O WITH DOT ABOVE -> LATIN CAPITAL LETTER O */
+	0x022F, /* LATIN SMALL LETTER O WITH DOT ABOVE -> LATIN SMALL LETTER O */
+	0x0230, /* LATIN CAPITAL LETTER O WITH DOT ABOVE AND MACRON -> LATIN CAPITAL LETTER O */
+	0x0231, /* LATIN SMALL LETTER O WITH DOT ABOVE AND MACRON -> LATIN SMALL LETTER O */
+	0x0232, /* LATIN CAPITAL LETTER Y WITH MACRON -> LATIN CAPITAL LETTER Y */
+	0x0233, /* LATIN SMALL LETTER Y WITH MACRON -> LATIN SMALL LETTER Y */
+	0x0385, /* GREEK DIALYTIKA TONOS -> DIAERESIS */
+	0x0387, /* GREEK ANO TELEIA -> FULL STOP */
+	0x038F, /* GREEK CAPITAL LETTER OMEGA WITH TONOS -> LATIN CAPITAL LETTER O */
+	0x0393, /* GREEK CAPITAL LETTER GAMMA -> LATIN CAPITAL LETTER I */
+	0x0394, /* GREEK CAPITAL LETTER DELTA -> LATIN CAPITAL LETTER A */
+	0x0398, /* GREEK CAPITAL LETTER THETA -> LATIN CAPITAL LETTER O */
+	0x039B, /* GREEK CAPITAL LETTER LAMDA -> LATIN CAPITAL LETTER A */
+	0x03A0, /* GREEK CAPITAL LETTER PI -> LATIN SMALL LETTER N */
+	0x03A3, /* GREEK CAPITAL LETTER SIGMA -> LATIN CAPITAL LETTER E */
+	0x03A6, /* GREEK CAPITAL LETTER PHI -> LATIN CAPITAL LETTER O */
+	0x03A8, /* GREEK CAPITAL LETTER PSI -> LATIN CAPITAL LETTER Y */
+	0x03A9, /* GREEK CAPITAL LETTER OMEGA -> LATIN CAPITAL LETTER O */
+	0x03AC, /* GREEK SMALL LETTER ALPHA WITH TONOS -> LATIN SMALL LETTER A */
+	0x03AD, /* GREEK SMALL LETTER EPSILON WITH TONOS -> LATIN SMALL LETTER E */
+	0x03AE, /* GREEK SMALL LETTER ETA WITH TONOS -> LATIN SMALL LETTER N */
+	0x03B1, /* GREEK SMALL LETTER ALPHA -> LATIN SMALL LETTER A */
+	0x03B2, /* GREEK SMALL LETTER BETA -> LATIN CAPITAL LETTER B */
+	0x03B3, /* GREEK SMALL LETTER GAMMA -> LATIN SMALL LETTER Y */
+	0x03B4, /* GREEK SMALL LETTER DELTA -> LATIN SMALL LETTER D */
+	0x03B5, /* GREEK SMALL LETTER EPSILON -> LATIN SMALL LETTER E */
+	0x03B6, /* GREEK SMALL LETTER ZETA -> LATIN SMALL LETTER Z */
+	0x03B7, /* GREEK SMALL LETTER ETA -> LATIN SMALL LETTER N */
+	0x03B8, /* GREEK SMALL LETTER THETA -> DIGIT ZERO */
+	0x03BB, /* GREEK SMALL LETTER LAMDA -> LATIN SMALL LETTER L */
+	0x03BC, /* GREEK SMALL LETTER MU -> LATIN SMALL LETTER U */
+	0x03C0, /* GREEK SMALL LETTER PI -> LATIN SMALL LETTER N */
+	0x03C1, /* GREEK SMALL LETTER RHO -> LATIN SMALL LETTER P */
+	0x03C3, /* GREEK SMALL LETTER SIGMA -> LATIN SMALL LETTER O */
+	0x03C4, /* GREEK SMALL LETTER TAU -> LATIN SMALL LETTER T */
+	0x03C6, /* GREEK SMALL LETTER PHI -> LATIN SMALL LETTER F */
+	0x03C7, /* GREEK SMALL LETTER CHI -> LATIN CAPITAL LETTER X */
+	0x03CE, /* GREEK SMALL LETTER OMEGA WITH TONOS -> LATIN SMALL LETTER W */
+	0x1680, /* OGHAM SPACE MARK -> SPACE */
+	0x1E00, /* LATIN CAPITAL LETTER A WITH RING BELOW -> LATIN CAPITAL LETTER A */
+	0x1E01, /* LATIN SMALL LETTER A WITH RING BELOW -> LATIN SMALL LETTER A */
+	0x1E02, /* LATIN CAPITAL LETTER B WITH DOT ABOVE -> LATIN CAPITAL LETTER B */
+	0x1E03, /* LATIN SMALL LETTER B WITH DOT ABOVE -> LATIN SMALL LETTER B */
+	0x1E04, /* LATIN CAPITAL LETTER B WITH DOT BELOW -> LATIN CAPITAL LETTER B */
+	0x1E05, /* LATIN SMALL LETTER B WITH DOT BELOW -> LATIN SMALL LETTER B */
+	0x1E06, /* LATIN CAPITAL LETTER B WITH LINE BELOW -> LATIN CAPITAL LETTER B */
+	0x1E07, /* LATIN SMALL LETTER B WITH LINE BELOW -> LATIN SMALL LETTER B */
+	0x1E08, /* LATIN CAPITAL LETTER C WITH CEDILLA AND ACUTE -> LATIN CAPITAL LETTER C WITH CEDILLA */
+	0x1E09, /* LATIN SMALL LETTER C WITH CEDILLA AND ACUTE -> LATIN SMALL LETTER C WITH CEDILLA */
+	0x1E0A, /* LATIN CAPITAL LETTER D WITH DOT ABOVE -> LATIN CAPITAL LETTER D */
+	0x1E0B, /* LATIN SMALL LETTER D WITH DOT ABOVE -> LATIN SMALL LETTER D */
+	0x1E0C, /* LATIN CAPITAL LETTER D WITH DOT BELOW -> LATIN CAPITAL LETTER D */
+	0x1E0D, /* LATIN SMALL LETTER D WITH DOT BELOW -> LATIN SMALL LETTER D */
+	0x1E0E, /* LATIN CAPITAL LETTER D WITH LINE BELOW -> LATIN CAPITAL LETTER D */
+	0x1E0F, /* LATIN SMALL LETTER D WITH LINE BELOW -> LATIN SMALL LETTER D */
+	0x1E10, /* LATIN CAPITAL LETTER D WITH CEDILLA -> LATIN CAPITAL LETTER D */
+	0x1E11, /* LATIN SMALL LETTER D WITH CEDILLA -> LATIN SMALL LETTER D */
+	0x1E12, /* LATIN CAPITAL LETTER D WITH CIRCUMFLEX BELOW -> LATIN CAPITAL LETTER D */
+	0x1E13, /* LATIN SMALL LETTER D WITH CIRCUMFLEX BELOW -> LATIN SMALL LETTER D */
+	0x1E14, /* LATIN CAPITAL LETTER E WITH MACRON AND GRAVE -> LATIN CAPITAL LETTER E */
+	0x1E15, /* LATIN SMALL LETTER E WITH MACRON AND GRAVE -> LATIN SMALL LETTER E */
+	0x1E16, /* LATIN CAPITAL LETTER E WITH MACRON AND ACUTE -> LATIN CAPITAL LETTER E */
+	0x1E17, /* LATIN SMALL LETTER E WITH MACRON AND ACUTE -> LATIN SMALL LETTER E */
+	0x1E18, /* LATIN CAPITAL LETTER E WITH CIRCUMFLEX BELOW -> LATIN CAPITAL LETTER E */
+	0x1E19, /* LATIN SMALL LETTER E WITH CIRCUMFLEX BELOW -> LATIN SMALL LETTER E */
+	0x1E1A, /* LATIN CAPITAL LETTER E WITH TILDE BELOW -> LATIN CAPITAL LETTER E */
+	0x1E1B, /* LATIN SMALL LETTER E WITH TILDE BELOW -> LATIN SMALL LETTER E */
+	0x1E1C, /* LATIN CAPITAL LETTER E WITH CEDILLA AND BREVE -> LATIN CAPITAL LETTER E */
+	0x1E1D, /* LATIN SMALL LETTER E WITH CEDILLA AND BREVE -> LATIN SMALL LETTER E */
+	0x1E1E, /* LATIN CAPITAL LETTER F WITH DOT ABOVE -> LATIN CAPITAL LETTER F */
+	0x1E1F, /* LATIN SMALL LETTER F WITH DOT ABOVE -> LATIN SMALL LETTER F */
+	0x1E20, /* LATIN CAPITAL LETTER G WITH MACRON -> LATIN CAPITAL LETTER G */
+	0x1E21, /* LATIN SMALL LETTER G WITH MACRON -> LATIN SMALL LETTER G */
+	0x1E22, /* LATIN CAPITAL LETTER H WITH DOT ABOVE -> LATIN CAPITAL LETTER H */
+	0x1E23, /* LATIN SMALL LETTER H WITH DOT ABOVE -> LATIN SMALL LETTER H */
+	0x1E24, /* LATIN CAPITAL LETTER H WITH DOT BELOW -> LATIN CAPITAL LETTER H */
+	0x1E25, /* LATIN SMALL LETTER H WITH DOT BELOW -> LATIN SMALL LETTER H */
+	0x1E26, /* LATIN CAPITAL LETTER H WITH DIAERESIS -> LATIN CAPITAL LETTER H */
+	0x1E27, /* LATIN SMALL LETTER H WITH DIAERESIS -> LATIN SMALL LETTER H */
+	0x1E28, /* LATIN CAPITAL LETTER H WITH CEDILLA -> LATIN CAPITAL LETTER H */
+	0x1E29, /* LATIN SMALL LETTER H WITH CEDILLA -> LATIN SMALL LETTER H */
+	0x1E2A, /* LATIN CAPITAL LETTER H WITH BREVE BELOW -> LATIN CAPITAL LETTER H */
+	0x1E2B, /* LATIN SMALL LETTER H WITH BREVE BELOW -> LATIN SMALL LETTER H */
+	0x1E2C, /* LATIN CAPITAL LETTER I WITH TILDE BELOW -> LATIN CAPITAL LETTER I */
+	0x1E2D, /* LATIN SMALL LETTER I WITH TILDE BELOW -> LATIN SMALL LETTER I */
+	0x1E2E, /* LATIN CAPITAL LETTER I WITH DIAERESIS AND ACUTE -> LATIN CAPITAL LETTER I WITH DIAERESIS */
+	0x1E2F, /* LATIN SMALL LETTER I WITH DIAERESIS AND ACUTE -> LATIN SMALL LETTER I WITH DIAERESIS */
+	0x1E30, /* LATIN CAPITAL LETTER K WITH ACUTE -> LATIN CAPITAL LETTER K */
+	0x1E31, /* LATIN SMALL LETTER K WITH ACUTE -> LATIN SMALL LETTER K */
+	0x1E32, /* LATIN CAPITAL LETTER K WITH DOT BELOW -> LATIN CAPITAL LETTER K */
+	0x1E33, /* LATIN SMALL LETTER K WITH DOT BELOW -> LATIN SMALL LETTER K */
+	0x1E34, /* LATIN CAPITAL LETTER K WITH LINE BELOW -> LATIN CAPITAL LETTER K */
+	0x1E35, /* LATIN SMALL LETTER K WITH LINE BELOW -> LATIN SMALL LETTER K */
+	0x1E36, /* LATIN CAPITAL LETTER L WITH DOT BELOW -> LATIN CAPITAL LETTER L */
+	0x1E37, /* LATIN SMALL LETTER L WITH DOT BELOW -> LATIN SMALL LETTER L */
+	0x1E38, /* LATIN CAPITAL LETTER L WITH DOT BELOW AND MACRON -> LATIN CAPITAL LETTER L */
+	0x1E39, /* LATIN SMALL LETTER L WITH DOT BELOW AND MACRON -> LATIN SMALL LETTER L */
+	0x1E3A, /* LATIN CAPITAL LETTER L WITH LINE BELOW -> LATIN CAPITAL LETTER L */
+	0x1E3B, /* LATIN SMALL LETTER L WITH LINE BELOW -> LATIN SMALL LETTER L */
+	0x1E3C, /* LATIN CAPITAL LETTER L WITH CIRCUMFLEX BELOW -> LATIN CAPITAL LETTER L */
+	0x1E3D, /* LATIN SMALL LETTER L WITH CIRCUMFLEX BELOW -> LATIN SMALL LETTER L */
+	0x1E3E, /* LATIN CAPITAL LETTER M WITH ACUTE -> LATIN CAPITAL LETTER M */
+	0x1E3F, /* LATIN SMALL LETTER M WITH ACUTE -> LATIN SMALL LETTER M */
+	0x1E40, /* LATIN CAPITAL LETTER M WITH DOT ABOVE -> LATIN CAPITAL LETTER M */
+	0x1E41, /* LATIN SMALL LETTER M WITH DOT ABOVE -> LATIN SMALL LETTER M */
+	0x1E42, /* LATIN CAPITAL LETTER M WITH DOT BELOW -> LATIN CAPITAL LETTER M */
+	0x1E43, /* LATIN SMALL LETTER M WITH DOT BELOW -> LATIN SMALL LETTER M */
+	0x1E44, /* LATIN CAPITAL LETTER N WITH DOT ABOVE -> LATIN CAPITAL LETTER N */
+	0x1E45, /* LATIN SMALL LETTER N WITH DOT ABOVE -> LATIN SMALL LETTER N */
+	0x1E46, /* LATIN CAPITAL LETTER N WITH DOT BELOW -> LATIN CAPITAL LETTER N */
+	0x1E47, /* LATIN SMALL LETTER N WITH DOT BELOW -> LATIN SMALL LETTER N */
+	0x1E48, /* LATIN CAPITAL LETTER N WITH LINE BELOW -> LATIN CAPITAL LETTER N */
+	0x1E49, /* LATIN SMALL LETTER N WITH LINE BELOW -> LATIN SMALL LETTER N */
+	0x1E4A, /* LATIN CAPITAL LETTER N WITH CIRCUMFLEX BELOW -> LATIN CAPITAL LETTER N */
+	0x1E4B, /* LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW -> LATIN SMALL LETTER N */
+	0x1E4C, /* LATIN CAPITAL LETTER O WITH TILDE AND ACUTE -> LATIN CAPITAL LETTER O WITH TILDE */
+	0x1E4D, /* LATIN SMALL LETTER O WITH TILDE AND ACUTE -> LATIN SMALL LETTER O WITH TILDE */
+	0x1E4E, /* LATIN CAPITAL LETTER O WITH TILDE AND DIAERESIS -> LATIN CAPITAL LETTER O WITH TILDE */
+	0x1E4F, /* LATIN SMALL LETTER O WITH TILDE AND DIAERESIS -> LATIN SMALL LETTER O WITH TILDE */
+	0x1E50, /* LATIN CAPITAL LETTER O WITH MACRON AND GRAVE -> LATIN CAPITAL LETTER O */
+	0x1E51, /* LATIN SMALL LETTER O WITH MACRON AND GRAVE -> LATIN SMALL LETTER O */
+	0x1E52, /* LATIN CAPITAL LETTER O WITH MACRON AND ACUTE -> LATIN CAPITAL LETTER O */
+	0x1E53, /* LATIN SMALL LETTER O WITH MACRON AND ACUTE -> LATIN SMALL LETTER O */
+	0x1E54, /* LATIN CAPITAL LETTER P WITH ACUTE -> LATIN CAPITAL LETTER P */
+	0x1E55, /* LATIN SMALL LETTER P WITH ACUTE -> LATIN SMALL LETTER P */
+	0x1E56, /* LATIN CAPITAL LETTER P WITH DOT ABOVE -> LATIN CAPITAL LETTER P */
+	0x1E57, /* LATIN SMALL LETTER P WITH DOT ABOVE -> LATIN SMALL LETTER P */
+	0x1E58, /* LATIN CAPITAL LETTER R WITH DOT ABOVE -> LATIN CAPITAL LETTER R */
+	0x1E59, /* LATIN SMALL LETTER R WITH DOT ABOVE -> LATIN SMALL LETTER R */
+	0x1E5A, /* LATIN CAPITAL LETTER R WITH DOT BELOW -> LATIN CAPITAL LETTER R */
+	0x1E5B, /* LATIN SMALL LETTER R WITH DOT BELOW -> LATIN SMALL LETTER R */
+	0x1E5C, /* LATIN CAPITAL LETTER R WITH DOT BELOW AND MACRON -> LATIN CAPITAL LETTER R */
+	0x1E5D, /* LATIN SMALL LETTER R WITH DOT BELOW AND MACRON -> LATIN SMALL LETTER R */
+	0x1E5E, /* LATIN CAPITAL LETTER R WITH LINE BELOW -> LATIN CAPITAL LETTER R */
+	0x1E5F, /* LATIN SMALL LETTER R WITH LINE BELOW -> LATIN SMALL LETTER R */
+	0x1E60, /* LATIN CAPITAL LETTER S WITH DOT ABOVE -> LATIN CAPITAL LETTER S */
+	0x1E61, /* LATIN SMALL LETTER S WITH DOT ABOVE -> LATIN SMALL LETTER S */
+	0x1E62, /* LATIN CAPITAL LETTER S WITH DOT BELOW -> LATIN CAPITAL LETTER S */
+	0x1E63, /* LATIN SMALL LETTER S WITH DOT BELOW -> LATIN SMALL LETTER S */
+	0x1E64, /* LATIN CAPITAL LETTER S WITH ACUTE AND DOT ABOVE -> LATIN CAPITAL LETTER S */
+	0x1E65, /* LATIN SMALL LETTER S WITH ACUTE AND DOT ABOVE -> LATIN SMALL LETTER S */
+	0x1E66, /* LATIN CAPITAL LETTER S WITH CARON AND DOT ABOVE -> LATIN CAPITAL LETTER S */
+	0x1E67, /* LATIN SMALL LETTER S WITH CARON AND DOT ABOVE -> LATIN SMALL LETTER S */
+	0x1E68, /* LATIN CAPITAL LETTER S WITH DOT BELOW AND DOT ABOVE -> LATIN CAPITAL LETTER S */
+	0x1E69, /* LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE -> LATIN SMALL LETTER S */
+	0x1E6A, /* LATIN CAPITAL LETTER T WITH DOT ABOVE -> LATIN CAPITAL LETTER T */
+	0x1E6B, /* LATIN SMALL LETTER T WITH DOT ABOVE -> LATIN SMALL LETTER T */
+	0x1E6C, /* LATIN CAPITAL LETTER T WITH DOT BELOW -> LATIN CAPITAL LETTER T */
+	0x1E6D, /* LATIN SMALL LETTER T WITH DOT BELOW -> LATIN SMALL LETTER T */
+	0x1E6E, /* LATIN CAPITAL LETTER T WITH LINE BELOW -> LATIN CAPITAL LETTER T */
+	0x1E6F, /* LATIN SMALL LETTER T WITH LINE BELOW -> LATIN SMALL LETTER T */
+	0x1E70, /* LATIN CAPITAL LETTER T WITH CIRCUMFLEX BELOW -> LATIN CAPITAL LETTER T */
+	0x1E71, /* LATIN SMALL LETTER T WITH CIRCUMFLEX BELOW -> LATIN SMALL LETTER T */
+	0x1E72, /* LATIN CAPITAL LETTER U WITH DIAERESIS BELOW -> LATIN CAPITAL LETTER U */
+	0x1E73, /* LATIN SMALL LETTER U WITH DIAERESIS BELOW -> LATIN SMALL LETTER U */
+	0x1E74, /* LATIN CAPITAL LETTER U WITH TILDE BELOW -> LATIN CAPITAL LETTER U */
+	0x1E75, /* LATIN SMALL LETTER U WITH TILDE BELOW -> LATIN SMALL LETTER U */
+	0x1E76, /* LATIN CAPITAL LETTER U WITH CIRCUMFLEX BELOW -> LATIN CAPITAL LETTER U */
+	0x1E77, /* LATIN SMALL LETTER U WITH CIRCUMFLEX BELOW -> LATIN SMALL LETTER U */
+	0x1E78, /* LATIN CAPITAL LETTER U WITH TILDE AND ACUTE -> LATIN CAPITAL LETTER U */
+	0x1E79, /* LATIN SMALL LETTER U WITH TILDE AND ACUTE -> LATIN SMALL LETTER U */
+	0x1E7A, /* LATIN CAPITAL LETTER U WITH MACRON AND DIAERESIS -> LATIN CAPITAL LETTER U */
+	0x1E7B, /* LATIN SMALL LETTER U WITH MACRON AND DIAERESIS -> LATIN SMALL LETTER U */
+	0x1E7C, /* LATIN CAPITAL LETTER V WITH TILDE -> LATIN CAPITAL LETTER V */
+	0x1E7D, /* LATIN SMALL LETTER V WITH TILDE -> LATIN SMALL LETTER V */
+	0x1E7E, /* LATIN CAPITAL LETTER V WITH DOT BELOW -> LATIN CAPITAL LETTER V */
+	0x1E7F, /* LATIN SMALL LETTER V WITH DOT BELOW -> LATIN SMALL LETTER V */
+	0x1E80, /* LATIN CAPITAL LETTER W WITH GRAVE -> LATIN CAPITAL LETTER W */
+	0x1E81, /* LATIN SMALL LETTER W WITH GRAVE -> LATIN SMALL LETTER W */
+	0x1E82, /* LATIN CAPITAL LETTER W WITH ACUTE -> LATIN CAPITAL LETTER W */
+	0x1E83, /* LATIN SMALL LETTER W WITH ACUTE -> LATIN SMALL LETTER W */
+	0x1E84, /* LATIN CAPITAL LETTER W WITH DIAERESIS -> LATIN CAPITAL LETTER W */
+	0x1E85, /* LATIN SMALL LETTER W WITH DIAERESIS -> LATIN SMALL LETTER W */
+	0x1E86, /* LATIN CAPITAL LETTER W WITH DOT ABOVE -> LATIN CAPITAL LETTER W */
+	0x1E87, /* LATIN SMALL LETTER W WITH DOT ABOVE -> LATIN SMALL LETTER W */
+	0x1E88, /* LATIN CAPITAL LETTER W WITH DOT BELOW -> LATIN CAPITAL LETTER W */
+	0x1E89, /* LATIN SMALL LETTER W WITH DOT BELOW -> LATIN SMALL LETTER W */
+	0x1E8A, /* LATIN CAPITAL LETTER X WITH DOT ABOVE -> LATIN CAPITAL LETTER X */
+	0x1E8B, /* LATIN SMALL LETTER X WITH DOT ABOVE -> LATIN SMALL LETTER X */
+	0x1E8C, /* LATIN CAPITAL LETTER X WITH DIAERESIS -> LATIN CAPITAL LETTER X */
+	0x1E8D, /* LATIN SMALL LETTER X WITH DIAERESIS -> LATIN SMALL LETTER X */
+	0x1E8E, /* LATIN CAPITAL LETTER Y WITH DOT ABOVE -> LATIN CAPITAL LETTER Y */
+	0x1E8F, /* LATIN SMALL LETTER Y WITH DOT ABOVE -> LATIN SMALL LETTER Y */
+	0x1E90, /* LATIN CAPITAL LETTER Z WITH CIRCUMFLEX -> LATIN CAPITAL LETTER Z */
+	0x1E91, /* LATIN SMALL LETTER Z WITH CIRCUMFLEX -> LATIN SMALL LETTER Z */
+	0x1E92, /* LATIN CAPITAL LETTER Z WITH DOT BELOW -> LATIN CAPITAL LETTER Z */
+	0x1E93, /* LATIN SMALL LETTER Z WITH DOT BELOW -> LATIN SMALL LETTER Z */
+	0x1E94, /* LATIN CAPITAL LETTER Z WITH LINE BELOW -> LATIN CAPITAL LETTER Z */
+	0x1E95, /* LATIN SMALL LETTER Z WITH LINE BELOW -> LATIN SMALL LETTER Z */
+	0x1E96, /* LATIN SMALL LETTER H WITH LINE BELOW -> LATIN SMALL LETTER H */
+	0x1E97, /* LATIN SMALL LETTER T WITH DIAERESIS -> LATIN SMALL LETTER T */
+	0x1E98, /* LATIN SMALL LETTER W WITH RING ABOVE -> LATIN SMALL LETTER W */
+	0x1E99, /* LATIN SMALL LETTER Y WITH RING ABOVE -> LATIN SMALL LETTER Y */
+	0x1EA0, /* LATIN CAPITAL LETTER A WITH DOT BELOW -> LATIN CAPITAL LETTER A */
+	0x1EA1, /* LATIN SMALL LETTER A WITH DOT BELOW -> LATIN SMALL LETTER A */
+	0x1EA2, /* LATIN CAPITAL LETTER A WITH HOOK ABOVE -> LATIN CAPITAL LETTER A */
+	0x1EA3, /* LATIN SMALL LETTER A WITH HOOK ABOVE -> LATIN SMALL LETTER A */
+	0x1EA4, /* LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND ACUTE -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX */
+	0x1EA5, /* LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE -> LATIN SMALL LETTER A WITH CIRCUMFLEX */
+	0x1EA6, /* LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND GRAVE -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX */
+	0x1EA7, /* LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRAVE -> LATIN SMALL LETTER A WITH CIRCUMFLEX */
+	0x1EA8, /* LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX */
+	0x1EA9, /* LATIN SMALL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE -> LATIN SMALL LETTER A WITH CIRCUMFLEX */
+	0x1EAA, /* LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND TILDE -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX */
+	0x1EAB, /* LATIN SMALL LETTER A WITH CIRCUMFLEX AND TILDE -> LATIN SMALL LETTER A WITH CIRCUMFLEX */
+	0x1EAC, /* LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW -> LATIN CAPITAL LETTER A */
+	0x1EAD, /* LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW -> LATIN SMALL LETTER A */
+	0x1EAE, /* LATIN CAPITAL LETTER A WITH BREVE AND ACUTE -> LATIN CAPITAL LETTER A */
+	0x1EAF, /* LATIN SMALL LETTER A WITH BREVE AND ACUTE -> LATIN SMALL LETTER A */
+	0x1EB0, /* LATIN CAPITAL LETTER A WITH BREVE AND GRAVE -> LATIN CAPITAL LETTER A */
+	0x1EB1, /* LATIN SMALL LETTER A WITH BREVE AND GRAVE -> LATIN SMALL LETTER A */
+	0x1EB2, /* LATIN CAPITAL LETTER A WITH BREVE AND HOOK ABOVE -> LATIN CAPITAL LETTER A */
+	0x1EB3, /* LATIN SMALL LETTER A WITH BREVE AND HOOK ABOVE -> LATIN SMALL LETTER A */
+	0x1EB4, /* LATIN CAPITAL LETTER A WITH BREVE AND TILDE -> LATIN CAPITAL LETTER A */
+	0x1EB5, /* LATIN SMALL LETTER A WITH BREVE AND TILDE -> LATIN SMALL LETTER A */
+	0x1EB6, /* LATIN CAPITAL LETTER A WITH BREVE AND DOT BELOW -> LATIN CAPITAL LETTER A */
+	0x1EB7, /* LATIN SMALL LETTER A WITH BREVE AND DOT BELOW -> LATIN SMALL LETTER A */
+	0x1EB8, /* LATIN CAPITAL LETTER E WITH DOT BELOW -> LATIN CAPITAL LETTER E */
+	0x1EB9, /* LATIN SMALL LETTER E WITH DOT BELOW -> LATIN SMALL LETTER E */
+	0x1EBA, /* LATIN CAPITAL LETTER E WITH HOOK ABOVE -> LATIN CAPITAL LETTER E */
+	0x1EBB, /* LATIN SMALL LETTER E WITH HOOK ABOVE -> LATIN SMALL LETTER E */
+	0x1EBC, /* LATIN CAPITAL LETTER E WITH TILDE -> LATIN CAPITAL LETTER E */
+	0x1EBD, /* LATIN SMALL LETTER E WITH TILDE -> LATIN SMALL LETTER E */
+	0x1EBE, /* LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND ACUTE -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX */
+	0x1EBF, /* LATIN SMALL LETTER E WITH CIRCUMFLEX AND ACUTE -> LATIN SMALL LETTER E WITH CIRCUMFLEX */
+	0x1EC0, /* LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND GRAVE -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX */
+	0x1EC1, /* LATIN SMALL LETTER E WITH CIRCUMFLEX AND GRAVE -> LATIN SMALL LETTER E WITH CIRCUMFLEX */
+	0x1EC2, /* LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX */
+	0x1EC3, /* LATIN SMALL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE -> LATIN SMALL LETTER E WITH CIRCUMFLEX */
+	0x1EC4, /* LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND TILDE -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX */
+	0x1EC5, /* LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE -> LATIN SMALL LETTER E WITH CIRCUMFLEX */
+	0x1EC6, /* LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND DOT BELOW -> LATIN CAPITAL LETTER E */
+	0x1EC7, /* LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW -> LATIN SMALL LETTER E */
+	0x1EC8, /* LATIN CAPITAL LETTER I WITH HOOK ABOVE -> LATIN CAPITAL LETTER I */
+	0x1EC9, /* LATIN SMALL LETTER I WITH HOOK ABOVE -> LATIN SMALL LETTER I */
+	0x1ECA, /* LATIN CAPITAL LETTER I WITH DOT BELOW -> LATIN CAPITAL LETTER I */
+	0x1ECB, /* LATIN SMALL LETTER I WITH DOT BELOW -> LATIN SMALL LETTER I */
+	0x1ECC, /* LATIN CAPITAL LETTER O WITH DOT BELOW -> LATIN CAPITAL LETTER O */
+	0x1ECD, /* LATIN SMALL LETTER O WITH DOT BELOW -> LATIN SMALL LETTER O */
+	0x1ECE, /* LATIN CAPITAL LETTER O WITH HOOK ABOVE -> LATIN CAPITAL LETTER O */
+	0x1ECF, /* LATIN SMALL LETTER O WITH HOOK ABOVE -> LATIN SMALL LETTER O */
+	0x1ED0, /* LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND ACUTE -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX */
+	0x1ED1, /* LATIN SMALL LETTER O WITH CIRCUMFLEX AND ACUTE -> LATIN SMALL LETTER O WITH CIRCUMFLEX */
+	0x1ED2, /* LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND GRAVE -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX */
+	0x1ED3, /* LATIN SMALL LETTER O WITH CIRCUMFLEX AND GRAVE -> LATIN SMALL LETTER O WITH CIRCUMFLEX */
+	0x1ED4, /* LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX */
+	0x1ED5, /* LATIN SMALL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE -> LATIN SMALL LETTER O WITH CIRCUMFLEX */
+	0x1ED6, /* LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND TILDE -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX */
+	0x1ED7, /* LATIN SMALL LETTER O WITH CIRCUMFLEX AND TILDE -> LATIN SMALL LETTER O WITH CIRCUMFLEX */
+	0x1ED8, /* LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND DOT BELOW -> LATIN CAPITAL LETTER O */
+	0x1ED9, /* LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW -> LATIN SMALL LETTER O */
+	0x1EDA, /* LATIN CAPITAL LETTER O WITH HORN AND ACUTE -> LATIN CAPITAL LETTER O */
+	0x1EDB, /* LATIN SMALL LETTER O WITH HORN AND ACUTE -> LATIN SMALL LETTER O */
+	0x1EDC, /* LATIN CAPITAL LETTER O WITH HORN AND GRAVE -> LATIN CAPITAL LETTER O */
+	0x1EDD, /* LATIN SMALL LETTER O WITH HORN AND GRAVE -> LATIN SMALL LETTER O */
+	0x1EDE, /* LATIN CAPITAL LETTER O WITH HORN AND HOOK ABOVE -> LATIN CAPITAL LETTER O */
+	0x1EDF, /* LATIN SMALL LETTER O WITH HORN AND HOOK ABOVE -> LATIN SMALL LETTER O */
+	0x1EE0, /* LATIN CAPITAL LETTER O WITH HORN AND TILDE -> LATIN CAPITAL LETTER O */
+	0x1EE1, /* LATIN SMALL LETTER O WITH HORN AND TILDE -> LATIN SMALL LETTER O */
+	0x1EE2, /* LATIN CAPITAL LETTER O WITH HORN AND DOT BELOW -> LATIN CAPITAL LETTER O */
+	0x1EE3, /* LATIN SMALL LETTER O WITH HORN AND DOT BELOW -> LATIN SMALL LETTER O */
+	0x1EE4, /* LATIN CAPITAL LETTER U WITH DOT BELOW -> LATIN CAPITAL LETTER U */
+	0x1EE5, /* LATIN SMALL LETTER U WITH DOT BELOW -> LATIN SMALL LETTER U */
+	0x1EE6, /* LATIN CAPITAL LETTER U WITH HOOK ABOVE -> LATIN CAPITAL LETTER U */
+	0x1EE7, /* LATIN SMALL LETTER U WITH HOOK ABOVE -> LATIN SMALL LETTER U */
+	0x1EE8, /* LATIN CAPITAL LETTER U WITH HORN AND ACUTE -> LATIN CAPITAL LETTER U */
+	0x1EE9, /* LATIN SMALL LETTER U WITH HORN AND ACUTE -> LATIN SMALL LETTER U */
+	0x1EEA, /* LATIN CAPITAL LETTER U WITH HORN AND GRAVE -> LATIN CAPITAL LETTER U */
+	0x1EEB, /* LATIN SMALL LETTER U WITH HORN AND GRAVE -> LATIN SMALL LETTER U */
+	0x1EEC, /* LATIN CAPITAL LETTER U WITH HORN AND HOOK ABOVE -> LATIN CAPITAL LETTER U */
+	0x1EED, /* LATIN SMALL LETTER U WITH HORN AND HOOK ABOVE -> LATIN SMALL LETTER U */
+	0x1EEE, /* LATIN CAPITAL LETTER U WITH HORN AND TILDE -> LATIN CAPITAL LETTER U */
+	0x1EEF, /* LATIN SMALL LETTER U WITH HORN AND TILDE -> LATIN SMALL LETTER U */
+	0x1EF0, /* LATIN CAPITAL LETTER U WITH HORN AND DOT BELOW -> LATIN CAPITAL LETTER U */
+	0x1EF1, /* LATIN SMALL LETTER U WITH HORN AND DOT BELOW -> LATIN SMALL LETTER U */
+	0x1EF2, /* LATIN CAPITAL LETTER Y WITH GRAVE -> LATIN CAPITAL LETTER Y */
+	0x1EF3, /* LATIN SMALL LETTER Y WITH GRAVE -> LATIN SMALL LETTER Y */
+	0x1EF4, /* LATIN CAPITAL LETTER Y WITH DOT BELOW -> LATIN CAPITAL LETTER Y */
+	0x1EF5, /* LATIN SMALL LETTER Y WITH DOT BELOW -> LATIN SMALL LETTER Y */
+	0x1EF6, /* LATIN CAPITAL LETTER Y WITH HOOK ABOVE -> LATIN CAPITAL LETTER Y */
+	0x1EF7, /* LATIN SMALL LETTER Y WITH HOOK ABOVE -> LATIN SMALL LETTER Y */
+	0x1EF8, /* LATIN CAPITAL LETTER Y WITH TILDE -> LATIN CAPITAL LETTER Y */
+	0x1EF9, /* LATIN SMALL LETTER Y WITH TILDE -> LATIN SMALL LETTER Y */
+	0x1F70, /* GREEK SMALL LETTER ALPHA WITH VARIA -> LATIN SMALL LETTER A */
+	0x1F72, /* GREEK SMALL LETTER EPSILON WITH VARIA -> LATIN SMALL LETTER E */
+	0x1F74, /* GREEK SMALL LETTER ETA WITH VARIA -> LATIN SMALL LETTER N */
+	0x1F7C, /* GREEK SMALL LETTER OMEGA WITH VARIA -> LATIN SMALL LETTER W */
+	0x1FC1, /* GREEK DIALYTIKA AND PERISPOMENI -> DIAERESIS */
+	0x1FED, /* GREEK DIALYTIKA AND VARIA -> DIAERESIS */
+	0x1FFA, /* GREEK CAPITAL LETTER OMEGA WITH VARIA -> LATIN CAPITAL LETTER O */
+	0x1FFC, /* GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI -> LATIN CAPITAL LETTER O */
+	0x201A, /* SINGLE LOW-9 QUOTATION MARK -> COMMA */
+	0x201B, /* SINGLE HIGH-REVERSED-9 QUOTATION MARK -> APOSTROPHE */
+	0x2022, /* BULLET -> ASTERISK */
+	0x2023, /* TRIANGULAR BULLET -> GREATER-THAN SIGN */
+	0x202F, /* NARROW NO-BREAK SPACE -> SPACE */
+	0x2032, /* PRIME -> APOSTROPHE */
+	0x2033, /* DOUBLE PRIME -> QUOTATION MARK */
+	0x2039, /* SINGLE LEFT-POINTING ANGLE QUOTATION MARK -> LESS-THAN SIGN */
+	0x203A, /* SINGLE RIGHT-POINTING ANGLE QUOTATION MARK -> GREATER-THAN SIGN */
+	0x203B, /* REFERENCE MARK -> ASTERISK */
+	0x203C, /* DOUBLE EXCLAMATION MARK -> EXCLAMATION MARK */
+	0x203D, /* INTERROBANG -> QUESTION MARK */
+	0x2042, /* ASTERISM -> ASTERISK */
+	0x2043, /* HYPHEN BULLET -> HYPHEN-MINUS */
+	0x2044, /* FRACTION SLASH -> SOLIDUS */
+	0x2049, /* EXCLAMATION QUESTION MARK -> EXCLAMATION MARK */
+	0x204A, /* TIRONIAN SIGN ET -> AMPERSAND */
+	0x204B, /* REVERSED PILCROW SIGN -> LATIN CAPITAL LETTER P */
+	0x204C, /* BLACK LEFTWARDS BULLET -> LESS-THAN SIGN */
+	0x204D, /* BLACK RIGHTWARDS BULLET -> GREATER-THAN SIGN */
+	0x204E, /* LOW ASTERISK -> ASTERISK */
+	0x204F, /* REVERSED SEMICOLON -> SEMICOLON */
+	0x2051, /* TWO ASTERISKS ALIGNED VERTICALLY -> ASTERISK */
+	0x2052, /* COMMERCIAL MINUS SIGN -> HYPHEN-MINUS */
+	0x2053, /* SWUNG DASH -> TILDE */
+	0x2055, /* FLOWER PUNCTUATION MARK -> ASTERISK */
+	0x205B, /* FOUR DOT MARK -> COLON */
+	0x205F, /* MEDIUM MATHEMATICAL SPACE -> SPACE */
+	0x2070, /* SUPERSCRIPT ZERO -> DIGIT ZERO */
+	0x2074, /* SUPERSCRIPT FOUR -> DIGIT FOUR */
+	0x2075, /* SUPERSCRIPT FIVE -> DIGIT FIVE */
+	0x2076, /* SUPERSCRIPT SIX -> DIGIT SIX */
+	0x2077, /* SUPERSCRIPT SEVEN -> DIGIT SEVEN */
+	0x2078, /* SUPERSCRIPT EIGHT -> DIGIT EIGHT */
+	0x2079, /* SUPERSCRIPT NINE -> DIGIT NINE */
+	0x2080, /* SUBSCRIPT ZERO -> DIGIT ZERO */
+	0x2081, /* SUBSCRIPT ONE -> DIGIT ONE */
+	0x2082, /* SUBSCRIPT TWO -> DIGIT TWO */
+	0x2083, /* SUBSCRIPT THREE -> DIGIT THREE */
+	0x2084, /* SUBSCRIPT FOUR -> DIGIT FOUR */
+	0x2085, /* SUBSCRIPT FIVE -> DIGIT FIVE */
+	0x2086, /* SUBSCRIPT SIX -> DIGIT SIX */
+	0x2087, /* SUBSCRIPT SEVEN -> DIGIT SEVEN */
+	0x2088, /* SUBSCRIPT EIGHT -> DIGIT EIGHT */
+	0x2089, /* SUBSCRIPT NINE -> DIGIT NINE */
+	0x20AC, /* EURO SIGN -> LATIN CAPITAL LETTER E */
+	0x2103, /* DEGREE CELSIUS -> LATIN CAPITAL LETTER C */
+	0x2109, /* DEGREE FAHRENHEIT -> LATIN CAPITAL LETTER F */
+	0x2122, /* TRADE MARK SIGN -> LATIN CAPITAL LETTER T */
+	0x2190, /* LEFTWARDS ARROW -> LESS-THAN SIGN */
+	0x2191, /* UPWARDS ARROW -> CIRCUMFLEX ACCENT */
+	0x2192, /* RIGHTWARDS ARROW -> GREATER-THAN SIGN */
+	0x2193, /* DOWNWARDS ARROW -> LATIN SMALL LETTER V */
+	0x21AE, /* LEFT RIGHT ARROW WITH STROKE -> EXCLAMATION MARK */
+	0x21D0, /* LEFTWARDS DOUBLE ARROW -> LESS-THAN SIGN */
+	0x21D1, /* UPWARDS DOUBLE ARROW -> CIRCUMFLEX ACCENT */
+	0x21D2, /* RIGHTWARDS DOUBLE ARROW -> GREATER-THAN SIGN */
+	0x21D3, /* DOWNWARDS DOUBLE ARROW -> LATIN SMALL LETTER V */
+	0x2204, /* THERE DOES NOT EXIST -> EXCLAMATION MARK */
+	0x2209, /* NOT AN ELEMENT OF -> EXCLAMATION MARK */
+	0x220C, /* DOES NOT CONTAIN AS MEMBER -> EXCLAMATION MARK */
+	0x2212, /* MINUS SIGN -> HYPHEN-MINUS */
+	0x2213, /* MINUS-OR-PLUS SIGN -> PLUS SIGN */
+	0x2215, /* DIVISION SLASH -> SOLIDUS */
+	0x2216, /* SET MINUS -> REVERSE SOLIDUS */
+	0x2217, /* ASTERISK OPERATOR -> ASTERISK */
+	0x2218, /* RING OPERATOR -> LATIN SMALL LETTER O */
+	0x2219, /* BULLET OPERATOR -> FULL STOP */
+	0x221A, /* SQUARE ROOT -> LATIN SMALL LETTER V */
+	0x221E, /* INFINITY -> DIGIT EIGHT */
+	0x2223, /* DIVIDES -> VERTICAL LINE */
+	0x2224, /* DOES NOT DIVIDE -> EXCLAMATION MARK */
+	0x2225, /* PARALLEL TO -> VERTICAL LINE */
+	0x2226, /* NOT PARALLEL TO -> EXCLAMATION MARK */
+	0x2227, /* LOGICAL AND -> AMPERSAND */
+	0x2228, /* LOGICAL OR -> VERTICAL LINE */
+	0x2229, /* INTERSECTION -> LATIN SMALL LETTER N */
+	0x222A, /* UNION -> LATIN SMALL LETTER U */
+	0x222B, /* INTEGRAL -> LATIN CAPITAL LETTER S */
+	0x2241, /* NOT TILDE -> NUMBER SIGN */
+	0x2244, /* NOT ASYMPTOTICALLY EQUAL TO -> NUMBER SIGN */
+	0x2248, /* ALMOST EQUAL TO -> TILDE */
+	0x2249, /* NOT ALMOST EQUAL TO -> NUMBER SIGN */
+	0x2260, /* NOT EQUAL TO -> NUMBER SIGN */
+	0x2262, /* NOT IDENTICAL TO -> NUMBER SIGN */
+	0x2264, /* LESS-THAN OR EQUAL TO -> LESS-THAN SIGN */
+	0x2265, /* GREATER-THAN OR EQUAL TO -> GREATER-THAN SIGN */
+	0x226D, /* NOT EQUIVALENT TO -> NUMBER SIGN */
+	0x2270, /* NEITHER LESS-THAN NOR EQUAL TO -> LESS-THAN SIGN */
+	0x2271, /* NEITHER GREATER-THAN NOR EQUAL TO -> GREATER-THAN SIGN */
+	0x2282, /* SUBSET OF -> LATIN SMALL LETTER C */
+	0x2283, /* SUPERSET OF -> LATIN CAPITAL LETTER C */
+	0x2286, /* SUBSET OF OR EQUAL TO -> LATIN SMALL LETTER C */
+	0x2287, /* SUPERSET OF OR EQUAL TO -> LATIN CAPITAL LETTER C */
+	0x2288, /* NEITHER A SUBSET OF NOR EQUAL TO -> LATIN SMALL LETTER C */
+	0x2289, /* NEITHER A SUPERSET OF NOR EQUAL TO -> LATIN CAPITAL LETTER C */
+	0x229B, /* CIRCLED ASTERISK OPERATOR -> ASTERISK */
+	0x22C5, /* DOT OPERATOR -> FULL STOP */
+	0x22C6, /* STAR OPERATOR -> ASTERISK */
+	0x235F, /* APL FUNCTIONAL SYMBOL CIRCLE STAR -> ASTERISK */
+	0x2363, /* APL FUNCTIONAL SYMBOL STAR DIAERESIS -> ASTERISK */
+	0x23A1, /* LEFT SQUARE BRACKET UPPER CORNER -> VERTICAL LINE */
+	0x23A9, /* LEFT CURLY BRACKET LOWER HOOK -> VERTICAL LINE */
+	0x23AB, /* RIGHT CURLY BRACKET UPPER HOOK -> VERTICAL LINE */
+	0x23B0, /* UPPER LEFT OR LOWER RIGHT CURLY BRACKET SECTION -> LEFT PARENTHESIS */
+	0x23B1, /* UPPER RIGHT OR LOWER LEFT CURLY BRACKET SECTION -> RIGHT PARENTHESIS */
+	0x23B3, /* SUMMATION BOTTOM -> VERTICAL LINE */
+	0x23BC, /* HORIZONTAL SCAN LINE-7 -> LATIN CAPITAL LETTER J */
+	0x23BD, /* HORIZONTAL SCAN LINE-9 -> LOW LINE */
+	0x2550, /* BOX DRAWINGS DOUBLE HORIZONTAL -> HYPHEN-MINUS */
+	0x2551, /* BOX DRAWINGS DOUBLE VERTICAL -> VERTICAL LINE */
+	0x2571, /* BOX DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT -> SOLIDUS */
+	0x2572, /* BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT -> REVERSE SOLIDUS */
+	0x2573, /* BOX DRAWINGS LIGHT DIAGONAL CROSS -> LATIN CAPITAL LETTER X */
+	0x2574, /* BOX DRAWINGS LIGHT LEFT -> HYPHEN-MINUS */
+	0x2575, /* BOX DRAWINGS LIGHT UP -> VERTICAL LINE */
+	0x2576, /* BOX DRAWINGS LIGHT RIGHT -> HYPHEN-MINUS */
+	0x2577, /* BOX DRAWINGS LIGHT DOWN -> VERTICAL LINE */
+	0x2578, /* BOX DRAWINGS HEAVY LEFT -> HYPHEN-MINUS */
+	0x2579, /* BOX DRAWINGS HEAVY UP -> VERTICAL LINE */
+	0x257A, /* BOX DRAWINGS HEAVY RIGHT -> HYPHEN-MINUS */
+	0x257B, /* BOX DRAWINGS HEAVY DOWN -> VERTICAL LINE */
+	0x257C, /* BOX DRAWINGS LIGHT LEFT AND HEAVY RIGHT -> HYPHEN-MINUS */
+	0x257D, /* BOX DRAWINGS LIGHT UP AND HEAVY DOWN -> VERTICAL LINE */
+	0x257E, /* BOX DRAWINGS HEAVY LEFT AND LIGHT RIGHT -> HYPHEN-MINUS */
+	0x257F, /* BOX DRAWINGS HEAVY UP AND LIGHT DOWN -> VERTICAL LINE */
+	0x2591, /* LIGHT SHADE -> FULL STOP */
+	0x2592, /* MEDIUM SHADE -> PERCENT SIGN */
+	0x25A1, /* WHITE SQUARE -> LATIN SMALL LETTER O */
+	0x25AC, /* BLACK RECTANGLE -> NUMBER SIGN */
+	0x25AD, /* WHITE RECTANGLE -> HYPHEN-MINUS */
+	0x25C6, /* BLACK DIAMOND -> ASTERISK */
+	0x25CB, /* WHITE CIRCLE -> LATIN SMALL LETTER O */
+	0x25CE, /* BULLSEYE -> LATIN SMALL LETTER O */
+	0x25CF, /* BLACK CIRCLE -> ASTERISK */
+	0x25E6, /* WHITE BULLET -> LATIN SMALL LETTER O */
+	0x262A, /* STAR AND CRESCENT -> ASTERISK */
+	0x2698, /* FLOWER -> ASTERISK */
+	0x269D, /* OUTLINED WHITE STAR -> ASTERISK */
+	0x2713, /* CHECK MARK -> LATIN SMALL LETTER V */
+	0x2714, /* HEAVY CHECK MARK -> LATIN CAPITAL LETTER V */
+	0x2715, /* MULTIPLICATION X -> LATIN SMALL LETTER X */
+	0x2716, /* HEAVY MULTIPLICATION X -> LATIN CAPITAL LETTER X */
+	0x2717, /* BALLOT X -> LATIN SMALL LETTER X */
+	0x2718, /* HEAVY BALLOT X -> LATIN CAPITAL LETTER X */
+	0x27F8, /* LONG LEFTWARDS DOUBLE ARROW -> LESS-THAN SIGN */
+	0x27F9, /* LONG RIGHTWARDS DOUBLE ARROW -> GREATER-THAN SIGN */
+	0xFF01, /* FULLWIDTH EXCLAMATION MARK -> EXCLAMATION MARK */
+	0xFF02, /* FULLWIDTH QUOTATION MARK -> QUOTATION MARK */
+	0xFF03, /* FULLWIDTH NUMBER SIGN -> NUMBER SIGN */
+	0xFF04, /* FULLWIDTH DOLLAR SIGN -> DOLLAR SIGN */
+	0xFF05, /* FULLWIDTH PERCENT SIGN -> PERCENT SIGN */
+	0xFF06, /* FULLWIDTH AMPERSAND -> AMPERSAND */
+	0xFF07, /* FULLWIDTH APOSTROPHE -> APOSTROPHE */
+	0xFF08, /* FULLWIDTH LEFT PARENTHESIS -> LEFT PARENTHESIS */
+	0xFF09, /* FULLWIDTH RIGHT PARENTHESIS -> RIGHT PARENTHESIS */
+	0xFF0A, /* FULLWIDTH ASTERISK -> ASTERISK */
+	0xFF0B, /* FULLWIDTH PLUS SIGN -> PLUS SIGN */
+	0xFF0C, /* FULLWIDTH COMMA -> COMMA */
+	0xFF0D, /* FULLWIDTH HYPHEN-MINUS -> HYPHEN-MINUS */
+	0xFF0E, /* FULLWIDTH FULL STOP -> FULL STOP */
+	0xFF0F, /* FULLWIDTH SOLIDUS -> SOLIDUS */
+	0xFF10, /* FULLWIDTH DIGIT ZERO -> DIGIT ZERO */
+	0xFF11, /* FULLWIDTH DIGIT ONE -> DIGIT ONE */
+	0xFF12, /* FULLWIDTH DIGIT TWO -> DIGIT TWO */
+	0xFF13, /* FULLWIDTH DIGIT THREE -> DIGIT THREE */
+	0xFF14, /* FULLWIDTH DIGIT FOUR -> DIGIT FOUR */
+	0xFF15, /* FULLWIDTH DIGIT FIVE -> DIGIT FIVE */
+	0xFF16, /* FULLWIDTH DIGIT SIX -> DIGIT SIX */
+	0xFF17, /* FULLWIDTH DIGIT SEVEN -> DIGIT SEVEN */
+	0xFF18, /* FULLWIDTH DIGIT EIGHT -> DIGIT EIGHT */
+	0xFF19, /* FULLWIDTH DIGIT NINE -> DIGIT NINE */
+	0xFF1A, /* FULLWIDTH COLON -> COLON */
+	0xFF1B, /* FULLWIDTH SEMICOLON -> SEMICOLON */
+	0xFF1C, /* FULLWIDTH LESS-THAN SIGN -> LESS-THAN SIGN */
+	0xFF1D, /* FULLWIDTH EQUALS SIGN -> EQUALS SIGN */
+	0xFF1E, /* FULLWIDTH GREATER-THAN SIGN -> GREATER-THAN SIGN */
+	0xFF1F, /* FULLWIDTH QUESTION MARK -> QUESTION MARK */
+	0xFF20, /* FULLWIDTH COMMERCIAL AT -> COMMERCIAL AT */
+	0xFF21, /* FULLWIDTH LATIN CAPITAL LETTER A -> LATIN CAPITAL LETTER A */
+	0xFF22, /* FULLWIDTH LATIN CAPITAL LETTER B -> LATIN CAPITAL LETTER B */
+	0xFF23, /* FULLWIDTH LATIN CAPITAL LETTER C -> LATIN CAPITAL LETTER C */
+	0xFF24, /* FULLWIDTH LATIN CAPITAL LETTER D -> LATIN CAPITAL LETTER D */
+	0xFF25, /* FULLWIDTH LATIN CAPITAL LETTER E -> LATIN CAPITAL LETTER E */
+	0xFF26, /* FULLWIDTH LATIN CAPITAL LETTER F -> LATIN CAPITAL LETTER F */
+	0xFF27, /* FULLWIDTH LATIN CAPITAL LETTER G -> LATIN CAPITAL LETTER G */
+	0xFF28, /* FULLWIDTH LATIN CAPITAL LETTER H -> LATIN CAPITAL LETTER H */
+	0xFF29, /* FULLWIDTH LATIN CAPITAL LETTER I -> LATIN CAPITAL LETTER I */
+	0xFF2A, /* FULLWIDTH LATIN CAPITAL LETTER J -> LATIN CAPITAL LETTER J */
+	0xFF2B, /* FULLWIDTH LATIN CAPITAL LETTER K -> LATIN CAPITAL LETTER K */
+	0xFF2C, /* FULLWIDTH LATIN CAPITAL LETTER L -> LATIN CAPITAL LETTER L */
+	0xFF2D, /* FULLWIDTH LATIN CAPITAL LETTER M -> LATIN CAPITAL LETTER M */
+	0xFF2E, /* FULLWIDTH LATIN CAPITAL LETTER N -> LATIN CAPITAL LETTER N */
+	0xFF2F, /* FULLWIDTH LATIN CAPITAL LETTER O -> LATIN CAPITAL LETTER O */
+	0xFF30, /* FULLWIDTH LATIN CAPITAL LETTER P -> LATIN CAPITAL LETTER P */
+	0xFF31, /* FULLWIDTH LATIN CAPITAL LETTER Q -> LATIN CAPITAL LETTER Q */
+	0xFF32, /* FULLWIDTH LATIN CAPITAL LETTER R -> LATIN CAPITAL LETTER R */
+	0xFF33, /* FULLWIDTH LATIN CAPITAL LETTER S -> LATIN CAPITAL LETTER S */
+	0xFF34, /* FULLWIDTH LATIN CAPITAL LETTER T -> LATIN CAPITAL LETTER T */
+	0xFF35, /* FULLWIDTH LATIN CAPITAL LETTER U -> LATIN CAPITAL LETTER U */
+	0xFF36, /* FULLWIDTH LATIN CAPITAL LETTER V -> LATIN CAPITAL LETTER V */
+	0xFF37, /* FULLWIDTH LATIN CAPITAL LETTER W -> LATIN CAPITAL LETTER W */
+	0xFF38, /* FULLWIDTH LATIN CAPITAL LETTER X -> LATIN CAPITAL LETTER X */
+	0xFF39, /* FULLWIDTH LATIN CAPITAL LETTER Y -> LATIN CAPITAL LETTER Y */
+	0xFF3A, /* FULLWIDTH LATIN CAPITAL LETTER Z -> LATIN CAPITAL LETTER Z */
+	0xFF3B, /* FULLWIDTH LEFT SQUARE BRACKET -> LEFT SQUARE BRACKET */
+	0xFF3C, /* FULLWIDTH REVERSE SOLIDUS -> REVERSE SOLIDUS */
+	0xFF3D, /* FULLWIDTH RIGHT SQUARE BRACKET -> RIGHT SQUARE BRACKET */
+	0xFF3E, /* FULLWIDTH CIRCUMFLEX ACCENT -> CIRCUMFLEX ACCENT */
+	0xFF3F, /* FULLWIDTH LOW LINE -> LOW LINE */
+	0xFF40, /* FULLWIDTH GRAVE ACCENT -> GRAVE ACCENT */
+	0xFF41, /* FULLWIDTH LATIN SMALL LETTER A -> LATIN SMALL LETTER A */
+	0xFF42, /* FULLWIDTH LATIN SMALL LETTER B -> LATIN SMALL LETTER B */
+	0xFF43, /* FULLWIDTH LATIN SMALL LETTER C -> LATIN SMALL LETTER C */
+	0xFF44, /* FULLWIDTH LATIN SMALL LETTER D -> LATIN SMALL LETTER D */
+	0xFF45, /* FULLWIDTH LATIN SMALL LETTER E -> LATIN SMALL LETTER E */
+	0xFF46, /* FULLWIDTH LATIN SMALL LETTER F -> LATIN SMALL LETTER F */
+	0xFF47, /* FULLWIDTH LATIN SMALL LETTER G -> LATIN SMALL LETTER G */
+	0xFF48, /* FULLWIDTH LATIN SMALL LETTER H -> LATIN SMALL LETTER H */
+	0xFF49, /* FULLWIDTH LATIN SMALL LETTER I -> LATIN SMALL LETTER I */
+	0xFF4A, /* FULLWIDTH LATIN SMALL LETTER J -> LATIN SMALL LETTER J */
+	0xFF4B, /* FULLWIDTH LATIN SMALL LETTER K -> LATIN SMALL LETTER K */
+	0xFF4C, /* FULLWIDTH LATIN SMALL LETTER L -> LATIN SMALL LETTER L */
+	0xFF4D, /* FULLWIDTH LATIN SMALL LETTER M -> LATIN SMALL LETTER M */
+	0xFF4E, /* FULLWIDTH LATIN SMALL LETTER N -> LATIN SMALL LETTER N */
+	0xFF4F, /* FULLWIDTH LATIN SMALL LETTER O -> LATIN SMALL LETTER O */
+	0xFF50, /* FULLWIDTH LATIN SMALL LETTER P -> LATIN SMALL LETTER P */
+	0xFF51, /* FULLWIDTH LATIN SMALL LETTER Q -> LATIN SMALL LETTER Q */
+	0xFF52, /* FULLWIDTH LATIN SMALL LETTER R -> LATIN SMALL LETTER R */
+	0xFF53, /* FULLWIDTH LATIN SMALL LETTER S -> LATIN SMALL LETTER S */
+	0xFF54, /* FULLWIDTH LATIN SMALL LETTER T -> LATIN SMALL LETTER T */
+	0xFF55, /* FULLWIDTH LATIN SMALL LETTER U -> LATIN SMALL LETTER U */
+	0xFF56, /* FULLWIDTH LATIN SMALL LETTER V -> LATIN SMALL LETTER V */
+	0xFF57, /* FULLWIDTH LATIN SMALL LETTER W -> LATIN SMALL LETTER W */
+	0xFF58, /* FULLWIDTH LATIN SMALL LETTER X -> LATIN SMALL LETTER X */
+	0xFF59, /* FULLWIDTH LATIN SMALL LETTER Y -> LATIN SMALL LETTER Y */
+	0xFF5A, /* FULLWIDTH LATIN SMALL LETTER Z -> LATIN SMALL LETTER Z */
+	0xFF5B, /* FULLWIDTH LEFT CURLY BRACKET -> LEFT CURLY BRACKET */
+	0xFF5C, /* FULLWIDTH VERTICAL LINE -> VERTICAL LINE */
+	0xFF5D, /* FULLWIDTH RIGHT CURLY BRACKET -> RIGHT CURLY BRACKET */
+	0xFF5E, /* FULLWIDTH TILDE -> TILDE */
+};
+
+static const u8 ucs_fallback_singles_subs[] = {
+	0x20, /* NO-BREAK SPACE -> SPACE */
+	0x21, /* INVERTED EXCLAMATION MARK -> EXCLAMATION MARK */
+	0x63, /* CENT SIGN -> LATIN SMALL LETTER C */
+	0x4C, /* POUND SIGN -> LATIN CAPITAL LETTER L */
+	0x59, /* YEN SIGN -> LATIN CAPITAL LETTER Y */
+	0x7C, /* BROKEN BAR -> VERTICAL LINE */
+	0x53, /* SECTION SIGN -> LATIN CAPITAL LETTER S */
+	0x43, /* COPYRIGHT SIGN -> LATIN CAPITAL LETTER C */
+	0x3C, /* LEFT-POINTING DOUBLE ANGLE QUOTATION MARK -> LESS-THAN SIGN */
+	0x52, /* REGISTERED SIGN -> LATIN CAPITAL LETTER R */
+	0x6F, /* DEGREE SIGN -> LATIN SMALL LETTER O */
+	0x2B, /* PLUS-MINUS SIGN -> PLUS SIGN */
+	0x32, /* SUPERSCRIPT TWO -> DIGIT TWO */
+	0x33, /* SUPERSCRIPT THREE -> DIGIT THREE */
+	0x75, /* MICRO SIGN -> LATIN SMALL LETTER U */
+	0x50, /* PILCROW SIGN -> LATIN CAPITAL LETTER P */
+	0x2E, /* MIDDLE DOT -> FULL STOP */
+	0x31, /* SUPERSCRIPT ONE -> DIGIT ONE */
+	0x3E, /* RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK -> GREATER-THAN SIGN */
+	0x3F, /* INVERTED QUESTION MARK -> QUESTION MARK */
+	0x45, /* LATIN CAPITAL LETTER AE -> LATIN CAPITAL LETTER E */
+	0x43, /* LATIN CAPITAL LETTER C WITH CEDILLA -> LATIN CAPITAL LETTER C */
+	0x44, /* LATIN CAPITAL LETTER ETH -> LATIN CAPITAL LETTER D */
+	0x4E, /* LATIN CAPITAL LETTER N WITH TILDE -> LATIN CAPITAL LETTER N */
+	0x78, /* MULTIPLICATION SIGN -> LATIN SMALL LETTER X */
+	0x4F, /* LATIN CAPITAL LETTER O WITH STROKE -> LATIN CAPITAL LETTER O */
+	0x59, /* LATIN CAPITAL LETTER Y WITH ACUTE -> LATIN CAPITAL LETTER Y */
+	0x50, /* LATIN CAPITAL LETTER THORN -> LATIN CAPITAL LETTER P */
+	0x73, /* LATIN SMALL LETTER SHARP S -> LATIN SMALL LETTER S */
+	0x65, /* LATIN SMALL LETTER AE -> LATIN SMALL LETTER E */
+	0x63, /* LATIN SMALL LETTER C WITH CEDILLA -> LATIN SMALL LETTER C */
+	0x64, /* LATIN SMALL LETTER ETH -> LATIN SMALL LETTER D */
+	0x6E, /* LATIN SMALL LETTER N WITH TILDE -> LATIN SMALL LETTER N */
+	0x2F, /* DIVISION SIGN -> SOLIDUS */
+	0x6F, /* LATIN SMALL LETTER O WITH STROKE -> LATIN SMALL LETTER O */
+	0x79, /* LATIN SMALL LETTER Y WITH ACUTE -> LATIN SMALL LETTER Y */
+	0x70, /* LATIN SMALL LETTER THORN -> LATIN SMALL LETTER P */
+	0x79, /* LATIN SMALL LETTER Y WITH DIAERESIS -> LATIN SMALL LETTER Y */
+	0x41, /* LATIN CAPITAL LETTER A WITH MACRON -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH MACRON -> LATIN SMALL LETTER A */
+	0x41, /* LATIN CAPITAL LETTER A WITH BREVE -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH BREVE -> LATIN SMALL LETTER A */
+	0x41, /* LATIN CAPITAL LETTER A WITH OGONEK -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH OGONEK -> LATIN SMALL LETTER A */
+	0x43, /* LATIN CAPITAL LETTER C WITH ACUTE -> LATIN CAPITAL LETTER C */
+	0x63, /* LATIN SMALL LETTER C WITH ACUTE -> LATIN SMALL LETTER C */
+	0x43, /* LATIN CAPITAL LETTER C WITH CIRCUMFLEX -> LATIN CAPITAL LETTER C */
+	0x63, /* LATIN SMALL LETTER C WITH CIRCUMFLEX -> LATIN SMALL LETTER C */
+	0x43, /* LATIN CAPITAL LETTER C WITH DOT ABOVE -> LATIN CAPITAL LETTER C */
+	0x63, /* LATIN SMALL LETTER C WITH DOT ABOVE -> LATIN SMALL LETTER C */
+	0x43, /* LATIN CAPITAL LETTER C WITH CARON -> LATIN CAPITAL LETTER C */
+	0x63, /* LATIN SMALL LETTER C WITH CARON -> LATIN SMALL LETTER C */
+	0x44, /* LATIN CAPITAL LETTER D WITH CARON -> LATIN CAPITAL LETTER D */
+	0x64, /* LATIN SMALL LETTER D WITH CARON -> LATIN SMALL LETTER D */
+	0x44, /* LATIN CAPITAL LETTER D WITH STROKE -> LATIN CAPITAL LETTER D */
+	0x64, /* LATIN SMALL LETTER D WITH STROKE -> LATIN SMALL LETTER D */
+	0x45, /* LATIN CAPITAL LETTER E WITH MACRON -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH MACRON -> LATIN SMALL LETTER E */
+	0x45, /* LATIN CAPITAL LETTER E WITH BREVE -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH BREVE -> LATIN SMALL LETTER E */
+	0x45, /* LATIN CAPITAL LETTER E WITH DOT ABOVE -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH DOT ABOVE -> LATIN SMALL LETTER E */
+	0x45, /* LATIN CAPITAL LETTER E WITH OGONEK -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH OGONEK -> LATIN SMALL LETTER E */
+	0x45, /* LATIN CAPITAL LETTER E WITH CARON -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH CARON -> LATIN SMALL LETTER E */
+	0x47, /* LATIN CAPITAL LETTER G WITH CIRCUMFLEX -> LATIN CAPITAL LETTER G */
+	0x67, /* LATIN SMALL LETTER G WITH CIRCUMFLEX -> LATIN SMALL LETTER G */
+	0x47, /* LATIN CAPITAL LETTER G WITH BREVE -> LATIN CAPITAL LETTER G */
+	0x67, /* LATIN SMALL LETTER G WITH BREVE -> LATIN SMALL LETTER G */
+	0x47, /* LATIN CAPITAL LETTER G WITH DOT ABOVE -> LATIN CAPITAL LETTER G */
+	0x67, /* LATIN SMALL LETTER G WITH DOT ABOVE -> LATIN SMALL LETTER G */
+	0x47, /* LATIN CAPITAL LETTER G WITH CEDILLA -> LATIN CAPITAL LETTER G */
+	0x67, /* LATIN SMALL LETTER G WITH CEDILLA -> LATIN SMALL LETTER G */
+	0x48, /* LATIN CAPITAL LETTER H WITH CIRCUMFLEX -> LATIN CAPITAL LETTER H */
+	0x68, /* LATIN SMALL LETTER H WITH CIRCUMFLEX -> LATIN SMALL LETTER H */
+	0x48, /* LATIN CAPITAL LETTER H WITH STROKE -> LATIN CAPITAL LETTER H */
+	0x68, /* LATIN SMALL LETTER H WITH STROKE -> LATIN SMALL LETTER H */
+	0x49, /* LATIN CAPITAL LETTER I WITH TILDE -> LATIN CAPITAL LETTER I */
+	0x69, /* LATIN SMALL LETTER I WITH TILDE -> LATIN SMALL LETTER I */
+	0x49, /* LATIN CAPITAL LETTER I WITH MACRON -> LATIN CAPITAL LETTER I */
+	0x69, /* LATIN SMALL LETTER I WITH MACRON -> LATIN SMALL LETTER I */
+	0x49, /* LATIN CAPITAL LETTER I WITH BREVE -> LATIN CAPITAL LETTER I */
+	0x69, /* LATIN SMALL LETTER I WITH BREVE -> LATIN SMALL LETTER I */
+	0x49, /* LATIN CAPITAL LETTER I WITH OGONEK -> LATIN CAPITAL LETTER I */
+	0x69, /* LATIN SMALL LETTER I WITH OGONEK -> LATIN SMALL LETTER I */
+	0x49, /* LATIN CAPITAL LETTER I WITH DOT ABOVE -> LATIN CAPITAL LETTER I */
+	0x4A, /* LATIN CAPITAL LETTER J WITH CIRCUMFLEX -> LATIN CAPITAL LETTER J */
+	0x6A, /* LATIN SMALL LETTER J WITH CIRCUMFLEX -> LATIN SMALL LETTER J */
+	0x4B, /* LATIN CAPITAL LETTER K WITH CEDILLA -> LATIN CAPITAL LETTER K */
+	0x6B, /* LATIN SMALL LETTER K WITH CEDILLA -> LATIN SMALL LETTER K */
+	0x4C, /* LATIN CAPITAL LETTER L WITH ACUTE -> LATIN CAPITAL LETTER L */
+	0x6C, /* LATIN SMALL LETTER L WITH ACUTE -> LATIN SMALL LETTER L */
+	0x4C, /* LATIN CAPITAL LETTER L WITH CEDILLA -> LATIN CAPITAL LETTER L */
+	0x6C, /* LATIN SMALL LETTER L WITH CEDILLA -> LATIN SMALL LETTER L */
+	0x4C, /* LATIN CAPITAL LETTER L WITH CARON -> LATIN CAPITAL LETTER L */
+	0x6C, /* LATIN SMALL LETTER L WITH CARON -> LATIN SMALL LETTER L */
+	0x4C, /* LATIN CAPITAL LETTER L WITH STROKE -> LATIN CAPITAL LETTER L */
+	0x6C, /* LATIN SMALL LETTER L WITH STROKE -> LATIN SMALL LETTER L */
+	0x4E, /* LATIN CAPITAL LETTER N WITH ACUTE -> LATIN CAPITAL LETTER N */
+	0x6E, /* LATIN SMALL LETTER N WITH ACUTE -> LATIN SMALL LETTER N */
+	0x4E, /* LATIN CAPITAL LETTER N WITH CEDILLA -> LATIN CAPITAL LETTER N */
+	0x6E, /* LATIN SMALL LETTER N WITH CEDILLA -> LATIN SMALL LETTER N */
+	0x4E, /* LATIN CAPITAL LETTER N WITH CARON -> LATIN CAPITAL LETTER N */
+	0x6E, /* LATIN SMALL LETTER N WITH CARON -> LATIN SMALL LETTER N */
+	0x4F, /* LATIN CAPITAL LETTER O WITH MACRON -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH MACRON -> LATIN SMALL LETTER O */
+	0x4F, /* LATIN CAPITAL LETTER O WITH BREVE -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH BREVE -> LATIN SMALL LETTER O */
+	0x4F, /* LATIN CAPITAL LETTER O WITH DOUBLE ACUTE -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH DOUBLE ACUTE -> LATIN SMALL LETTER O */
+	0x45, /* LATIN CAPITAL LIGATURE OE -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LIGATURE OE -> LATIN SMALL LETTER E */
+	0x52, /* LATIN CAPITAL LETTER R WITH ACUTE -> LATIN CAPITAL LETTER R */
+	0x72, /* LATIN SMALL LETTER R WITH ACUTE -> LATIN SMALL LETTER R */
+	0x52, /* LATIN CAPITAL LETTER R WITH CEDILLA -> LATIN CAPITAL LETTER R */
+	0x72, /* LATIN SMALL LETTER R WITH CEDILLA -> LATIN SMALL LETTER R */
+	0x52, /* LATIN CAPITAL LETTER R WITH CARON -> LATIN CAPITAL LETTER R */
+	0x72, /* LATIN SMALL LETTER R WITH CARON -> LATIN SMALL LETTER R */
+	0x53, /* LATIN CAPITAL LETTER S WITH ACUTE -> LATIN CAPITAL LETTER S */
+	0x73, /* LATIN SMALL LETTER S WITH ACUTE -> LATIN SMALL LETTER S */
+	0x53, /* LATIN CAPITAL LETTER S WITH CIRCUMFLEX -> LATIN CAPITAL LETTER S */
+	0x73, /* LATIN SMALL LETTER S WITH CIRCUMFLEX -> LATIN SMALL LETTER S */
+	0x53, /* LATIN CAPITAL LETTER S WITH CEDILLA -> LATIN CAPITAL LETTER S */
+	0x73, /* LATIN SMALL LETTER S WITH CEDILLA -> LATIN SMALL LETTER S */
+	0x53, /* LATIN CAPITAL LETTER S WITH CARON -> LATIN CAPITAL LETTER S */
+	0x73, /* LATIN SMALL LETTER S WITH CARON -> LATIN SMALL LETTER S */
+	0x54, /* LATIN CAPITAL LETTER T WITH CEDILLA -> LATIN CAPITAL LETTER T */
+	0x74, /* LATIN SMALL LETTER T WITH CEDILLA -> LATIN SMALL LETTER T */
+	0x54, /* LATIN CAPITAL LETTER T WITH CARON -> LATIN CAPITAL LETTER T */
+	0x74, /* LATIN SMALL LETTER T WITH CARON -> LATIN SMALL LETTER T */
+	0x55, /* LATIN CAPITAL LETTER U WITH TILDE -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH TILDE -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH MACRON -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH MACRON -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH BREVE -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH BREVE -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH RING ABOVE -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH RING ABOVE -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH DOUBLE ACUTE -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH DOUBLE ACUTE -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH OGONEK -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH OGONEK -> LATIN SMALL LETTER U */
+	0x57, /* LATIN CAPITAL LETTER W WITH CIRCUMFLEX -> LATIN CAPITAL LETTER W */
+	0x77, /* LATIN SMALL LETTER W WITH CIRCUMFLEX -> LATIN SMALL LETTER W */
+	0x59, /* LATIN CAPITAL LETTER Y WITH CIRCUMFLEX -> LATIN CAPITAL LETTER Y */
+	0x79, /* LATIN SMALL LETTER Y WITH CIRCUMFLEX -> LATIN SMALL LETTER Y */
+	0x59, /* LATIN CAPITAL LETTER Y WITH DIAERESIS -> LATIN CAPITAL LETTER Y */
+	0x5A, /* LATIN CAPITAL LETTER Z WITH ACUTE -> LATIN CAPITAL LETTER Z */
+	0x7A, /* LATIN SMALL LETTER Z WITH ACUTE -> LATIN SMALL LETTER Z */
+	0x5A, /* LATIN CAPITAL LETTER Z WITH DOT ABOVE -> LATIN CAPITAL LETTER Z */
+	0x7A, /* LATIN SMALL LETTER Z WITH DOT ABOVE -> LATIN SMALL LETTER Z */
+	0x5A, /* LATIN CAPITAL LETTER Z WITH CARON -> LATIN CAPITAL LETTER Z */
+	0x7A, /* LATIN SMALL LETTER Z WITH CARON -> LATIN SMALL LETTER Z */
+	0x4F, /* LATIN CAPITAL LETTER O WITH HORN -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH HORN -> LATIN SMALL LETTER O */
+	0x55, /* LATIN CAPITAL LETTER U WITH HORN -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH HORN -> LATIN SMALL LETTER U */
+	0x41, /* LATIN CAPITAL LETTER A WITH CARON -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH CARON -> LATIN SMALL LETTER A */
+	0x49, /* LATIN CAPITAL LETTER I WITH CARON -> LATIN CAPITAL LETTER I */
+	0x69, /* LATIN SMALL LETTER I WITH CARON -> LATIN SMALL LETTER I */
+	0x4F, /* LATIN CAPITAL LETTER O WITH CARON -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH CARON -> LATIN SMALL LETTER O */
+	0x55, /* LATIN CAPITAL LETTER U WITH CARON -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH CARON -> LATIN SMALL LETTER U */
+	0xDC, /* LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON -> LATIN CAPITAL LETTER U WITH DIAERESIS */
+	0xFC, /* LATIN SMALL LETTER U WITH DIAERESIS AND MACRON -> LATIN SMALL LETTER U WITH DIAERESIS */
+	0xDC, /* LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE -> LATIN CAPITAL LETTER U WITH DIAERESIS */
+	0xFC, /* LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE -> LATIN SMALL LETTER U WITH DIAERESIS */
+	0xDC, /* LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON -> LATIN CAPITAL LETTER U WITH DIAERESIS */
+	0xFC, /* LATIN SMALL LETTER U WITH DIAERESIS AND CARON -> LATIN SMALL LETTER U WITH DIAERESIS */
+	0xDC, /* LATIN CAPITAL LETTER U WITH DIAERESIS AND GRAVE -> LATIN CAPITAL LETTER U WITH DIAERESIS */
+	0xFC, /* LATIN SMALL LETTER U WITH DIAERESIS AND GRAVE -> LATIN SMALL LETTER U WITH DIAERESIS */
+	0xC4, /* LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON -> LATIN CAPITAL LETTER A WITH DIAERESIS */
+	0xE4, /* LATIN SMALL LETTER A WITH DIAERESIS AND MACRON -> LATIN SMALL LETTER A WITH DIAERESIS */
+	0x41, /* LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON -> LATIN SMALL LETTER A */
+	0xC6, /* LATIN CAPITAL LETTER AE WITH MACRON -> LATIN CAPITAL LETTER AE */
+	0xE6, /* LATIN SMALL LETTER AE WITH MACRON -> LATIN SMALL LETTER AE */
+	0x47, /* LATIN CAPITAL LETTER G WITH CARON -> LATIN CAPITAL LETTER G */
+	0x67, /* LATIN SMALL LETTER G WITH CARON -> LATIN SMALL LETTER G */
+	0x4B, /* LATIN CAPITAL LETTER K WITH CARON -> LATIN CAPITAL LETTER K */
+	0x6B, /* LATIN SMALL LETTER K WITH CARON -> LATIN SMALL LETTER K */
+	0x4F, /* LATIN CAPITAL LETTER O WITH OGONEK -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH OGONEK -> LATIN SMALL LETTER O */
+	0x4F, /* LATIN CAPITAL LETTER O WITH OGONEK AND MACRON -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH OGONEK AND MACRON -> LATIN SMALL LETTER O */
+	0x6A, /* LATIN SMALL LETTER J WITH CARON -> LATIN SMALL LETTER J */
+	0x47, /* LATIN CAPITAL LETTER G WITH ACUTE -> LATIN CAPITAL LETTER G */
+	0x67, /* LATIN SMALL LETTER G WITH ACUTE -> LATIN SMALL LETTER G */
+	0x4E, /* LATIN CAPITAL LETTER N WITH GRAVE -> LATIN CAPITAL LETTER N */
+	0x6E, /* LATIN SMALL LETTER N WITH GRAVE -> LATIN SMALL LETTER N */
+	0xC5, /* LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE -> LATIN CAPITAL LETTER A WITH RING ABOVE */
+	0xE5, /* LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE -> LATIN SMALL LETTER A WITH RING ABOVE */
+	0xC6, /* LATIN CAPITAL LETTER AE WITH ACUTE -> LATIN CAPITAL LETTER AE */
+	0xE6, /* LATIN SMALL LETTER AE WITH ACUTE -> LATIN SMALL LETTER AE */
+	0xD8, /* LATIN CAPITAL LETTER O WITH STROKE AND ACUTE -> LATIN CAPITAL LETTER O WITH STROKE */
+	0xF8, /* LATIN SMALL LETTER O WITH STROKE AND ACUTE -> LATIN SMALL LETTER O WITH STROKE */
+	0x41, /* LATIN CAPITAL LETTER A WITH DOUBLE GRAVE -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH DOUBLE GRAVE -> LATIN SMALL LETTER A */
+	0x41, /* LATIN CAPITAL LETTER A WITH INVERTED BREVE -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH INVERTED BREVE -> LATIN SMALL LETTER A */
+	0x45, /* LATIN CAPITAL LETTER E WITH DOUBLE GRAVE -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH DOUBLE GRAVE -> LATIN SMALL LETTER E */
+	0x45, /* LATIN CAPITAL LETTER E WITH INVERTED BREVE -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH INVERTED BREVE -> LATIN SMALL LETTER E */
+	0x49, /* LATIN CAPITAL LETTER I WITH DOUBLE GRAVE -> LATIN CAPITAL LETTER I */
+	0x69, /* LATIN SMALL LETTER I WITH DOUBLE GRAVE -> LATIN SMALL LETTER I */
+	0x49, /* LATIN CAPITAL LETTER I WITH INVERTED BREVE -> LATIN CAPITAL LETTER I */
+	0x69, /* LATIN SMALL LETTER I WITH INVERTED BREVE -> LATIN SMALL LETTER I */
+	0x4F, /* LATIN CAPITAL LETTER O WITH DOUBLE GRAVE -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH DOUBLE GRAVE -> LATIN SMALL LETTER O */
+	0x4F, /* LATIN CAPITAL LETTER O WITH INVERTED BREVE -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH INVERTED BREVE -> LATIN SMALL LETTER O */
+	0x52, /* LATIN CAPITAL LETTER R WITH DOUBLE GRAVE -> LATIN CAPITAL LETTER R */
+	0x72, /* LATIN SMALL LETTER R WITH DOUBLE GRAVE -> LATIN SMALL LETTER R */
+	0x52, /* LATIN CAPITAL LETTER R WITH INVERTED BREVE -> LATIN CAPITAL LETTER R */
+	0x72, /* LATIN SMALL LETTER R WITH INVERTED BREVE -> LATIN SMALL LETTER R */
+	0x55, /* LATIN CAPITAL LETTER U WITH DOUBLE GRAVE -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH DOUBLE GRAVE -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH INVERTED BREVE -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH INVERTED BREVE -> LATIN SMALL LETTER U */
+	0x53, /* LATIN CAPITAL LETTER S WITH COMMA BELOW -> LATIN CAPITAL LETTER S */
+	0x73, /* LATIN SMALL LETTER S WITH COMMA BELOW -> LATIN SMALL LETTER S */
+	0x54, /* LATIN CAPITAL LETTER T WITH COMMA BELOW -> LATIN CAPITAL LETTER T */
+	0x74, /* LATIN SMALL LETTER T WITH COMMA BELOW -> LATIN SMALL LETTER T */
+	0x48, /* LATIN CAPITAL LETTER H WITH CARON -> LATIN CAPITAL LETTER H */
+	0x68, /* LATIN SMALL LETTER H WITH CARON -> LATIN SMALL LETTER H */
+	0x41, /* LATIN CAPITAL LETTER A WITH DOT ABOVE -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH DOT ABOVE -> LATIN SMALL LETTER A */
+	0x45, /* LATIN CAPITAL LETTER E WITH CEDILLA -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH CEDILLA -> LATIN SMALL LETTER E */
+	0xD6, /* LATIN CAPITAL LETTER O WITH DIAERESIS AND MACRON -> LATIN CAPITAL LETTER O WITH DIAERESIS */
+	0xF6, /* LATIN SMALL LETTER O WITH DIAERESIS AND MACRON -> LATIN SMALL LETTER O WITH DIAERESIS */
+	0xD5, /* LATIN CAPITAL LETTER O WITH TILDE AND MACRON -> LATIN CAPITAL LETTER O WITH TILDE */
+	0xF5, /* LATIN SMALL LETTER O WITH TILDE AND MACRON -> LATIN SMALL LETTER O WITH TILDE */
+	0x4F, /* LATIN CAPITAL LETTER O WITH DOT ABOVE -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH DOT ABOVE -> LATIN SMALL LETTER O */
+	0x4F, /* LATIN CAPITAL LETTER O WITH DOT ABOVE AND MACRON -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH DOT ABOVE AND MACRON -> LATIN SMALL LETTER O */
+	0x59, /* LATIN CAPITAL LETTER Y WITH MACRON -> LATIN CAPITAL LETTER Y */
+	0x79, /* LATIN SMALL LETTER Y WITH MACRON -> LATIN SMALL LETTER Y */
+	0xA8, /* GREEK DIALYTIKA TONOS -> DIAERESIS */
+	0x2E, /* GREEK ANO TELEIA -> FULL STOP */
+	0x4F, /* GREEK CAPITAL LETTER OMEGA WITH TONOS -> LATIN CAPITAL LETTER O */
+	0x49, /* GREEK CAPITAL LETTER GAMMA -> LATIN CAPITAL LETTER I */
+	0x41, /* GREEK CAPITAL LETTER DELTA -> LATIN CAPITAL LETTER A */
+	0x4F, /* GREEK CAPITAL LETTER THETA -> LATIN CAPITAL LETTER O */
+	0x41, /* GREEK CAPITAL LETTER LAMDA -> LATIN CAPITAL LETTER A */
+	0x6E, /* GREEK CAPITAL LETTER PI -> LATIN SMALL LETTER N */
+	0x45, /* GREEK CAPITAL LETTER SIGMA -> LATIN CAPITAL LETTER E */
+	0x4F, /* GREEK CAPITAL LETTER PHI -> LATIN CAPITAL LETTER O */
+	0x59, /* GREEK CAPITAL LETTER PSI -> LATIN CAPITAL LETTER Y */
+	0x4F, /* GREEK CAPITAL LETTER OMEGA -> LATIN CAPITAL LETTER O */
+	0x61, /* GREEK SMALL LETTER ALPHA WITH TONOS -> LATIN SMALL LETTER A */
+	0x65, /* GREEK SMALL LETTER EPSILON WITH TONOS -> LATIN SMALL LETTER E */
+	0x6E, /* GREEK SMALL LETTER ETA WITH TONOS -> LATIN SMALL LETTER N */
+	0x61, /* GREEK SMALL LETTER ALPHA -> LATIN SMALL LETTER A */
+	0x42, /* GREEK SMALL LETTER BETA -> LATIN CAPITAL LETTER B */
+	0x79, /* GREEK SMALL LETTER GAMMA -> LATIN SMALL LETTER Y */
+	0x64, /* GREEK SMALL LETTER DELTA -> LATIN SMALL LETTER D */
+	0x65, /* GREEK SMALL LETTER EPSILON -> LATIN SMALL LETTER E */
+	0x7A, /* GREEK SMALL LETTER ZETA -> LATIN SMALL LETTER Z */
+	0x6E, /* GREEK SMALL LETTER ETA -> LATIN SMALL LETTER N */
+	0x30, /* GREEK SMALL LETTER THETA -> DIGIT ZERO */
+	0x6C, /* GREEK SMALL LETTER LAMDA -> LATIN SMALL LETTER L */
+	0x75, /* GREEK SMALL LETTER MU -> LATIN SMALL LETTER U */
+	0x6E, /* GREEK SMALL LETTER PI -> LATIN SMALL LETTER N */
+	0x70, /* GREEK SMALL LETTER RHO -> LATIN SMALL LETTER P */
+	0x6F, /* GREEK SMALL LETTER SIGMA -> LATIN SMALL LETTER O */
+	0x74, /* GREEK SMALL LETTER TAU -> LATIN SMALL LETTER T */
+	0x66, /* GREEK SMALL LETTER PHI -> LATIN SMALL LETTER F */
+	0x58, /* GREEK SMALL LETTER CHI -> LATIN CAPITAL LETTER X */
+	0x77, /* GREEK SMALL LETTER OMEGA WITH TONOS -> LATIN SMALL LETTER W */
+	0x20, /* OGHAM SPACE MARK -> SPACE */
+	0x41, /* LATIN CAPITAL LETTER A WITH RING BELOW -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH RING BELOW -> LATIN SMALL LETTER A */
+	0x42, /* LATIN CAPITAL LETTER B WITH DOT ABOVE -> LATIN CAPITAL LETTER B */
+	0x62, /* LATIN SMALL LETTER B WITH DOT ABOVE -> LATIN SMALL LETTER B */
+	0x42, /* LATIN CAPITAL LETTER B WITH DOT BELOW -> LATIN CAPITAL LETTER B */
+	0x62, /* LATIN SMALL LETTER B WITH DOT BELOW -> LATIN SMALL LETTER B */
+	0x42, /* LATIN CAPITAL LETTER B WITH LINE BELOW -> LATIN CAPITAL LETTER B */
+	0x62, /* LATIN SMALL LETTER B WITH LINE BELOW -> LATIN SMALL LETTER B */
+	0xC7, /* LATIN CAPITAL LETTER C WITH CEDILLA AND ACUTE -> LATIN CAPITAL LETTER C WITH CEDILLA */
+	0xE7, /* LATIN SMALL LETTER C WITH CEDILLA AND ACUTE -> LATIN SMALL LETTER C WITH CEDILLA */
+	0x44, /* LATIN CAPITAL LETTER D WITH DOT ABOVE -> LATIN CAPITAL LETTER D */
+	0x64, /* LATIN SMALL LETTER D WITH DOT ABOVE -> LATIN SMALL LETTER D */
+	0x44, /* LATIN CAPITAL LETTER D WITH DOT BELOW -> LATIN CAPITAL LETTER D */
+	0x64, /* LATIN SMALL LETTER D WITH DOT BELOW -> LATIN SMALL LETTER D */
+	0x44, /* LATIN CAPITAL LETTER D WITH LINE BELOW -> LATIN CAPITAL LETTER D */
+	0x64, /* LATIN SMALL LETTER D WITH LINE BELOW -> LATIN SMALL LETTER D */
+	0x44, /* LATIN CAPITAL LETTER D WITH CEDILLA -> LATIN CAPITAL LETTER D */
+	0x64, /* LATIN SMALL LETTER D WITH CEDILLA -> LATIN SMALL LETTER D */
+	0x44, /* LATIN CAPITAL LETTER D WITH CIRCUMFLEX BELOW -> LATIN CAPITAL LETTER D */
+	0x64, /* LATIN SMALL LETTER D WITH CIRCUMFLEX BELOW -> LATIN SMALL LETTER D */
+	0x45, /* LATIN CAPITAL LETTER E WITH MACRON AND GRAVE -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH MACRON AND GRAVE -> LATIN SMALL LETTER E */
+	0x45, /* LATIN CAPITAL LETTER E WITH MACRON AND ACUTE -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH MACRON AND ACUTE -> LATIN SMALL LETTER E */
+	0x45, /* LATIN CAPITAL LETTER E WITH CIRCUMFLEX BELOW -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH CIRCUMFLEX BELOW -> LATIN SMALL LETTER E */
+	0x45, /* LATIN CAPITAL LETTER E WITH TILDE BELOW -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH TILDE BELOW -> LATIN SMALL LETTER E */
+	0x45, /* LATIN CAPITAL LETTER E WITH CEDILLA AND BREVE -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH CEDILLA AND BREVE -> LATIN SMALL LETTER E */
+	0x46, /* LATIN CAPITAL LETTER F WITH DOT ABOVE -> LATIN CAPITAL LETTER F */
+	0x66, /* LATIN SMALL LETTER F WITH DOT ABOVE -> LATIN SMALL LETTER F */
+	0x47, /* LATIN CAPITAL LETTER G WITH MACRON -> LATIN CAPITAL LETTER G */
+	0x67, /* LATIN SMALL LETTER G WITH MACRON -> LATIN SMALL LETTER G */
+	0x48, /* LATIN CAPITAL LETTER H WITH DOT ABOVE -> LATIN CAPITAL LETTER H */
+	0x68, /* LATIN SMALL LETTER H WITH DOT ABOVE -> LATIN SMALL LETTER H */
+	0x48, /* LATIN CAPITAL LETTER H WITH DOT BELOW -> LATIN CAPITAL LETTER H */
+	0x68, /* LATIN SMALL LETTER H WITH DOT BELOW -> LATIN SMALL LETTER H */
+	0x48, /* LATIN CAPITAL LETTER H WITH DIAERESIS -> LATIN CAPITAL LETTER H */
+	0x68, /* LATIN SMALL LETTER H WITH DIAERESIS -> LATIN SMALL LETTER H */
+	0x48, /* LATIN CAPITAL LETTER H WITH CEDILLA -> LATIN CAPITAL LETTER H */
+	0x68, /* LATIN SMALL LETTER H WITH CEDILLA -> LATIN SMALL LETTER H */
+	0x48, /* LATIN CAPITAL LETTER H WITH BREVE BELOW -> LATIN CAPITAL LETTER H */
+	0x68, /* LATIN SMALL LETTER H WITH BREVE BELOW -> LATIN SMALL LETTER H */
+	0x49, /* LATIN CAPITAL LETTER I WITH TILDE BELOW -> LATIN CAPITAL LETTER I */
+	0x69, /* LATIN SMALL LETTER I WITH TILDE BELOW -> LATIN SMALL LETTER I */
+	0xCF, /* LATIN CAPITAL LETTER I WITH DIAERESIS AND ACUTE -> LATIN CAPITAL LETTER I WITH DIAERESIS */
+	0xEF, /* LATIN SMALL LETTER I WITH DIAERESIS AND ACUTE -> LATIN SMALL LETTER I WITH DIAERESIS */
+	0x4B, /* LATIN CAPITAL LETTER K WITH ACUTE -> LATIN CAPITAL LETTER K */
+	0x6B, /* LATIN SMALL LETTER K WITH ACUTE -> LATIN SMALL LETTER K */
+	0x4B, /* LATIN CAPITAL LETTER K WITH DOT BELOW -> LATIN CAPITAL LETTER K */
+	0x6B, /* LATIN SMALL LETTER K WITH DOT BELOW -> LATIN SMALL LETTER K */
+	0x4B, /* LATIN CAPITAL LETTER K WITH LINE BELOW -> LATIN CAPITAL LETTER K */
+	0x6B, /* LATIN SMALL LETTER K WITH LINE BELOW -> LATIN SMALL LETTER K */
+	0x4C, /* LATIN CAPITAL LETTER L WITH DOT BELOW -> LATIN CAPITAL LETTER L */
+	0x6C, /* LATIN SMALL LETTER L WITH DOT BELOW -> LATIN SMALL LETTER L */
+	0x4C, /* LATIN CAPITAL LETTER L WITH DOT BELOW AND MACRON -> LATIN CAPITAL LETTER L */
+	0x6C, /* LATIN SMALL LETTER L WITH DOT BELOW AND MACRON -> LATIN SMALL LETTER L */
+	0x4C, /* LATIN CAPITAL LETTER L WITH LINE BELOW -> LATIN CAPITAL LETTER L */
+	0x6C, /* LATIN SMALL LETTER L WITH LINE BELOW -> LATIN SMALL LETTER L */
+	0x4C, /* LATIN CAPITAL LETTER L WITH CIRCUMFLEX BELOW -> LATIN CAPITAL LETTER L */
+	0x6C, /* LATIN SMALL LETTER L WITH CIRCUMFLEX BELOW -> LATIN SMALL LETTER L */
+	0x4D, /* LATIN CAPITAL LETTER M WITH ACUTE -> LATIN CAPITAL LETTER M */
+	0x6D, /* LATIN SMALL LETTER M WITH ACUTE -> LATIN SMALL LETTER M */
+	0x4D, /* LATIN CAPITAL LETTER M WITH DOT ABOVE -> LATIN CAPITAL LETTER M */
+	0x6D, /* LATIN SMALL LETTER M WITH DOT ABOVE -> LATIN SMALL LETTER M */
+	0x4D, /* LATIN CAPITAL LETTER M WITH DOT BELOW -> LATIN CAPITAL LETTER M */
+	0x6D, /* LATIN SMALL LETTER M WITH DOT BELOW -> LATIN SMALL LETTER M */
+	0x4E, /* LATIN CAPITAL LETTER N WITH DOT ABOVE -> LATIN CAPITAL LETTER N */
+	0x6E, /* LATIN SMALL LETTER N WITH DOT ABOVE -> LATIN SMALL LETTER N */
+	0x4E, /* LATIN CAPITAL LETTER N WITH DOT BELOW -> LATIN CAPITAL LETTER N */
+	0x6E, /* LATIN SMALL LETTER N WITH DOT BELOW -> LATIN SMALL LETTER N */
+	0x4E, /* LATIN CAPITAL LETTER N WITH LINE BELOW -> LATIN CAPITAL LETTER N */
+	0x6E, /* LATIN SMALL LETTER N WITH LINE BELOW -> LATIN SMALL LETTER N */
+	0x4E, /* LATIN CAPITAL LETTER N WITH CIRCUMFLEX BELOW -> LATIN CAPITAL LETTER N */
+	0x6E, /* LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW -> LATIN SMALL LETTER N */
+	0xD5, /* LATIN CAPITAL LETTER O WITH TILDE AND ACUTE -> LATIN CAPITAL LETTER O WITH TILDE */
+	0xF5, /* LATIN SMALL LETTER O WITH TILDE AND ACUTE -> LATIN SMALL LETTER O WITH TILDE */
+	0xD5, /* LATIN CAPITAL LETTER O WITH TILDE AND DIAERESIS -> LATIN CAPITAL LETTER O WITH TILDE */
+	0xF5, /* LATIN SMALL LETTER O WITH TILDE AND DIAERESIS -> LATIN SMALL LETTER O WITH TILDE */
+	0x4F, /* LATIN CAPITAL LETTER O WITH MACRON AND GRAVE -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH MACRON AND GRAVE -> LATIN SMALL LETTER O */
+	0x4F, /* LATIN CAPITAL LETTER O WITH MACRON AND ACUTE -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH MACRON AND ACUTE -> LATIN SMALL LETTER O */
+	0x50, /* LATIN CAPITAL LETTER P WITH ACUTE -> LATIN CAPITAL LETTER P */
+	0x70, /* LATIN SMALL LETTER P WITH ACUTE -> LATIN SMALL LETTER P */
+	0x50, /* LATIN CAPITAL LETTER P WITH DOT ABOVE -> LATIN CAPITAL LETTER P */
+	0x70, /* LATIN SMALL LETTER P WITH DOT ABOVE -> LATIN SMALL LETTER P */
+	0x52, /* LATIN CAPITAL LETTER R WITH DOT ABOVE -> LATIN CAPITAL LETTER R */
+	0x72, /* LATIN SMALL LETTER R WITH DOT ABOVE -> LATIN SMALL LETTER R */
+	0x52, /* LATIN CAPITAL LETTER R WITH DOT BELOW -> LATIN CAPITAL LETTER R */
+	0x72, /* LATIN SMALL LETTER R WITH DOT BELOW -> LATIN SMALL LETTER R */
+	0x52, /* LATIN CAPITAL LETTER R WITH DOT BELOW AND MACRON -> LATIN CAPITAL LETTER R */
+	0x72, /* LATIN SMALL LETTER R WITH DOT BELOW AND MACRON -> LATIN SMALL LETTER R */
+	0x52, /* LATIN CAPITAL LETTER R WITH LINE BELOW -> LATIN CAPITAL LETTER R */
+	0x72, /* LATIN SMALL LETTER R WITH LINE BELOW -> LATIN SMALL LETTER R */
+	0x53, /* LATIN CAPITAL LETTER S WITH DOT ABOVE -> LATIN CAPITAL LETTER S */
+	0x73, /* LATIN SMALL LETTER S WITH DOT ABOVE -> LATIN SMALL LETTER S */
+	0x53, /* LATIN CAPITAL LETTER S WITH DOT BELOW -> LATIN CAPITAL LETTER S */
+	0x73, /* LATIN SMALL LETTER S WITH DOT BELOW -> LATIN SMALL LETTER S */
+	0x53, /* LATIN CAPITAL LETTER S WITH ACUTE AND DOT ABOVE -> LATIN CAPITAL LETTER S */
+	0x73, /* LATIN SMALL LETTER S WITH ACUTE AND DOT ABOVE -> LATIN SMALL LETTER S */
+	0x53, /* LATIN CAPITAL LETTER S WITH CARON AND DOT ABOVE -> LATIN CAPITAL LETTER S */
+	0x73, /* LATIN SMALL LETTER S WITH CARON AND DOT ABOVE -> LATIN SMALL LETTER S */
+	0x53, /* LATIN CAPITAL LETTER S WITH DOT BELOW AND DOT ABOVE -> LATIN CAPITAL LETTER S */
+	0x73, /* LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE -> LATIN SMALL LETTER S */
+	0x54, /* LATIN CAPITAL LETTER T WITH DOT ABOVE -> LATIN CAPITAL LETTER T */
+	0x74, /* LATIN SMALL LETTER T WITH DOT ABOVE -> LATIN SMALL LETTER T */
+	0x54, /* LATIN CAPITAL LETTER T WITH DOT BELOW -> LATIN CAPITAL LETTER T */
+	0x74, /* LATIN SMALL LETTER T WITH DOT BELOW -> LATIN SMALL LETTER T */
+	0x54, /* LATIN CAPITAL LETTER T WITH LINE BELOW -> LATIN CAPITAL LETTER T */
+	0x74, /* LATIN SMALL LETTER T WITH LINE BELOW -> LATIN SMALL LETTER T */
+	0x54, /* LATIN CAPITAL LETTER T WITH CIRCUMFLEX BELOW -> LATIN CAPITAL LETTER T */
+	0x74, /* LATIN SMALL LETTER T WITH CIRCUMFLEX BELOW -> LATIN SMALL LETTER T */
+	0x55, /* LATIN CAPITAL LETTER U WITH DIAERESIS BELOW -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH DIAERESIS BELOW -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH TILDE BELOW -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH TILDE BELOW -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH CIRCUMFLEX BELOW -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH CIRCUMFLEX BELOW -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH TILDE AND ACUTE -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH TILDE AND ACUTE -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH MACRON AND DIAERESIS -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH MACRON AND DIAERESIS -> LATIN SMALL LETTER U */
+	0x56, /* LATIN CAPITAL LETTER V WITH TILDE -> LATIN CAPITAL LETTER V */
+	0x76, /* LATIN SMALL LETTER V WITH TILDE -> LATIN SMALL LETTER V */
+	0x56, /* LATIN CAPITAL LETTER V WITH DOT BELOW -> LATIN CAPITAL LETTER V */
+	0x76, /* LATIN SMALL LETTER V WITH DOT BELOW -> LATIN SMALL LETTER V */
+	0x57, /* LATIN CAPITAL LETTER W WITH GRAVE -> LATIN CAPITAL LETTER W */
+	0x77, /* LATIN SMALL LETTER W WITH GRAVE -> LATIN SMALL LETTER W */
+	0x57, /* LATIN CAPITAL LETTER W WITH ACUTE -> LATIN CAPITAL LETTER W */
+	0x77, /* LATIN SMALL LETTER W WITH ACUTE -> LATIN SMALL LETTER W */
+	0x57, /* LATIN CAPITAL LETTER W WITH DIAERESIS -> LATIN CAPITAL LETTER W */
+	0x77, /* LATIN SMALL LETTER W WITH DIAERESIS -> LATIN SMALL LETTER W */
+	0x57, /* LATIN CAPITAL LETTER W WITH DOT ABOVE -> LATIN CAPITAL LETTER W */
+	0x77, /* LATIN SMALL LETTER W WITH DOT ABOVE -> LATIN SMALL LETTER W */
+	0x57, /* LATIN CAPITAL LETTER W WITH DOT BELOW -> LATIN CAPITAL LETTER W */
+	0x77, /* LATIN SMALL LETTER W WITH DOT BELOW -> LATIN SMALL LETTER W */
+	0x58, /* LATIN CAPITAL LETTER X WITH DOT ABOVE -> LATIN CAPITAL LETTER X */
+	0x78, /* LATIN SMALL LETTER X WITH DOT ABOVE -> LATIN SMALL LETTER X */
+	0x58, /* LATIN CAPITAL LETTER X WITH DIAERESIS -> LATIN CAPITAL LETTER X */
+	0x78, /* LATIN SMALL LETTER X WITH DIAERESIS -> LATIN SMALL LETTER X */
+	0x59, /* LATIN CAPITAL LETTER Y WITH DOT ABOVE -> LATIN CAPITAL LETTER Y */
+	0x79, /* LATIN SMALL LETTER Y WITH DOT ABOVE -> LATIN SMALL LETTER Y */
+	0x5A, /* LATIN CAPITAL LETTER Z WITH CIRCUMFLEX -> LATIN CAPITAL LETTER Z */
+	0x7A, /* LATIN SMALL LETTER Z WITH CIRCUMFLEX -> LATIN SMALL LETTER Z */
+	0x5A, /* LATIN CAPITAL LETTER Z WITH DOT BELOW -> LATIN CAPITAL LETTER Z */
+	0x7A, /* LATIN SMALL LETTER Z WITH DOT BELOW -> LATIN SMALL LETTER Z */
+	0x5A, /* LATIN CAPITAL LETTER Z WITH LINE BELOW -> LATIN CAPITAL LETTER Z */
+	0x7A, /* LATIN SMALL LETTER Z WITH LINE BELOW -> LATIN SMALL LETTER Z */
+	0x68, /* LATIN SMALL LETTER H WITH LINE BELOW -> LATIN SMALL LETTER H */
+	0x74, /* LATIN SMALL LETTER T WITH DIAERESIS -> LATIN SMALL LETTER T */
+	0x77, /* LATIN SMALL LETTER W WITH RING ABOVE -> LATIN SMALL LETTER W */
+	0x79, /* LATIN SMALL LETTER Y WITH RING ABOVE -> LATIN SMALL LETTER Y */
+	0x41, /* LATIN CAPITAL LETTER A WITH DOT BELOW -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH DOT BELOW -> LATIN SMALL LETTER A */
+	0x41, /* LATIN CAPITAL LETTER A WITH HOOK ABOVE -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH HOOK ABOVE -> LATIN SMALL LETTER A */
+	0xC2, /* LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND ACUTE -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX */
+	0xE2, /* LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE -> LATIN SMALL LETTER A WITH CIRCUMFLEX */
+	0xC2, /* LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND GRAVE -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX */
+	0xE2, /* LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRAVE -> LATIN SMALL LETTER A WITH CIRCUMFLEX */
+	0xC2, /* LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX */
+	0xE2, /* LATIN SMALL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE -> LATIN SMALL LETTER A WITH CIRCUMFLEX */
+	0xC2, /* LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND TILDE -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX */
+	0xE2, /* LATIN SMALL LETTER A WITH CIRCUMFLEX AND TILDE -> LATIN SMALL LETTER A WITH CIRCUMFLEX */
+	0x41, /* LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW -> LATIN SMALL LETTER A */
+	0x41, /* LATIN CAPITAL LETTER A WITH BREVE AND ACUTE -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH BREVE AND ACUTE -> LATIN SMALL LETTER A */
+	0x41, /* LATIN CAPITAL LETTER A WITH BREVE AND GRAVE -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH BREVE AND GRAVE -> LATIN SMALL LETTER A */
+	0x41, /* LATIN CAPITAL LETTER A WITH BREVE AND HOOK ABOVE -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH BREVE AND HOOK ABOVE -> LATIN SMALL LETTER A */
+	0x41, /* LATIN CAPITAL LETTER A WITH BREVE AND TILDE -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH BREVE AND TILDE -> LATIN SMALL LETTER A */
+	0x41, /* LATIN CAPITAL LETTER A WITH BREVE AND DOT BELOW -> LATIN CAPITAL LETTER A */
+	0x61, /* LATIN SMALL LETTER A WITH BREVE AND DOT BELOW -> LATIN SMALL LETTER A */
+	0x45, /* LATIN CAPITAL LETTER E WITH DOT BELOW -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH DOT BELOW -> LATIN SMALL LETTER E */
+	0x45, /* LATIN CAPITAL LETTER E WITH HOOK ABOVE -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH HOOK ABOVE -> LATIN SMALL LETTER E */
+	0x45, /* LATIN CAPITAL LETTER E WITH TILDE -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH TILDE -> LATIN SMALL LETTER E */
+	0xCA, /* LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND ACUTE -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX */
+	0xEA, /* LATIN SMALL LETTER E WITH CIRCUMFLEX AND ACUTE -> LATIN SMALL LETTER E WITH CIRCUMFLEX */
+	0xCA, /* LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND GRAVE -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX */
+	0xEA, /* LATIN SMALL LETTER E WITH CIRCUMFLEX AND GRAVE -> LATIN SMALL LETTER E WITH CIRCUMFLEX */
+	0xCA, /* LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX */
+	0xEA, /* LATIN SMALL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE -> LATIN SMALL LETTER E WITH CIRCUMFLEX */
+	0xCA, /* LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND TILDE -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX */
+	0xEA, /* LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE -> LATIN SMALL LETTER E WITH CIRCUMFLEX */
+	0x45, /* LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND DOT BELOW -> LATIN CAPITAL LETTER E */
+	0x65, /* LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW -> LATIN SMALL LETTER E */
+	0x49, /* LATIN CAPITAL LETTER I WITH HOOK ABOVE -> LATIN CAPITAL LETTER I */
+	0x69, /* LATIN SMALL LETTER I WITH HOOK ABOVE -> LATIN SMALL LETTER I */
+	0x49, /* LATIN CAPITAL LETTER I WITH DOT BELOW -> LATIN CAPITAL LETTER I */
+	0x69, /* LATIN SMALL LETTER I WITH DOT BELOW -> LATIN SMALL LETTER I */
+	0x4F, /* LATIN CAPITAL LETTER O WITH DOT BELOW -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH DOT BELOW -> LATIN SMALL LETTER O */
+	0x4F, /* LATIN CAPITAL LETTER O WITH HOOK ABOVE -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH HOOK ABOVE -> LATIN SMALL LETTER O */
+	0xD4, /* LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND ACUTE -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX */
+	0xF4, /* LATIN SMALL LETTER O WITH CIRCUMFLEX AND ACUTE -> LATIN SMALL LETTER O WITH CIRCUMFLEX */
+	0xD4, /* LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND GRAVE -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX */
+	0xF4, /* LATIN SMALL LETTER O WITH CIRCUMFLEX AND GRAVE -> LATIN SMALL LETTER O WITH CIRCUMFLEX */
+	0xD4, /* LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX */
+	0xF4, /* LATIN SMALL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE -> LATIN SMALL LETTER O WITH CIRCUMFLEX */
+	0xD4, /* LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND TILDE -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX */
+	0xF4, /* LATIN SMALL LETTER O WITH CIRCUMFLEX AND TILDE -> LATIN SMALL LETTER O WITH CIRCUMFLEX */
+	0x4F, /* LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND DOT BELOW -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW -> LATIN SMALL LETTER O */
+	0x4F, /* LATIN CAPITAL LETTER O WITH HORN AND ACUTE -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH HORN AND ACUTE -> LATIN SMALL LETTER O */
+	0x4F, /* LATIN CAPITAL LETTER O WITH HORN AND GRAVE -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH HORN AND GRAVE -> LATIN SMALL LETTER O */
+	0x4F, /* LATIN CAPITAL LETTER O WITH HORN AND HOOK ABOVE -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH HORN AND HOOK ABOVE -> LATIN SMALL LETTER O */
+	0x4F, /* LATIN CAPITAL LETTER O WITH HORN AND TILDE -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH HORN AND TILDE -> LATIN SMALL LETTER O */
+	0x4F, /* LATIN CAPITAL LETTER O WITH HORN AND DOT BELOW -> LATIN CAPITAL LETTER O */
+	0x6F, /* LATIN SMALL LETTER O WITH HORN AND DOT BELOW -> LATIN SMALL LETTER O */
+	0x55, /* LATIN CAPITAL LETTER U WITH DOT BELOW -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH DOT BELOW -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH HOOK ABOVE -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH HOOK ABOVE -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH HORN AND ACUTE -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH HORN AND ACUTE -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH HORN AND GRAVE -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH HORN AND GRAVE -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH HORN AND HOOK ABOVE -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH HORN AND HOOK ABOVE -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH HORN AND TILDE -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH HORN AND TILDE -> LATIN SMALL LETTER U */
+	0x55, /* LATIN CAPITAL LETTER U WITH HORN AND DOT BELOW -> LATIN CAPITAL LETTER U */
+	0x75, /* LATIN SMALL LETTER U WITH HORN AND DOT BELOW -> LATIN SMALL LETTER U */
+	0x59, /* LATIN CAPITAL LETTER Y WITH GRAVE -> LATIN CAPITAL LETTER Y */
+	0x79, /* LATIN SMALL LETTER Y WITH GRAVE -> LATIN SMALL LETTER Y */
+	0x59, /* LATIN CAPITAL LETTER Y WITH DOT BELOW -> LATIN CAPITAL LETTER Y */
+	0x79, /* LATIN SMALL LETTER Y WITH DOT BELOW -> LATIN SMALL LETTER Y */
+	0x59, /* LATIN CAPITAL LETTER Y WITH HOOK ABOVE -> LATIN CAPITAL LETTER Y */
+	0x79, /* LATIN SMALL LETTER Y WITH HOOK ABOVE -> LATIN SMALL LETTER Y */
+	0x59, /* LATIN CAPITAL LETTER Y WITH TILDE -> LATIN CAPITAL LETTER Y */
+	0x79, /* LATIN SMALL LETTER Y WITH TILDE -> LATIN SMALL LETTER Y */
+	0x61, /* GREEK SMALL LETTER ALPHA WITH VARIA -> LATIN SMALL LETTER A */
+	0x65, /* GREEK SMALL LETTER EPSILON WITH VARIA -> LATIN SMALL LETTER E */
+	0x6E, /* GREEK SMALL LETTER ETA WITH VARIA -> LATIN SMALL LETTER N */
+	0x77, /* GREEK SMALL LETTER OMEGA WITH VARIA -> LATIN SMALL LETTER W */
+	0xA8, /* GREEK DIALYTIKA AND PERISPOMENI -> DIAERESIS */
+	0xA8, /* GREEK DIALYTIKA AND VARIA -> DIAERESIS */
+	0x4F, /* GREEK CAPITAL LETTER OMEGA WITH VARIA -> LATIN CAPITAL LETTER O */
+	0x4F, /* GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI -> LATIN CAPITAL LETTER O */
+	0x2C, /* SINGLE LOW-9 QUOTATION MARK -> COMMA */
+	0x27, /* SINGLE HIGH-REVERSED-9 QUOTATION MARK -> APOSTROPHE */
+	0x2A, /* BULLET -> ASTERISK */
+	0x3E, /* TRIANGULAR BULLET -> GREATER-THAN SIGN */
+	0x20, /* NARROW NO-BREAK SPACE -> SPACE */
+	0x27, /* PRIME -> APOSTROPHE */
+	0x22, /* DOUBLE PRIME -> QUOTATION MARK */
+	0x3C, /* SINGLE LEFT-POINTING ANGLE QUOTATION MARK -> LESS-THAN SIGN */
+	0x3E, /* SINGLE RIGHT-POINTING ANGLE QUOTATION MARK -> GREATER-THAN SIGN */
+	0x2A, /* REFERENCE MARK -> ASTERISK */
+	0x21, /* DOUBLE EXCLAMATION MARK -> EXCLAMATION MARK */
+	0x3F, /* INTERROBANG -> QUESTION MARK */
+	0x2A, /* ASTERISM -> ASTERISK */
+	0x2D, /* HYPHEN BULLET -> HYPHEN-MINUS */
+	0x2F, /* FRACTION SLASH -> SOLIDUS */
+	0x21, /* EXCLAMATION QUESTION MARK -> EXCLAMATION MARK */
+	0x26, /* TIRONIAN SIGN ET -> AMPERSAND */
+	0x50, /* REVERSED PILCROW SIGN -> LATIN CAPITAL LETTER P */
+	0x3C, /* BLACK LEFTWARDS BULLET -> LESS-THAN SIGN */
+	0x3E, /* BLACK RIGHTWARDS BULLET -> GREATER-THAN SIGN */
+	0x2A, /* LOW ASTERISK -> ASTERISK */
+	0x3B, /* REVERSED SEMICOLON -> SEMICOLON */
+	0x2A, /* TWO ASTERISKS ALIGNED VERTICALLY -> ASTERISK */
+	0x2D, /* COMMERCIAL MINUS SIGN -> HYPHEN-MINUS */
+	0x7E, /* SWUNG DASH -> TILDE */
+	0x2A, /* FLOWER PUNCTUATION MARK -> ASTERISK */
+	0x3A, /* FOUR DOT MARK -> COLON */
+	0x20, /* MEDIUM MATHEMATICAL SPACE -> SPACE */
+	0x30, /* SUPERSCRIPT ZERO -> DIGIT ZERO */
+	0x34, /* SUPERSCRIPT FOUR -> DIGIT FOUR */
+	0x35, /* SUPERSCRIPT FIVE -> DIGIT FIVE */
+	0x36, /* SUPERSCRIPT SIX -> DIGIT SIX */
+	0x37, /* SUPERSCRIPT SEVEN -> DIGIT SEVEN */
+	0x38, /* SUPERSCRIPT EIGHT -> DIGIT EIGHT */
+	0x39, /* SUPERSCRIPT NINE -> DIGIT NINE */
+	0x30, /* SUBSCRIPT ZERO -> DIGIT ZERO */
+	0x31, /* SUBSCRIPT ONE -> DIGIT ONE */
+	0x32, /* SUBSCRIPT TWO -> DIGIT TWO */
+	0x33, /* SUBSCRIPT THREE -> DIGIT THREE */
+	0x34, /* SUBSCRIPT FOUR -> DIGIT FOUR */
+	0x35, /* SUBSCRIPT FIVE -> DIGIT FIVE */
+	0x36, /* SUBSCRIPT SIX -> DIGIT SIX */
+	0x37, /* SUBSCRIPT SEVEN -> DIGIT SEVEN */
+	0x38, /* SUBSCRIPT EIGHT -> DIGIT EIGHT */
+	0x39, /* SUBSCRIPT NINE -> DIGIT NINE */
+	0x45, /* EURO SIGN -> LATIN CAPITAL LETTER E */
+	0x43, /* DEGREE CELSIUS -> LATIN CAPITAL LETTER C */
+	0x46, /* DEGREE FAHRENHEIT -> LATIN CAPITAL LETTER F */
+	0x54, /* TRADE MARK SIGN -> LATIN CAPITAL LETTER T */
+	0x3C, /* LEFTWARDS ARROW -> LESS-THAN SIGN */
+	0x5E, /* UPWARDS ARROW -> CIRCUMFLEX ACCENT */
+	0x3E, /* RIGHTWARDS ARROW -> GREATER-THAN SIGN */
+	0x76, /* DOWNWARDS ARROW -> LATIN SMALL LETTER V */
+	0x21, /* LEFT RIGHT ARROW WITH STROKE -> EXCLAMATION MARK */
+	0x3C, /* LEFTWARDS DOUBLE ARROW -> LESS-THAN SIGN */
+	0x5E, /* UPWARDS DOUBLE ARROW -> CIRCUMFLEX ACCENT */
+	0x3E, /* RIGHTWARDS DOUBLE ARROW -> GREATER-THAN SIGN */
+	0x76, /* DOWNWARDS DOUBLE ARROW -> LATIN SMALL LETTER V */
+	0x21, /* THERE DOES NOT EXIST -> EXCLAMATION MARK */
+	0x21, /* NOT AN ELEMENT OF -> EXCLAMATION MARK */
+	0x21, /* DOES NOT CONTAIN AS MEMBER -> EXCLAMATION MARK */
+	0x2D, /* MINUS SIGN -> HYPHEN-MINUS */
+	0x2B, /* MINUS-OR-PLUS SIGN -> PLUS SIGN */
+	0x2F, /* DIVISION SLASH -> SOLIDUS */
+	0x5C, /* SET MINUS -> REVERSE SOLIDUS */
+	0x2A, /* ASTERISK OPERATOR -> ASTERISK */
+	0x6F, /* RING OPERATOR -> LATIN SMALL LETTER O */
+	0x2E, /* BULLET OPERATOR -> FULL STOP */
+	0x76, /* SQUARE ROOT -> LATIN SMALL LETTER V */
+	0x38, /* INFINITY -> DIGIT EIGHT */
+	0x7C, /* DIVIDES -> VERTICAL LINE */
+	0x21, /* DOES NOT DIVIDE -> EXCLAMATION MARK */
+	0x7C, /* PARALLEL TO -> VERTICAL LINE */
+	0x21, /* NOT PARALLEL TO -> EXCLAMATION MARK */
+	0x26, /* LOGICAL AND -> AMPERSAND */
+	0x7C, /* LOGICAL OR -> VERTICAL LINE */
+	0x6E, /* INTERSECTION -> LATIN SMALL LETTER N */
+	0x75, /* UNION -> LATIN SMALL LETTER U */
+	0x53, /* INTEGRAL -> LATIN CAPITAL LETTER S */
+	0x23, /* NOT TILDE -> NUMBER SIGN */
+	0x23, /* NOT ASYMPTOTICALLY EQUAL TO -> NUMBER SIGN */
+	0x7E, /* ALMOST EQUAL TO -> TILDE */
+	0x23, /* NOT ALMOST EQUAL TO -> NUMBER SIGN */
+	0x23, /* NOT EQUAL TO -> NUMBER SIGN */
+	0x23, /* NOT IDENTICAL TO -> NUMBER SIGN */
+	0x3C, /* LESS-THAN OR EQUAL TO -> LESS-THAN SIGN */
+	0x3E, /* GREATER-THAN OR EQUAL TO -> GREATER-THAN SIGN */
+	0x23, /* NOT EQUIVALENT TO -> NUMBER SIGN */
+	0x3C, /* NEITHER LESS-THAN NOR EQUAL TO -> LESS-THAN SIGN */
+	0x3E, /* NEITHER GREATER-THAN NOR EQUAL TO -> GREATER-THAN SIGN */
+	0x63, /* SUBSET OF -> LATIN SMALL LETTER C */
+	0x43, /* SUPERSET OF -> LATIN CAPITAL LETTER C */
+	0x63, /* SUBSET OF OR EQUAL TO -> LATIN SMALL LETTER C */
+	0x43, /* SUPERSET OF OR EQUAL TO -> LATIN CAPITAL LETTER C */
+	0x63, /* NEITHER A SUBSET OF NOR EQUAL TO -> LATIN SMALL LETTER C */
+	0x43, /* NEITHER A SUPERSET OF NOR EQUAL TO -> LATIN CAPITAL LETTER C */
+	0x2A, /* CIRCLED ASTERISK OPERATOR -> ASTERISK */
+	0x2E, /* DOT OPERATOR -> FULL STOP */
+	0x2A, /* STAR OPERATOR -> ASTERISK */
+	0x2A, /* APL FUNCTIONAL SYMBOL CIRCLE STAR -> ASTERISK */
+	0x2A, /* APL FUNCTIONAL SYMBOL STAR DIAERESIS -> ASTERISK */
+	0x7C, /* LEFT SQUARE BRACKET UPPER CORNER -> VERTICAL LINE */
+	0x7C, /* LEFT CURLY BRACKET LOWER HOOK -> VERTICAL LINE */
+	0x7C, /* RIGHT CURLY BRACKET UPPER HOOK -> VERTICAL LINE */
+	0x28, /* UPPER LEFT OR LOWER RIGHT CURLY BRACKET SECTION -> LEFT PARENTHESIS */
+	0x29, /* UPPER RIGHT OR LOWER LEFT CURLY BRACKET SECTION -> RIGHT PARENTHESIS */
+	0x7C, /* SUMMATION BOTTOM -> VERTICAL LINE */
+	0x4A, /* HORIZONTAL SCAN LINE-7 -> LATIN CAPITAL LETTER J */
+	0x5F, /* HORIZONTAL SCAN LINE-9 -> LOW LINE */
+	0x2D, /* BOX DRAWINGS DOUBLE HORIZONTAL -> HYPHEN-MINUS */
+	0x7C, /* BOX DRAWINGS DOUBLE VERTICAL -> VERTICAL LINE */
+	0x2F, /* BOX DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT -> SOLIDUS */
+	0x5C, /* BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT -> REVERSE SOLIDUS */
+	0x58, /* BOX DRAWINGS LIGHT DIAGONAL CROSS -> LATIN CAPITAL LETTER X */
+	0x2D, /* BOX DRAWINGS LIGHT LEFT -> HYPHEN-MINUS */
+	0x7C, /* BOX DRAWINGS LIGHT UP -> VERTICAL LINE */
+	0x2D, /* BOX DRAWINGS LIGHT RIGHT -> HYPHEN-MINUS */
+	0x7C, /* BOX DRAWINGS LIGHT DOWN -> VERTICAL LINE */
+	0x2D, /* BOX DRAWINGS HEAVY LEFT -> HYPHEN-MINUS */
+	0x7C, /* BOX DRAWINGS HEAVY UP -> VERTICAL LINE */
+	0x2D, /* BOX DRAWINGS HEAVY RIGHT -> HYPHEN-MINUS */
+	0x7C, /* BOX DRAWINGS HEAVY DOWN -> VERTICAL LINE */
+	0x2D, /* BOX DRAWINGS LIGHT LEFT AND HEAVY RIGHT -> HYPHEN-MINUS */
+	0x7C, /* BOX DRAWINGS LIGHT UP AND HEAVY DOWN -> VERTICAL LINE */
+	0x2D, /* BOX DRAWINGS HEAVY LEFT AND LIGHT RIGHT -> HYPHEN-MINUS */
+	0x7C, /* BOX DRAWINGS HEAVY UP AND LIGHT DOWN -> VERTICAL LINE */
+	0x2E, /* LIGHT SHADE -> FULL STOP */
+	0x25, /* MEDIUM SHADE -> PERCENT SIGN */
+	0x6F, /* WHITE SQUARE -> LATIN SMALL LETTER O */
+	0x23, /* BLACK RECTANGLE -> NUMBER SIGN */
+	0x2D, /* WHITE RECTANGLE -> HYPHEN-MINUS */
+	0x2A, /* BLACK DIAMOND -> ASTERISK */
+	0x6F, /* WHITE CIRCLE -> LATIN SMALL LETTER O */
+	0x6F, /* BULLSEYE -> LATIN SMALL LETTER O */
+	0x2A, /* BLACK CIRCLE -> ASTERISK */
+	0x6F, /* WHITE BULLET -> LATIN SMALL LETTER O */
+	0x2A, /* STAR AND CRESCENT -> ASTERISK */
+	0x2A, /* FLOWER -> ASTERISK */
+	0x2A, /* OUTLINED WHITE STAR -> ASTERISK */
+	0x76, /* CHECK MARK -> LATIN SMALL LETTER V */
+	0x56, /* HEAVY CHECK MARK -> LATIN CAPITAL LETTER V */
+	0x78, /* MULTIPLICATION X -> LATIN SMALL LETTER X */
+	0x58, /* HEAVY MULTIPLICATION X -> LATIN CAPITAL LETTER X */
+	0x78, /* BALLOT X -> LATIN SMALL LETTER X */
+	0x58, /* HEAVY BALLOT X -> LATIN CAPITAL LETTER X */
+	0x3C, /* LONG LEFTWARDS DOUBLE ARROW -> LESS-THAN SIGN */
+	0x3E, /* LONG RIGHTWARDS DOUBLE ARROW -> GREATER-THAN SIGN */
+	0x21, /* FULLWIDTH EXCLAMATION MARK -> EXCLAMATION MARK */
+	0x22, /* FULLWIDTH QUOTATION MARK -> QUOTATION MARK */
+	0x23, /* FULLWIDTH NUMBER SIGN -> NUMBER SIGN */
+	0x24, /* FULLWIDTH DOLLAR SIGN -> DOLLAR SIGN */
+	0x25, /* FULLWIDTH PERCENT SIGN -> PERCENT SIGN */
+	0x26, /* FULLWIDTH AMPERSAND -> AMPERSAND */
+	0x27, /* FULLWIDTH APOSTROPHE -> APOSTROPHE */
+	0x28, /* FULLWIDTH LEFT PARENTHESIS -> LEFT PARENTHESIS */
+	0x29, /* FULLWIDTH RIGHT PARENTHESIS -> RIGHT PARENTHESIS */
+	0x2A, /* FULLWIDTH ASTERISK -> ASTERISK */
+	0x2B, /* FULLWIDTH PLUS SIGN -> PLUS SIGN */
+	0x2C, /* FULLWIDTH COMMA -> COMMA */
+	0x2D, /* FULLWIDTH HYPHEN-MINUS -> HYPHEN-MINUS */
+	0x2E, /* FULLWIDTH FULL STOP -> FULL STOP */
+	0x2F, /* FULLWIDTH SOLIDUS -> SOLIDUS */
+	0x30, /* FULLWIDTH DIGIT ZERO -> DIGIT ZERO */
+	0x31, /* FULLWIDTH DIGIT ONE -> DIGIT ONE */
+	0x32, /* FULLWIDTH DIGIT TWO -> DIGIT TWO */
+	0x33, /* FULLWIDTH DIGIT THREE -> DIGIT THREE */
+	0x34, /* FULLWIDTH DIGIT FOUR -> DIGIT FOUR */
+	0x35, /* FULLWIDTH DIGIT FIVE -> DIGIT FIVE */
+	0x36, /* FULLWIDTH DIGIT SIX -> DIGIT SIX */
+	0x37, /* FULLWIDTH DIGIT SEVEN -> DIGIT SEVEN */
+	0x38, /* FULLWIDTH DIGIT EIGHT -> DIGIT EIGHT */
+	0x39, /* FULLWIDTH DIGIT NINE -> DIGIT NINE */
+	0x3A, /* FULLWIDTH COLON -> COLON */
+	0x3B, /* FULLWIDTH SEMICOLON -> SEMICOLON */
+	0x3C, /* FULLWIDTH LESS-THAN SIGN -> LESS-THAN SIGN */
+	0x3D, /* FULLWIDTH EQUALS SIGN -> EQUALS SIGN */
+	0x3E, /* FULLWIDTH GREATER-THAN SIGN -> GREATER-THAN SIGN */
+	0x3F, /* FULLWIDTH QUESTION MARK -> QUESTION MARK */
+	0x40, /* FULLWIDTH COMMERCIAL AT -> COMMERCIAL AT */
+	0x41, /* FULLWIDTH LATIN CAPITAL LETTER A -> LATIN CAPITAL LETTER A */
+	0x42, /* FULLWIDTH LATIN CAPITAL LETTER B -> LATIN CAPITAL LETTER B */
+	0x43, /* FULLWIDTH LATIN CAPITAL LETTER C -> LATIN CAPITAL LETTER C */
+	0x44, /* FULLWIDTH LATIN CAPITAL LETTER D -> LATIN CAPITAL LETTER D */
+	0x45, /* FULLWIDTH LATIN CAPITAL LETTER E -> LATIN CAPITAL LETTER E */
+	0x46, /* FULLWIDTH LATIN CAPITAL LETTER F -> LATIN CAPITAL LETTER F */
+	0x47, /* FULLWIDTH LATIN CAPITAL LETTER G -> LATIN CAPITAL LETTER G */
+	0x48, /* FULLWIDTH LATIN CAPITAL LETTER H -> LATIN CAPITAL LETTER H */
+	0x49, /* FULLWIDTH LATIN CAPITAL LETTER I -> LATIN CAPITAL LETTER I */
+	0x4A, /* FULLWIDTH LATIN CAPITAL LETTER J -> LATIN CAPITAL LETTER J */
+	0x4B, /* FULLWIDTH LATIN CAPITAL LETTER K -> LATIN CAPITAL LETTER K */
+	0x4C, /* FULLWIDTH LATIN CAPITAL LETTER L -> LATIN CAPITAL LETTER L */
+	0x4D, /* FULLWIDTH LATIN CAPITAL LETTER M -> LATIN CAPITAL LETTER M */
+	0x4E, /* FULLWIDTH LATIN CAPITAL LETTER N -> LATIN CAPITAL LETTER N */
+	0x4F, /* FULLWIDTH LATIN CAPITAL LETTER O -> LATIN CAPITAL LETTER O */
+	0x50, /* FULLWIDTH LATIN CAPITAL LETTER P -> LATIN CAPITAL LETTER P */
+	0x51, /* FULLWIDTH LATIN CAPITAL LETTER Q -> LATIN CAPITAL LETTER Q */
+	0x52, /* FULLWIDTH LATIN CAPITAL LETTER R -> LATIN CAPITAL LETTER R */
+	0x53, /* FULLWIDTH LATIN CAPITAL LETTER S -> LATIN CAPITAL LETTER S */
+	0x54, /* FULLWIDTH LATIN CAPITAL LETTER T -> LATIN CAPITAL LETTER T */
+	0x55, /* FULLWIDTH LATIN CAPITAL LETTER U -> LATIN CAPITAL LETTER U */
+	0x56, /* FULLWIDTH LATIN CAPITAL LETTER V -> LATIN CAPITAL LETTER V */
+	0x57, /* FULLWIDTH LATIN CAPITAL LETTER W -> LATIN CAPITAL LETTER W */
+	0x58, /* FULLWIDTH LATIN CAPITAL LETTER X -> LATIN CAPITAL LETTER X */
+	0x59, /* FULLWIDTH LATIN CAPITAL LETTER Y -> LATIN CAPITAL LETTER Y */
+	0x5A, /* FULLWIDTH LATIN CAPITAL LETTER Z -> LATIN CAPITAL LETTER Z */
+	0x5B, /* FULLWIDTH LEFT SQUARE BRACKET -> LEFT SQUARE BRACKET */
+	0x5C, /* FULLWIDTH REVERSE SOLIDUS -> REVERSE SOLIDUS */
+	0x5D, /* FULLWIDTH RIGHT SQUARE BRACKET -> RIGHT SQUARE BRACKET */
+	0x5E, /* FULLWIDTH CIRCUMFLEX ACCENT -> CIRCUMFLEX ACCENT */
+	0x5F, /* FULLWIDTH LOW LINE -> LOW LINE */
+	0x60, /* FULLWIDTH GRAVE ACCENT -> GRAVE ACCENT */
+	0x61, /* FULLWIDTH LATIN SMALL LETTER A -> LATIN SMALL LETTER A */
+	0x62, /* FULLWIDTH LATIN SMALL LETTER B -> LATIN SMALL LETTER B */
+	0x63, /* FULLWIDTH LATIN SMALL LETTER C -> LATIN SMALL LETTER C */
+	0x64, /* FULLWIDTH LATIN SMALL LETTER D -> LATIN SMALL LETTER D */
+	0x65, /* FULLWIDTH LATIN SMALL LETTER E -> LATIN SMALL LETTER E */
+	0x66, /* FULLWIDTH LATIN SMALL LETTER F -> LATIN SMALL LETTER F */
+	0x67, /* FULLWIDTH LATIN SMALL LETTER G -> LATIN SMALL LETTER G */
+	0x68, /* FULLWIDTH LATIN SMALL LETTER H -> LATIN SMALL LETTER H */
+	0x69, /* FULLWIDTH LATIN SMALL LETTER I -> LATIN SMALL LETTER I */
+	0x6A, /* FULLWIDTH LATIN SMALL LETTER J -> LATIN SMALL LETTER J */
+	0x6B, /* FULLWIDTH LATIN SMALL LETTER K -> LATIN SMALL LETTER K */
+	0x6C, /* FULLWIDTH LATIN SMALL LETTER L -> LATIN SMALL LETTER L */
+	0x6D, /* FULLWIDTH LATIN SMALL LETTER M -> LATIN SMALL LETTER M */
+	0x6E, /* FULLWIDTH LATIN SMALL LETTER N -> LATIN SMALL LETTER N */
+	0x6F, /* FULLWIDTH LATIN SMALL LETTER O -> LATIN SMALL LETTER O */
+	0x70, /* FULLWIDTH LATIN SMALL LETTER P -> LATIN SMALL LETTER P */
+	0x71, /* FULLWIDTH LATIN SMALL LETTER Q -> LATIN SMALL LETTER Q */
+	0x72, /* FULLWIDTH LATIN SMALL LETTER R -> LATIN SMALL LETTER R */
+	0x73, /* FULLWIDTH LATIN SMALL LETTER S -> LATIN SMALL LETTER S */
+	0x74, /* FULLWIDTH LATIN SMALL LETTER T -> LATIN SMALL LETTER T */
+	0x75, /* FULLWIDTH LATIN SMALL LETTER U -> LATIN SMALL LETTER U */
+	0x76, /* FULLWIDTH LATIN SMALL LETTER V -> LATIN SMALL LETTER V */
+	0x77, /* FULLWIDTH LATIN SMALL LETTER W -> LATIN SMALL LETTER W */
+	0x78, /* FULLWIDTH LATIN SMALL LETTER X -> LATIN SMALL LETTER X */
+	0x79, /* FULLWIDTH LATIN SMALL LETTER Y -> LATIN SMALL LETTER Y */
+	0x7A, /* FULLWIDTH LATIN SMALL LETTER Z -> LATIN SMALL LETTER Z */
+	0x7B, /* FULLWIDTH LEFT CURLY BRACKET -> LEFT CURLY BRACKET */
+	0x7C, /* FULLWIDTH VERTICAL LINE -> VERTICAL LINE */
+	0x7D, /* FULLWIDTH RIGHT CURLY BRACKET -> RIGHT CURLY BRACKET */
+	0x7E, /* FULLWIDTH TILDE -> TILDE */
+};
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/8] vt: add ucs_get_fallback()
  2025-05-05 16:55 [PATCH 0/8] vt: more Unicode handling changes Nicolas Pitre
                   ` (4 preceding siblings ...)
  2025-05-05 16:55 ` [PATCH 5/8] vt: create ucs_fallback_table.h_shipped with gen_ucs_fallback_table.py Nicolas Pitre
@ 2025-05-05 16:55 ` Nicolas Pitre
  2025-05-05 16:55 ` [PATCH 7/8] vt: make use of ucs_get_fallback() when glyph is unavailable Nicolas Pitre
  2025-05-05 16:55 ` [PATCH 8/8] vt: process the full-width ASCII fallback range programmatically Nicolas Pitre
  7 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2025-05-05 16:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby; +Cc: Nicolas Pitre, linux-serial, linux-kernel

From: Nicolas Pitre <npitre@baylibre.com>

This is the code querying the newly introduced tables.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
---
 drivers/tty/vt/Makefile    |  3 +-
 drivers/tty/vt/ucs.c       | 73 ++++++++++++++++++++++++++++++++++++--
 include/linux/consolemap.h |  6 ++++
 3 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/vt/Makefile b/drivers/tty/vt/Makefile
index 509362a3e11e..ae746dcdeec8 100644
--- a/drivers/tty/vt/Makefile
+++ b/drivers/tty/vt/Makefile
@@ -36,7 +36,8 @@ $(obj)/defkeymap.c: $(obj)/%.c: $(src)/%.map
 
 endif
 
-$(obj)/ucs.o: $(src)/ucs.c $(obj)/ucs_width_table.h $(obj)/ucs_recompose_table.h
+$(obj)/ucs.o:	$(src)/ucs.c $(obj)/ucs_width_table.h \
+		$(obj)/ucs_recompose_table.h $(obj)/ucs_fallback_table.h
 
 # You may uncomment one of those to have the UCS tables be regenerated
 # during the build process. By default the _shipped versions are used.
diff --git a/drivers/tty/vt/ucs.c b/drivers/tty/vt/ucs.c
index b0b23830170d..dcce733b80cb 100644
--- a/drivers/tty/vt/ucs.c
+++ b/drivers/tty/vt/ucs.c
@@ -44,13 +44,20 @@ static int interval32_cmp(const void *key, const void *element)
 	return 0;
 }
 
-static bool cp_in_range16(u16 cp, const struct ucs_interval16 *ranges, size_t size)
+static const struct ucs_interval16 *find_cp_in_range16(u16 cp,
+						       const struct ucs_interval16 *ranges,
+						       size_t size)
 {
 	if (cp < ranges[0].first || cp > ranges[size - 1].last)
-		return false;
+		return NULL;
 
 	return __inline_bsearch(&cp, ranges, size, sizeof(*ranges),
-				interval16_cmp) != NULL;
+				interval16_cmp);
+}
+
+static bool cp_in_range16(u16 cp, const struct ucs_interval16 *ranges, size_t size)
+{
+	return find_cp_in_range16(cp, ranges, size) != NULL;
 }
 
 static bool cp_in_range32(u32 cp, const struct ucs_interval32 *ranges, size_t size)
@@ -157,3 +164,63 @@ u32 ucs_recompose(u32 base, u32 mark)
 
 	return result ? result->recomposed : 0;
 }
+
+/*
+ * The fallback tables are using struct ucs_interval16 or plain literals
+ * directly. We reuse interval16_cmp() for the former, but another compare
+ * function is needed in the singles case.
+ */
+
+#include "ucs_fallback_table.h"
+
+static int u16_cmp(const void *key, const void *element)
+{
+	u16 cp = *(u16 *)key;
+	u16 entry = *(u16 *)element;
+
+	if (cp < entry)
+		return -1;
+	if (cp > entry)
+		return 1;
+	return 0;
+}
+
+static u16 *find_cp_in_table16(u16 cp, const u16 *table, size_t size)
+{
+	if (cp < table[0] || cp > table[size - 1])
+		return NULL;
+
+	return __inline_bsearch(&cp, table, size, sizeof(u16), u16_cmp);
+}
+
+/**
+ * ucs_get_fallback() - Get a substitution for the provided Unicode character
+ * @base: Base Unicode code point (UCS-4)
+ *
+ * Get a simpler fallback character for the provided Unicode character.
+ * This is used for terminal display when corresponding glyph is unavailable.
+ * The substitution may not be as good as the actual glyph for the original
+ * character but still way more helpful than a squared question mark.
+ *
+ * Return: Fallback Unicode code point, or 0 if none is available
+ */
+u32 ucs_get_fallback(u32 cp)
+{
+	const struct ucs_interval16 *interval;
+	u16 *single;
+
+	if (!UCS_IS_BMP(cp))
+		return 0;
+
+	interval = find_cp_in_range16(cp, ucs_fallback_intervals,
+				      ARRAY_SIZE(ucs_fallback_intervals));
+	if (interval)
+		return ucs_fallback_intervals_subs[interval - ucs_fallback_intervals];
+
+	single = find_cp_in_table16(cp, ucs_fallback_singles,
+				    ARRAY_SIZE(ucs_fallback_singles));
+	if (single)
+		return ucs_fallback_singles_subs[single - ucs_fallback_singles];
+
+	return 0;
+}
diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h
index 8167494229db..6180b803795c 100644
--- a/include/linux/consolemap.h
+++ b/include/linux/consolemap.h
@@ -31,6 +31,7 @@ void console_map_init(void);
 bool ucs_is_double_width(uint32_t cp);
 bool ucs_is_zero_width(uint32_t cp);
 u32 ucs_recompose(u32 base, u32 mark);
+u32 ucs_get_fallback(u32 cp);
 #else
 static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph,
 		bool use_unicode)
@@ -75,6 +76,11 @@ static inline u32 ucs_recompose(u32 base, u32 mark)
 {
 	return 0;
 }
+
+static inline u32 ucs_get_fallback(u32 cp)
+{
+	return 0;
+}
 #endif /* CONFIG_CONSOLE_TRANSLATIONS */
 
 #endif /* __LINUX_CONSOLEMAP_H__ */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/8] vt: make use of ucs_get_fallback() when glyph is unavailable
  2025-05-05 16:55 [PATCH 0/8] vt: more Unicode handling changes Nicolas Pitre
                   ` (5 preceding siblings ...)
  2025-05-05 16:55 ` [PATCH 6/8] vt: add ucs_get_fallback() Nicolas Pitre
@ 2025-05-05 16:55 ` Nicolas Pitre
  2025-05-05 16:55 ` [PATCH 8/8] vt: process the full-width ASCII fallback range programmatically Nicolas Pitre
  7 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2025-05-05 16:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby; +Cc: Nicolas Pitre, linux-serial, linux-kernel

From: Nicolas Pitre <npitre@baylibre.com>

Attempt to display a fallback character when given character doesn't
have an available glyph. The substitution may not be as good as the
original character but still way more helpful than a squared question
mark.

Example substitutions: À -> A, ç -> c, ø -> o, ─ -> -, © -> C, etc.

See gen_ucs_fallback_table.py for a comprehensive list.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
---
 drivers/tty/vt/vt.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 4e80384a419b..479a03647aab 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -3009,6 +3009,19 @@ static int vc_get_glyph(struct vc_data *vc, int tc)
 		return tc;
 	}
 
+	/*
+	 * The Unicode screen memory is allocated only when required.
+	 * This is one such case: we're about to "cheat" with the displayed
+	 * character meaning the simple screen buffer won't hold the original
+	 * information, whereas the Unicode screen buffer always does.
+	 */
+	vc_uniscr_check(vc);
+
+	/* Try getting a simpler fallback character. */
+	tc = ucs_get_fallback(tc);
+	if (tc)
+		return vc_get_glyph(vc, tc);
+
 	/* Display U+FFFD (Unicode Replacement Character). */
 	return conv_uni_to_pc(vc, UCS_REPLACEMENT);
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 8/8] vt: process the full-width ASCII fallback range programmatically
  2025-05-05 16:55 [PATCH 0/8] vt: more Unicode handling changes Nicolas Pitre
                   ` (6 preceding siblings ...)
  2025-05-05 16:55 ` [PATCH 7/8] vt: make use of ucs_get_fallback() when glyph is unavailable Nicolas Pitre
@ 2025-05-05 16:55 ` Nicolas Pitre
  2025-05-06  5:55   ` Jiri Slaby
  7 siblings, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2025-05-05 16:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby; +Cc: Nicolas Pitre, linux-serial, linux-kernel

From: Nicolas Pitre <npitre@baylibre.com>

This saves about 258 bytes of text.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
---
 drivers/tty/vt/gen_ucs_fallback_table.py    |  11 +-
 drivers/tty/vt/ucs.c                        |   8 +
 drivers/tty/vt/ucs_fallback_table.h_shipped | 188 --------------------
 3 files changed, 13 insertions(+), 194 deletions(-)

diff --git a/drivers/tty/vt/gen_ucs_fallback_table.py b/drivers/tty/vt/gen_ucs_fallback_table.py
index cb4e75b454fe..3a725d47366d 100755
--- a/drivers/tty/vt/gen_ucs_fallback_table.py
+++ b/drivers/tty/vt/gen_ucs_fallback_table.py
@@ -666,12 +666,11 @@ def collect_drawing_character_mappings():
     fallback_map[0x2746] = ord('*')  # ❆ HEAVY CHEVRON SNOWFLAKE
     fallback_map[0x2698] = ord('*')  # ⚘ FLOWER
 
-    # Add special ASCII characters with full-width equivalents
-    # Map between full-width and ASCII forms
-    for i, cp in enumerate(range(0xFF01, 0xFF5E+1)):
-        # Full-width to ASCII mapping (covering all printable ASCII 33-126)
-        # 0xFF01 (!) to 0xFF5E (~) -> ASCII 33 (!) to 126 (~)
-        fallback_map[cp] = 33 + i
+    # Full-width to ASCII mapping (covering all printable ASCII 33-126)
+    # 0xFF01 (!) to 0xFF5E (~) -> ASCII 33 (!) to 126 (~)
+    # Those are not included here to reduce the table size.
+    # It is more efficient to process them programmatically in
+    # ucs.c:ucs_get_fallback().
 
     return fallback_map
 
diff --git a/drivers/tty/vt/ucs.c b/drivers/tty/vt/ucs.c
index dcce733b80cb..ae3254302760 100644
--- a/drivers/tty/vt/ucs.c
+++ b/drivers/tty/vt/ucs.c
@@ -222,5 +222,13 @@ u32 ucs_get_fallback(u32 cp)
 	if (single)
 		return ucs_fallback_singles_subs[single - ucs_fallback_singles];
 
+	/*
+	 * Full-width to ASCII mapping (covering all printable ASCII 33-126)
+	 * 0xFF01 (!) to 0xFF5E (~) -> ASCII 33 (!) to 126 (~)
+	 * We process them programmatically to reduce the table size.
+	 */
+	if (cp >= 0xFF01 && cp <= 0xFF5E)
+		return cp - 0xFF01 + 33;
+
 	return 0;
 }
diff --git a/drivers/tty/vt/ucs_fallback_table.h_shipped b/drivers/tty/vt/ucs_fallback_table.h_shipped
index d528d500ec9d..fe61418497ee 100644
--- a/drivers/tty/vt/ucs_fallback_table.h_shipped
+++ b/drivers/tty/vt/ucs_fallback_table.h_shipped
@@ -817,100 +817,6 @@ static const u16 ucs_fallback_singles[] = {
 	0x2718, /* HEAVY BALLOT X -> LATIN CAPITAL LETTER X */
 	0x27F8, /* LONG LEFTWARDS DOUBLE ARROW -> LESS-THAN SIGN */
 	0x27F9, /* LONG RIGHTWARDS DOUBLE ARROW -> GREATER-THAN SIGN */
-	0xFF01, /* FULLWIDTH EXCLAMATION MARK -> EXCLAMATION MARK */
-	0xFF02, /* FULLWIDTH QUOTATION MARK -> QUOTATION MARK */
-	0xFF03, /* FULLWIDTH NUMBER SIGN -> NUMBER SIGN */
-	0xFF04, /* FULLWIDTH DOLLAR SIGN -> DOLLAR SIGN */
-	0xFF05, /* FULLWIDTH PERCENT SIGN -> PERCENT SIGN */
-	0xFF06, /* FULLWIDTH AMPERSAND -> AMPERSAND */
-	0xFF07, /* FULLWIDTH APOSTROPHE -> APOSTROPHE */
-	0xFF08, /* FULLWIDTH LEFT PARENTHESIS -> LEFT PARENTHESIS */
-	0xFF09, /* FULLWIDTH RIGHT PARENTHESIS -> RIGHT PARENTHESIS */
-	0xFF0A, /* FULLWIDTH ASTERISK -> ASTERISK */
-	0xFF0B, /* FULLWIDTH PLUS SIGN -> PLUS SIGN */
-	0xFF0C, /* FULLWIDTH COMMA -> COMMA */
-	0xFF0D, /* FULLWIDTH HYPHEN-MINUS -> HYPHEN-MINUS */
-	0xFF0E, /* FULLWIDTH FULL STOP -> FULL STOP */
-	0xFF0F, /* FULLWIDTH SOLIDUS -> SOLIDUS */
-	0xFF10, /* FULLWIDTH DIGIT ZERO -> DIGIT ZERO */
-	0xFF11, /* FULLWIDTH DIGIT ONE -> DIGIT ONE */
-	0xFF12, /* FULLWIDTH DIGIT TWO -> DIGIT TWO */
-	0xFF13, /* FULLWIDTH DIGIT THREE -> DIGIT THREE */
-	0xFF14, /* FULLWIDTH DIGIT FOUR -> DIGIT FOUR */
-	0xFF15, /* FULLWIDTH DIGIT FIVE -> DIGIT FIVE */
-	0xFF16, /* FULLWIDTH DIGIT SIX -> DIGIT SIX */
-	0xFF17, /* FULLWIDTH DIGIT SEVEN -> DIGIT SEVEN */
-	0xFF18, /* FULLWIDTH DIGIT EIGHT -> DIGIT EIGHT */
-	0xFF19, /* FULLWIDTH DIGIT NINE -> DIGIT NINE */
-	0xFF1A, /* FULLWIDTH COLON -> COLON */
-	0xFF1B, /* FULLWIDTH SEMICOLON -> SEMICOLON */
-	0xFF1C, /* FULLWIDTH LESS-THAN SIGN -> LESS-THAN SIGN */
-	0xFF1D, /* FULLWIDTH EQUALS SIGN -> EQUALS SIGN */
-	0xFF1E, /* FULLWIDTH GREATER-THAN SIGN -> GREATER-THAN SIGN */
-	0xFF1F, /* FULLWIDTH QUESTION MARK -> QUESTION MARK */
-	0xFF20, /* FULLWIDTH COMMERCIAL AT -> COMMERCIAL AT */
-	0xFF21, /* FULLWIDTH LATIN CAPITAL LETTER A -> LATIN CAPITAL LETTER A */
-	0xFF22, /* FULLWIDTH LATIN CAPITAL LETTER B -> LATIN CAPITAL LETTER B */
-	0xFF23, /* FULLWIDTH LATIN CAPITAL LETTER C -> LATIN CAPITAL LETTER C */
-	0xFF24, /* FULLWIDTH LATIN CAPITAL LETTER D -> LATIN CAPITAL LETTER D */
-	0xFF25, /* FULLWIDTH LATIN CAPITAL LETTER E -> LATIN CAPITAL LETTER E */
-	0xFF26, /* FULLWIDTH LATIN CAPITAL LETTER F -> LATIN CAPITAL LETTER F */
-	0xFF27, /* FULLWIDTH LATIN CAPITAL LETTER G -> LATIN CAPITAL LETTER G */
-	0xFF28, /* FULLWIDTH LATIN CAPITAL LETTER H -> LATIN CAPITAL LETTER H */
-	0xFF29, /* FULLWIDTH LATIN CAPITAL LETTER I -> LATIN CAPITAL LETTER I */
-	0xFF2A, /* FULLWIDTH LATIN CAPITAL LETTER J -> LATIN CAPITAL LETTER J */
-	0xFF2B, /* FULLWIDTH LATIN CAPITAL LETTER K -> LATIN CAPITAL LETTER K */
-	0xFF2C, /* FULLWIDTH LATIN CAPITAL LETTER L -> LATIN CAPITAL LETTER L */
-	0xFF2D, /* FULLWIDTH LATIN CAPITAL LETTER M -> LATIN CAPITAL LETTER M */
-	0xFF2E, /* FULLWIDTH LATIN CAPITAL LETTER N -> LATIN CAPITAL LETTER N */
-	0xFF2F, /* FULLWIDTH LATIN CAPITAL LETTER O -> LATIN CAPITAL LETTER O */
-	0xFF30, /* FULLWIDTH LATIN CAPITAL LETTER P -> LATIN CAPITAL LETTER P */
-	0xFF31, /* FULLWIDTH LATIN CAPITAL LETTER Q -> LATIN CAPITAL LETTER Q */
-	0xFF32, /* FULLWIDTH LATIN CAPITAL LETTER R -> LATIN CAPITAL LETTER R */
-	0xFF33, /* FULLWIDTH LATIN CAPITAL LETTER S -> LATIN CAPITAL LETTER S */
-	0xFF34, /* FULLWIDTH LATIN CAPITAL LETTER T -> LATIN CAPITAL LETTER T */
-	0xFF35, /* FULLWIDTH LATIN CAPITAL LETTER U -> LATIN CAPITAL LETTER U */
-	0xFF36, /* FULLWIDTH LATIN CAPITAL LETTER V -> LATIN CAPITAL LETTER V */
-	0xFF37, /* FULLWIDTH LATIN CAPITAL LETTER W -> LATIN CAPITAL LETTER W */
-	0xFF38, /* FULLWIDTH LATIN CAPITAL LETTER X -> LATIN CAPITAL LETTER X */
-	0xFF39, /* FULLWIDTH LATIN CAPITAL LETTER Y -> LATIN CAPITAL LETTER Y */
-	0xFF3A, /* FULLWIDTH LATIN CAPITAL LETTER Z -> LATIN CAPITAL LETTER Z */
-	0xFF3B, /* FULLWIDTH LEFT SQUARE BRACKET -> LEFT SQUARE BRACKET */
-	0xFF3C, /* FULLWIDTH REVERSE SOLIDUS -> REVERSE SOLIDUS */
-	0xFF3D, /* FULLWIDTH RIGHT SQUARE BRACKET -> RIGHT SQUARE BRACKET */
-	0xFF3E, /* FULLWIDTH CIRCUMFLEX ACCENT -> CIRCUMFLEX ACCENT */
-	0xFF3F, /* FULLWIDTH LOW LINE -> LOW LINE */
-	0xFF40, /* FULLWIDTH GRAVE ACCENT -> GRAVE ACCENT */
-	0xFF41, /* FULLWIDTH LATIN SMALL LETTER A -> LATIN SMALL LETTER A */
-	0xFF42, /* FULLWIDTH LATIN SMALL LETTER B -> LATIN SMALL LETTER B */
-	0xFF43, /* FULLWIDTH LATIN SMALL LETTER C -> LATIN SMALL LETTER C */
-	0xFF44, /* FULLWIDTH LATIN SMALL LETTER D -> LATIN SMALL LETTER D */
-	0xFF45, /* FULLWIDTH LATIN SMALL LETTER E -> LATIN SMALL LETTER E */
-	0xFF46, /* FULLWIDTH LATIN SMALL LETTER F -> LATIN SMALL LETTER F */
-	0xFF47, /* FULLWIDTH LATIN SMALL LETTER G -> LATIN SMALL LETTER G */
-	0xFF48, /* FULLWIDTH LATIN SMALL LETTER H -> LATIN SMALL LETTER H */
-	0xFF49, /* FULLWIDTH LATIN SMALL LETTER I -> LATIN SMALL LETTER I */
-	0xFF4A, /* FULLWIDTH LATIN SMALL LETTER J -> LATIN SMALL LETTER J */
-	0xFF4B, /* FULLWIDTH LATIN SMALL LETTER K -> LATIN SMALL LETTER K */
-	0xFF4C, /* FULLWIDTH LATIN SMALL LETTER L -> LATIN SMALL LETTER L */
-	0xFF4D, /* FULLWIDTH LATIN SMALL LETTER M -> LATIN SMALL LETTER M */
-	0xFF4E, /* FULLWIDTH LATIN SMALL LETTER N -> LATIN SMALL LETTER N */
-	0xFF4F, /* FULLWIDTH LATIN SMALL LETTER O -> LATIN SMALL LETTER O */
-	0xFF50, /* FULLWIDTH LATIN SMALL LETTER P -> LATIN SMALL LETTER P */
-	0xFF51, /* FULLWIDTH LATIN SMALL LETTER Q -> LATIN SMALL LETTER Q */
-	0xFF52, /* FULLWIDTH LATIN SMALL LETTER R -> LATIN SMALL LETTER R */
-	0xFF53, /* FULLWIDTH LATIN SMALL LETTER S -> LATIN SMALL LETTER S */
-	0xFF54, /* FULLWIDTH LATIN SMALL LETTER T -> LATIN SMALL LETTER T */
-	0xFF55, /* FULLWIDTH LATIN SMALL LETTER U -> LATIN SMALL LETTER U */
-	0xFF56, /* FULLWIDTH LATIN SMALL LETTER V -> LATIN SMALL LETTER V */
-	0xFF57, /* FULLWIDTH LATIN SMALL LETTER W -> LATIN SMALL LETTER W */
-	0xFF58, /* FULLWIDTH LATIN SMALL LETTER X -> LATIN SMALL LETTER X */
-	0xFF59, /* FULLWIDTH LATIN SMALL LETTER Y -> LATIN SMALL LETTER Y */
-	0xFF5A, /* FULLWIDTH LATIN SMALL LETTER Z -> LATIN SMALL LETTER Z */
-	0xFF5B, /* FULLWIDTH LEFT CURLY BRACKET -> LEFT CURLY BRACKET */
-	0xFF5C, /* FULLWIDTH VERTICAL LINE -> VERTICAL LINE */
-	0xFF5D, /* FULLWIDTH RIGHT CURLY BRACKET -> RIGHT CURLY BRACKET */
-	0xFF5E, /* FULLWIDTH TILDE -> TILDE */
 };
 
 static const u8 ucs_fallback_singles_subs[] = {
@@ -1589,98 +1495,4 @@ static const u8 ucs_fallback_singles_subs[] = {
 	0x58, /* HEAVY BALLOT X -> LATIN CAPITAL LETTER X */
 	0x3C, /* LONG LEFTWARDS DOUBLE ARROW -> LESS-THAN SIGN */
 	0x3E, /* LONG RIGHTWARDS DOUBLE ARROW -> GREATER-THAN SIGN */
-	0x21, /* FULLWIDTH EXCLAMATION MARK -> EXCLAMATION MARK */
-	0x22, /* FULLWIDTH QUOTATION MARK -> QUOTATION MARK */
-	0x23, /* FULLWIDTH NUMBER SIGN -> NUMBER SIGN */
-	0x24, /* FULLWIDTH DOLLAR SIGN -> DOLLAR SIGN */
-	0x25, /* FULLWIDTH PERCENT SIGN -> PERCENT SIGN */
-	0x26, /* FULLWIDTH AMPERSAND -> AMPERSAND */
-	0x27, /* FULLWIDTH APOSTROPHE -> APOSTROPHE */
-	0x28, /* FULLWIDTH LEFT PARENTHESIS -> LEFT PARENTHESIS */
-	0x29, /* FULLWIDTH RIGHT PARENTHESIS -> RIGHT PARENTHESIS */
-	0x2A, /* FULLWIDTH ASTERISK -> ASTERISK */
-	0x2B, /* FULLWIDTH PLUS SIGN -> PLUS SIGN */
-	0x2C, /* FULLWIDTH COMMA -> COMMA */
-	0x2D, /* FULLWIDTH HYPHEN-MINUS -> HYPHEN-MINUS */
-	0x2E, /* FULLWIDTH FULL STOP -> FULL STOP */
-	0x2F, /* FULLWIDTH SOLIDUS -> SOLIDUS */
-	0x30, /* FULLWIDTH DIGIT ZERO -> DIGIT ZERO */
-	0x31, /* FULLWIDTH DIGIT ONE -> DIGIT ONE */
-	0x32, /* FULLWIDTH DIGIT TWO -> DIGIT TWO */
-	0x33, /* FULLWIDTH DIGIT THREE -> DIGIT THREE */
-	0x34, /* FULLWIDTH DIGIT FOUR -> DIGIT FOUR */
-	0x35, /* FULLWIDTH DIGIT FIVE -> DIGIT FIVE */
-	0x36, /* FULLWIDTH DIGIT SIX -> DIGIT SIX */
-	0x37, /* FULLWIDTH DIGIT SEVEN -> DIGIT SEVEN */
-	0x38, /* FULLWIDTH DIGIT EIGHT -> DIGIT EIGHT */
-	0x39, /* FULLWIDTH DIGIT NINE -> DIGIT NINE */
-	0x3A, /* FULLWIDTH COLON -> COLON */
-	0x3B, /* FULLWIDTH SEMICOLON -> SEMICOLON */
-	0x3C, /* FULLWIDTH LESS-THAN SIGN -> LESS-THAN SIGN */
-	0x3D, /* FULLWIDTH EQUALS SIGN -> EQUALS SIGN */
-	0x3E, /* FULLWIDTH GREATER-THAN SIGN -> GREATER-THAN SIGN */
-	0x3F, /* FULLWIDTH QUESTION MARK -> QUESTION MARK */
-	0x40, /* FULLWIDTH COMMERCIAL AT -> COMMERCIAL AT */
-	0x41, /* FULLWIDTH LATIN CAPITAL LETTER A -> LATIN CAPITAL LETTER A */
-	0x42, /* FULLWIDTH LATIN CAPITAL LETTER B -> LATIN CAPITAL LETTER B */
-	0x43, /* FULLWIDTH LATIN CAPITAL LETTER C -> LATIN CAPITAL LETTER C */
-	0x44, /* FULLWIDTH LATIN CAPITAL LETTER D -> LATIN CAPITAL LETTER D */
-	0x45, /* FULLWIDTH LATIN CAPITAL LETTER E -> LATIN CAPITAL LETTER E */
-	0x46, /* FULLWIDTH LATIN CAPITAL LETTER F -> LATIN CAPITAL LETTER F */
-	0x47, /* FULLWIDTH LATIN CAPITAL LETTER G -> LATIN CAPITAL LETTER G */
-	0x48, /* FULLWIDTH LATIN CAPITAL LETTER H -> LATIN CAPITAL LETTER H */
-	0x49, /* FULLWIDTH LATIN CAPITAL LETTER I -> LATIN CAPITAL LETTER I */
-	0x4A, /* FULLWIDTH LATIN CAPITAL LETTER J -> LATIN CAPITAL LETTER J */
-	0x4B, /* FULLWIDTH LATIN CAPITAL LETTER K -> LATIN CAPITAL LETTER K */
-	0x4C, /* FULLWIDTH LATIN CAPITAL LETTER L -> LATIN CAPITAL LETTER L */
-	0x4D, /* FULLWIDTH LATIN CAPITAL LETTER M -> LATIN CAPITAL LETTER M */
-	0x4E, /* FULLWIDTH LATIN CAPITAL LETTER N -> LATIN CAPITAL LETTER N */
-	0x4F, /* FULLWIDTH LATIN CAPITAL LETTER O -> LATIN CAPITAL LETTER O */
-	0x50, /* FULLWIDTH LATIN CAPITAL LETTER P -> LATIN CAPITAL LETTER P */
-	0x51, /* FULLWIDTH LATIN CAPITAL LETTER Q -> LATIN CAPITAL LETTER Q */
-	0x52, /* FULLWIDTH LATIN CAPITAL LETTER R -> LATIN CAPITAL LETTER R */
-	0x53, /* FULLWIDTH LATIN CAPITAL LETTER S -> LATIN CAPITAL LETTER S */
-	0x54, /* FULLWIDTH LATIN CAPITAL LETTER T -> LATIN CAPITAL LETTER T */
-	0x55, /* FULLWIDTH LATIN CAPITAL LETTER U -> LATIN CAPITAL LETTER U */
-	0x56, /* FULLWIDTH LATIN CAPITAL LETTER V -> LATIN CAPITAL LETTER V */
-	0x57, /* FULLWIDTH LATIN CAPITAL LETTER W -> LATIN CAPITAL LETTER W */
-	0x58, /* FULLWIDTH LATIN CAPITAL LETTER X -> LATIN CAPITAL LETTER X */
-	0x59, /* FULLWIDTH LATIN CAPITAL LETTER Y -> LATIN CAPITAL LETTER Y */
-	0x5A, /* FULLWIDTH LATIN CAPITAL LETTER Z -> LATIN CAPITAL LETTER Z */
-	0x5B, /* FULLWIDTH LEFT SQUARE BRACKET -> LEFT SQUARE BRACKET */
-	0x5C, /* FULLWIDTH REVERSE SOLIDUS -> REVERSE SOLIDUS */
-	0x5D, /* FULLWIDTH RIGHT SQUARE BRACKET -> RIGHT SQUARE BRACKET */
-	0x5E, /* FULLWIDTH CIRCUMFLEX ACCENT -> CIRCUMFLEX ACCENT */
-	0x5F, /* FULLWIDTH LOW LINE -> LOW LINE */
-	0x60, /* FULLWIDTH GRAVE ACCENT -> GRAVE ACCENT */
-	0x61, /* FULLWIDTH LATIN SMALL LETTER A -> LATIN SMALL LETTER A */
-	0x62, /* FULLWIDTH LATIN SMALL LETTER B -> LATIN SMALL LETTER B */
-	0x63, /* FULLWIDTH LATIN SMALL LETTER C -> LATIN SMALL LETTER C */
-	0x64, /* FULLWIDTH LATIN SMALL LETTER D -> LATIN SMALL LETTER D */
-	0x65, /* FULLWIDTH LATIN SMALL LETTER E -> LATIN SMALL LETTER E */
-	0x66, /* FULLWIDTH LATIN SMALL LETTER F -> LATIN SMALL LETTER F */
-	0x67, /* FULLWIDTH LATIN SMALL LETTER G -> LATIN SMALL LETTER G */
-	0x68, /* FULLWIDTH LATIN SMALL LETTER H -> LATIN SMALL LETTER H */
-	0x69, /* FULLWIDTH LATIN SMALL LETTER I -> LATIN SMALL LETTER I */
-	0x6A, /* FULLWIDTH LATIN SMALL LETTER J -> LATIN SMALL LETTER J */
-	0x6B, /* FULLWIDTH LATIN SMALL LETTER K -> LATIN SMALL LETTER K */
-	0x6C, /* FULLWIDTH LATIN SMALL LETTER L -> LATIN SMALL LETTER L */
-	0x6D, /* FULLWIDTH LATIN SMALL LETTER M -> LATIN SMALL LETTER M */
-	0x6E, /* FULLWIDTH LATIN SMALL LETTER N -> LATIN SMALL LETTER N */
-	0x6F, /* FULLWIDTH LATIN SMALL LETTER O -> LATIN SMALL LETTER O */
-	0x70, /* FULLWIDTH LATIN SMALL LETTER P -> LATIN SMALL LETTER P */
-	0x71, /* FULLWIDTH LATIN SMALL LETTER Q -> LATIN SMALL LETTER Q */
-	0x72, /* FULLWIDTH LATIN SMALL LETTER R -> LATIN SMALL LETTER R */
-	0x73, /* FULLWIDTH LATIN SMALL LETTER S -> LATIN SMALL LETTER S */
-	0x74, /* FULLWIDTH LATIN SMALL LETTER T -> LATIN SMALL LETTER T */
-	0x75, /* FULLWIDTH LATIN SMALL LETTER U -> LATIN SMALL LETTER U */
-	0x76, /* FULLWIDTH LATIN SMALL LETTER V -> LATIN SMALL LETTER V */
-	0x77, /* FULLWIDTH LATIN SMALL LETTER W -> LATIN SMALL LETTER W */
-	0x78, /* FULLWIDTH LATIN SMALL LETTER X -> LATIN SMALL LETTER X */
-	0x79, /* FULLWIDTH LATIN SMALL LETTER Y -> LATIN SMALL LETTER Y */
-	0x7A, /* FULLWIDTH LATIN SMALL LETTER Z -> LATIN SMALL LETTER Z */
-	0x7B, /* FULLWIDTH LEFT CURLY BRACKET -> LEFT CURLY BRACKET */
-	0x7C, /* FULLWIDTH VERTICAL LINE -> VERTICAL LINE */
-	0x7D, /* FULLWIDTH RIGHT CURLY BRACKET -> RIGHT CURLY BRACKET */
-	0x7E, /* FULLWIDTH TILDE -> TILDE */
 };
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 8/8] vt: process the full-width ASCII fallback range programmatically
  2025-05-05 16:55 ` [PATCH 8/8] vt: process the full-width ASCII fallback range programmatically Nicolas Pitre
@ 2025-05-06  5:55   ` Jiri Slaby
  2025-05-07 14:04     ` Nicolas Pitre
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Slaby @ 2025-05-06  5:55 UTC (permalink / raw)
  To: Nicolas Pitre, Greg Kroah-Hartman
  Cc: Nicolas Pitre, linux-serial, linux-kernel

On 05. 05. 25, 18:55, Nicolas Pitre wrote:
> From: Nicolas Pitre <npitre@baylibre.com>
> 
> This saves about 258 bytes of text.

You mean .rodata, actually?

> --- a/drivers/tty/vt/ucs.c
> +++ b/drivers/tty/vt/ucs.c
> @@ -222,5 +222,13 @@ u32 ucs_get_fallback(u32 cp)
>   	if (single)
>   		return ucs_fallback_singles_subs[single - ucs_fallback_singles];
>   
> +	/*
> +	 * Full-width to ASCII mapping (covering all printable ASCII 33-126)
> +	 * 0xFF01 (!) to 0xFF5E (~) -> ASCII 33 (!) to 126 (~)
> +	 * We process them programmatically to reduce the table size.
> +	 */
> +	if (cp >= 0xFF01 && cp <= 0xFF5E)
> +		return cp - 0xFF01 + 33;

So do really »+ '!'« instead of »+ 33«.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/8] vt: ucs.c: fix misappropriate in_range() usage
  2025-05-05 16:55 ` [PATCH 1/8] vt: ucs.c: fix misappropriate in_range() usage Nicolas Pitre
@ 2025-05-06  5:58   ` Jiri Slaby
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2025-05-06  5:58 UTC (permalink / raw)
  To: Nicolas Pitre, Greg Kroah-Hartman
  Cc: Nicolas Pitre, linux-serial, linux-kernel

On 05. 05. 25, 18:55, Nicolas Pitre wrote:
> From: Nicolas Pitre <npitre@baylibre.com>
> 
> The in_range() helper accepts a start and a length, not a start and
> an end.

Indeed.

> Signed-off-by: Nicolas Pitre <npitre@baylibre.com>

Reviewed-by: Jiri Slaby <jirislaby@kernel.org>


-- 
js
suse labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/8] vt: move glyph determination to a separate function
  2025-05-05 16:55 ` [PATCH 3/8] vt: move glyph determination to a separate function Nicolas Pitre
@ 2025-05-06  6:06   ` Jiri Slaby
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2025-05-06  6:06 UTC (permalink / raw)
  To: Nicolas Pitre, Greg Kroah-Hartman
  Cc: Nicolas Pitre, linux-serial, linux-kernel

On 05. 05. 25, 18:55, Nicolas Pitre wrote:
> From: Nicolas Pitre <npitre@baylibre.com>
> 
> No logical changes. Make it easier for enhancements to come.
...
> @@ -2984,12 +2985,40 @@ static int vc_process_ucs(struct vc_data *vc, int *c, int *tc)
>   	return 0;
>   }
>   
> +static int vc_get_glyph(struct vc_data *vc, int tc)
> +{
> +	int glyph = conv_uni_to_pc(vc, tc);
> +	int charmask = vc->vc_hi_font_mask ? 0x1ff : 0xff;

Could you keep charmask unsigned? It used to be u16.

> +
> +	if (!(glyph & ~charmask))
> +		return glyph;
> +
> +	if (glyph == -1)
> +		return -1; /* nothing to display */
> +
> +	/* Glyph not found */
> +

Do no additional \n here ^^.

> +	if ((!vc->vc_utf || vc->vc_disp_ctrl || tc < 128) && !(tc & ~charmask)) {
> +		/*
> +		 * In legacy mode use the glyph we get by a 1:1 mapping.
> +		 * This would make absolutely no sense with Unicode in mind,
> +		 * but do this for ASCII characters since a font may lack
> +		 * Unicode mapping info and we don't want to end up with
> +		 * having question marks only.

Generally: feel free to use 100 characters per line.

> +		 */
> +		return tc;
> +	}
> +
> +	/* Display U+FFFD (Unicode Replacement Character). */
> +	return conv_uni_to_pc(vc, UCS_REPLACEMENT);
> +}

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/8] vt: introduce gen_ucs_fallback_table.py to create ucs_fallback_table.h
  2025-05-05 16:55 ` [PATCH 4/8] vt: introduce gen_ucs_fallback_table.py to create ucs_fallback_table.h Nicolas Pitre
@ 2025-05-06  6:33   ` Jiri Slaby
  2025-05-07 14:11     ` Nicolas Pitre
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Slaby @ 2025-05-06  6:33 UTC (permalink / raw)
  To: Nicolas Pitre, Greg Kroah-Hartman
  Cc: Nicolas Pitre, linux-serial, linux-kernel

On 05. 05. 25, 18:55, Nicolas Pitre wrote:
> From: Nicolas Pitre <npitre@baylibre.com>
> 
> The generated table maps complex characters to their simpler fallback
> forms for a terminal display when corresponding glyphs are unavailable.
> This includes diacritics, symbols as well as many drawing characters.
> Fallback characters aren't perfect replacements, obviously. But they are
> still far more useful than a bunch of squared question marks.
> 
> Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
> ---
>   drivers/tty/vt/gen_ucs_fallback_table.py | 882 +++++++++++++++++++++++
>   1 file changed, 882 insertions(+)
>   create mode 100755 drivers/tty/vt/gen_ucs_fallback_table.py
> 
> diff --git a/drivers/tty/vt/gen_ucs_fallback_table.py b/drivers/tty/vt/gen_ucs_fallback_table.py
> new file mode 100755
> index 000000000000..cb4e75b454fe
> --- /dev/null
> +++ b/drivers/tty/vt/gen_ucs_fallback_table.py
> @@ -0,0 +1,882 @@
> +#!/usr/bin/env python3
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Leverage Python's unicodedata module to generate ucs_fallback_table.h
> +#
> +# The generated table maps complex characters to their simpler fallback forms
> +# for a terminal display when corresponding glyphs are unavailable.
> +#
> +# Usage:
> +#   python3 gen_ucs_fallback_table.py         # Generate fallback tables
> +#   python3 gen_ucs_fallback_table.py -o FILE # Specify output file
> +
> +import unicodedata
> +import sys
> +import argparse
> +from collections import defaultdict
> +
> +# This script's file name
> +from pathlib import Path
> +this_file = Path(__file__).name
> +
> +# Default output file name
> +DEFAULT_OUT_FILE = "ucs_fallback_table.h"
> +
> +def collect_accented_latin_letters():
> +    """Collect already composed Latin letters with diacritics."""
> +    fallback_map = {}
> +
> +    # Latin-1 Supplement (0x00C0-0x00FF)
> +    # Capital letters with accents to their base forms
> +    fallback_map[0x00C0] = ord('A')  # À LATIN CAPITAL LETTER A WITH GRAVE
> +    fallback_map[0x00C1] = ord('A')  # Á LATIN CAPITAL LETTER A WITH ACUTE
> +    fallback_map[0x00C2] = ord('A')  # Â LATIN CAPITAL LETTER A WITH CIRCUMFLEX
> +    fallback_map[0x00C3] = ord('A')  # Ã LATIN CAPITAL LETTER A WITH TILDE
> +    fallback_map[0x00C4] = ord('A')  # Ä LATIN CAPITAL LETTER A WITH DIAERESIS
> +    fallback_map[0x00C5] = ord('A')  # Å LATIN CAPITAL LETTER A WITH RING ABOVE
> +    fallback_map[0x00C7] = ord('C')  # Ç LATIN CAPITAL LETTER C WITH CEDILLA
> +    fallback_map[0x00C8] = ord('E')  # È LATIN CAPITAL LETTER E WITH GRAVE
> +    fallback_map[0x00C9] = ord('E')  # É LATIN CAPITAL LETTER E WITH ACUTE
> +    fallback_map[0x00CA] = ord('E')  # Ê LATIN CAPITAL LETTER E WITH CIRCUMFLEX
> +    fallback_map[0x00CB] = ord('E')  # Ë LATIN CAPITAL LETTER E WITH DIAERESIS
> +    fallback_map[0x00CC] = ord('I')  # Ì LATIN CAPITAL LETTER I WITH GRAVE
> +    fallback_map[0x00CD] = ord('I')  # Í LATIN CAPITAL LETTER I WITH ACUTE
> +    fallback_map[0x00CE] = ord('I')  # Î LATIN CAPITAL LETTER I WITH CIRCUMFLEX
> +    fallback_map[0x00CF] = ord('I')  # Ï LATIN CAPITAL LETTER I WITH DIAERESIS
> +    fallback_map[0x00D1] = ord('N')  # Ñ LATIN CAPITAL LETTER N WITH TILDE
> +    fallback_map[0x00D2] = ord('O')  # Ò LATIN CAPITAL LETTER O WITH GRAVE
> +    fallback_map[0x00D3] = ord('O')  # Ó LATIN CAPITAL LETTER O WITH ACUTE
> +    fallback_map[0x00D4] = ord('O')  # Ô LATIN CAPITAL LETTER O WITH CIRCUMFLEX
> +    fallback_map[0x00D5] = ord('O')  # Õ LATIN CAPITAL LETTER O WITH TILDE
> +    fallback_map[0x00D6] = ord('O')  # Ö LATIN CAPITAL LETTER O WITH DIAERESIS
> +    fallback_map[0x00D9] = ord('U')  # Ù LATIN CAPITAL LETTER U WITH GRAVE
> +    fallback_map[0x00DA] = ord('U')  # Ú LATIN CAPITAL LETTER U WITH ACUTE
> +    fallback_map[0x00DB] = ord('U')  # Û LATIN CAPITAL LETTER U WITH CIRCUMFLEX
> +    fallback_map[0x00DC] = ord('U')  # Ü LATIN CAPITAL LETTER U WITH DIAERESIS
> +    fallback_map[0x00DD] = ord('Y')  # Ý LATIN CAPITAL LETTER Y WITH ACUTE


So you are in fact doing iconv's utf-8 -> ascii//translit conversion. 
Does python not have an iconv lib?

 > perl -e 'use Text::Iconv; print Text::Iconv->new("UTF8", 
"ASCII//TRANSLIT")->convert("áąà"), "\n";'
aaa

/me digging

Ah, unidecode:
 > python3 -c 'from unidecode import unidecode; print(unidecode("áąà"))'
aaa

Perhaps use that instead of manual table?

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 8/8] vt: process the full-width ASCII fallback range programmatically
  2025-05-06  5:55   ` Jiri Slaby
@ 2025-05-07 14:04     ` Nicolas Pitre
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2025-05-07 14:04 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Greg Kroah-Hartman, linux-serial, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

On Tue, 6 May 2025, Jiri Slaby wrote:

> On 05. 05. 25, 18:55, Nicolas Pitre wrote:
> > From: Nicolas Pitre <npitre@baylibre.com>
> > 
> > This saves about 258 bytes of text.
> 
> You mean .rodata, actually?

I used the `size`command which lumps .text and .rodata together. In 
reality .rodata goes down and .text goes up a little, and the end result 
is the combined text size.

> > --- a/drivers/tty/vt/ucs.c
> > +++ b/drivers/tty/vt/ucs.c
> > @@ -222,5 +222,13 @@ u32 ucs_get_fallback(u32 cp)
> >    if (single)
> >     return ucs_fallback_singles_subs[single - ucs_fallback_singles];
> >   +	/*
> > +	 * Full-width to ASCII mapping (covering all printable ASCII 33-126)
> > +	 * 0xFF01 (!) to 0xFF5E (~) -> ASCII 33 (!) to 126 (~)
> > +	 * We process them programmatically to reduce the table size.
> > +	 */
> > +	if (cp >= 0xFF01 && cp <= 0xFF5E)
> > +		return cp - 0xFF01 + 33;
> 
> So do really »+ '!'« instead of »+ 33«.

Uh... why? Having both as numerical value is more meaningful and clearer 
here IMHO.


Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/8] vt: introduce gen_ucs_fallback_table.py to create ucs_fallback_table.h
  2025-05-06  6:33   ` Jiri Slaby
@ 2025-05-07 14:11     ` Nicolas Pitre
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2025-05-07 14:11 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Greg Kroah-Hartman, linux-serial, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2000 bytes --]

On Tue, 6 May 2025, Jiri Slaby wrote:

> On 05. 05. 25, 18:55, Nicolas Pitre wrote:
> > From: Nicolas Pitre <npitre@baylibre.com>
> > 
> > The generated table maps complex characters to their simpler fallback
> > forms for a terminal display when corresponding glyphs are unavailable.
> > This includes diacritics, symbols as well as many drawing characters.
> > Fallback characters aren't perfect replacements, obviously. But they are
> > still far more useful than a bunch of squared question marks.
> > 
> > Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
> > ---
> >   drivers/tty/vt/gen_ucs_fallback_table.py | 882 +++++++++++++++++++++++
> >   1 file changed, 882 insertions(+)
> >   create mode 100755 drivers/tty/vt/gen_ucs_fallback_table.py
> > 
> > diff --git a/drivers/tty/vt/gen_ucs_fallback_table.py
> > b/drivers/tty/vt/gen_ucs_fallback_table.py
> > new file mode 100755
> > index 000000000000..cb4e75b454fe
> > --- /dev/null
> > +++ b/drivers/tty/vt/gen_ucs_fallback_table.py
> > @@ -0,0 +1,882 @@
> > +    fallback_map[0x00D9] = ord('U')  # Ù LATIN CAPITAL LETTER U WITH GRAVE
> > +    fallback_map[0x00DA] = ord('U')  # Ú LATIN CAPITAL LETTER U WITH ACUTE
> > +    fallback_map[0x00DB] = ord('U')  # Û LATIN CAPITAL LETTER U WITH CIRCUMFLEX
> > +    fallback_map[0x00DC] = ord('U')  # Ü LATIN CAPITAL LETTER U WITH DIAERESIS
> > +    fallback_map[0x00DD] = ord('Y')  # Ý LATIN CAPITAL LETTER Y WITH ACUTE
> 
> 
> So you are in fact doing iconv's utf-8 -> ascii//translit conversion. Does
> python not have an iconv lib?
> 
> > perl -e 'use Text::Iconv; print Text::Iconv->new("UTF8", 
> "ASCII//TRANSLIT")->convert("áąà"), "\n";'
> aaa
> 
> /me digging
> 
> Ah, unidecode:
> > python3 -c 'from unidecode import unidecode; print(unidecode("áąà"))'
> aaa
> 
> Perhaps use that instead of manual table?

Good idea! Go figure why I didn't think of that.

Some overrides are still needed but the script is much smaller now (and 
the table somewhat bigger though). 


Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-05-07 14:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-05 16:55 [PATCH 0/8] vt: more Unicode handling changes Nicolas Pitre
2025-05-05 16:55 ` [PATCH 1/8] vt: ucs.c: fix misappropriate in_range() usage Nicolas Pitre
2025-05-06  5:58   ` Jiri Slaby
2025-05-05 16:55 ` [PATCH 2/8] vt: make sure displayed double-width characters are remembered as such Nicolas Pitre
2025-05-05 16:55 ` [PATCH 3/8] vt: move glyph determination to a separate function Nicolas Pitre
2025-05-06  6:06   ` Jiri Slaby
2025-05-05 16:55 ` [PATCH 4/8] vt: introduce gen_ucs_fallback_table.py to create ucs_fallback_table.h Nicolas Pitre
2025-05-06  6:33   ` Jiri Slaby
2025-05-07 14:11     ` Nicolas Pitre
2025-05-05 16:55 ` [PATCH 5/8] vt: create ucs_fallback_table.h_shipped with gen_ucs_fallback_table.py Nicolas Pitre
2025-05-05 16:55 ` [PATCH 6/8] vt: add ucs_get_fallback() Nicolas Pitre
2025-05-05 16:55 ` [PATCH 7/8] vt: make use of ucs_get_fallback() when glyph is unavailable Nicolas Pitre
2025-05-05 16:55 ` [PATCH 8/8] vt: process the full-width ASCII fallback range programmatically Nicolas Pitre
2025-05-06  5:55   ` Jiri Slaby
2025-05-07 14:04     ` Nicolas Pitre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.