From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from phobos.denx.de (phobos.denx.de [85.214.62.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1A444C433EF for ; Mon, 10 Jan 2022 11:08:50 +0000 (UTC) Received: from h2850616.stratoserver.net (localhost [IPv6:::1]) by phobos.denx.de (Postfix) with ESMTP id 8916181DCA; Mon, 10 Jan 2022 12:08:48 +0100 (CET) Authentication-Results: phobos.denx.de; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=u-boot-bounces@lists.denx.de Received: by phobos.denx.de (Postfix, from userid 109) id 59AC881761; Mon, 10 Jan 2022 12:08:47 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by phobos.denx.de (Postfix) with ESMTP id 9A90B81DCA for ; Mon, 10 Jan 2022 12:08:41 +0100 (CET) Authentication-Results: phobos.denx.de; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=andre.przywara@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C025F2B; Mon, 10 Jan 2022 03:08:40 -0800 (PST) Received: from donnerap.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8E9EF3F5A1; Mon, 10 Jan 2022 03:08:39 -0800 (PST) Date: Mon, 10 Jan 2022 11:08:34 +0000 From: Andre Przywara To: Heinrich Schuchardt Cc: Anatolij Gustschin , Simon Glass , Tom Rini , Alexander Graf , Mark Kettenis , u-boot@lists.denx.de, Ilias Apalodimas Subject: Re: [PATCH 8/8] video: Convert UTF-8 input stream to the 437 code page Message-ID: <20220110110834.2f751c7b@donnerap.cambridge.arm.com> In-Reply-To: <2e09d00b-d2b0-a719-38a8-5b657e4ca2a4@canonical.com> References: <20220110005638.21599-1-andre.przywara@arm.com> <20220110005638.21599-9-andre.przywara@arm.com> <2e09d00b-d2b0-a719-38a8-5b657e4ca2a4@canonical.com> Organization: ARM X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; aarch64-unknown-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: u-boot@lists.denx.de X-Mailman-Version: 2.1.39 Precedence: list List-Id: U-Boot discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: u-boot-bounces@lists.denx.de Sender: "U-Boot" X-Virus-Scanned: clamav-milter 0.103.2 at phobos.denx.de X-Virus-Status: Clean On Mon, 10 Jan 2022 09:18:45 +0100 Heinrich Schuchardt wrote: Hi Heinrich, > /On 1/10/22 01:56, Andre Przywara wrote: > > The bitmap fonts (VGA 8x16 and friends) we import from Linux use the > > 437 code page to map their glyphs. For U-Boot's own purposes this is > > probably fine, but UEFI applications output Unicode, which only matches > > in the very basic first 127 characters. > >=20 > > Add a function that converts UTF-8 character sequences into the > > respective CP437 code point, as far as the characters defined in there > > allow this. This includes quite some international and box drawing > > characters, which are used by UEFI applications. > >=20 > > Signed-off-by: Andre Przywara > > --- > > drivers/video/Makefile | 1 + > > drivers/video/utf8_cp437.c | 169 ++++++++++++++++++++++++++++++ > > drivers/video/vidconsole-uclass.c | 6 +- > > include/video_console.h | 9 ++ > > 4 files changed, 184 insertions(+), 1 deletion(-) > > create mode 100644 drivers/video/utf8_cp437.c > >=20 > > diff --git a/drivers/video/Makefile b/drivers/video/Makefile > > index 8956b5f9b00..5f9823dff9e 100644 > > --- a/drivers/video/Makefile > > +++ b/drivers/video/Makefile > > @@ -14,6 +14,7 @@ obj-$(CONFIG_DISPLAY) +=3D display-uclass.o > > obj-$(CONFIG_VIDEO_MIPI_DSI) +=3D dsi-host-uclass.o > > obj-$(CONFIG_DM_VIDEO) +=3D video-uclass.o vidconsole-uclass.o > > obj-$(CONFIG_DM_VIDEO) +=3D video_bmp.o > > +obj-$(CONFIG_DM_VIDEO) +=3D utf8_cp437.o > > obj-$(CONFIG_PANEL) +=3D panel-uclass.o > > obj-$(CONFIG_DM_PANEL_HX8238D) +=3D hx8238d.o > > obj-$(CONFIG_SIMPLE_PANEL) +=3D simple_panel.o > > diff --git a/drivers/video/utf8_cp437.c b/drivers/video/utf8_cp437.c > > new file mode 100644 > > index 00000000000..cab68b92b6e > > --- /dev/null > > +++ b/drivers/video/utf8_cp437.c =20 >=20 > A translation from Unicode to CP437 is needed in the FAT driver (but=20 > missing), in Unicode Collation Protocol and here in video. So this=20 > functionality should live in lib/charset.c. > Please, have a look at efi_fat_to_str() in=20 > lib/efi_loader/efi_unicode_collation.c. We should avoid code duplication. Ah, didn't know, thanks for the heads up! Happy to move the code in there, and will also look for prior art. > Maybe we should drop CP1250 support to make our effort simpler? No=20 > defconfig uses it. I need to have a closer look. >=20 > > @@ -0,0 +1,169 @@ > > +/* > > + * Convert UTF-8 bytes into a code page 437 character. > > + * Based on the table in the Code_page_437 Wikipedia page. > > + */ > > + > > +#include > > + > > +uint8_t code_points_00a0[] =3D { > > + 255, 173, 155, 156, 7, 157, 7, 21, > > + 7, 7, 166, 174, 170, 7, 7, 7, > > + 248, 241, 253, 7, 7, 230, 20, 250, > > + 7, 7, 167, 175, 172, 171, 7, 168, > > + 7, 7, 7, 7, 142, 143, 146, 128, > > + 7, 144, 7, 7, 7, 7, 7, 7, > > + 7, 165, 7, 7, 7, 7, 153, 7, > > + 7, 7, 7, 7, 154, 7, 7, 225, > > + 133, 160, 131, 7, 132, 134, 145, 135, > > + 138, 130, 136, 137, 141, 161, 140, 139, > > + 7, 164, 149, 162, 147, 7, 148, 246, > > + 7, 151, 163, 150, 129, 7, 7, 152, > > +}; > > + > > +uint8_t code_points_2550[] =3D { > > + 205, 186, 213, 214, 201, 184, 183, 187, > > + 212, 211, 200, 190, 189, 188, 198, 199, > > + 204, 181, 182, 185, 209, 210, 203, 207, > > + 208, 202, 216, 215, 206 > > +}; > > + > > +static uint8_t utf8_convert_11bit(uint16_t code) > > +{ > > + switch (code) { > > + case 0x0192: return 159; > > + case 0x0393: return 226; > > + case 0x0398: return 233; > > + case 0x03A3: return 228; > > + case 0x03A6: return 232; > > + case 0x03A9: return 234; > > + case 0x03B1: return 224; > > + case 0x03B4: return 235; > > + case 0x03B5: return 238; > > + case 0x03C0: return 227; > > + case 0x03C3: return 229; > > + case 0x03C4: return 231; > > + case 0x03C6: return 237; > > + } > > + > > + return 0; > > +}; > > + > > +static uint8_t utf8_convert_2xxx(uint16_t code) =20 >=20 >=20 > This is duplicate to include/cp437.h But this is the other way around, isn't it? The above is from CP437 to UCS-2, this one here is from UCS-2 to CP437 (and it's misnamed). But I can certainly move it in there. > > +{ > > + switch (code) { > > + case 0x2022: return 7; > > + case 0x203C: return 19; > > + case 0x207F: return 252; > > + case 0x20A7: return 158; > > + case 0x2190: return 27; > > + case 0x2191: return 24; > > + case 0x2192: return 26; > > + case 0x2193: return 25; > > + case 0x2194: return 29; > > + case 0x2195: return 18; > > + case 0x21A8: return 23; > > + case 0x2219: return 249; > > + case 0x221A: return 251; > > + case 0x221E: return 236; > > + case 0x221F: return 28; > > + case 0x2229: return 239; > > + case 0x2248: return 247; > > + case 0x2261: return 240; > > + case 0x2264: return 243; > > + case 0x2265: return 242; > > + case 0x2310: return 169; > > + case 0x2320: return 244; > > + case 0x2321: return 245; > > + case 0x2500: return 196; > > + case 0x2502: return 179; > > + case 0x250C: return 218; > > + case 0x2510: return 191; > > + case 0x2514: return 192; > > + case 0x2518: return 217; > > + case 0x251C: return 195; > > + case 0x2524: return 180; > > + case 0x252C: return 194; > > + case 0x2534: return 193; > > + case 0x253C: return 197; > > + case 0x2580: return 223; > > + case 0x2584: return 220; > > + case 0x2588: return 219; > > + case 0x258C: return 221; > > + case 0x2590: return 222; > > + case 0x2591: return 176; > > + case 0x2592: return 177; > > + case 0x2593: return 178; > > + case 0x25A0: return 254; > > + case 0x25AC: return 22; > > + case 0x25B2: return 30; > > + case 0x25BA: return 16; > > + case 0x25BC: return 31; > > + case 0x25C4: return 17; > > + case 0x25CB: return 9; > > + case 0x25D8: return 8; > > + case 0x25D9: return 10; > > + case 0x263A: return 1; > > + case 0x263B: return 2; > > + case 0x263C: return 15; > > + case 0x2640: return 12; > > + case 0x2642: return 11; > > + case 0x2660: return 6; > > + case 0x2663: return 5; > > + case 0x2665: return 3; > > + case 0x2666: return 4; > > + case 0x266A: return 13; > > + case 0x266B: return 14; > > + } > > + > > + return 0; > > +} > > + > > +uint8_t convert_uc16_to_cp437(uint16_t code) =20 >=20 > We should not duplicate efi_fat_to_str() but use a common function. Definitely, I wasn't aware of this code, thanks for the heads up. > > +{ > > + if (code < 0x7f) // ASCII > > + return code; > > + if (code < 0xa0) // high control characters > > + return code; > > + if (code < 0x100) // international characters > > + return code_points_00a0[code - 0xa0]; > > + if (code < 0x800) > > + return utf8_convert_11bit(code); > > + if (code >=3D 0x2550 && code < 0x256d) // block graphics > > + return code_points_2550[code - 0x2550]; =20 >=20 > How about =C3=84=C3=96=C3=9C=C3=9F and other European letters? Any "=C3=9Cbergr=C3=B6=C3=9Fengesch=C3=A4ft" should be covered by the inter= national characters section above, between 160 and 255. Do you miss anything in particular? > > + > > + return utf8_convert_2xxx(code); > > +} > > + > > +uint8_t convert_utf8_to_cp437(uint8_t c, uint32_t *esc) > > +{ > > + int shift; > > + uint16_t ucs; > > + > > + if (c < 127) // ASCII > > + return c; > > + if (c =3D=3D 127) > > + return 8; // DEL (?) > > + > > + switch (c & 0xf0) { > > + case 0xc0: case 0xd0: // two bytes sequence > > + *esc =3D (1U << 24) | ((c & 0x1f) << 6); > > + return 0; > > + case 0xe0: // three bytes sequence > > + *esc =3D (2U << 24) | ((c & 0x0f) << 12); > > + return 0; > > + case 0xf0: // four bytes sequence > > + *esc =3D (3U << 24) | ((c & 0x07) << 18); > > + return 0; > > + case 0x80: case 0x90: case 0xa0: case 0xb0: // continuation > > + shift =3D (*esc >> 24) - 1; > > + ucs =3D *esc & 0xffffff; > > + if (shift) { > > + *esc =3D (shift << 24) | ucs | (c & 0x3f) << (shift * 6); > > + return 0; > > + } > > + *esc =3D 0; > > + return convert_uc16_to_cp437(ucs | (c & 0x3f)); > > + } > > + > > + return 0; > > +} > > diff --git a/drivers/video/vidconsole-uclass.c b/drivers/video/vidconso= le-uclass.c > > index 420fd86f9ac..ca6e1a2620c 100644 > > --- a/drivers/video/vidconsole-uclass.c > > +++ b/drivers/video/vidconsole-uclass.c > > @@ -546,6 +546,7 @@ static int vidconsole_output_glyph(struct udevice *= dev, char ch) > > int vidconsole_put_char(struct udevice *dev, char ch) > > { > > struct vidconsole_priv *priv =3D dev_get_uclass_priv(dev); > > + uint8_t cp437; > > int ret; > > =20 > > /* > > @@ -587,7 +588,10 @@ int vidconsole_put_char(struct udevice *dev, char = ch) > > priv->last_ch =3D 0; > > break; > > default: > > - ret =3D vidconsole_output_glyph(dev, ch); > > + cp437 =3D convert_utf8_to_cp437(ch, &priv->ucs); > > + if (cp437 =3D=3D 0) > > + return 0; > > + ret =3D vidconsole_output_glyph(dev, cp437); > > if (ret < 0) > > return ret; > > break; > > diff --git a/include/video_console.h b/include/video_console.h =20 >=20 > This should go to include/charset.h. >=20 > Best regards >=20 > Heinrich >=20 > > index a908f1412e8..f2d05e7f4e7 100644 > > --- a/include/video_console.h > > +++ b/include/video_console.h > > @@ -83,6 +83,7 @@ struct vidconsole_priv { > > int escape_len; > > int row_saved; > > int col_saved; > > + u32 ucs; > > bool cursor_visible; > > char escape_buf[32]; > > }; > > @@ -304,6 +305,14 @@ static inline int vidconsole_memmove(struct udevic= e *dev, void *dst, > > return 0; > > } > > =20 > > +/* > > + * Convert an UTF-8 byte into the corresponding character in the CP437 > > + * code page. Returns 0 if that character is part of a multi-byte sequ= ence. > > + * for which *esc holds the state of. Repeatedly feed in more bytes un= til > > + * the return value returns a non-0 character. =20 >=20 > Please, follow the style described in=20 > https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#function= -documentation.=20 > This will allow inclusion in the HTML documentation (see=20 > https://u-boot.readthedocs.io/en/latest/api/index.html). >=20 Sure, I love and use kerneldoc, just missed that in that three-year old patch ;-) Cheers, Andre >=20 > > + */ > > +uint8_t convert_utf8_to_cp437(uint8_t c, uint32_t *esc); > > + > > #endif > > =20 > > #endif =20 >=20