From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Nicolas Pitre <nico@fluxnic.net>
Cc: Jiri Slaby <jirislaby@kernel.org>,
Nicolas Pitre <npitre@baylibre.com>,
linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 00/14] vt: implement proper Unicode handling
Date: Fri, 25 Apr 2025 16:29:52 +0200 [thread overview]
Message-ID: <2025042517-defacing-lushly-10d5@gregkh> (raw)
In-Reply-To: <20250417184849.475581-1-nico@fluxnic.net>
On Thu, Apr 17, 2025 at 02:45:02PM -0400, Nicolas Pitre wrote:
> The Linux VT console has many problems with regards to proper Unicode
> handling:
>
> - All new double-width Unicode code points which have been introduced since
> Unicode 5.0 are not recognized as such (we're at Unicode 16.0 now).
>
> - Zero-width code points are not recognized at all. If you try to edit files
> containing a lot of emojis, you will see the rendering issues. When there
> are a lot of zero-width characters (like "variation selectors"), long
> lines get wrapped, but any Unicode-aware editor thinks that the content
> was rendered properly and its rendering logic starts to work in very bad
> ways. Combine this with tmux or screen, and there is a huge mess going on
> in the terminal.
>
> - Also, text which uses combining diacritics has the same effect as text
> with zero-width characters as programs expect the characters to take fewer
> columns than what they actually do.
>
> Some may argue that the Linux VT console is unmaintained and/or not used
> much any longer and that one should consider a user space terminal
> alternative instead. But every such alternative that is not less maintained
> than the Linux VT console does require a full heavy graphical environment
> and that is the exact antithesis of what the Linux console is meant to be.
>
> Furthermore, there is a significant Linux console user base represented by
> blind users (which I'm a member of) for whom the alternatives are way more
> cumbersome to use reducing our productivity. So it has to stay and
> be maintained to the best of our abilities.
>
> That being said...
>
> This patch series is about fixing all the above issues. This is accomplished
> with some Python scripts leveraging Python's unicodedata module to generate
> C code with lookup tables that is suitable for the kernel. In summary:
>
> - The double-width code point table is updated to the latest Unicode version
> and the table itself is optimized to reduce its size.
>
> - A zero-width code point table is created and the console code is modified
> to properly use it.
>
> - A table with base character + combining mark pairs is created to convert
> them into their precomposed equivalents when they're encountered.
> By default the generated table contains most commonly used Latin, Greek,
> and Cyrillic recomposition pairs only, but one can execute the provided
> script with the --full argument to create a table that covers all
> possibilities. Combining marks that are not listed in the table are simply
> treated like zero-width code points and properly ignored.
>
> - All those tables plus related lookup code require about 3500 additional
> bytes of text which is not very significant these days. Yet, one
> can still set CONFIG_CONSOLE_TRANSLATIONS=n to configure this all out
> if need be.
>
> Note: The generated C code makes scripts/checkpatch.pl complain about
> "... exceeds 100 columns" because the inserted comments with code
> point names, well, make some inlines exceed 100 columns. Please make
> an exception for those files and disregard those warnings. When
> checkpatch.pl is used on those files directly with -f then it doesn't
> complain.
>
> This series was tested on top of v6.15-rc2.
I've taken the first version of this, should I revert all of them and
then apply these, or do you want to send a diff between this and what is
in the tty-next tree?
thanks,
greg k-h
next prev parent reply other threads:[~2025-04-25 14:29 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-17 18:45 [PATCH v3 00/14] vt: implement proper Unicode handling Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 01/14] vt: minor cleanup to vc_translate_unicode() Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 02/14] vt: move unicode processing to a separate file Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 03/14] vt: properly support zero-width Unicode code points Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 04/14] vt: introduce gen_ucs_width_table.py to create ucs_width_table.h Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 05/14] vt: create ucs_width_table.h with gen_ucs_width_table.py Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 06/14] vt: use new tables in ucs.c Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 07/14] vt: introduce gen_ucs_recompose_table.py to create ucs_recompose_table.h Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 08/14] vt: create ucs_recompose_table.h with gen_ucs_recompose_table.py Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 09/14] vt: support Unicode recomposition Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 10/14] vt: pad double-width code points with a zero-width space Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 11/14] vt: remove zero-width-space handling from conv_uni_to_pc() Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 12/14] vt: update gen_ucs_width_table.py to make tables more space efficient Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 13/14] vt: refresh ucs_width_table.h and adjust code in ucs.c accordingly Nicolas Pitre
2025-04-17 18:45 ` [PATCH v3 14/14] vt: move UCS tables to the "shipped" form Nicolas Pitre
2025-04-25 14:29 ` Greg Kroah-Hartman [this message]
2025-04-25 16:13 ` [PATCH v3 00/14] vt: implement proper Unicode handling Nicolas Pitre
2025-04-26 9:24 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2025042517-defacing-lushly-10d5@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=jirislaby@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-serial@vger.kernel.org \
--cc=nico@fluxnic.net \
--cc=npitre@baylibre.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox