From: Greg KH <gregkh@linuxfoundation.org>
To: "Roman Žilka" <roman.zilka@gmail.com>
Cc: jirislaby@kernel.org, linux-serial@vger.kernel.org
Subject: Re: [PATCH v3] tty/vt: UTF-8 parsing update according to RFC 3629, modern Unicode
Date: Thu, 4 Jan 2024 16:28:47 +0100 [thread overview]
Message-ID: <2024010413-quickly-crinkly-6c5b@gregkh> (raw)
In-Reply-To: <e5e7fd4f-acac-41a0-8a36-1f4f71eb7c18@gmail.com>
On Tue, Dec 12, 2023 at 09:26:53PM +0100, Roman Žilka wrote:
> vc_translate_unicode() and vc_sanitize_unicode() parse input to the
> UTF-8-enabled console, marking invalid byte sequences and producing Unicode
> codepoints. The current algorithm follows ancient Unicode and may accept
> invalid byte sequences, pass on non-existent codepoints and reject valid
> sequences.
>
> The patch restores the functions' compliance with modern Unicode (v15.1 [1]
> + many previous versions) as well as RFC 3629 [2].
> 1. Codepoint space is limited to 0x10FFFF.
Wait, why? And shouldn't this be an individual patch on it's own? What
is wrong with the checking we currently have.
> 2. "Noncharacters", such as U+FFFE, U+FFFF, are no longer invalid in
> Unicode and will be accepted.
Accepted when?
> Another option was to complete the set of
> noncharacters (used to be just those two, now there's more) and preserve
> the rejection step. This is indeed what Unicode suggests (v15.1, chap.
> 23.7) (not requires), but most codepoints are !iswprint(), so selecting
> just the noncharacters seemed arbitrary and futile (and unnecessary).
What is this change going to break with existing systems that were
thinking these were invalid characters?
> On the side:
> 3. Corrected/improved the doc of the two functions (esp. @rescan).
Again, a separate commit. When you have to list the changes out, that
is a huge hint it needs to be broken up into smaller pieces.
thanks,
greg k-h
next prev parent reply other threads:[~2024-01-04 15:28 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-12 15:13 [PATCH] tty/vt: UTF-8 parsing update according to RFC 3629, modern Unicode Roman Žilka
2023-12-12 15:36 ` Greg KH
2023-12-12 16:23 ` [PATCH v2] " Roman Žilka
2023-12-12 20:26 ` [PATCH v3] " Roman Žilka
2024-01-04 15:28 ` Greg KH [this message]
2024-01-09 10:28 ` Roman Žilka
2024-01-09 10:43 ` [PATCH v4] " Roman Žilka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2024010413-quickly-crinkly-6c5b@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=jirislaby@kernel.org \
--cc=linux-serial@vger.kernel.org \
--cc=roman.zilka@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox