From: Greg KH <gregkh@linuxfoundation.org>
To: "Roman Žilka" <roman.zilka@gmail.com>
Cc: jirislaby@kernel.org, linux-serial@vger.kernel.org
Subject: Re: [PATCH v3] tty/vt: UTF-8 parsing update according to RFC 3629, modern Unicode
Date: Thu, 4 Jan 2024 16:28:47 +0100 [thread overview]
Message-ID: <2024010413-quickly-crinkly-6c5b@gregkh> (raw)
In-Reply-To: <e5e7fd4f-acac-41a0-8a36-1f4f71eb7c18@gmail.com>
On Tue, Dec 12, 2023 at 09:26:53PM +0100, Roman Žilka wrote:
> vc_translate_unicode() and vc_sanitize_unicode() parse input to the
> UTF-8-enabled console, marking invalid byte sequences and producing Unicode
> codepoints. The current algorithm follows ancient Unicode and may accept
> invalid byte sequences, pass on non-existent codepoints and reject valid
> sequences.
>
> The patch restores the functions' compliance with modern Unicode (v15.1 [1]
> + many previous versions) as well as RFC 3629 [2].
> 1. Codepoint space is limited to 0x10FFFF.
Wait, why? And shouldn't this be an individual patch on it's own? What
is wrong with the checking we currently have.
> 2. "Noncharacters", such as U+FFFE, U+FFFF, are no longer invalid in
> Unicode and will be accepted.
Accepted when?
> Another option was to complete the set of
> noncharacters (used to be just those two, now there's more) and preserve
> the rejection step. This is indeed what Unicode suggests (v15.1, chap.
> 23.7) (not requires), but most codepoints are !iswprint(), so selecting
> just the noncharacters seemed arbitrary and futile (and unnecessary).
What is this change going to break with existing systems that were
thinking these were invalid characters?
> On the side:
> 3. Corrected/improved the doc of the two functions (esp. @rescan).
Again, a separate commit. When you have to list the changes out, that
is a huge hint it needs to be broken up into smaller pieces.
thanks,
greg k-h
next prev parent reply other threads:[~2024-01-04 15:28 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-12 15:13 [PATCH] tty/vt: UTF-8 parsing update according to RFC 3629, modern Unicode Roman Žilka
2023-12-12 15:36 ` Greg KH
2023-12-12 16:23 ` [PATCH v2] " Roman Žilka
2023-12-12 20:26 ` [PATCH v3] " Roman Žilka
2024-01-04 15:28 ` Greg KH [this message]
2024-01-09 10:28 ` Roman Žilka
2024-01-09 10:43 ` [PATCH v4] " Roman Žilka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2024010413-quickly-crinkly-6c5b@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=jirislaby@kernel.org \
--cc=linux-serial@vger.kernel.org \
--cc=roman.zilka@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.