From: Pavel Machek <pavel@ucw.cz>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Eric Wong" <e@80x24.org>,
users@linux.kernel.org, tools@linux.kernel.org,
git@vger.kernel.org
Subject: Re: b4: unicode control characters -- warn or remove?
Date: Mon, 1 Nov 2021 21:49:14 +0100 [thread overview]
Message-ID: <20211101204914.GA16445@duo.ucw.cz> (raw)
In-Reply-To: <20211101202220.dlcebvckeoz6c26k@meerkat.local>
[-- Attachment #1: Type: text/plain, Size: 1429 bytes --]
Hi!
> > It checks whitespace because that's something that's commonly a source
> > of patch corruption. I'm not adverse to adding this to core.whitespace,
> > but trying to catch malicious injected code seems like a rather big
> > expansion of its scope, particularly since:
> >
> > "[...]sending patches for docs actually written in RTL languages[...]"
> >
> > Or just code? People write comment and even in their native languages,
> > and not all projects are as anglo-centric as those hosted on kernel.org.
>
> My comment about docs was purely within the scope of the Linux kernel.
>
> I think the following would be a sane check:
>
> 1. are there unicode control characters (CCs) present?
> 2. are there other characters from RTL languages present in the same line?
>
> if both 1 && 2 are true, this is a legitimate use of Unicode CCs. If only 1 is
> true, then it's likely worth a warning.
>
> Maybe even relax #2 to just check for unicode characters above a certain
> barrier where RTL languages live. I think everyone will agree that if there
> are unicode CCs and no other unicode characters in that same line, it's likely
> not a legitimate use of control characters.
If you are worried about malicious patches, then it should be easy for
attackers to add some RTL characters and escape the check...
Best regards,
Pavel
--
http://www.livejournal.com/~pavelmachek
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
next prev parent reply other threads:[~2021-11-01 20:51 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20211101175020.5r4cwmy4qppi7dis@meerkat.local>
2021-11-01 19:09 ` b4: unicode control characters -- warn or remove? Eric Wong
2021-11-01 19:17 ` Konstantin Ryabitsev
2021-11-01 20:02 ` Ævar Arnfjörð Bjarmason
2021-11-01 20:22 ` Konstantin Ryabitsev
2021-11-01 20:49 ` Pavel Machek [this message]
2021-11-01 21:02 ` Konstantin Ryabitsev
2021-11-02 14:09 ` Konstantin Ryabitsev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211101204914.GA16445@duo.ucw.cz \
--to=pavel@ucw.cz \
--cc=avarab@gmail.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=konstantin@linuxfoundation.org \
--cc=tools@linux.kernel.org \
--cc=users@linux.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).