git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>,
	Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH v3] ci: disallow directional formatting
Date: Thu, 04 Nov 2021 14:48:29 +0100	[thread overview]
Message-ID: <211104.86ee7whvoz.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <pull.1071.v3.git.1636031609982.gitgitgadget@gmail.com>


On Thu, Nov 04 2021, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> [...]
>      * Corrected a code comment: my custom git grep was not PCRE-enabled,
>        but Ubuntu's isn't. But git grep -P still does not understand \uNNNN.
>      * Even if the *.po files currently pass the check, the script is now
>        future-proof by excluding them.
> [...]

> +# This is intended to run on an Ubuntu agent in a GitHub workflow.
> +#
> +# To allow translated messages to introduce such directional formatting in the
> +# future, we exclude the `.po` files from this validation.
> +#
> +# Neither GNU grep nor `git grep` (not even with `-P`) handle `\u` as a way to
> +# specify UTF-8.
> +#
> +# To work around that, we use `printf` to produce the pattern as a byte
> +# sequence, and then feed that to `git grep` as a byte sequence (setting
> +# `LC_CTYPE` to make sure that the arguments are interpreted as intended).
> +#
> +# Note: we need to use Bash here because its `printf` interprets `\uNNNN` as
> +# UTF-8 code points, as desired. Running this script through Ubuntu's `dash`,
> +# for example, would use a `printf` that does not understand that syntax.
> +
> +# U+202a..U+2a2e: LRE, RLE, PDF, LRO and RLO
> +# U+2066..U+2069: LRI, RLI, FSI and PDI
> +regex='(\u202a|\u202b|\u202c|\u202d|\u202e|\u2066|\u2067|\u2068|\u2069)'
> +
> +! LC_CTYPE=C git grep -El "$(LC_CTYPE=C.UTF-8 printf "$regex")" \
> +	-- ':(exclude,attr:binary)' ':(exclude)*.po'
>
> base-commit: 0cddd84c9f3e9c3d793ec93034ef679335f35e49

So this is still using not-PCRE instead of the much simpler PCRE-enabled
one I suggested in[1] because your locally-built git doesn't link to
libpcre?

At the very least that comment is still quite confusing. I.e. it starts
out by saying that it's meant to run in CI where we've got PCRE, but
then goes on to describe an elaborate workaround that's only needed with
a not-PCRE grep.

Would be less confusing to understand why it's so complex as:

   # This is intended to run on an Ubuntu agent in a GitHub
   # workflow. We've got PCRE there, so all of the below could also be:
   #
   #    ! git -P grep -nP '[\N{U+202a}-\N{U+202e}\N{U+2066}-\N{U+2069}]' ':!(attr:binary)' ':!**po/**'
   #
   #
   # But the author wanted to run this locally on a system that didn't
   # have PCRE, So [go on to describe elaborate bash / "git grep -E" /
   # LC_* tweaking workaround....]
   [...]

B.t.w. I think that **po/** exclusion is closer to what you want,
i.e. "a 'po' dir anywhere in our tree". It'll also exclude this if we
e.g. end up using these in language-specific README files there or
whatever.

1. https://lore.kernel.org/git/211103.86zgqlhzvz.gmgdl@evledraar.gmail.com/ or:

   ! git -P grep -nP '[\N{U+202a}-\N{U+202e}\N{U+2066}-\N{U+2069}]' ':!(attr:binary)'

  reply	other threads:[~2021-11-04 13:55 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-02 12:58 [PATCH] ci: disallow directional formatting Johannes Schindelin via GitGitGadget
2021-11-02 15:01 ` Ævar Arnfjörð Bjarmason
2021-11-02 15:48   ` Taylor Blau
2021-11-02 16:03     ` Ævar Arnfjörð Bjarmason
2021-11-02 16:12     ` Johannes Schindelin
2021-11-02 16:38       ` Ævar Arnfjörð Bjarmason
2021-11-03 17:20     ` Junio C Hamano
2021-11-03 12:23 ` [PATCH v2] " Johannes Schindelin via GitGitGadget
2021-11-03 16:36   ` Ævar Arnfjörð Bjarmason
2021-11-03 18:00   ` Junio C Hamano
2021-11-03 23:38   ` Junio C Hamano
2021-11-04 10:19     ` Johannes Schindelin
2021-11-04 13:13   ` [PATCH v3] " Johannes Schindelin via GitGitGadget
2021-11-04 13:48     ` Ævar Arnfjörð Bjarmason [this message]
2021-11-04 17:19       ` Junio C Hamano
2021-11-08 18:49         ` Ævar Arnfjörð Bjarmason
2021-11-08 20:08           ` Junio C Hamano
2021-11-09 13:34             ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211104.86ee7whvoz.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).