From: Junio C Hamano <gitster@pobox.com>
To: Mathias Krause <minipli@grsecurity.net>
Cc: git@vger.kernel.org, "Stephane Odul" <stephane@clumio.com>,
"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: Suspected git grep regression in git 2.40.0 - proposed fix
Date: Thu, 23 Mar 2023 09:19:17 -0700 [thread overview]
Message-ID: <xmqq355va1a2.fsf@gitster.g> (raw)
In-Reply-To: <20230323144000.21146-1-minipli@grsecurity.net> (Mathias Krause's message of "Thu, 23 Mar 2023 15:40:00 +0100")
Mathias Krause <minipli@grsecurity.net> writes:
> ... or fall-back to the previous behaviour and
> ignore Unicode properties (and reintroduce the bug commit acabd2048ee0
> ("grep: correctly identify utf-8 characters with \{b,w} in -P") wanted
> to fix).
>
> I went with the second option and could confirm the below patch fixes
> the segfault on Ubuntu 20.04 which is shipping the broken version.
>
> Junio, what's your call on it? Below patch or simply a revert of commit
> acabd2048ee0? Other ideas?
Thanks for all the investigation and a prompt fix. Very much
appreciated.
As you identified where the breakage ended in the versions of pcre,
I think that the approach the patch I am responding to takes is more
sensible than a straight revert.
But I somehow suspect that it may be better to have the "workaround"
independent of the line acabd204 (grep: correctly identify utf-8
characters with \{b,w} in -P, 2023-01-08) touched, more like how we
"Work around ... fixed in 10.36".
IOW, not like this
> options |= PCRE2_CASELESS;
> }
> - if (!opt->ignore_locale && is_utf8_locale() && !literal)
> + if (!opt->ignore_locale && is_utf8_locale() && !literal) {
> options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF);
> +#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER
> + /*
> + * Work around a JIT bug related to invalid Unicode character
> + * handling fixed in 10.35:
> + * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
> + */
> + options ^= PCRE2_UCP;
> +#endif
> + }
but more like
if (!opt->ignore_locale && is_utf8_locale() && !literal)
options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF);
+#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER
+ /*
+ * Work around a JIT bug related to invalid Unicode character
+ * handling fixed in 10.35:
+ * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
+ */
+ options ^= PCRE2_UCP;
+#endif
+
#ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER
/* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */
if (PCRE2_MATCH_INVALID_UTF && options & (PCRE2_UTF | PCRE2_CASELESS))
That way, no matter how we ended up setting the UCP bit in the
options variable, we would avoid broken UCP handling on problematic
versions, no?
> #ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER
> /* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */
> diff --git a/grep.h b/grep.h
> index 6075f997e68f..c59592e3bdba 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -7,6 +7,9 @@
> #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11
> #define GIT_PCRE2_VERSION_10_36_OR_HIGHER
> #endif
> +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 35) || PCRE2_MAJOR >= 11
> +#define GIT_PCRE2_VERSION_10_35_OR_HIGHER
> +#endif
> #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 34) || PCRE2_MAJOR >= 11
> #define GIT_PCRE2_VERSION_10_34_OR_HIGHER
> #endif
next prev parent reply other threads:[~2023-03-23 16:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-21 8:04 Suspected git grep regression in git 2.40.0 Stephane Odul
2023-03-21 12:33 ` Bagas Sanjaya
2023-03-21 16:33 ` Junio C Hamano
2023-03-21 19:20 ` Mathias Krause
2023-03-21 20:46 ` [EXTERNAL SENDER] " Stephane Odul
2023-03-22 19:52 ` Mathias Krause
2023-03-22 20:04 ` [EXTERNAL SENDER] " Stephane Odul
2023-03-23 14:40 ` Suspected git grep regression in git 2.40.0 - proposed fix Mathias Krause
2023-03-23 16:19 ` Junio C Hamano [this message]
2023-03-23 16:36 ` Mathias Krause
2023-03-23 17:25 ` [PATCH v2] grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34 Mathias Krause
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq355va1a2.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=avarab@gmail.com \
--cc=carenas@gmail.com \
--cc=git@vger.kernel.org \
--cc=minipli@grsecurity.net \
--cc=stephane@clumio.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.