From: Hamza Mahfooz <someguy@effective-light.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
"René Scharfe" <l.s.r@web.de>,
"Andreas Schwab" <schwab@linux-m68k.org>,
"Hamza Mahfooz" <someguy@effective-light.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH 1/2] grep/pcre2: limit the instances in which UTF mode is enabled
Date: Thu, 18 Nov 2021 03:41:42 -0500 [thread overview]
Message-ID: <20211118084143.279174-1-someguy@effective-light.com> (raw)
UTF mode is enabled for cases that cause older versions of PCRE to break.
This is primarily due to the fact that we can't make as many assumptions on
the kind of data that is fed to "git grep." So, limit when UTF mode can be
enabled by introducing "is_log" to struct grep_opt, checking to see if it's
a non-zero value in compile_pcre2_pattern() and only mutating it in
cmd_log() so that we know "git log" was invoked if it's set to a non-zero
value.
Fixes: ae39ba431a (grep/pcre2: fix an edge case concerning ascii patterns
and UTF-8 data, 2021-10-15)
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
---
builtin/log.c | 1 +
grep.c | 2 +-
grep.h | 1 +
t/t7812-grep-icase-non-ascii.sh | 2 +-
4 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/builtin/log.c b/builtin/log.c
index f75d87e8d7..040b0b533f 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -751,6 +751,7 @@ int cmd_log(int argc, const char **argv, const char *prefix)
git_config(git_log_config, NULL);
repo_init_revisions(the_repository, &rev, prefix);
+ rev.grep_filter.is_log = 1;
rev.always_show_header = 1;
memset(&opt, 0, sizeof(opt));
opt.def = "HEAD";
diff --git a/grep.c b/grep.c
index f6e113e9f0..665d86f007 100644
--- a/grep.c
+++ b/grep.c
@@ -382,7 +382,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
}
options |= PCRE2_CASELESS;
}
- if ((!opt->ignore_locale && !has_non_ascii(p->pattern)) ||
+ if ((opt->is_log && !opt->ignore_locale && !has_non_ascii(p->pattern)) ||
(!opt->ignore_locale && is_utf8_locale() &&
has_non_ascii(p->pattern) && !(!opt->ignore_case &&
(p->fixed || p->is_fixed))))
diff --git a/grep.h b/grep.h
index 3e8815c347..64634c6a3f 100644
--- a/grep.h
+++ b/grep.h
@@ -167,6 +167,7 @@ struct grep_opt {
int extended_regexp_option;
int pattern_type_option;
int ignore_locale;
+ int is_log;
char colors[NR_GREP_COLORS][COLOR_MAXLEN];
unsigned pre_context;
unsigned post_context;
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index 22487d90fd..1da6b07a57 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -60,7 +60,7 @@ test_expect_success GETTEXT_LOCALE,PCRE 'log --author with an ascii pattern on U
test_write_lines "forth" >file4 &&
git add file4 &&
git commit --author="À Ú Thor <author@example.com>" -m sécond &&
- git log -1 --color=always --perl-regexp --author=".*Thor" >log &&
+ git log -1 --color=always --perl-regexp --author=". . Thor" >log &&
grep Author log >actual.raw &&
test_decode_color <actual.raw >actual &&
test_cmp expected actual
--
2.34.0
next reply other threads:[~2021-11-18 8:44 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-18 8:41 Hamza Mahfooz [this message]
2021-11-18 8:41 ` [PATCH 2/2] ci: add a job for PCRE2 Hamza Mahfooz
2021-11-18 9:53 ` [PATCH v2 " Hamza Mahfooz
2021-11-18 10:32 ` [PATCH " Ævar Arnfjörð Bjarmason
2021-11-22 22:26 ` Hamza Mahfooz
2021-11-18 10:04 ` [PATCH 1/2] grep/pcre2: limit the instances in which UTF mode is enabled Carlo Arenas
2021-11-18 19:40 ` Carlo Marcelo Arenas Belón
2021-11-18 20:53 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211118084143.279174-1-someguy@effective-light.com \
--to=someguy@effective-light.com \
--cc=avarab@gmail.com \
--cc=carenas@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=l.s.r@web.de \
--cc=schwab@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).