From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
orygaw <orygaw@protonmail.com>,
rsbecker@nexbridge.com, "Taylor Blau" <me@ttaylorr.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v2] grep.c: remove "extended" in favor of "pattern_expression", fix segfault
Date: Tue, 11 Oct 2022 11:48:45 +0200 [thread overview]
Message-ID: <patch-v2-1.1-6ad7627706f-20221011T094715Z-avarab@gmail.com> (raw)
In-Reply-To: <patch-1.1-f4b90799fce-20221010T165711Z-avarab@gmail.com>
Since 79d3696cfb4 (git-grep: boolean expression on pattern matching.,
2006-06-30) the "pattern_expression" member has been used for complex
queries (AND/OR...), with "pattern_list" being used for the simple OR
queries. Since then we've used both "pattern_expression" and its
associated boolean "extended" member to see if we have a complex
expression.
Since f41fb662f57 (revisions API: have release_revisions() release
"grep_filter", 2022-04-13) we've had a subtle bug relating to that: If
we supplied options that were only used for "complex queries", but
didn't supply the query itself we'd set "opt->extended", but would
have a NULL "pattern_expression". As a result these would segfault as
we tried to call "free_grep_patterns()" from "release_revisions()":
git -P log -1 --invert-grep
git -P log -1 --all-match
The root cause of this is that we were conflating the state management
we needed in "compile_grep_patterns()" itself with whether or not we
had an "opt->pattern_expression" later on.
In this cases as we're going through "compile_grep_patterns()" we have
no "opt->pattern_list" but have "opt->no_body_match" or
"opt->all_match". So we'd set "opt->extended = 1", but not "return" on
"opt->extended" as that's an "else if" in the same "if" statement.
That behavior is intentional and required, as the common case is that
we have an "opt->pattern_list" that we're about to parse into the
"opt->pattern_expression".
But we don't need to keep track of this "extended" flag beyond the
state management in compile_grep_patterns() itself. It needs it, but
once we're out of that function we can rely on
"opt->pattern_expression" being non-NULL instead for using these
extended patterns.
As 79d3696cfb4 itself shows we've assumed that there's a one-to-one
mapping between the two since the very beginning. I.e. "match_line()"
would check "opt->extended" to see if it should call "match_expr()",
and the first thing we do in that function is assume that we have a
"opt->pattern_expression". We'd then call "match_expr_eval()", which
would have died if that "opt->pattern_expression" was NULL.
The "die" was added in c922b01f54c (grep: fix segfault when "git grep
'('" is given, 2009-04-27), and can now be removed as it's now clearly
unreachable. We still do the right thing in the case that prompted
that fix:
git grep '('
fatal: unmatched parenthesis
Arguably neither the "--invert-grep" option added in [1] nor the
earlier "--all-match" option added in [2] were intended to be used
stand-alone, and another approach[3] would be to error out in those
cases. But since we've been treating them as a NOOP when given without
--grep for a long time let's keep doing that.
We could also return in "free_pattern_expr()" if the argument is
non-NULL, as an alternative fix for this segfault does [4]. That would
be more elegant in making the "free_*()" function behave like
"free()", but it would also remove a sanity check: The
"free_pattern_expr()" function calls itself recursively, and only the
top-level is allowed to be NULL, let's not conflate those two
conditions.
1. 22dfa8a23de (log: teach --invert-grep option, 2015-01-12)
2. 0ab7befa31d (grep --all-match, 2006-09-27)
3. https://lore.kernel.org/git/patch-1.1-f4b90799fce-20221010T165711Z-avarab@gmail.com/
4. http://lore.kernel.org/git/7e094882c2a71894416089f894557a9eae07e8f8.1665423686.git.me@ttaylorr.com
Reported-by: orygaw <orygaw@protonmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
Per the v1 feedback this v2 doesn't change any user-observable
behavior, except to address the segfault(s).
Per https://lore.kernel.org/git/Y0Rg9My2EaWl%2FWCU@nand.local/ it
sounds like Taylor's preparing his own v2 with a more narrow fix
focused on https://lore.kernel.org/git/Y0Rg9My2EaWl%2FWCU@nand.local/;
which we may or may not want to have instead of this as a quicker
band-aid.
But this attempts to address the root cause of the problem, which is
that grep.[ch] is effectively juggling two struct members that mean
the same thing, but whose state drifted apart.
Passing CI at: https://github.com/avar/git/actions/runs/3225227595
(actually https://github.com/avar/git/actions/runs/3225813488, but
when I was about to submit this I made a trivial whitespace fix in the
t/*.sh change).
Range-diff against v1:
1: f4b90799fce ! 1: 6ad7627706f log: require --grep for --invert-grep and --all-match, fix segfault
@@ Metadata
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## Commit message ##
- log: require --grep for --invert-grep and --all-match, fix segfault
+ grep.c: remove "extended" in favor of "pattern_expression", fix segfault
- Neither the "--invert-grep" option added in [1] nor the earlier
- "--all-match" option added in [2] were intended to be used
- stand-alone.
+ Since 79d3696cfb4 (git-grep: boolean expression on pattern matching.,
+ 2006-06-30) the "pattern_expression" member has been used for complex
+ queries (AND/OR...), with "pattern_list" being used for the simple OR
+ queries. Since then we've used both "pattern_expression" and its
+ associated boolean "extended" member to see if we have a complex
+ expression.
- But due to how the built-in and the revision API interacted those
- options without the corresponding --grep would be ignored.
+ Since f41fb662f57 (revisions API: have release_revisions() release
+ "grep_filter", 2022-04-13) we've had a subtle bug relating to that: If
+ we supplied options that were only used for "complex queries", but
+ didn't supply the query itself we'd set "opt->extended", but would
+ have a NULL "pattern_expression". As a result these would segfault as
+ we tried to call "free_grep_patterns()" from "release_revisions()":
- Then in f41fb662f57 (revisions API: have release_revisions() release
- "grep_filter", 2022-04-13) this turned into a segfault, as we'd
- attempt to free() the non-existing --grep patterns.
+ git -P log -1 --invert-grep
+ git -P log -1 --all-match
- Arguably it makes more sense to add this check to
- compile_grep_patterns(), since it's possible to use the C API in the
- same way and trigger this segfault. But in practice the revision.c API
- is the only user of "no_body_match", and by placing the check here we
- can more sensibly emit a message that assumes that the user used
- "--invert-grep" without "--grep".
+ The root cause of this is that we were conflating the state management
+ we needed in "compile_grep_patterns()" itself with whether or not we
+ had an "opt->pattern_expression" later on.
+
+ In this cases as we're going through "compile_grep_patterns()" we have
+ no "opt->pattern_list" but have "opt->no_body_match" or
+ "opt->all_match". So we'd set "opt->extended = 1", but not "return" on
+ "opt->extended" as that's an "else if" in the same "if" statement.
+
+ That behavior is intentional and required, as the common case is that
+ we have an "opt->pattern_list" that we're about to parse into the
+ "opt->pattern_expression".
+
+ But we don't need to keep track of this "extended" flag beyond the
+ state management in compile_grep_patterns() itself. It needs it, but
+ once we're out of that function we can rely on
+ "opt->pattern_expression" being non-NULL instead for using these
+ extended patterns.
+
+ As 79d3696cfb4 itself shows we've assumed that there's a one-to-one
+ mapping between the two since the very beginning. I.e. "match_line()"
+ would check "opt->extended" to see if it should call "match_expr()",
+ and the first thing we do in that function is assume that we have a
+ "opt->pattern_expression". We'd then call "match_expr_eval()", which
+ would have died if that "opt->pattern_expression" was NULL.
+
+ The "die" was added in c922b01f54c (grep: fix segfault when "git grep
+ '('" is given, 2009-04-27), and can now be removed as it's now clearly
+ unreachable. We still do the right thing in the case that prompted
+ that fix:
+
+ git grep '('
+ fatal: unmatched parenthesis
+
+ Arguably neither the "--invert-grep" option added in [1] nor the
+ earlier "--all-match" option added in [2] were intended to be used
+ stand-alone, and another approach[3] would be to error out in those
+ cases. But since we've been treating them as a NOOP when given without
+ --grep for a long time let's keep doing that.
+
+ We could also return in "free_pattern_expr()" if the argument is
+ non-NULL, as an alternative fix for this segfault does [4]. That would
+ be more elegant in making the "free_*()" function behave like
+ "free()", but it would also remove a sanity check: The
+ "free_pattern_expr()" function calls itself recursively, and only the
+ top-level is allowed to be NULL, let's not conflate those two
+ conditions.
1. 22dfa8a23de (log: teach --invert-grep option, 2015-01-12)
2. 0ab7befa31d (grep --all-match, 2006-09-27)
+ 3. https://lore.kernel.org/git/patch-1.1-f4b90799fce-20221010T165711Z-avarab@gmail.com/
+ 4. http://lore.kernel.org/git/7e094882c2a71894416089f894557a9eae07e8f8.1665423686.git.me@ttaylorr.com
Reported-by: orygaw <orygaw@protonmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
- ## revision.c ##
-@@ revision.c: int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
- die(_("options '%s' and '%s' cannot be used together"), "--no-walk", "--graph");
- if (!revs->reflog_info && revs->grep_filter.use_reflog_filter)
- die(_("the option '%s' requires '%s'"), "--grep-reflog", "--walk-reflogs");
-+ if (!revs->grep_filter.pattern_expression) {
-+ if (revs->grep_filter.no_body_match)
-+ die(_("the option '%s' requires '%s'"), "--invert-grep", "--grep");
-+ if (revs->grep_filter.all_match)
-+ die(_("the option '%s' requires '%s'"), "--all-match", "--grep");
-+ }
+ ## grep.c ##
+@@ grep.c: void compile_grep_patterns(struct grep_opt *opt)
+ {
+ struct grep_pat *p;
+ struct grep_expr *header_expr = prep_header_patterns(opt);
++ int extended = 0;
+
+ for (p = opt->pattern_list; p; p = p->next) {
+ switch (p->token) {
+@@ grep.c: void compile_grep_patterns(struct grep_opt *opt)
+ compile_regexp(p, opt);
+ break;
+ default:
+- opt->extended = 1;
++ extended = 1;
+ break;
+ }
+ }
+
+ if (opt->all_match || opt->no_body_match || header_expr)
+- opt->extended = 1;
+- else if (!opt->extended)
++ extended = 1;
++ else if (!extended)
+ return;
+
+ p = opt->pattern_list;
+@@ grep.c: void free_grep_patterns(struct grep_opt *opt)
+ free(p);
+ }
- if (revs->line_level_traverse &&
- (revs->diffopt.output_format & ~(DIFF_FORMAT_PATCH | DIFF_FORMAT_NO_OUTPUT)))
+- if (!opt->extended)
++ if (!opt->pattern_expression)
+ return;
+ free_pattern_expr(opt->pattern_expression);
+ }
+@@ grep.c: static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x,
+ {
+ int h = 0;
+
+- if (!x)
+- die("Not a valid grep expression");
+ switch (x->node) {
+ case GREP_NODE_TRUE:
+ h = 1;
+@@ grep.c: static int match_line(struct grep_opt *opt,
+ struct grep_pat *p;
+ int hit = 0;
+
+- if (opt->extended)
++ if (opt->pattern_expression)
+ return match_expr(opt, bol, eol, ctx, col, icol,
+ collect_hits);
+
+@@ grep.c: static int should_lookahead(struct grep_opt *opt)
+ {
+ struct grep_pat *p;
+
+- if (opt->extended)
++ if (opt->pattern_expression)
+ return 0; /* punt for too complex stuff */
+ if (opt->invert)
+ return 0;
+
+ ## grep.h ##
+@@ grep.h: struct grep_opt {
+ #define GREP_BINARY_TEXT 2
+ int binary;
+ int allow_textconv;
+- int extended;
+ int use_reflog_filter;
+ int relative;
+ int pathname;
## t/t4202-log.sh ##
@@ t/t4202-log.sh: test_expect_success 'log --grep' '
test_cmp expect actual
'
-+test_expect_success 'log --invert-grep usage' '
-+ test_expect_code 128 git log --invert-grep
-+'
++for noop_opt in --invert-grep --all-match
++do
++ test_expect_success "log $noop_opt without --grep is a NOOP" '
++ git log >expect &&
++ git log $noop_opt >actual &&
++ test_cmp expect actual
++ '
++done
+
cat > expect << EOF
second
initial
-
- ## t/t7810-grep.sh ##
-@@ t/t7810-grep.sh: test_expect_success 'log with multiple --grep uses union' '
- test_cmp expect actual
- '
-
-+test_expect_success 'log --all-match usage' '
-+ test_expect_code 128 git log --all-match
-+'
-+
- test_expect_success 'log --all-match with multiple --grep uses intersection' '
- git log --all-match --grep=i --grep=r --format=%s >actual &&
- {
grep.c | 15 +++++++--------
grep.h | 1 -
t/t4202-log.sh | 9 +++++++++
3 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/grep.c b/grep.c
index 52a894c9890..06eed694936 100644
--- a/grep.c
+++ b/grep.c
@@ -708,6 +708,7 @@ void compile_grep_patterns(struct grep_opt *opt)
{
struct grep_pat *p;
struct grep_expr *header_expr = prep_header_patterns(opt);
+ int extended = 0;
for (p = opt->pattern_list; p; p = p->next) {
switch (p->token) {
@@ -717,14 +718,14 @@ void compile_grep_patterns(struct grep_opt *opt)
compile_regexp(p, opt);
break;
default:
- opt->extended = 1;
+ extended = 1;
break;
}
}
if (opt->all_match || opt->no_body_match || header_expr)
- opt->extended = 1;
- else if (!opt->extended)
+ extended = 1;
+ else if (!extended)
return;
p = opt->pattern_list;
@@ -790,7 +791,7 @@ void free_grep_patterns(struct grep_opt *opt)
free(p);
}
- if (!opt->extended)
+ if (!opt->pattern_expression)
return;
free_pattern_expr(opt->pattern_expression);
}
@@ -971,8 +972,6 @@ static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x,
{
int h = 0;
- if (!x)
- die("Not a valid grep expression");
switch (x->node) {
case GREP_NODE_TRUE:
h = 1;
@@ -1052,7 +1051,7 @@ static int match_line(struct grep_opt *opt,
struct grep_pat *p;
int hit = 0;
- if (opt->extended)
+ if (opt->pattern_expression)
return match_expr(opt, bol, eol, ctx, col, icol,
collect_hits);
@@ -1370,7 +1369,7 @@ static int should_lookahead(struct grep_opt *opt)
{
struct grep_pat *p;
- if (opt->extended)
+ if (opt->pattern_expression)
return 0; /* punt for too complex stuff */
if (opt->invert)
return 0;
diff --git a/grep.h b/grep.h
index bdcadce61b8..6075f997e68 100644
--- a/grep.h
+++ b/grep.h
@@ -151,7 +151,6 @@ struct grep_opt {
#define GREP_BINARY_TEXT 2
int binary;
int allow_textconv;
- int extended;
int use_reflog_filter;
int relative;
int pathname;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index cc15cb4ff62..2ce2b41174d 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -249,6 +249,15 @@ test_expect_success 'log --grep' '
test_cmp expect actual
'
+for noop_opt in --invert-grep --all-match
+do
+ test_expect_success "log $noop_opt without --grep is a NOOP" '
+ git log >expect &&
+ git log $noop_opt >actual &&
+ test_cmp expect actual
+ '
+done
+
cat > expect << EOF
second
initial
--
2.38.0.971.ge79ff6d20e7
next prev parent reply other threads:[~2022-10-11 9:48 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-10 14:33 Git BUG 2.37.3 and 2.38.0 orygaw
2022-10-10 15:40 ` rsbecker
2022-10-10 15:48 ` Taylor Blau
2022-10-10 17:02 ` Junio C Hamano
2022-10-10 16:57 ` [PATCH] log: require --grep for --invert-grep and --all-match, fix segfault Ævar Arnfjörð Bjarmason
2022-10-10 17:34 ` Junio C Hamano
2022-10-10 17:45 ` Junio C Hamano
2022-10-10 18:48 ` Ævar Arnfjörð Bjarmason
2022-10-10 19:00 ` Taylor Blau
2022-10-11 9:48 ` Ævar Arnfjörð Bjarmason [this message]
2022-10-11 15:46 ` [PATCH v2] grep.c: remove "extended" in favor of "pattern_expression", " Junio C Hamano
2022-10-10 17:41 ` [PATCH 0/2] grep: tolerate NULL argument to free_grep_expr() Taylor Blau
2022-10-10 17:41 ` [PATCH 1/2] t4202: demonstrate `git log --invert-grep` segfault Taylor Blau
2022-10-10 17:41 ` [PATCH 2/2] grep.c: tolerate NULL grep_expr in free_pattern_expr() Taylor Blau
2022-10-10 17:54 ` Junio C Hamano
2022-10-10 18:10 ` Taylor Blau
2022-10-10 18:11 ` Junio C Hamano
2022-10-10 18:14 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=patch-v2-1.1-6ad7627706f-20221011T094715Z-avarab@gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=orygaw@protonmail.com \
--cc=rsbecker@nexbridge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).