* [RFC/PATCH] Fix "grep -w"
@ 2006-08-05 5:16 Junio C Hamano
2006-08-05 19:19 ` Morten Welinder
0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2006-08-05 5:16 UTC (permalink / raw)
To: git
We used to find the first match of the pattern and then if the
match is not for the entire word, declared that the whole line
does not match.
But that is wrong. The command "git grep -w -e mmap" should
find that a line "foo_mmap bar mmap baz" matches, by tring the
second instance of pattern "mmap" on the same line.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
builtin-grep.c | 10 ++++++++++
t/t7002-grep.sh | 45 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 0 deletions(-)
diff --git a/builtin-grep.c b/builtin-grep.c
index 69b7c48..b5feda4 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -412,6 +412,7 @@ static int match_one_pattern(struct grep
int hit = 0;
regmatch_t pmatch[10];
+ again:
if (!opt->fixed) {
regex_t *exp = &p->regexp;
hit = !regexec(exp, bol, ARRAY_SIZE(pmatch),
@@ -438,6 +439,15 @@ static int match_one_pattern(struct grep
if (pmatch[0].rm_eo != (eol-bol) &&
word_char(bol[pmatch[0].rm_eo]))
hit = 0;
+
+ if (!hit && pmatch[0].rm_eo + bol < eol) {
+ /* there could be more than one match on the
+ * line, and the first match might not be
+ * strict word match. But later ones could be!
+ */
+ bol += pmatch[0].rm_eo;
+ goto again;
+ }
}
return hit;
}
diff --git a/t/t7002-grep.sh b/t/t7002-grep.sh
new file mode 100755
index 0000000..0a0e302
--- /dev/null
+++ b/t/t7002-grep.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+#
+# Copyright (c) 2006 Junio C Hamano
+#
+
+test_description='git grep -w
+'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+ {
+ echo foo mmap bar
+ echo foo_mmap bar
+ echo foo_mmap bar mmap
+ echo foo mmap bar_mmap
+ echo foo_mmap bar mmap baz
+ } >file &&
+ git add file &&
+ git commit -m initial
+'
+
+test_expect_success 'grep -w HEAD' '
+ git grep -n -w -e mmap HEAD >actual &&
+ {
+ echo HEAD:file:1:foo mmap bar
+ echo HEAD:file:3:foo_mmap bar mmap
+ echo HEAD:file:4:foo mmap bar_mmap
+ echo HEAD:file:5:foo_mmap bar mmap baz
+ } >expected &&
+ diff expected actual
+'
+
+test_expect_success 'grep -w in working tree' '
+ git grep -n -w -e mmap >actual &&
+ {
+ echo file:1:foo mmap bar
+ echo file:3:foo_mmap bar mmap
+ echo file:4:foo mmap bar_mmap
+ echo file:5:foo_mmap bar mmap baz
+ } >expected &&
+ diff expected actual
+'
+
+test_done
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC/PATCH] Fix "grep -w"
2006-08-05 5:16 [RFC/PATCH] Fix "grep -w" Junio C Hamano
@ 2006-08-05 19:19 ` Morten Welinder
2006-08-05 21:08 ` Junio C Hamano
0 siblings, 1 reply; 4+ messages in thread
From: Morten Welinder @ 2006-08-05 19:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
1. Are you sure that going to the end of the first match is correct?
It seems to
me that this will skip matches. Say you search for ".*" on a line that reads
" xxx".
2. What about "^"?
3. What about empty matches? That could take a while...
Morten
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC/PATCH] Fix "grep -w"
2006-08-05 19:19 ` Morten Welinder
@ 2006-08-05 21:08 ` Junio C Hamano
2006-08-06 8:39 ` Junio C Hamano
0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2006-08-05 21:08 UTC (permalink / raw)
To: Morten Welinder; +Cc: git
"Morten Welinder" <mwelinder@gmail.com> writes:
> 1. Are you sure that going to the end of the first match is correct?
> It seems to me that this will skip matches. Say you search
> for ".*" on a line that reads
> " xxx".
It is fine for your example, I think. .* matches the entire
line the first time, and BOL and EOL are defined to be word
boundaries. But you are right. If the pattern is "x xx* x" and
the line is "x x xx x", the first round would match the first 5
bytes, we find that 6th byte 'x' makes it not a word boundary,
and redoing the match starting at 6th is a wrong thing to do.
We should find "x xx x" starting at the 3rd byte.
> 2. What about "^"?
The pattern would not match the second time anyway, so I do not
think it is such a big deal.
But there is another bug I just spotted. git grep -w -e '^x'
matches line "xxx" (when not cheating with external grep).
> 3. What about empty matches? That could take a while...
True. So we would need to make sure we advance at least one.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC/PATCH] Fix "grep -w"
2006-08-05 21:08 ` Junio C Hamano
@ 2006-08-06 8:39 ` Junio C Hamano
0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2006-08-06 8:39 UTC (permalink / raw)
To: git; +Cc: Morten Welinder
We used to find the first match of the pattern and then if the
match is not for the entire word, declared that the whole line
does not match.
But that is wrong. The command "git grep -w -e mmap" should
find that a line "foo_mmap bar mmap baz" matches, by tring the
second instance of pattern "mmap" on the same line.
Problems an earlier round of "fix" had were pointed out by Morten
Welinder, which have been incorporated in the t7002 tests.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
* Second try.
builtin-grep.c | 35 ++++++++++++++++-------
t/t7002-grep.sh | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 110 insertions(+), 10 deletions(-)
diff --git a/builtin-grep.c b/builtin-grep.c
index 69b7c48..93b7e07 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -410,8 +410,10 @@ static int fixmatch(const char *pattern,
static int match_one_pattern(struct grep_opt *opt, struct grep_pat *p, char *bol, char *eol)
{
int hit = 0;
+ int at_true_bol = 1;
regmatch_t pmatch[10];
+ again:
if (!opt->fixed) {
regex_t *exp = &p->regexp;
hit = !regexec(exp, bol, ARRAY_SIZE(pmatch),
@@ -422,22 +424,35 @@ static int match_one_pattern(struct grep
}
if (hit && opt->word_regexp) {
- /* Match beginning must be either
- * beginning of the line, or at word
- * boundary (i.e. the last char must
- * not be alnum or underscore).
- */
if ((pmatch[0].rm_so < 0) ||
(eol - bol) <= pmatch[0].rm_so ||
(pmatch[0].rm_eo < 0) ||
(eol - bol) < pmatch[0].rm_eo)
die("regexp returned nonsense");
- if (pmatch[0].rm_so != 0 &&
- word_char(bol[pmatch[0].rm_so-1]))
- hit = 0;
- if (pmatch[0].rm_eo != (eol-bol) &&
- word_char(bol[pmatch[0].rm_eo]))
+
+ /* Match beginning must be either beginning of the
+ * line, or at word boundary (i.e. the last char must
+ * not be a word char). Similarly, match end must be
+ * either end of the line, or at word boundary
+ * (i.e. the next char must not be a word char).
+ */
+ if ( ((pmatch[0].rm_so == 0 && at_true_bol) ||
+ !word_char(bol[pmatch[0].rm_so-1])) &&
+ ((pmatch[0].rm_eo == (eol-bol)) ||
+ !word_char(bol[pmatch[0].rm_eo])) )
+ ;
+ else
hit = 0;
+
+ if (!hit && pmatch[0].rm_so + bol + 1 < eol) {
+ /* There could be more than one match on the
+ * line, and the first match might not be
+ * strict word match. But later ones could be!
+ */
+ bol = pmatch[0].rm_so + bol + 1;
+ at_true_bol = 0;
+ goto again;
+ }
}
return hit;
}
diff --git a/t/t7002-grep.sh b/t/t7002-grep.sh
new file mode 100755
index 0000000..00a7d76
--- /dev/null
+++ b/t/t7002-grep.sh
@@ -0,0 +1,85 @@
+#!/bin/sh
+#
+# Copyright (c) 2006 Junio C Hamano
+#
+
+test_description='git grep -w
+'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+ {
+ echo foo mmap bar
+ echo foo_mmap bar
+ echo foo_mmap bar mmap
+ echo foo mmap bar_mmap
+ echo foo_mmap bar mmap baz
+ } >file &&
+ echo x x xx x >x &&
+ echo y yy >y &&
+ echo zzz > z &&
+ git add file x y z &&
+ git commit -m initial
+'
+
+for H in HEAD ''
+do
+ case "$H" in
+ HEAD) HC='HEAD:' L='HEAD' ;;
+ '') HC= L='in working tree' ;;
+ esac
+
+ test_expect_success "grep -w $L" '
+ {
+ echo ${HC}file:1:foo mmap bar
+ echo ${HC}file:3:foo_mmap bar mmap
+ echo ${HC}file:4:foo mmap bar_mmap
+ echo ${HC}file:5:foo_mmap bar mmap baz
+ } >expected &&
+ git grep -n -w -e mmap $H >actual &&
+ diff expected actual
+ '
+
+ test_expect_success "grep -w $L (x)" '
+ {
+ echo ${HC}x:1:x x xx x
+ } >expected &&
+ git grep -n -w -e "x xx* x" $H >actual &&
+ diff expected actual
+ '
+
+ test_expect_success "grep -w $L (y-1)" '
+ {
+ echo ${HC}y:1:y yy
+ } >expected &&
+ git grep -n -w -e "^y" $H >actual &&
+ diff expected actual
+ '
+
+ test_expect_success "grep -w $L (y-2)" '
+ : >expected &&
+ if git grep -n -w -e "^y y" $H >actual
+ then
+ echo should not have matched
+ cat actual
+ false
+ else
+ diff expected actual
+ fi
+ '
+
+ test_expect_success "grep -w $L (z)" '
+ : >expected &&
+ if git grep -n -w -e "^z" $H >actual
+ then
+ echo should not have matched
+ cat actual
+ false
+ else
+ diff expected actual
+ fi
+ '
+done
+
+test_done
--
1.4.2.rc3.g19a8a
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-08-06 8:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-05 5:16 [RFC/PATCH] Fix "grep -w" Junio C Hamano
2006-08-05 19:19 ` Morten Welinder
2006-08-05 21:08 ` Junio C Hamano
2006-08-06 8:39 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).