* [PATCH 1/7] grep: add test script for binary file handling
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
@ 2010-05-13 20:34 ` René Scharfe
2010-05-13 20:36 ` [PATCH 2/7] grep: refactor handling of binary mode options René Scharfe
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:34 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
t/t7008-grep-binary.sh | 30 ++++++++++++++++++++++++++++++
1 files changed, 30 insertions(+), 0 deletions(-)
create mode 100755 t/t7008-grep-binary.sh
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
new file mode 100755
index 0000000..f9fd5e6
--- /dev/null
+++ b/t/t7008-grep-binary.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+test_description='git grep in binary files'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' "
+ printf 'binary\000file\n' >a &&
+ git add a &&
+ git commit -m.
+"
+
+test_expect_success 'git grep ina a' '
+ echo Binary file a matches >expect &&
+ git grep ina a >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'git grep -ah ina a' '
+ git grep -ah ina a >actual &&
+ test_cmp a actual
+'
+
+test_expect_success 'git grep -I ina a' '
+ : >expect &&
+ test_must_fail git grep -I ina a >actual &&
+ test_cmp expect actual
+'
+
+test_done
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/7] grep: refactor handling of binary mode options
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
2010-05-13 20:34 ` [PATCH 1/7] grep: add test script for binary file handling René Scharfe
@ 2010-05-13 20:36 ` René Scharfe
2010-05-13 20:37 ` [PATCH 3/7] grep: --count over binary René Scharfe
` (4 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:36 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
Turn the switch inside-out and add labels for each possible value
of ->binary. This makes the code easier to read and avoids calling
buffer_is_binary() if the option -a was given.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 20 +++++++++++---------
1 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/grep.c b/grep.c
index 543b1d5..2a8e879 100644
--- a/grep.c
+++ b/grep.c
@@ -800,17 +800,19 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
opt->show_hunk_mark = 1;
opt->last_shown = 0;
- if (buffer_is_binary(buf, size)) {
- switch (opt->binary) {
- case GREP_BINARY_DEFAULT:
+ switch (opt->binary) {
+ case GREP_BINARY_DEFAULT:
+ if (buffer_is_binary(buf, size))
binary_match_only = 1;
- break;
- case GREP_BINARY_NOMATCH:
+ break;
+ case GREP_BINARY_NOMATCH:
+ if (buffer_is_binary(buf, size))
return 0; /* Assume unmatch */
- break;
- default:
- break;
- }
+ break;
+ case GREP_BINARY_TEXT:
+ break;
+ default:
+ die("bug: unknown binary handling mode");
}
memset(&xecfg, 0, sizeof(xecfg));
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/7] grep: --count over binary
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
2010-05-13 20:34 ` [PATCH 1/7] grep: add test script for binary file handling René Scharfe
2010-05-13 20:36 ` [PATCH 2/7] grep: refactor handling of binary mode options René Scharfe
@ 2010-05-13 20:37 ` René Scharfe
2010-05-14 9:34 ` Dmitry Potapov
2010-05-13 20:38 ` [PATCH 4/7] grep: use memmem() for fixed string search René Scharfe
` (3 subsequent siblings)
6 siblings, 1 reply; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:37 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
The intent of showing the message "Binary file xyz matches" for
binary files is to avoid annoying users by potentially messing up
their terminals by printing control characters. In --count mode,
this precaution isn't necessary.
Display counts of matches if -c/--count was specified, even if -a
was not given. GNU grep does the same.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 2 +-
t/t7008-grep-binary.sh | 6 ++++++
2 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/grep.c b/grep.c
index 2a8e879..4b6c02e 100644
--- a/grep.c
+++ b/grep.c
@@ -802,7 +802,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
switch (opt->binary) {
case GREP_BINARY_DEFAULT:
- if (buffer_is_binary(buf, size))
+ if (!opt->count && buffer_is_binary(buf, size))
binary_match_only = 1;
break;
case GREP_BINARY_NOMATCH:
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index f9fd5e6..5449dd9 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -27,4 +27,10 @@ test_expect_success 'git grep -I ina a' '
test_cmp expect actual
'
+test_expect_success 'git grep -c ina a' '
+ echo a:1 >expect &&
+ git grep -c ina a >actual &&
+ test_cmp expect actual
+'
+
test_done
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 3/7] grep: --count over binary
2010-05-13 20:37 ` [PATCH 3/7] grep: --count over binary René Scharfe
@ 2010-05-14 9:34 ` Dmitry Potapov
2010-05-16 17:45 ` René Scharfe
0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Potapov @ 2010-05-14 9:34 UTC (permalink / raw)
To: René Scharfe; +Cc: Git Mailing List, Phil Lawrence, Junio C Hamano
On Thu, May 13, 2010 at 10:37:10PM +0200, René Scharfe wrote:
> The intent of showing the message "Binary file xyz matches" for
> binary files is to avoid annoying users by potentially messing up
> their terminals by printing control characters. In --count mode,
> this precaution isn't necessary.
>
> Display counts of matches if -c/--count was specified, even if -a
> was not given. GNU grep does the same.
It is also not necessary with '-l' and '-L' options. (At least, if
we follow GNU grep).
> --- a/grep.c
> +++ b/grep.c
> @@ -802,7 +802,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
>
> switch (opt->binary) {
> case GREP_BINARY_DEFAULT:
> - if (buffer_is_binary(buf, size))
> + if (!opt->count && buffer_is_binary(buf, size))
> binary_match_only = 1;
So, I believe it should be:
if (!opt->count && !opt->name_only && !opt->unmatch_name_only &&
buffer_is_binary(buf, size))
Dmitry
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/7] grep: --count over binary
2010-05-14 9:34 ` Dmitry Potapov
@ 2010-05-16 17:45 ` René Scharfe
0 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-16 17:45 UTC (permalink / raw)
To: Dmitry Potapov; +Cc: Git Mailing List, Phil Lawrence, Junio C Hamano
Am 14.05.2010 11:34, schrieb Dmitry Potapov:
> On Thu, May 13, 2010 at 10:37:10PM +0200, René Scharfe wrote:
>> The intent of showing the message "Binary file xyz matches" for
>> binary files is to avoid annoying users by potentially messing up
>> their terminals by printing control characters. In --count mode,
>> this precaution isn't necessary.
>>
>> Display counts of matches if -c/--count was specified, even if -a
>> was not given. GNU grep does the same.
>
> It is also not necessary with '-l' and '-L' options. (At least, if
> we follow GNU grep).
Good point. The same is true for -q, too. ->unmatch_name_only (-L)
and ->status_only (-q) are already handled correctly because they are
checked before binary_match_only. We can do the same for ->name_only.
-- >8 --
Subject: grep: --name-only over binary
As with the option -c/--count, git grep with the option -l/--name-only
should work the same with binary files as with text files because
there is no danger of messing up the terminal with control characters
from the contents of matching files. GNU grep does the same.
Move the check for ->name_only before the one for binary_match_only,
thus making the latter irrelevant for git grep -l. Also add a simple
test for each of -l, -L and -q. The latter two options were already
handled before binary_match_only, so no code changes were needed to
make them pass.
Reported-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 8 ++++----
t/t7008-grep-binary.sh | 18 ++++++++++++++++++
2 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/grep.c b/grep.c
index 2b2f70e..f292e25 100644
--- a/grep.c
+++ b/grep.c
@@ -885,6 +885,10 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
count++;
if (opt->status_only)
return 1;
+ if (opt->name_only) {
+ show_name(opt, name);
+ return 1;
+ }
if (binary_match_only) {
opt->output(opt, "Binary file ", 12);
output_color(opt, name, strlen(name),
@@ -892,10 +896,6 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
opt->output(opt, " matches\n", 9);
return 1;
}
- if (opt->name_only) {
- show_name(opt, name);
- return 1;
- }
/* Hit at this line. If we haven't shown the
* pre-context lines, we would need to show them.
* When asked to do "count", this still show
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index d8fde18..4f5e74f 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -33,6 +33,24 @@ test_expect_success 'git grep -c ina a' '
test_cmp expect actual
'
+test_expect_success 'git grep -l ina a' '
+ echo a >expect &&
+ git grep -l ina a >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'git grep -L bar a' '
+ echo a >expect &&
+ git grep -L bar a >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'git grep -q ina a' '
+ : >expect &&
+ git grep -q ina a >actual &&
+ test_cmp expect actual
+'
+
test_expect_success 'git grep -F ile a' '
git grep -F ile a
'
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/7] grep: use memmem() for fixed string search
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
` (2 preceding siblings ...)
2010-05-13 20:37 ` [PATCH 3/7] grep: --count over binary René Scharfe
@ 2010-05-13 20:38 ` René Scharfe
2010-05-13 20:39 ` [PATCH 5/7] grep: continue case insensitive fixed string search after NUL chars René Scharfe
` (2 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:38 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
Allow searching beyond NUL characters by using memmem() instead of
strstr().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 16 +++++++++-------
t/t7008-grep-binary.sh | 4 ++++
2 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/grep.c b/grep.c
index 4b6c02e..4633b63 100644
--- a/grep.c
+++ b/grep.c
@@ -329,14 +329,15 @@ static void show_name(struct grep_opt *opt, const char *name)
opt->output(opt, opt->null_following_name ? "\0" : "\n", 1);
}
-
-static int fixmatch(const char *pattern, char *line, int ignore_case, regmatch_t *match)
+static int fixmatch(const char *pattern, char *line, char *eol,
+ int ignore_case, regmatch_t *match)
{
char *hit;
+
if (ignore_case)
hit = strcasestr(line, pattern);
else
- hit = strstr(line, pattern);
+ hit = memmem(line, eol - line, pattern, strlen(pattern));
if (!hit) {
match->rm_so = match->rm_eo = -1;
@@ -399,7 +400,7 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
again:
if (p->fixed)
- hit = !fixmatch(p->pattern, bol, p->ignore_case, pmatch);
+ hit = !fixmatch(p->pattern, bol, eol, p->ignore_case, pmatch);
else
hit = !regexec(&p->regexp, bol, 1, pmatch, eflags);
@@ -725,9 +726,10 @@ static int look_ahead(struct grep_opt *opt,
int hit;
regmatch_t m;
- if (p->fixed)
- hit = !fixmatch(p->pattern, bol, p->ignore_case, &m);
- else {
+ if (p->fixed) {
+ hit = !fixmatch(p->pattern, bol, bol + *left_p,
+ p->ignore_case, &m);
+ } else {
#ifdef REG_STARTEND
m.rm_so = 0;
m.rm_eo = *left_p;
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index 5449dd9..ad97720 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -33,4 +33,8 @@ test_expect_success 'git grep -c ina a' '
test_cmp expect actual
'
+test_expect_success 'git grep -F ile a' '
+ git grep -F ile a
+'
+
test_done
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/7] grep: continue case insensitive fixed string search after NUL chars
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
` (3 preceding siblings ...)
2010-05-13 20:38 ` [PATCH 4/7] grep: use memmem() for fixed string search René Scharfe
@ 2010-05-13 20:39 ` René Scharfe
2010-05-13 20:40 ` [PATCH 6/7] grep: add regmatch(), a wrapper for REG_STARTEND handling René Scharfe
2010-05-13 20:41 ` [PATCH 7/7] grep: use regmatch() for line matching René Scharfe
6 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:39 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
Functions for C strings, like strcasestr(), can't see beyond NUL
characters. Check if there is such an obstacle on the line and try
again behind it.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 12 +++++++++---
t/t7008-grep-binary.sh | 4 ++++
2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/grep.c b/grep.c
index 4633b63..20a02a2 100644
--- a/grep.c
+++ b/grep.c
@@ -334,9 +334,15 @@ static int fixmatch(const char *pattern, char *line, char *eol,
{
char *hit;
- if (ignore_case)
- hit = strcasestr(line, pattern);
- else
+ if (ignore_case) {
+ char *s = line;
+ do {
+ hit = strcasestr(s, pattern);
+ if (hit)
+ break;
+ s += strlen(s) + 1;
+ } while (s < eol);
+ } else
hit = memmem(line, eol - line, pattern, strlen(pattern));
if (!hit) {
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index ad97720..1143903 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -37,4 +37,8 @@ test_expect_success 'git grep -F ile a' '
git grep -F ile a
'
+test_expect_success 'git grep -Fi iLE a' '
+ git grep -Fi iLE a
+'
+
test_done
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 6/7] grep: add regmatch(), a wrapper for REG_STARTEND handling
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
` (4 preceding siblings ...)
2010-05-13 20:39 ` [PATCH 5/7] grep: continue case insensitive fixed string search after NUL chars René Scharfe
@ 2010-05-13 20:40 ` René Scharfe
2010-05-13 20:41 ` [PATCH 7/7] grep: use regmatch() for line matching René Scharfe
6 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:40 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
Refactor REG_STARTEND handling into a new helper, regmatch().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 22 +++++++++++++---------
1 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/grep.c b/grep.c
index 31c0e38..5be72cf 100644
--- a/grep.c
+++ b/grep.c
@@ -359,6 +359,17 @@ static int fixmatch(const char *pattern, char *line, char *eol,
}
}
+static int regmatch(const regex_t *preg, char *line, char *eol,
+ regmatch_t *match, int eflags)
+{
+#ifdef REG_STARTEND
+ match->rm_so = 0;
+ match->rm_eo = eol - line;
+ eflags |= REG_STARTEND;
+#endif
+ return regexec(preg, line, 1, match, eflags);
+}
+
static int strip_timestamp(char *bol, char **eol_p)
{
char *eol = *eol_p;
@@ -738,15 +749,8 @@ static int look_ahead(struct grep_opt *opt,
if (p->fixed) {
hit = !fixmatch(p->pattern, bol, bol + *left_p,
p->ignore_case, &m);
- } else {
-#ifdef REG_STARTEND
- m.rm_so = 0;
- m.rm_eo = *left_p;
- hit = !regexec(&p->regexp, bol, 1, &m, REG_STARTEND);
-#else
- hit = !regexec(&p->regexp, bol, 1, &m, 0);
-#endif
- }
+ } else
+ hit = !regmatch(&p->regexp, bol, bol + *left_p, &m, 0);
if (!hit || m.rm_so < 0 || m.rm_eo < 0)
continue;
if (earliest < 0 || m.rm_so < earliest)
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 7/7] grep: use regmatch() for line matching
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
` (5 preceding siblings ...)
2010-05-13 20:40 ` [PATCH 6/7] grep: add regmatch(), a wrapper for REG_STARTEND handling René Scharfe
@ 2010-05-13 20:41 ` René Scharfe
6 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:41 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
Use regmatch() in match_one_pattern(), allowing regex matching
beyond NUL characters if regexec() supports the flag REG_STARTEND.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 2 +-
t/t7008-grep-binary.sh | 10 ++++++++++
2 files changed, 11 insertions(+), 1 deletions(-)
diff --git a/grep.c b/grep.c
index 5be72cf..9dd2471 100644
--- a/grep.c
+++ b/grep.c
@@ -422,7 +422,7 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
if (p->fixed)
hit = !fixmatch(p->pattern, bol, eol, p->ignore_case, pmatch);
else
- hit = !regexec(&p->regexp, bol, 1, pmatch, eflags);
+ hit = !regmatch(&p->regexp, bol, eol, pmatch, eflags);
if (hit && p->word_regexp) {
if ((pmatch[0].rm_so < 0) ||
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index 1143903..d8fde18 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -41,4 +41,14 @@ test_expect_success 'git grep -Fi iLE a' '
git grep -Fi iLE a
'
+# This test actually passes on platforms where regexec() supports the
+# flag REG_STARTEND.
+test_expect_failure 'git grep ile a' '
+ git grep ile a
+'
+
+test_expect_failure 'git grep .fi a' '
+ git grep .fi a
+'
+
test_done
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread