* [PATCH 0/7] grep: better support for binary files
@ 2010-05-13 20:33 René Scharfe
2010-05-13 20:34 ` [PATCH 1/7] grep: add test script for binary file handling René Scharfe
` (6 more replies)
0 siblings, 7 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:33 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
This series improves support of git grep for binary files. It tries to
pick the low hanging fruits; at the end you can search _in_ files that
contain NUL characters, but you can't search _for_ NULs, yet.
[PATCH 1/7] grep: add test script for binary file handling
This patch adds a simple test script documenting what git grep
can do with binary files.
[PATCH 2/7] grep: refactor handling of binary mode options
[PATCH 3/7] grep: --count over binary
These two makes git grep handle counting in binary files like
GNU grep does.
[PATCH 4/7] grep: use memmem() for fixed string search
[PATCH 5/7] grep: continue case insensitive fixed string search after NUL chars
These two patches make git grep -F work on binary files.
[PATCH 6/7] grep: add regmatch(), a wrapper for REG_STARTEND handling
[PATCH 7/7] grep: use regmatch() for line matching
The final patches make git grep work on binary files if the
platform's regexec() supports the flag REG_STARTEND. Our own
version in compat/ doesn't, unfortunately.
grep.c | 70 ++++++++++++++++++++++++++++-------------------
t/t7008-grep-binary.sh | 54 +++++++++++++++++++++++++++++++++++++
2 files changed, 96 insertions(+), 28 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/7] grep: add test script for binary file handling
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
@ 2010-05-13 20:34 ` René Scharfe
2010-05-13 20:36 ` [PATCH 2/7] grep: refactor handling of binary mode options René Scharfe
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:34 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
t/t7008-grep-binary.sh | 30 ++++++++++++++++++++++++++++++
1 files changed, 30 insertions(+), 0 deletions(-)
create mode 100755 t/t7008-grep-binary.sh
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
new file mode 100755
index 0000000..f9fd5e6
--- /dev/null
+++ b/t/t7008-grep-binary.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+test_description='git grep in binary files'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' "
+ printf 'binary\000file\n' >a &&
+ git add a &&
+ git commit -m.
+"
+
+test_expect_success 'git grep ina a' '
+ echo Binary file a matches >expect &&
+ git grep ina a >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'git grep -ah ina a' '
+ git grep -ah ina a >actual &&
+ test_cmp a actual
+'
+
+test_expect_success 'git grep -I ina a' '
+ : >expect &&
+ test_must_fail git grep -I ina a >actual &&
+ test_cmp expect actual
+'
+
+test_done
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/7] grep: refactor handling of binary mode options
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
2010-05-13 20:34 ` [PATCH 1/7] grep: add test script for binary file handling René Scharfe
@ 2010-05-13 20:36 ` René Scharfe
2010-05-13 20:37 ` [PATCH 3/7] grep: --count over binary René Scharfe
` (4 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:36 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
Turn the switch inside-out and add labels for each possible value
of ->binary. This makes the code easier to read and avoids calling
buffer_is_binary() if the option -a was given.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 20 +++++++++++---------
1 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/grep.c b/grep.c
index 543b1d5..2a8e879 100644
--- a/grep.c
+++ b/grep.c
@@ -800,17 +800,19 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
opt->show_hunk_mark = 1;
opt->last_shown = 0;
- if (buffer_is_binary(buf, size)) {
- switch (opt->binary) {
- case GREP_BINARY_DEFAULT:
+ switch (opt->binary) {
+ case GREP_BINARY_DEFAULT:
+ if (buffer_is_binary(buf, size))
binary_match_only = 1;
- break;
- case GREP_BINARY_NOMATCH:
+ break;
+ case GREP_BINARY_NOMATCH:
+ if (buffer_is_binary(buf, size))
return 0; /* Assume unmatch */
- break;
- default:
- break;
- }
+ break;
+ case GREP_BINARY_TEXT:
+ break;
+ default:
+ die("bug: unknown binary handling mode");
}
memset(&xecfg, 0, sizeof(xecfg));
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/7] grep: --count over binary
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
2010-05-13 20:34 ` [PATCH 1/7] grep: add test script for binary file handling René Scharfe
2010-05-13 20:36 ` [PATCH 2/7] grep: refactor handling of binary mode options René Scharfe
@ 2010-05-13 20:37 ` René Scharfe
2010-05-14 9:34 ` Dmitry Potapov
2010-05-13 20:38 ` [PATCH 4/7] grep: use memmem() for fixed string search René Scharfe
` (3 subsequent siblings)
6 siblings, 1 reply; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:37 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
The intent of showing the message "Binary file xyz matches" for
binary files is to avoid annoying users by potentially messing up
their terminals by printing control characters. In --count mode,
this precaution isn't necessary.
Display counts of matches if -c/--count was specified, even if -a
was not given. GNU grep does the same.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 2 +-
t/t7008-grep-binary.sh | 6 ++++++
2 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/grep.c b/grep.c
index 2a8e879..4b6c02e 100644
--- a/grep.c
+++ b/grep.c
@@ -802,7 +802,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
switch (opt->binary) {
case GREP_BINARY_DEFAULT:
- if (buffer_is_binary(buf, size))
+ if (!opt->count && buffer_is_binary(buf, size))
binary_match_only = 1;
break;
case GREP_BINARY_NOMATCH:
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index f9fd5e6..5449dd9 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -27,4 +27,10 @@ test_expect_success 'git grep -I ina a' '
test_cmp expect actual
'
+test_expect_success 'git grep -c ina a' '
+ echo a:1 >expect &&
+ git grep -c ina a >actual &&
+ test_cmp expect actual
+'
+
test_done
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/7] grep: use memmem() for fixed string search
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
` (2 preceding siblings ...)
2010-05-13 20:37 ` [PATCH 3/7] grep: --count over binary René Scharfe
@ 2010-05-13 20:38 ` René Scharfe
2010-05-13 20:39 ` [PATCH 5/7] grep: continue case insensitive fixed string search after NUL chars René Scharfe
` (2 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:38 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
Allow searching beyond NUL characters by using memmem() instead of
strstr().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 16 +++++++++-------
t/t7008-grep-binary.sh | 4 ++++
2 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/grep.c b/grep.c
index 4b6c02e..4633b63 100644
--- a/grep.c
+++ b/grep.c
@@ -329,14 +329,15 @@ static void show_name(struct grep_opt *opt, const char *name)
opt->output(opt, opt->null_following_name ? "\0" : "\n", 1);
}
-
-static int fixmatch(const char *pattern, char *line, int ignore_case, regmatch_t *match)
+static int fixmatch(const char *pattern, char *line, char *eol,
+ int ignore_case, regmatch_t *match)
{
char *hit;
+
if (ignore_case)
hit = strcasestr(line, pattern);
else
- hit = strstr(line, pattern);
+ hit = memmem(line, eol - line, pattern, strlen(pattern));
if (!hit) {
match->rm_so = match->rm_eo = -1;
@@ -399,7 +400,7 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
again:
if (p->fixed)
- hit = !fixmatch(p->pattern, bol, p->ignore_case, pmatch);
+ hit = !fixmatch(p->pattern, bol, eol, p->ignore_case, pmatch);
else
hit = !regexec(&p->regexp, bol, 1, pmatch, eflags);
@@ -725,9 +726,10 @@ static int look_ahead(struct grep_opt *opt,
int hit;
regmatch_t m;
- if (p->fixed)
- hit = !fixmatch(p->pattern, bol, p->ignore_case, &m);
- else {
+ if (p->fixed) {
+ hit = !fixmatch(p->pattern, bol, bol + *left_p,
+ p->ignore_case, &m);
+ } else {
#ifdef REG_STARTEND
m.rm_so = 0;
m.rm_eo = *left_p;
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index 5449dd9..ad97720 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -33,4 +33,8 @@ test_expect_success 'git grep -c ina a' '
test_cmp expect actual
'
+test_expect_success 'git grep -F ile a' '
+ git grep -F ile a
+'
+
test_done
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/7] grep: continue case insensitive fixed string search after NUL chars
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
` (3 preceding siblings ...)
2010-05-13 20:38 ` [PATCH 4/7] grep: use memmem() for fixed string search René Scharfe
@ 2010-05-13 20:39 ` René Scharfe
2010-05-13 20:40 ` [PATCH 6/7] grep: add regmatch(), a wrapper for REG_STARTEND handling René Scharfe
2010-05-13 20:41 ` [PATCH 7/7] grep: use regmatch() for line matching René Scharfe
6 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:39 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
Functions for C strings, like strcasestr(), can't see beyond NUL
characters. Check if there is such an obstacle on the line and try
again behind it.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 12 +++++++++---
t/t7008-grep-binary.sh | 4 ++++
2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/grep.c b/grep.c
index 4633b63..20a02a2 100644
--- a/grep.c
+++ b/grep.c
@@ -334,9 +334,15 @@ static int fixmatch(const char *pattern, char *line, char *eol,
{
char *hit;
- if (ignore_case)
- hit = strcasestr(line, pattern);
- else
+ if (ignore_case) {
+ char *s = line;
+ do {
+ hit = strcasestr(s, pattern);
+ if (hit)
+ break;
+ s += strlen(s) + 1;
+ } while (s < eol);
+ } else
hit = memmem(line, eol - line, pattern, strlen(pattern));
if (!hit) {
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index ad97720..1143903 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -37,4 +37,8 @@ test_expect_success 'git grep -F ile a' '
git grep -F ile a
'
+test_expect_success 'git grep -Fi iLE a' '
+ git grep -Fi iLE a
+'
+
test_done
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 6/7] grep: add regmatch(), a wrapper for REG_STARTEND handling
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
` (4 preceding siblings ...)
2010-05-13 20:39 ` [PATCH 5/7] grep: continue case insensitive fixed string search after NUL chars René Scharfe
@ 2010-05-13 20:40 ` René Scharfe
2010-05-13 20:41 ` [PATCH 7/7] grep: use regmatch() for line matching René Scharfe
6 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:40 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
Refactor REG_STARTEND handling into a new helper, regmatch().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 22 +++++++++++++---------
1 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/grep.c b/grep.c
index 31c0e38..5be72cf 100644
--- a/grep.c
+++ b/grep.c
@@ -359,6 +359,17 @@ static int fixmatch(const char *pattern, char *line, char *eol,
}
}
+static int regmatch(const regex_t *preg, char *line, char *eol,
+ regmatch_t *match, int eflags)
+{
+#ifdef REG_STARTEND
+ match->rm_so = 0;
+ match->rm_eo = eol - line;
+ eflags |= REG_STARTEND;
+#endif
+ return regexec(preg, line, 1, match, eflags);
+}
+
static int strip_timestamp(char *bol, char **eol_p)
{
char *eol = *eol_p;
@@ -738,15 +749,8 @@ static int look_ahead(struct grep_opt *opt,
if (p->fixed) {
hit = !fixmatch(p->pattern, bol, bol + *left_p,
p->ignore_case, &m);
- } else {
-#ifdef REG_STARTEND
- m.rm_so = 0;
- m.rm_eo = *left_p;
- hit = !regexec(&p->regexp, bol, 1, &m, REG_STARTEND);
-#else
- hit = !regexec(&p->regexp, bol, 1, &m, 0);
-#endif
- }
+ } else
+ hit = !regmatch(&p->regexp, bol, bol + *left_p, &m, 0);
if (!hit || m.rm_so < 0 || m.rm_eo < 0)
continue;
if (earliest < 0 || m.rm_so < earliest)
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 7/7] grep: use regmatch() for line matching
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
` (5 preceding siblings ...)
2010-05-13 20:40 ` [PATCH 6/7] grep: add regmatch(), a wrapper for REG_STARTEND handling René Scharfe
@ 2010-05-13 20:41 ` René Scharfe
6 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-13 20:41 UTC (permalink / raw)
To: Git Mailing List; +Cc: Phil Lawrence, Junio C Hamano
Use regmatch() in match_one_pattern(), allowing regex matching
beyond NUL characters if regexec() supports the flag REG_STARTEND.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 2 +-
t/t7008-grep-binary.sh | 10 ++++++++++
2 files changed, 11 insertions(+), 1 deletions(-)
diff --git a/grep.c b/grep.c
index 5be72cf..9dd2471 100644
--- a/grep.c
+++ b/grep.c
@@ -422,7 +422,7 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
if (p->fixed)
hit = !fixmatch(p->pattern, bol, eol, p->ignore_case, pmatch);
else
- hit = !regexec(&p->regexp, bol, 1, pmatch, eflags);
+ hit = !regmatch(&p->regexp, bol, eol, pmatch, eflags);
if (hit && p->word_regexp) {
if ((pmatch[0].rm_so < 0) ||
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index 1143903..d8fde18 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -41,4 +41,14 @@ test_expect_success 'git grep -Fi iLE a' '
git grep -Fi iLE a
'
+# This test actually passes on platforms where regexec() supports the
+# flag REG_STARTEND.
+test_expect_failure 'git grep ile a' '
+ git grep ile a
+'
+
+test_expect_failure 'git grep .fi a' '
+ git grep .fi a
+'
+
test_done
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 3/7] grep: --count over binary
2010-05-13 20:37 ` [PATCH 3/7] grep: --count over binary René Scharfe
@ 2010-05-14 9:34 ` Dmitry Potapov
2010-05-16 17:45 ` René Scharfe
0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Potapov @ 2010-05-14 9:34 UTC (permalink / raw)
To: René Scharfe; +Cc: Git Mailing List, Phil Lawrence, Junio C Hamano
On Thu, May 13, 2010 at 10:37:10PM +0200, René Scharfe wrote:
> The intent of showing the message "Binary file xyz matches" for
> binary files is to avoid annoying users by potentially messing up
> their terminals by printing control characters. In --count mode,
> this precaution isn't necessary.
>
> Display counts of matches if -c/--count was specified, even if -a
> was not given. GNU grep does the same.
It is also not necessary with '-l' and '-L' options. (At least, if
we follow GNU grep).
> --- a/grep.c
> +++ b/grep.c
> @@ -802,7 +802,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
>
> switch (opt->binary) {
> case GREP_BINARY_DEFAULT:
> - if (buffer_is_binary(buf, size))
> + if (!opt->count && buffer_is_binary(buf, size))
> binary_match_only = 1;
So, I believe it should be:
if (!opt->count && !opt->name_only && !opt->unmatch_name_only &&
buffer_is_binary(buf, size))
Dmitry
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/7] grep: --count over binary
2010-05-14 9:34 ` Dmitry Potapov
@ 2010-05-16 17:45 ` René Scharfe
0 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2010-05-16 17:45 UTC (permalink / raw)
To: Dmitry Potapov; +Cc: Git Mailing List, Phil Lawrence, Junio C Hamano
Am 14.05.2010 11:34, schrieb Dmitry Potapov:
> On Thu, May 13, 2010 at 10:37:10PM +0200, René Scharfe wrote:
>> The intent of showing the message "Binary file xyz matches" for
>> binary files is to avoid annoying users by potentially messing up
>> their terminals by printing control characters. In --count mode,
>> this precaution isn't necessary.
>>
>> Display counts of matches if -c/--count was specified, even if -a
>> was not given. GNU grep does the same.
>
> It is also not necessary with '-l' and '-L' options. (At least, if
> we follow GNU grep).
Good point. The same is true for -q, too. ->unmatch_name_only (-L)
and ->status_only (-q) are already handled correctly because they are
checked before binary_match_only. We can do the same for ->name_only.
-- >8 --
Subject: grep: --name-only over binary
As with the option -c/--count, git grep with the option -l/--name-only
should work the same with binary files as with text files because
there is no danger of messing up the terminal with control characters
from the contents of matching files. GNU grep does the same.
Move the check for ->name_only before the one for binary_match_only,
thus making the latter irrelevant for git grep -l. Also add a simple
test for each of -l, -L and -q. The latter two options were already
handled before binary_match_only, so no code changes were needed to
make them pass.
Reported-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
grep.c | 8 ++++----
t/t7008-grep-binary.sh | 18 ++++++++++++++++++
2 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/grep.c b/grep.c
index 2b2f70e..f292e25 100644
--- a/grep.c
+++ b/grep.c
@@ -885,6 +885,10 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
count++;
if (opt->status_only)
return 1;
+ if (opt->name_only) {
+ show_name(opt, name);
+ return 1;
+ }
if (binary_match_only) {
opt->output(opt, "Binary file ", 12);
output_color(opt, name, strlen(name),
@@ -892,10 +896,6 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
opt->output(opt, " matches\n", 9);
return 1;
}
- if (opt->name_only) {
- show_name(opt, name);
- return 1;
- }
/* Hit at this line. If we haven't shown the
* pre-context lines, we would need to show them.
* When asked to do "count", this still show
diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index d8fde18..4f5e74f 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -33,6 +33,24 @@ test_expect_success 'git grep -c ina a' '
test_cmp expect actual
'
+test_expect_success 'git grep -l ina a' '
+ echo a >expect &&
+ git grep -l ina a >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'git grep -L bar a' '
+ echo a >expect &&
+ git grep -L bar a >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'git grep -q ina a' '
+ : >expect &&
+ git grep -q ina a >actual &&
+ test_cmp expect actual
+'
+
test_expect_success 'git grep -F ile a' '
git grep -F ile a
'
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-05-16 17:45 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-13 20:33 [PATCH 0/7] grep: better support for binary files René Scharfe
2010-05-13 20:34 ` [PATCH 1/7] grep: add test script for binary file handling René Scharfe
2010-05-13 20:36 ` [PATCH 2/7] grep: refactor handling of binary mode options René Scharfe
2010-05-13 20:37 ` [PATCH 3/7] grep: --count over binary René Scharfe
2010-05-14 9:34 ` Dmitry Potapov
2010-05-16 17:45 ` René Scharfe
2010-05-13 20:38 ` [PATCH 4/7] grep: use memmem() for fixed string search René Scharfe
2010-05-13 20:39 ` [PATCH 5/7] grep: continue case insensitive fixed string search after NUL chars René Scharfe
2010-05-13 20:40 ` [PATCH 6/7] grep: add regmatch(), a wrapper for REG_STARTEND handling René Scharfe
2010-05-13 20:41 ` [PATCH 7/7] grep: use regmatch() for line matching René Scharfe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).