git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Conrad Irwin <conrad.irwin@gmail.com>,
	git@vger.kernel.org, Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Dov Grobgeld <dov.grobgeld@gmail.com>
Subject: Re: [PATCH] Don't search files with an unset "grep" attribute
Date: Wed, 1 Feb 2012 03:20:05 -0500	[thread overview]
Message-ID: <20120201082005.GA32348@sigill.intra.peff.net> (raw)
In-Reply-To: <7vhazb3rtm.fsf@alter.siamese.dyndns.org>

On Wed, Feb 01, 2012 at 12:01:41AM -0800, Junio C Hamano wrote:

> > So if this was all spelled:
> >
> >   $ cat .gitattributes
> >   *.pdf filetype=pdf
> >   $ cat .git/config
> >   [filetype "pdf"]
> >           binary = true
> >           textconv = pdf2txt
> >
> > I think it would be a no-brainer that those type attributes should apply
> > to "git grep".
> 
> I think this discussion has, instead of forking into two equally
> interesting subthreads, veered to a more intellectually stimulating
> tangent and we ended up losing focus.

That's what I'm here for.

> Regardless of what to do with "I do not want to grep in these types of
> files" and "I want textconv applied when grepping in these types", which
> would be new attributes to implement two new features, I would like to see
> us first concentrate on fixing the "binary" issue.  When somebody tells us
> "Your autodetection may screw it up, but this file is binary; just show
> 'Binary files differ.' when comparing." with "-diff" (or "binary"), we
> should honor that when "git grep" decides if it should take the 'Binary
> file matches' codepath.  We currently do not, and it clearly is a bug.

Right. It may have been lost in the verbosity of what I wrote in my
previous email, but I completely agree. With the caveat that one should
also respect "diff=foo" coupled with "diff.foo.binary = true" as making
something binary. But that is already handled transparently by the
userdiff.[ch] code (which seems like the obvious entry point for
grep to use for attribute lookup, and which we already use there for
funcname lookup).

The trivial-ish patch is below.

> We should have to teach the underlying machinery that matches pathspec
> about negative pathspec entries only once. After we have done so, all the
> callers, not just "git grep", should be able to take advantage of the
> change by just learning to place negative pathspec entries in the "struct
> pathspec" they pass to the machinery.  Doing anything else will lead to
> madness of adding ad-hoc "here we should further filter with the other
> negative 'struct pathspec'" in each and every application.

Yes, I agree.

> But I suspect that it would not materialize anytime soon.  And I also
> suspect that the correct handling of 'Binary file matches', which is a
> pure bugfix, should solve the original issue started these threads 90% in
> practice.

Also agree. Let's fix the bug and then give it some time to see whether
people really want more explicit exclusions.

Here's the bug-fix patch. Not quite ready for inclusion, as it obviously
needs tests and a commit message. Also, we can cache the result of the
userdiff lookup so the funcname code doesn't have to look it up again.

---
diff --git a/grep.c b/grep.c
index b29d09c..d7ab054 100644
--- a/grep.c
+++ b/grep.c
@@ -960,6 +960,15 @@ static void std_output(struct grep_opt *opt, const void *buf, size_t size)
 	fwrite(buf, size, 1, stdout);
 }
 
+static int grep_buffer_is_binary(const char *path,
+				 char *buf, unsigned long size)
+{
+	struct userdiff_driver *drv = userdiff_find_by_path(path);
+	if (drv && drv->binary != -1)
+		return drv->binary;
+	return buffer_is_binary(buf, size);
+}
+
 static int grep_buffer_1(struct grep_opt *opt, const char *name,
 			 char *buf, unsigned long size, int collect_hits)
 {
@@ -994,11 +1003,11 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 
 	switch (opt->binary) {
 	case GREP_BINARY_DEFAULT:
-		if (buffer_is_binary(buf, size))
+		if (grep_buffer_is_binary(name, buf, size))
 			binary_match_only = 1;
 		break;
 	case GREP_BINARY_NOMATCH:
-		if (buffer_is_binary(buf, size))
+		if (grep_buffer_is_binary(name, buf, size))
 			return 0; /* Assume unmatch */
 		break;
 	case GREP_BINARY_TEXT:

  reply	other threads:[~2012-02-01  8:20 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-17  9:14 git-grep while excluding files in a blacklist Dov Grobgeld
2012-01-17  9:19 ` Nguyen Thai Ngoc Duy
2012-01-17 20:09   ` Junio C Hamano
2012-01-18  1:24     ` Nguyen Thai Ngoc Duy
2012-01-23  9:37       ` [PATCH] Don't search files with an unset "grep" attribute conrad.irwin
2012-01-23 18:33         ` Junio C Hamano
2012-01-23 22:59           ` Conrad Irwin
2012-01-24  6:59             ` Junio C Hamano
2012-01-25 21:46               ` Jeff King
2012-01-26 13:51                 ` Stephen Bash
2012-01-26 17:29                   ` Jeff King
2012-01-26 16:45                 ` Michael Haggerty
2012-01-27  6:35                   ` Jeff King
2012-02-01  8:01                 ` Junio C Hamano
2012-02-01  8:20                   ` Jeff King [this message]
2012-02-01  9:10                     ` Jeff King
2012-02-01  9:28                       ` Conrad Irwin
2012-02-01 22:14                         ` Jeff King
2012-02-01 23:20                           ` Jeff King
2012-02-02  2:03                             ` Junio C Hamano
2012-02-01 23:21                           ` [PATCH 1/2] grep: let grep_buffer callers specify a binary flag Jeff King
2012-02-02  0:47                             ` Junio C Hamano
2012-02-02  0:52                               ` Jeff King
2012-02-02  8:17                                 ` [PATCH 0/9] respect binary attribute in grep Jeff King
2012-02-02  8:18                                   ` [PATCH 1/9] grep: make locking flag global Jeff King
2012-02-02  8:18                                   ` [PATCH 2/9] grep: move sha1-reading mutex into low-level code Jeff King
2012-02-02  8:19                                   ` [PATCH 3/9] grep: refactor the concept of "grep source" into an object Jeff King
2012-02-02  8:19                                   ` [PATCH 4/9] convert git-grep to use grep_source interface Jeff King
2012-02-02  8:20                                   ` [PATCH 5/9] grep: drop grep_buffer's "name" parameter Jeff King
2012-02-02  8:20                                   ` [PATCH 6/9] grep: cache userdiff_driver in grep_source Jeff King
2012-02-02 18:34                                     ` Junio C Hamano
2012-02-02 19:37                                       ` Jeff King
2012-02-02  8:21                                   ` [PATCH 7/9] grep: respect diff attributes for binary-ness Jeff King
2012-02-02  8:21                                   ` [PATCH 8/9] grep: load file data after checking binary-ness Jeff King
2012-02-02  8:24                                   ` [PATCH 9/9] grep: pre-load userdiff drivers when threaded Jeff King
2012-02-02  8:30                                   ` [PATCH 0/9] respect binary attribute in grep Jeff King
2012-02-02 11:00                                   ` Thomas Rast
2012-02-02 11:07                                     ` Jeff King
2012-02-02 18:39                                   ` Junio C Hamano
2012-02-04 19:22                                   ` Pete Wyckoff
2012-02-04 23:18                                     ` Jeff King
2012-02-01 23:21                           ` [PATCH 2/2] grep: respect diff attributes for binary-ness Jeff King
2012-02-01 16:28                       ` [PATCH] Don't search files with an unset "grep" attribute Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120201082005.GA32348@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=conrad.irwin@gmail.com \
    --cc=dov.grobgeld@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).