git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Pete Wyckoff <pw@padd.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Thomas Rast <trast@student.ethz.ch>,
	Conrad Irwin <conrad.irwin@gmail.com>,
	git@vger.kernel.org, Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Dov Grobgeld <dov.grobgeld@gmail.com>
Subject: Re: [PATCH 0/9] respect binary attribute in grep
Date: Sat, 4 Feb 2012 18:18:25 -0500	[thread overview]
Message-ID: <20120204231825.GA1170@sigill.intra.peff.net> (raw)
In-Reply-To: <20120204192252.GA15319@padd.com>

On Sat, Feb 04, 2012 at 02:22:52PM -0500, Pete Wyckoff wrote:

> I took a look at this series.  It's nice.  My worry was that the
> extra open() of non-existent .gitattributes files in all the
> directories would cause performance problems across networked
> filesystems like NFS.

Yeah, I was able to measure a small slow-down on a quick grep even with
a warm cache. So it does take some extra effort, but I think the
correctness is worth it (and note that the slow down is in the tens of
milliseconds if you have a reasonable stat()).

If people have big trees on NFS (or some other slow-stat system) where
these lookups are actually a problem, I'd rather see a global option to
disable .gitattributes lookups for both diff and grep (i.e., a "trust
me, I'm not using gitattributes, and don't bother with stat" flag). In
practice, though, I think such a thing is not necessary because the
stat() is local to the file being examined (e.g., for "foo/bar/baz", we
look only at "foo/bar/.gitattributes", "foo/.gitattributes", and
".gitattributes", without having to touch other parts of the tree).

Anyway, thanks for doing some performance testing. More data is always
good.

> It could be plausible that deep directory structures with few
> grep-able files will suffer with this change.  For example, many
> big binary blobs in deep directory hierarchies, but also some
> useful files here and there.
>
> One could argue that with the use of .gitattributes to specify
> which blobs should not be searched, this series makes this faster
> by not having to to read the binary blobs at all.  And I'd be
> okay with that.

Yes, exactly. I think this will end up being a big win for such cases,
because the cost of loading even one large binary file from disk will
dwarf all of the stats. But it does depend on people marking their
binaries and using "-I".

-Peff

  reply	other threads:[~2012-02-04 23:18 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-17  9:14 git-grep while excluding files in a blacklist Dov Grobgeld
2012-01-17  9:19 ` Nguyen Thai Ngoc Duy
2012-01-17 20:09   ` Junio C Hamano
2012-01-18  1:24     ` Nguyen Thai Ngoc Duy
2012-01-23  9:37       ` [PATCH] Don't search files with an unset "grep" attribute conrad.irwin
2012-01-23 18:33         ` Junio C Hamano
2012-01-23 22:59           ` Conrad Irwin
2012-01-24  6:59             ` Junio C Hamano
2012-01-25 21:46               ` Jeff King
2012-01-26 13:51                 ` Stephen Bash
2012-01-26 17:29                   ` Jeff King
2012-01-26 16:45                 ` Michael Haggerty
2012-01-27  6:35                   ` Jeff King
2012-02-01  8:01                 ` Junio C Hamano
2012-02-01  8:20                   ` Jeff King
2012-02-01  9:10                     ` Jeff King
2012-02-01  9:28                       ` Conrad Irwin
2012-02-01 22:14                         ` Jeff King
2012-02-01 23:20                           ` Jeff King
2012-02-02  2:03                             ` Junio C Hamano
2012-02-01 23:21                           ` [PATCH 1/2] grep: let grep_buffer callers specify a binary flag Jeff King
2012-02-02  0:47                             ` Junio C Hamano
2012-02-02  0:52                               ` Jeff King
2012-02-02  8:17                                 ` [PATCH 0/9] respect binary attribute in grep Jeff King
2012-02-02  8:18                                   ` [PATCH 1/9] grep: make locking flag global Jeff King
2012-02-02  8:18                                   ` [PATCH 2/9] grep: move sha1-reading mutex into low-level code Jeff King
2012-02-02  8:19                                   ` [PATCH 3/9] grep: refactor the concept of "grep source" into an object Jeff King
2012-02-02  8:19                                   ` [PATCH 4/9] convert git-grep to use grep_source interface Jeff King
2012-02-02  8:20                                   ` [PATCH 5/9] grep: drop grep_buffer's "name" parameter Jeff King
2012-02-02  8:20                                   ` [PATCH 6/9] grep: cache userdiff_driver in grep_source Jeff King
2012-02-02 18:34                                     ` Junio C Hamano
2012-02-02 19:37                                       ` Jeff King
2012-02-02  8:21                                   ` [PATCH 7/9] grep: respect diff attributes for binary-ness Jeff King
2012-02-02  8:21                                   ` [PATCH 8/9] grep: load file data after checking binary-ness Jeff King
2012-02-02  8:24                                   ` [PATCH 9/9] grep: pre-load userdiff drivers when threaded Jeff King
2012-02-02  8:30                                   ` [PATCH 0/9] respect binary attribute in grep Jeff King
2012-02-02 11:00                                   ` Thomas Rast
2012-02-02 11:07                                     ` Jeff King
2012-02-02 18:39                                   ` Junio C Hamano
2012-02-04 19:22                                   ` Pete Wyckoff
2012-02-04 23:18                                     ` Jeff King [this message]
2012-02-01 23:21                           ` [PATCH 2/2] grep: respect diff attributes for binary-ness Jeff King
2012-02-01 16:28                       ` [PATCH] Don't search files with an unset "grep" attribute Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120204231825.GA1170@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=conrad.irwin@gmail.com \
    --cc=dov.grobgeld@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=pw@padd.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).