From: Jeff King <peff@peff.net>
To: Conrad Irwin <conrad.irwin@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org, Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
Dov Grobgeld <dov.grobgeld@gmail.com>
Subject: Re: [PATCH] Don't search files with an unset "grep" attribute
Date: Wed, 1 Feb 2012 18:20:27 -0500 [thread overview]
Message-ID: <20120201232027.GA32119@sigill.intra.peff.net> (raw)
In-Reply-To: <20120201221437.GA19044@sigill.intra.peff.net>
On Wed, Feb 01, 2012 at 05:14:37PM -0500, Jeff King wrote:
> > The first time I introduced this behaviour[1], I made it conditional
> > on a preference — those who wanted "good" grep could set the
> > preference, while those who wanted "fast" grep could not. I think
> > that's not a good idea, though if the performance issues are
> > show-stoppers, I'd suggest the opposite preference (so speed-freaks
> > can disable the checks).
>
> I've been able to get somewhat better performance by hoisting the
> attribute lookup into the parent thread. That means it happens in order
> (which lets the attr code's stack optimizations work), and there's no
> lock contention.
>
> I'll post finished patches with numbers in a few minutes.
OK, here they are. After playing with some options, I'm satisfied this
is a sane way to do it. I don't think it's worth having a config option.
There is a measurable slowdown, but it's simply not that big.
[1/2]: grep: let grep_buffer callers specify a binary flag
[2/2]: grep: respect diff attributes for binary-ness
There are a few optimizations I didn't do that you could put on top:
1. When "-a" is given, we can avoid the attribute lookup altogether.
2. When "-I" is given, we can actually check attributes _before_
loading the file or blob into memory. This can help with very large
binaries.
3. When "-I" is given but we have no attribute, we can stream the
beginning of the file or blob to check for binary-ness, and then
avoid loading the whole thing if it turns out to be binary.
I think (1) and (2) should be easy. Doing (3) is a little messier,
because binary detection happens inside grep_buffer, but we can hoist it
out. However, for large files, it might be nice to have a streaming grep
interface anyway, and (3) could be part of that.
-Peff
next prev parent reply other threads:[~2012-02-01 23:20 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-17 9:14 git-grep while excluding files in a blacklist Dov Grobgeld
2012-01-17 9:19 ` Nguyen Thai Ngoc Duy
2012-01-17 20:09 ` Junio C Hamano
2012-01-18 1:24 ` Nguyen Thai Ngoc Duy
2012-01-23 9:37 ` [PATCH] Don't search files with an unset "grep" attribute conrad.irwin
2012-01-23 18:33 ` Junio C Hamano
2012-01-23 22:59 ` Conrad Irwin
2012-01-24 6:59 ` Junio C Hamano
2012-01-25 21:46 ` Jeff King
2012-01-26 13:51 ` Stephen Bash
2012-01-26 17:29 ` Jeff King
2012-01-26 16:45 ` Michael Haggerty
2012-01-27 6:35 ` Jeff King
2012-02-01 8:01 ` Junio C Hamano
2012-02-01 8:20 ` Jeff King
2012-02-01 9:10 ` Jeff King
2012-02-01 9:28 ` Conrad Irwin
2012-02-01 22:14 ` Jeff King
2012-02-01 23:20 ` Jeff King [this message]
2012-02-02 2:03 ` Junio C Hamano
2012-02-01 23:21 ` [PATCH 1/2] grep: let grep_buffer callers specify a binary flag Jeff King
2012-02-02 0:47 ` Junio C Hamano
2012-02-02 0:52 ` Jeff King
2012-02-02 8:17 ` [PATCH 0/9] respect binary attribute in grep Jeff King
2012-02-02 8:18 ` [PATCH 1/9] grep: make locking flag global Jeff King
2012-02-02 8:18 ` [PATCH 2/9] grep: move sha1-reading mutex into low-level code Jeff King
2012-02-02 8:19 ` [PATCH 3/9] grep: refactor the concept of "grep source" into an object Jeff King
2012-02-02 8:19 ` [PATCH 4/9] convert git-grep to use grep_source interface Jeff King
2012-02-02 8:20 ` [PATCH 5/9] grep: drop grep_buffer's "name" parameter Jeff King
2012-02-02 8:20 ` [PATCH 6/9] grep: cache userdiff_driver in grep_source Jeff King
2012-02-02 18:34 ` Junio C Hamano
2012-02-02 19:37 ` Jeff King
2012-02-02 8:21 ` [PATCH 7/9] grep: respect diff attributes for binary-ness Jeff King
2012-02-02 8:21 ` [PATCH 8/9] grep: load file data after checking binary-ness Jeff King
2012-02-02 8:24 ` [PATCH 9/9] grep: pre-load userdiff drivers when threaded Jeff King
2012-02-02 8:30 ` [PATCH 0/9] respect binary attribute in grep Jeff King
2012-02-02 11:00 ` Thomas Rast
2012-02-02 11:07 ` Jeff King
2012-02-02 18:39 ` Junio C Hamano
2012-02-04 19:22 ` Pete Wyckoff
2012-02-04 23:18 ` Jeff King
2012-02-01 23:21 ` [PATCH 2/2] grep: respect diff attributes for binary-ness Jeff King
2012-02-01 16:28 ` [PATCH] Don't search files with an unset "grep" attribute Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120201232027.GA32119@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=conrad.irwin@gmail.com \
--cc=dov.grobgeld@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).