git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Thomas Rast <trast@student.ethz.ch>,
	Conrad Irwin <conrad.irwin@gmail.com>,
	git@vger.kernel.org, Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Dov Grobgeld <dov.grobgeld@gmail.com>
Subject: [PATCH 8/9] grep: load file data after checking binary-ness
Date: Thu, 2 Feb 2012 03:21:11 -0500	[thread overview]
Message-ID: <20120202082111.GH6786@sigill.intra.peff.net> (raw)
In-Reply-To: <20120202081747.GA10271@sigill.intra.peff.net>

Usually we load each file to grep into memory, check whether
it's binary, and then either grep it (the default) or not
(if "-I" was given).

In the "-I" case, we can skip loading the file entirely if
it is marked as binary via gitattributes. On my giant
3-gigabyte media repository, doing "git grep -I foo" went
from:

  real    0m0.712s
  user    0m0.044s
  sys     0m4.780s

to:

  real    0m0.026s
  user    0m0.016s
  sys     0m0.020s

Obviously this is an extreme example. The repo is almost
entirely binary files, and you can see that we spent all of
our time asking the kernel to read() the data. However, with
a cold disk cache, even avoiding a few binary files can have
an impact.

Signed-off-by: Jeff King <peff@peff.net>
---
 grep.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/grep.c b/grep.c
index a50d161..3821400 100644
--- a/grep.c
+++ b/grep.c
@@ -1019,9 +1019,6 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 	}
 	opt->last_shown = 0;
 
-	if (grep_source_load(gs) < 0)
-		return 0;
-
 	switch (opt->binary) {
 	case GREP_BINARY_DEFAULT:
 		if (grep_source_is_binary(gs))
@@ -1042,6 +1039,9 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 
 	try_lookahead = should_lookahead(opt);
 
+	if (grep_source_load(gs) < 0)
+		return 0;
+
 	bol = gs->buf;
 	left = gs->size;
 	while (left) {
-- 
1.7.9.3.gc3fce1.dirty

  parent reply	other threads:[~2012-02-02  8:21 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-17  9:14 git-grep while excluding files in a blacklist Dov Grobgeld
2012-01-17  9:19 ` Nguyen Thai Ngoc Duy
2012-01-17 20:09   ` Junio C Hamano
2012-01-18  1:24     ` Nguyen Thai Ngoc Duy
2012-01-23  9:37       ` [PATCH] Don't search files with an unset "grep" attribute conrad.irwin
2012-01-23 18:33         ` Junio C Hamano
2012-01-23 22:59           ` Conrad Irwin
2012-01-24  6:59             ` Junio C Hamano
2012-01-25 21:46               ` Jeff King
2012-01-26 13:51                 ` Stephen Bash
2012-01-26 17:29                   ` Jeff King
2012-01-26 16:45                 ` Michael Haggerty
2012-01-27  6:35                   ` Jeff King
2012-02-01  8:01                 ` Junio C Hamano
2012-02-01  8:20                   ` Jeff King
2012-02-01  9:10                     ` Jeff King
2012-02-01  9:28                       ` Conrad Irwin
2012-02-01 22:14                         ` Jeff King
2012-02-01 23:20                           ` Jeff King
2012-02-02  2:03                             ` Junio C Hamano
2012-02-01 23:21                           ` [PATCH 1/2] grep: let grep_buffer callers specify a binary flag Jeff King
2012-02-02  0:47                             ` Junio C Hamano
2012-02-02  0:52                               ` Jeff King
2012-02-02  8:17                                 ` [PATCH 0/9] respect binary attribute in grep Jeff King
2012-02-02  8:18                                   ` [PATCH 1/9] grep: make locking flag global Jeff King
2012-02-02  8:18                                   ` [PATCH 2/9] grep: move sha1-reading mutex into low-level code Jeff King
2012-02-02  8:19                                   ` [PATCH 3/9] grep: refactor the concept of "grep source" into an object Jeff King
2012-02-02  8:19                                   ` [PATCH 4/9] convert git-grep to use grep_source interface Jeff King
2012-02-02  8:20                                   ` [PATCH 5/9] grep: drop grep_buffer's "name" parameter Jeff King
2012-02-02  8:20                                   ` [PATCH 6/9] grep: cache userdiff_driver in grep_source Jeff King
2012-02-02 18:34                                     ` Junio C Hamano
2012-02-02 19:37                                       ` Jeff King
2012-02-02  8:21                                   ` [PATCH 7/9] grep: respect diff attributes for binary-ness Jeff King
2012-02-02  8:21                                   ` Jeff King [this message]
2012-02-02  8:24                                   ` [PATCH 9/9] grep: pre-load userdiff drivers when threaded Jeff King
2012-02-02  8:30                                   ` [PATCH 0/9] respect binary attribute in grep Jeff King
2012-02-02 11:00                                   ` Thomas Rast
2012-02-02 11:07                                     ` Jeff King
2012-02-02 18:39                                   ` Junio C Hamano
2012-02-04 19:22                                   ` Pete Wyckoff
2012-02-04 23:18                                     ` Jeff King
2012-02-01 23:21                           ` [PATCH 2/2] grep: respect diff attributes for binary-ness Jeff King
2012-02-01 16:28                       ` [PATCH] Don't search files with an unset "grep" attribute Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120202082111.GH6786@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=conrad.irwin@gmail.com \
    --cc=dov.grobgeld@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).