From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Thomas Rast <trast@student.ethz.ch>,
Conrad Irwin <conrad.irwin@gmail.com>,
git@vger.kernel.org, Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
Dov Grobgeld <dov.grobgeld@gmail.com>
Subject: [PATCH 9/9] grep: pre-load userdiff drivers when threaded
Date: Thu, 2 Feb 2012 03:24:28 -0500 [thread overview]
Message-ID: <20120202082428.GI6786@sigill.intra.peff.net> (raw)
In-Reply-To: <20120202081747.GA10271@sigill.intra.peff.net>
The low-level grep_source code will automatically load the
userdiff driver to see whether a file is binary. However,
when we are threaded, it will load the drivers in a
non-deterministic order, handling each one as its assigned
thread happens to be scheduled.
Meanwhile, the attribute lookup code (which underlies the
userdiff driver lookup) is optimized to handle paths in
sequential order (because they tend to share the same
gitattributes files). Multi-threading the lookups destroys
the locality and makes this optimization less effective.
We can fix this by pre-loading the userdiff driver in the
main thread, before we hand off the file to a worker thread.
My best-of-five for "git grep foo" on the linux-2.6
repository went from:
real 0m0.391s
user 0m1.708s
sys 0m0.584s
to:
real 0m0.360s
user 0m1.576s
sys 0m0.572s
Not a huge speedup, but it's quite easy to do. The only
trick is that we shouldn't perform this optimization if "-a"
was used, in which case we won't bother checking whether
the files are binary at all.
Signed-off-by: Jeff King <peff@peff.net>
---
The speedup is especially unimpressive when you consider that it won't
grow as the grep load grows. This is a pretty fast grep. If you used a
real regex, the whole thing would take even longer, and you will still
only be shaving off a few tens of milliseconds. So I wouldn't be
heart-broken if this patch was dropped. I included it because it's easy
to do, and maybe somebody with a slower machine would find the absolute
time difference more noticeable.
builtin/grep.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/builtin/grep.c b/builtin/grep.c
index bc85a20..9fc3e95 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -85,8 +85,8 @@ static pthread_cond_t cond_result;
static int skip_first_line;
-static void add_work(enum grep_source_type type, const char *name,
- const void *id)
+static void add_work(struct grep_opt *opt, enum grep_source_type type,
+ const char *name, const void *id)
{
grep_lock();
@@ -95,6 +95,8 @@ static void add_work(enum grep_source_type type, const char *name,
}
grep_source_init(&todo[todo_end].source, type, name, id);
+ if (opt->binary != GREP_BINARY_TEXT)
+ grep_source_load_driver(&todo[todo_end].source);
todo[todo_end].done = 0;
strbuf_reset(&todo[todo_end].out);
todo_end = (todo_end + 1) % ARRAY_SIZE(todo);
@@ -333,7 +335,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
#ifndef NO_PTHREADS
if (use_threads) {
- add_work(GREP_SOURCE_SHA1, pathbuf.buf, sha1);
+ add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, sha1);
strbuf_release(&pathbuf);
return 0;
} else
@@ -362,7 +364,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
#ifndef NO_PTHREADS
if (use_threads) {
- add_work(GREP_SOURCE_FILE, buf.buf, filename);
+ add_work(opt, GREP_SOURCE_FILE, buf.buf, filename);
strbuf_release(&buf);
return 0;
} else
--
1.7.9.3.gc3fce1.dirty
next prev parent reply other threads:[~2012-02-02 8:24 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-17 9:14 git-grep while excluding files in a blacklist Dov Grobgeld
2012-01-17 9:19 ` Nguyen Thai Ngoc Duy
2012-01-17 20:09 ` Junio C Hamano
2012-01-18 1:24 ` Nguyen Thai Ngoc Duy
2012-01-23 9:37 ` [PATCH] Don't search files with an unset "grep" attribute conrad.irwin
2012-01-23 18:33 ` Junio C Hamano
2012-01-23 22:59 ` Conrad Irwin
2012-01-24 6:59 ` Junio C Hamano
2012-01-25 21:46 ` Jeff King
2012-01-26 13:51 ` Stephen Bash
2012-01-26 17:29 ` Jeff King
2012-01-26 16:45 ` Michael Haggerty
2012-01-27 6:35 ` Jeff King
2012-02-01 8:01 ` Junio C Hamano
2012-02-01 8:20 ` Jeff King
2012-02-01 9:10 ` Jeff King
2012-02-01 9:28 ` Conrad Irwin
2012-02-01 22:14 ` Jeff King
2012-02-01 23:20 ` Jeff King
2012-02-02 2:03 ` Junio C Hamano
2012-02-01 23:21 ` [PATCH 1/2] grep: let grep_buffer callers specify a binary flag Jeff King
2012-02-02 0:47 ` Junio C Hamano
2012-02-02 0:52 ` Jeff King
2012-02-02 8:17 ` [PATCH 0/9] respect binary attribute in grep Jeff King
2012-02-02 8:18 ` [PATCH 1/9] grep: make locking flag global Jeff King
2012-02-02 8:18 ` [PATCH 2/9] grep: move sha1-reading mutex into low-level code Jeff King
2012-02-02 8:19 ` [PATCH 3/9] grep: refactor the concept of "grep source" into an object Jeff King
2012-02-02 8:19 ` [PATCH 4/9] convert git-grep to use grep_source interface Jeff King
2012-02-02 8:20 ` [PATCH 5/9] grep: drop grep_buffer's "name" parameter Jeff King
2012-02-02 8:20 ` [PATCH 6/9] grep: cache userdiff_driver in grep_source Jeff King
2012-02-02 18:34 ` Junio C Hamano
2012-02-02 19:37 ` Jeff King
2012-02-02 8:21 ` [PATCH 7/9] grep: respect diff attributes for binary-ness Jeff King
2012-02-02 8:21 ` [PATCH 8/9] grep: load file data after checking binary-ness Jeff King
2012-02-02 8:24 ` Jeff King [this message]
2012-02-02 8:30 ` [PATCH 0/9] respect binary attribute in grep Jeff King
2012-02-02 11:00 ` Thomas Rast
2012-02-02 11:07 ` Jeff King
2012-02-02 18:39 ` Junio C Hamano
2012-02-04 19:22 ` Pete Wyckoff
2012-02-04 23:18 ` Jeff King
2012-02-01 23:21 ` [PATCH 2/2] grep: respect diff attributes for binary-ness Jeff King
2012-02-01 16:28 ` [PATCH] Don't search files with an unset "grep" attribute Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120202082428.GI6786@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=conrad.irwin@gmail.com \
--cc=dov.grobgeld@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
--cc=trast@student.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).