From: Jeff King <peff@peff.net>
To: Thomas Rast <trast@student.ethz.ch>
Cc: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>,
"Eric Herman" <eric@freesa.org>,
git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH v2 0/3] grep multithreading and scaling
Date: Fri, 2 Dec 2011 12:34:00 -0500 [thread overview]
Message-ID: <20111202173400.GC23447@sigill.intra.peff.net> (raw)
In-Reply-To: <cover.1322830368.git.trast@student.ethz.ch>
On Fri, Dec 02, 2011 at 02:07:45PM +0100, Thomas Rast wrote:
> where I put the --cached originally because that makes it independent
> of the worktree (which in the very first measurements I still had
> wiped, as I tend to do for this repo; I checked it out again after
> that). This in fact gives me (~/g/git-grep --cached
> INITRAMFS_ROOT_UID, leaving aside -W; best of 10):
>
> THREADS=8: 2.88user 0.21system 0:02.94elapsed
> THREADS=4: 2.89user 0.29system 0:02.99elapsed
> THREADS=2: 2.83user 0.36system 0:02.87elapsed
> NO_PTHREADS: 2.16user 0.08system 0:02.25elapsed
>
> Uhuh. Doesn't scale so well after all. But removing the --cached, as
> most people probably would:
>
> THREADS=8: 0.19user 0.32system 0:00.16elapsed
> THREADS=4: 0.16user 0.34system 0:00.17elapsed
> THREADS=2: 0.18user 0.32system 0:00.26elapsed
> NO_PTHREADS: 0.12user 0.17system 0:00.31elapsed
>
> So I conclude that during any grep that cannot use the worktree,
> having any threads hurts.
Wow, that's horrible. Leaving aside the parallelism, it's just terrible
that reading from the cache is 20 times slower than the worktree. I get
similar results on my quad-core machine.
A quick perf run shows most of the time is spent inflating objects. The
diff code has a sneaky trick to re-use worktree files when we know they
are stat-clean (in diff's case it is to avoid writing a tempfile). I
wonder if we should use the same trick here.
It would hurt the cold cache case, though, as the compressed versions
require fewer disk accesses, of course.
-Peff
PS I suspect your timings are somewhat affected by the simplicity of the
regex you are asking for. The time to inflate the blobs dominates,
because the search is just a memmem(). On my quad-core w/
hyperthreading (i.e., 8 apparent cores):
[no caching, simple regex; we get some parallelism, but the regex
task is just not that intensive]
$ /usr/bin/time git grep INITRAMFS_ROOT_UID >/dev/null
0.42user 0.45system 0:00.15elapsed 578%CPU
[no caching, harder regex; we get much higher CPU utilization]
$ /usr/bin/time git grep 'a.*b' >/dev/null
14.68user 0.50system 0:02.00elapsed 758%CPU
[with caching, simple regex; we get almost _no_ parallelism because
all of our time is spent deflating under a lock, and the regex task
takes very little time]
$ /usr/bin/time git grep --cached INITRAMFS_ROOT_UID >/dev/null
7.64user 0.41system 0:07.61elapsed 105%CPU
[with caching, harder regex; not as much parallelism as we hoped for,
but still much more than before. Because there is actually work to
parallelize in the regex]
$ /usr/bin/time git grep --cached 'a.*b' >/dev/null
23.46user 0.47system 0:08.42elapsed 284%CPU
So I think there is value in parallelizing even --cached greps. But
we could do so much better if blob inflation could be done in
parallel.
next prev parent reply other threads:[~2011-12-02 17:34 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-25 14:46 [PATCH] grep: load funcname patterns for -W Thomas Rast
2011-11-25 16:32 ` René Scharfe
2011-11-26 12:15 ` [PATCH] grep: enable multi-threading for -p and -W René Scharfe
2011-11-29 9:54 ` Thomas Rast
2011-11-29 13:49 ` René Scharfe
2011-11-29 14:07 ` Thomas Rast
2011-12-02 13:07 ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast
2011-12-02 13:07 ` [PATCH v2 1/3] grep: load funcname patterns for -W Thomas Rast
2011-12-02 13:07 ` [PATCH v2 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
2011-12-02 13:07 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast
2011-12-02 16:15 ` René Scharfe
2011-12-05 9:02 ` Thomas Rast
2011-12-06 22:48 ` René Scharfe
2011-12-06 23:01 ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe
2011-12-07 4:42 ` Jeff King
2011-12-07 17:11 ` René Scharfe
2011-12-07 18:28 ` Jeff King
2011-12-07 20:11 ` J. Bruce Fields
2011-12-07 20:45 ` Jeff King
2011-12-07 8:12 ` Thomas Rast
2011-12-07 17:00 ` René Scharfe
2011-12-10 13:13 ` Pete Wyckoff
2011-12-12 22:37 ` René Scharfe
2011-12-07 4:24 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Jeff King
2011-12-07 16:52 ` René Scharfe
2011-12-07 18:10 ` Jeff King
2011-12-07 8:11 ` Thomas Rast
2011-12-07 16:54 ` René Scharfe
2011-12-12 21:16 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast
2011-12-12 21:16 ` [PATCH v3 1/3] grep: load funcname patterns for -W Thomas Rast
2011-12-12 21:16 ` [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
2011-12-16 8:22 ` Johannes Sixt
2011-12-16 17:34 ` Junio C Hamano
2011-12-12 21:16 ` [PATCH v3 3/3] grep: disable threading in non-worktree case Thomas Rast
2011-12-12 22:37 ` [PATCH v3 0/3] grep attributes and multithreading René Scharfe
2011-12-12 23:44 ` Junio C Hamano
2011-12-13 8:44 ` Thomas Rast
2011-12-23 22:37 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Ævar Arnfjörð Bjarmason
2011-12-23 22:49 ` Thomas Rast
2011-12-24 1:39 ` Ævar Arnfjörð Bjarmason
2011-12-24 7:07 ` Jeff King
2011-12-24 10:49 ` Nguyen Thai Ngoc Duy
2011-12-24 10:55 ` Nguyen Thai Ngoc Duy
2011-12-24 13:38 ` Jeff King
2011-12-25 3:32 ` Nguyen Thai Ngoc Duy
2011-12-02 17:34 ` Jeff King [this message]
2011-12-05 9:38 ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast
2011-12-05 20:16 ` Thomas Rast
2011-12-06 0:40 ` Jeff King
2011-12-02 20:02 ` Eric Herman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111202173400.GC23447@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=eric@freesa.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=rene.scharfe@lsrfire.ath.cx \
--cc=trast@student.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).