git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fredrik Kuivinen <frekui@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: peff@peff.net, gitster@pobox.com, miles@gnu.org,
	pclouds@gmail.com, Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] Threaded grep (was: Re: [PATCH] grep: do not do external  grep on skip-worktree entries)
Date: Mon, 11 Jan 2010 11:42:55 +0100	[thread overview]
Message-ID: <4c8ef71001110242m160a63a8wd0294c0f26373c2e@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1001080956270.7821@localhost.localdomain>

[I messed up the Cc list when I sent the first mail in this thread, so
it didn't reach git@vger. This time it's fixed for real. Sorry for the
extra copy.]

On Fri, Jan 8, 2010 at 19:04, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Fri, 8 Jan 2010, Fredrik Kuivinen wrote:
>>
>> I only have access to a couple of boxes with more than one core so
>> some more testing would be greatly appreciated. On the boxes I have
>> tested this on the added parallelism roughly cut the time to grep the
>> linux kernel in half (compared to the built-in grep). It also compares
>> favourably to the external GNU grep (these are best of three runs):
>
> On my box (in all cases best-of-five):
>
>  - "NO_THREADS=1 git grep --no-ext-grep qwerty":
>
>        real    0m0.945s
>        user    0m0.808s
>        sys     0m0.128s
>
>  - "git grep --no-ext-grep qwerty":
>
>        real    0m0.402s
>        user    0m1.116s
>        sys     0m0.216s
>
>  - "git grep qwerty":
>
>        real    0m0.408s
>        user    0m0.176s
>        sys     0m0.152s
>
> so it _just_ beat the external grep thanks to using 330% CPU time. An
> improvement, yes, but the CPU wastage is kind of sad. It really would be
> nice to see if we could get rid of the stupid per-line overhead some way.

I agree. The per-line thing seems to be fixed with Junios recent patch.

> Btw, there does seem to be some unnecessary synchronization there, because
> if I pick a pattern that has no matches at all, my best parallel number
> goes down to 0.316. But the variation in times for the parallel one is so
> big that I don't know how relevant that all is.
>
> I suspect you need more threads than CPU's due to the waiting (so that
> other threads can pick up the slack when one thread ends up waiting to
> output). Or don't wait at all, and queue it up instead.

Yes, you are right, there is some unnecessary synchronization. I am
working on a new patch which queues the output instead.

- Fredrik

           reply	other threads:[~2010-01-11 10:43 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <alpine.LFD.2.00.1001080956270.7821@localhost.localdomain>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4c8ef71001110242m160a63a8wd0294c0f26373c2e@mail.gmail.com \
    --to=frekui@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=miles@gnu.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).