git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Herman <eric@freesa.org>
To: Pete Wyckoff <pw@padd.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Sverre Rabbelier" <srabbelier@gmail.com>,
	"Fernando Vezzosi" <buccia@repnz.net>
Subject: Re: [PATCH] grep: detect number of CPUs for thread spawning
Date: Sun, 06 Nov 2011 19:00:00 +0100	[thread overview]
Message-ID: <4EB6CB20.5060309@freesa.org> (raw)
In-Reply-To: <20111106145050.GA4219@arf.padd.com>

Hello Pete,

Thank you for the feedback.

On 11/06/2011 03:50 PM, Pete Wyckoff wrote:

>> From: Eric Herman<eric@freesa.org>
>>
>> Change the number of threads that we spawn from a hardcoded value of
>> "8" to what online_cpus() returns.


> I agree with the need to exploit>8 CPUs, but I lose a lot of
> performance when limiting the threads to the number of physical
> CPUs.

Ah, yes, Being focused on big machines, I did not actually test with low 
CPU machines, certainly not with NFS mounts.

>
> Tests without your patch on master, just changing "#define
> THREADS" from 8 to 2.  On a 2-core Intel Core2 Duo.
>
> Producing lots of output:
>
>      8 threads:
>
> 	$ time ~/u/src/git/bin-wrappers/git grep f>  /dev/null
> 	0m14.02s user 0m3.64s sys 0m11.93s elapsed 148.07 %CPU
> 	$ time ~/u/src/git/bin-wrappers/git grep f>  /dev/null
> 	0m13.86s user 0m3.70s sys 0m11.82s elapsed 148.57 %CPU
>
>      2 threads:
>
> 	$ time ~/u/src/git/bin-wrappers/git grep f>  /dev/null
> 	0m15.14s user 0m3.52s sys 0m24.22s elapsed 77.05 %CPU
> 	$ time ~/u/src/git/bin-wrappers/git grep f>  /dev/null
> 	0m14.85s user 0m3.79s sys 0m24.20s elapsed 77.05 %CPU
>
> Producing no output:
>
>      8 threads:
>
> 	$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
> 	0m1.14s user 0m3.68s sys 0m5.17s elapsed 93.22 %CPU
> 	$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
> 	0m1.28s user 0m3.56s sys 0m5.15s elapsed 94.22 %CPU
>
>      2 threads:
>
> 	$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
> 	0m1.36s user 0m3.64s sys 0m16.82s elapsed 29.75 %CPU
> 	$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
> 	0m1.38s user 0m3.66s sys 0m16.81s elapsed 30.04 %CPU
>
> My workdir is on NFS, where even though the repository is fully
> cached, the open()s must go to the server.  Using more threads
> than CPUs makes it more likely that some thread isn't blocked.

This is good data.
It gives me ideas for how I can do some more testing.

>
> You could add a #threads knob,

Sure, adding a knob is not a bad idea.


> but then we'd have to get
> everybody on NFS to set that properly.

Indeed, I think you agree that it would be better if there was no need 
for most people to fiddle with yet another knob.


>  Or take a look at
> preload_index() to see how it guesses at how many threads it
> needs.

Good tip.
A quick peek at preload_index suggests that it was a bit of guesswork:

/*
  * Mostly randomly chosen maximum thread counts: we
  * cap the parallelism to 20 threads, and we want
  * to have at least 500 lstat's per thread for it to
  * be worth starting a thread.
  */

However, your comments make me wonder if a rule-of-thumb like "3 + 
online_cpus()" would yield better results across both large and small 
numbers of cores with either blazing fast or very slow storage.

I will create a setup similar to the one you describe and do some 
exploration.

Cheers,
  -Eric

-- 
http://www.freesa.org/ -- mobile: +31 620719662
aim: ericigps -- skype: eric_herman -- jabber: eric.herman@gmail.com

      reply	other threads:[~2011-11-06 18:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-05 14:16 [PATCH] grep: detect number of CPUs for thread spawning Ævar Arnfjörð Bjarmason
2011-11-06 14:50 ` Pete Wyckoff
2011-11-06 18:00   ` Eric Herman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EB6CB20.5060309@freesa.org \
    --to=eric@freesa.org \
    --cc=avarab@gmail.com \
    --cc=buccia@repnz.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pw@padd.com \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).