* [PATCH] grep: detect number of CPUs for thread spawning
@ 2011-11-05 14:16 Ævar Arnfjörð Bjarmason
2011-11-06 14:50 ` Pete Wyckoff
0 siblings, 1 reply; 3+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2011-11-05 14:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Eric Herman, Sverre Rabbelier, Fernando Vezzosi,
Ævar Arnfjörð Bjarmason
From: Eric Herman <eric@freesa.org>
Change the number of threads that we spawn from a hardcoded value of
"8" to what online_cpus() returns.
Back in v1.7.0-rc1~19^2 when threaded grep was introduced the number
of threads was hardcoded at compile time to 8, but this value was only
used if online_cpus() returned greater than 1.
However just using 8 threads regardless of the actual number of CPUs
is inefficient if we have more than 8 CPUs, and potentially wasteful
if we have fewer than 8 CPUs.
The array holding the threads is now being allocated at runtime
instead of being allocated statically. We free the array further down
the line in the wait_all() function; this is OK since the allocation
and freeing is guarded by the global "use_threads" variable, which is
based on the return value of online_cpus().
Reviewed-by: Sverre Rabbelier <srabbelier@gmail.com>
Reviewed-by: Fernando Vezzosi <buccia@repnz.net>
Signed-off-by: Eric Herman <eric@freesa.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/grep.c | 17 +++++++++++------
1 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/builtin/grep.c b/builtin/grep.c
index 3d7329d..307a253 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -27,8 +27,8 @@ static char const * const grep_usage[] = {
static int use_threads = 1;
#ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static int nthreads;
+static pthread_t *threads;
static void *load_sha1(const unsigned char *sha1, unsigned long *size,
const char *name);
@@ -36,7 +36,7 @@ static void *load_file(const char *filename, size_t *sz);
enum work_type {WORK_SHA1, WORK_FILE};
-/* We use one producer thread and THREADS consumer
+/* We use one producer thread and online_cpus() consumer
* threads. The producer adds struct work_items to 'todo' and the
* consumers pick work items from the same array.
*/
@@ -254,6 +254,8 @@ static void start_threads(struct grep_opt *opt)
{
int i;
+ threads = xmalloc(nthreads * sizeof(pthread_t));
+
pthread_mutex_init(&grep_mutex, NULL);
pthread_mutex_init(&read_sha1_mutex, NULL);
pthread_cond_init(&cond_add, NULL);
@@ -264,7 +266,7 @@ static void start_threads(struct grep_opt *opt)
strbuf_init(&todo[i].out, 0);
}
- for (i = 0; i < ARRAY_SIZE(threads); i++) {
+ for (i = 0; i < nthreads; i++) {
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
@@ -295,7 +297,7 @@ static int wait_all(void)
pthread_cond_broadcast(&cond_add);
grep_unlock();
- for (i = 0; i < ARRAY_SIZE(threads); i++) {
+ for (i = 0; i < nthreads; i++) {
void *h;
pthread_join(threads[i], &h);
hit |= (int) (intptr_t) h;
@@ -307,6 +309,8 @@ static int wait_all(void)
pthread_cond_destroy(&cond_write);
pthread_cond_destroy(&cond_result);
+ free(threads);
+
return hit;
}
#else /* !NO_PTHREADS */
@@ -1001,7 +1005,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
opt.regflags |= REG_ICASE;
#ifndef NO_PTHREADS
- if (online_cpus() == 1 || !grep_threads_ok(&opt))
+ nthreads = online_cpus();
+ if (nthreads == 1 || !grep_threads_ok(&opt))
use_threads = 0;
if (use_threads) {
--
1.7.7
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] grep: detect number of CPUs for thread spawning
2011-11-05 14:16 [PATCH] grep: detect number of CPUs for thread spawning Ævar Arnfjörð Bjarmason
@ 2011-11-06 14:50 ` Pete Wyckoff
2011-11-06 18:00 ` Eric Herman
0 siblings, 1 reply; 3+ messages in thread
From: Pete Wyckoff @ 2011-11-06 14:50 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Eric Herman, Sverre Rabbelier,
Fernando Vezzosi
avarab@gmail.com wrote on Sat, 05 Nov 2011 14:16 +0000:
> From: Eric Herman <eric@freesa.org>
>
> Change the number of threads that we spawn from a hardcoded value of
> "8" to what online_cpus() returns.
>
> Back in v1.7.0-rc1~19^2 when threaded grep was introduced the number
> of threads was hardcoded at compile time to 8, but this value was only
> used if online_cpus() returned greater than 1.
>
> However just using 8 threads regardless of the actual number of CPUs
> is inefficient if we have more than 8 CPUs, and potentially wasteful
> if we have fewer than 8 CPUs.
I agree with the need to exploit >8 CPUs, but I lose a lot of
performance when limiting the threads to the number of physical
CPUs.
Tests without your patch on master, just changing "#define
THREADS" from 8 to 2. On a 2-core Intel Core2 Duo.
Producing lots of output:
8 threads:
$ time ~/u/src/git/bin-wrappers/git grep f > /dev/null
0m14.02s user 0m3.64s sys 0m11.93s elapsed 148.07 %CPU
$ time ~/u/src/git/bin-wrappers/git grep f > /dev/null
0m13.86s user 0m3.70s sys 0m11.82s elapsed 148.57 %CPU
2 threads:
$ time ~/u/src/git/bin-wrappers/git grep f > /dev/null
0m15.14s user 0m3.52s sys 0m24.22s elapsed 77.05 %CPU
$ time ~/u/src/git/bin-wrappers/git grep f > /dev/null
0m14.85s user 0m3.79s sys 0m24.20s elapsed 77.05 %CPU
Producing no output:
8 threads:
$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
0m1.14s user 0m3.68s sys 0m5.17s elapsed 93.22 %CPU
$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
0m1.28s user 0m3.56s sys 0m5.15s elapsed 94.22 %CPU
2 threads:
$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
0m1.36s user 0m3.64s sys 0m16.82s elapsed 29.75 %CPU
$ time ~/u/src/git/bin-wrappers/git grep unfindable-string
0m1.38s user 0m3.66s sys 0m16.81s elapsed 30.04 %CPU
My workdir is on NFS, where even though the repository is fully
cached, the open()s must go to the server. Using more threads
than CPUs makes it more likely that some thread isn't blocked.
You could add a #threads knob, but then we'd have to get
everybody on NFS to set that properly. Or take a look at
preload_index() to see how it guesses at how many threads it
needs.
-- Pete
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] grep: detect number of CPUs for thread spawning
2011-11-06 14:50 ` Pete Wyckoff
@ 2011-11-06 18:00 ` Eric Herman
0 siblings, 0 replies; 3+ messages in thread
From: Eric Herman @ 2011-11-06 18:00 UTC (permalink / raw)
To: Pete Wyckoff
Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
Sverre Rabbelier, Fernando Vezzosi
Hello Pete,
Thank you for the feedback.
On 11/06/2011 03:50 PM, Pete Wyckoff wrote:
>> From: Eric Herman<eric@freesa.org>
>>
>> Change the number of threads that we spawn from a hardcoded value of
>> "8" to what online_cpus() returns.
> I agree with the need to exploit>8 CPUs, but I lose a lot of
> performance when limiting the threads to the number of physical
> CPUs.
Ah, yes, Being focused on big machines, I did not actually test with low
CPU machines, certainly not with NFS mounts.
>
> Tests without your patch on master, just changing "#define
> THREADS" from 8 to 2. On a 2-core Intel Core2 Duo.
>
> Producing lots of output:
>
> 8 threads:
>
> $ time ~/u/src/git/bin-wrappers/git grep f> /dev/null
> 0m14.02s user 0m3.64s sys 0m11.93s elapsed 148.07 %CPU
> $ time ~/u/src/git/bin-wrappers/git grep f> /dev/null
> 0m13.86s user 0m3.70s sys 0m11.82s elapsed 148.57 %CPU
>
> 2 threads:
>
> $ time ~/u/src/git/bin-wrappers/git grep f> /dev/null
> 0m15.14s user 0m3.52s sys 0m24.22s elapsed 77.05 %CPU
> $ time ~/u/src/git/bin-wrappers/git grep f> /dev/null
> 0m14.85s user 0m3.79s sys 0m24.20s elapsed 77.05 %CPU
>
> Producing no output:
>
> 8 threads:
>
> $ time ~/u/src/git/bin-wrappers/git grep unfindable-string
> 0m1.14s user 0m3.68s sys 0m5.17s elapsed 93.22 %CPU
> $ time ~/u/src/git/bin-wrappers/git grep unfindable-string
> 0m1.28s user 0m3.56s sys 0m5.15s elapsed 94.22 %CPU
>
> 2 threads:
>
> $ time ~/u/src/git/bin-wrappers/git grep unfindable-string
> 0m1.36s user 0m3.64s sys 0m16.82s elapsed 29.75 %CPU
> $ time ~/u/src/git/bin-wrappers/git grep unfindable-string
> 0m1.38s user 0m3.66s sys 0m16.81s elapsed 30.04 %CPU
>
> My workdir is on NFS, where even though the repository is fully
> cached, the open()s must go to the server. Using more threads
> than CPUs makes it more likely that some thread isn't blocked.
This is good data.
It gives me ideas for how I can do some more testing.
>
> You could add a #threads knob,
Sure, adding a knob is not a bad idea.
> but then we'd have to get
> everybody on NFS to set that properly.
Indeed, I think you agree that it would be better if there was no need
for most people to fiddle with yet another knob.
> Or take a look at
> preload_index() to see how it guesses at how many threads it
> needs.
Good tip.
A quick peek at preload_index suggests that it was a bit of guesswork:
/*
* Mostly randomly chosen maximum thread counts: we
* cap the parallelism to 20 threads, and we want
* to have at least 500 lstat's per thread for it to
* be worth starting a thread.
*/
However, your comments make me wonder if a rule-of-thumb like "3 +
online_cpus()" would yield better results across both large and small
numbers of cores with either blazing fast or very slow storage.
I will create a setup similar to the one you describe and do some
exploration.
Cheers,
-Eric
--
http://www.freesa.org/ -- mobile: +31 620719662
aim: ericigps -- skype: eric_herman -- jabber: eric.herman@gmail.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-11-06 18:07 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-05 14:16 [PATCH] grep: detect number of CPUs for thread spawning Ævar Arnfjörð Bjarmason
2011-11-06 14:50 ` Pete Wyckoff
2011-11-06 18:00 ` Eric Herman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).