* Better value for chunk_size when threaded
@ 2007-12-06 23:58 Jon Smirl
2007-12-07 1:27 ` Nicolas Pitre
2007-12-07 8:57 ` Andreas Ericsson
0 siblings, 2 replies; 7+ messages in thread
From: Jon Smirl @ 2007-12-06 23:58 UTC (permalink / raw)
To: Git Mailing List, Nicolas Pitre
I tried some various ideas out for chunk_size and the best strategy I
found was to simply set it to a constant. How does 20,000 work on
other CPUs?
I'd turn on default threaded support with this change. With threads=1
versus non-threaded there is no appreciable difference in the time.
Is there an API to ask how many CPUs are in the system? It would be
nice to default the number of threads equal to the number of CPUs and
only use pack.threads=X to override.
Making all of this work by default should help when outside people
decide to do a massive import.
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 4f44658..4d73be8 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1645,7 +1645,7 @@ static void ll_find_deltas(struct object_entry
**list, unsigned list_size,
}
/* this should be auto-tuned somehow */
- chunk_size = window * 1000;
+ chunk_size = 20000;
do {
unsigned sublist_size = chunk_size;
with chunk_size = 20000, everything is on a q6600 4GB
threads = 5
time git repack -a -d -f --depth=250 --window=250
real 6m20.123s
user 20m25.841s
sys 0m5.520s
threads = 4
time git repack -a -d -f --depth=250 --window=250
real 6m15.525s
user 20m20.852s
sys 0m5.356s
threads = 4
time git repack -a -d -f
real 1m31.537s
user 3m2.063s
sys 0m3.064s
threads = 1
time git repack -a -d -f --depth=250 --window=250
real 18m46.005s
user 18m43.122s
sys 0m1.228s
threads = 1
time git repack -a -d -f
real 2m57.774s
user 2m54.211s
sys 0m1.228s
Non-threaded
time git repack -a -d -f --depth=250 --window=250
real 18m51.183s
user 18m46.538s
sys 0m1.604s
Non-threaded
time git repack -a -d -f
real 2m54.849s
user 2m51.267s
sys 0m1.412s
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Better value for chunk_size when threaded
2007-12-06 23:58 Better value for chunk_size when threaded Jon Smirl
@ 2007-12-07 1:27 ` Nicolas Pitre
2007-12-07 1:37 ` Jon Smirl
2007-12-10 10:26 ` Andreas Ericsson
2007-12-07 8:57 ` Andreas Ericsson
1 sibling, 2 replies; 7+ messages in thread
From: Nicolas Pitre @ 2007-12-07 1:27 UTC (permalink / raw)
To: Jon Smirl; +Cc: Git Mailing List
On Thu, 6 Dec 2007, Jon Smirl wrote:
> I tried some various ideas out for chunk_size and the best strategy I
> found was to simply set it to a constant. How does 20,000 work on
> other CPUs?
That depends on the object size. If you have a repo with big objects
but only 1000 of them for example, then the constant doesn't work.
Ideally I'd opt for a value that tend towards around 5 seconds worth of
work per segment, or something like that. Maybe using the actual
objects size could be another way.
> I'd turn on default threaded support with this change. With threads=1
> versus non-threaded there is no appreciable difference in the time.
Would need a way to determine pthreads availability from Makefile.
> Is there an API to ask how many CPUs are in the system? It would be
> nice to default the number of threads equal to the number of CPUs and
> only use pack.threads=X to override.
If there is one besides futzing with /proc/cpuinfo I'd like to know
about it. Bonus points if it is portable.
Nicolas
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Better value for chunk_size when threaded
2007-12-07 1:27 ` Nicolas Pitre
@ 2007-12-07 1:37 ` Jon Smirl
2007-12-07 3:25 ` Nicolas Pitre
2007-12-10 10:26 ` Andreas Ericsson
1 sibling, 1 reply; 7+ messages in thread
From: Jon Smirl @ 2007-12-07 1:37 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Git Mailing List
On 12/6/07, Nicolas Pitre <nico@cam.org> wrote:
> On Thu, 6 Dec 2007, Jon Smirl wrote:
>
> > I tried some various ideas out for chunk_size and the best strategy I
> > found was to simply set it to a constant. How does 20,000 work on
> > other CPUs?
>
> That depends on the object size. If you have a repo with big objects
> but only 1000 of them for example, then the constant doesn't work.
How about defaulting it to 20,000 and allowing an override? It's not
fatal if we guess wrong, we just want to most common cases to work out
of the box. 20,000 is definitely better than the current window *
1000.
> Ideally I'd opt for a value that tend towards around 5 seconds worth of
> work per segment, or something like that. Maybe using the actual
> objects size could be another way.
>
> > I'd turn on default threaded support with this change. With threads=1
> > versus non-threaded there is no appreciable difference in the time.
>
> Would need a way to determine pthreads availability from Makefile.
configure knows if pthreads is there.
> > Is there an API to ask how many CPUs are in the system? It would be
> > nice to default the number of threads equal to the number of CPUs and
> > only use pack.threads=X to override.
>
> If there is one besides futzing with /proc/cpuinfo I'd like to know
> about it. Bonus points if it is portable.
>
>
> Nicolas
>
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Better value for chunk_size when threaded
2007-12-07 1:37 ` Jon Smirl
@ 2007-12-07 3:25 ` Nicolas Pitre
0 siblings, 0 replies; 7+ messages in thread
From: Nicolas Pitre @ 2007-12-07 3:25 UTC (permalink / raw)
To: Jon Smirl; +Cc: Git Mailing List
On Thu, 6 Dec 2007, Jon Smirl wrote:
> On 12/6/07, Nicolas Pitre <nico@cam.org> wrote:
> > On Thu, 6 Dec 2007, Jon Smirl wrote:
> >
> > > I tried some various ideas out for chunk_size and the best strategy I
> > > found was to simply set it to a constant. How does 20,000 work on
> > > other CPUs?
> >
> > That depends on the object size. If you have a repo with big objects
> > but only 1000 of them for example, then the constant doesn't work.
>
> How about defaulting it to 20,000 and allowing an override? It's not
> fatal if we guess wrong, we just want to most common cases to work out
> of the box. 20,000 is definitely better than the current window *
> 1000.
Sure.
... But I think this can be made much better than that with no guessing
at all.
Say you have 4 threads. then let's divide the whole object list into 4
big segments and feed those to each thread.
One thread will always finish before the others. The idea is to find
the active thread with the largest amount of remaining objects to
process at that point, and steal half of them and give that to the
thread that just finished. Repeat for each thread that completes its
segment until everything is done.
Nicolas
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Better value for chunk_size when threaded
2007-12-06 23:58 Better value for chunk_size when threaded Jon Smirl
2007-12-07 1:27 ` Nicolas Pitre
@ 2007-12-07 8:57 ` Andreas Ericsson
1 sibling, 0 replies; 7+ messages in thread
From: Andreas Ericsson @ 2007-12-07 8:57 UTC (permalink / raw)
To: Jon Smirl; +Cc: Git Mailing List, Nicolas Pitre
Jon Smirl wrote:
> I tried some various ideas out for chunk_size and the best strategy I
> found was to simply set it to a constant. How does 20,000 work on
> other CPUs?
>
> I'd turn on default threaded support with this change. With threads=1
> versus non-threaded there is no appreciable difference in the time.
>
> Is there an API to ask how many CPUs are in the system? It would be
> nice to default the number of threads equal to the number of CPUs and
> only use pack.threads=X to override.
>
I posted a patch to implement that just yesterday. It might need some
polishing, but it hasn't received any comments so far.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Better value for chunk_size when threaded
2007-12-07 1:27 ` Nicolas Pitre
2007-12-07 1:37 ` Jon Smirl
@ 2007-12-10 10:26 ` Andreas Ericsson
2007-12-10 13:46 ` Nicolas Pitre
1 sibling, 1 reply; 7+ messages in thread
From: Andreas Ericsson @ 2007-12-10 10:26 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Jon Smirl, Git Mailing List, Junio C Hamano
Nicolas Pitre wrote:
> On Thu, 6 Dec 2007, Jon Smirl wrote:
>
>> I tried some various ideas out for chunk_size and the best strategy I
>> found was to simply set it to a constant. How does 20,000 work on
>> other CPUs?
>
> That depends on the object size. If you have a repo with big objects
> but only 1000 of them for example, then the constant doesn't work.
>
> Ideally I'd opt for a value that tend towards around 5 seconds worth of
> work per segment, or something like that. Maybe using the actual
> objects size could be another way.
>
>> I'd turn on default threaded support with this change. With threads=1
>> versus non-threaded there is no appreciable difference in the time.
>
> Would need a way to determine pthreads availability from Makefile.
>
>> Is there an API to ask how many CPUs are in the system? It would be
>> nice to default the number of threads equal to the number of CPUs and
>> only use pack.threads=X to override.
>
> If there is one besides futzing with /proc/cpuinfo I'd like to know
> about it. Bonus points if it is portable.
>
Here is such a one. I've sent it before, using git-send-email, but that
one doesn't seem to work too well for all list-members, probably because
my own laptop appears to be the original SMTP-server and its name can't
be looked up. Sorry for inlining it here instead of sending it as a mail
on its own, but I have absolutely no idea how to get git-send-email to
do ldap authentication and connect to our tls-enabled smtp-server
without using /usr/bin/sendmail and adding my laptop as originating
smtp-server.
This patch replaces the one I sent earlier and *should* work on
everything from Irix and AIX to Linux, Windows and every other
posixish system. It passes all tests, both with and without
THREADED_DELTA_SEARCH, and causes our weekly repack of our
mother-ship repos to run roughly 4 times as fast (4 cores, no
previous thread config).
Extract with
sed -n -e /^##SEDMEHERE##/,/##TOHERE##/p -e /^##/d
##SEDMEHERE##
From ddf08303bd7962be385abbd5e964455a90ed6055 Mon Sep 17 00:00:00 2001
From: Andreas Ericsson <ae@op5.se>
Date: Thu, 6 Dec 2007 22:09:27 +0100
Subject: [PATCH] pack-objects: Add runtime detection of number of CPU's
Packing objects can be done in parallell nowadays, but
it's only done if the config option pack.threads is set
to a value above 1. Because of that, the code-path used
is sometimes not the most optimal one.
This patch adds a routine to detect the number of active
CPU's at runtime, which should provide a better default
and activate the (hopefully) better codepath more often.
The code is a rework of "numcpus.c", written by one
Philip Willoughby <pgw99@doc.ic.ac.uk>. numcpus.c is in
the public domain and can presently be downloaded from
http://csgsoft.doc.ic.ac.uk/numcpus/
Signed-off-by: Andreas Ericsson <ae@op5.se>
---
builtin-pack-objects.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 48 insertions(+), 1 deletions(-)
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 250dc56..ccf5198 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -17,6 +17,13 @@
#ifdef THREADED_DELTA_SEARCH
#include <pthread.h>
+# ifdef _WIN32
+# define WIN32_LEAN_AND_MEAN
+# include <windows.h>
+# endif
+# if defined(hpux) || defined(__hpux) || defined(_hpux)
+# include <sys/pstat.h>
+# endif
#endif
static const char pack_usage[] = "\
@@ -70,7 +77,7 @@ static int progress = 1;
static int window = 10;
static uint32_t pack_size_limit;
static int depth = 50;
-static int delta_search_threads = 1;
+static long delta_search_threads;
static int pack_to_stdout;
static int num_preferred_base;
static struct progress *progress_state;
@@ -2004,6 +2011,44 @@ static int adjust_perm(const char *path, mode_t mode)
return adjust_shared_perm(path);
}
+/*
+ * this is a disgusting nest of #ifdef's. I just love
+ * non-portable interfaces. By doing it in two steps
+ * we can get the function to be fairly coherent anyways
+ */
+#ifndef _SC_NPROCESSORS_ONLN
+# ifdef _SC_NPROC_ONLN
+# define _SC_NPROCESSORS_ONLN _SC_NPROC_ONLN
+# elif defined _SC_CRAY_NCPU
+# define _SC_NPROCESSORS_ONLN _SC_CRAY_NCPU
+# endif
+#endif
+static long active_cpu_count(void)
+{
+#ifdef THREADED_DELTA_SEARCH
+# ifdef _SC_NPROCESSORS_ONLN
+ long ncpus;
+
+ if ((ncpus = (long)sysconf(_SC_NPROCESSORS_ONLN)) > 0)
+ return ncpus;
+# else
+# ifdef _WIN32
+ SYSTEM_INFO info;
+ GetSystemInfo(&info);
+
+ return (long)info.dwNumberOfProcessors;
+# endif /* _WIN32 */
+# if defined(hpux) || defined(__hpux) || defined(_hpux)
+ struct pst_dynamic psd;
+
+ if (!pstat_getdynamic(&psd, sizeof(psd), (size_t)1, 0))
+ return (long)psd.psd_proc_cnt;
+# endif /* hpux */
+# endif /* _SC_NPROCESSORS_ONLN */
+#endif /* THREADED_DELTA_SEARCH */
+ return 1;
+}
+
int cmd_pack_objects(int argc, const char **argv, const char *prefix)
{
int use_internal_rev_list = 0;
@@ -2019,6 +2064,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
rp_av[1] = "--objects"; /* --thin will make it --objects-edge */
rp_ac = 2;
+ delta_search_threads = active_cpu_count();
+
git_config(git_pack_config);
if (!pack_compression_seen && core_compression_seen)
pack_compression_level = core_compression_level;
--
1.5.3.6.2031.gf9bdc
##TOHERE##
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Better value for chunk_size when threaded
2007-12-10 10:26 ` Andreas Ericsson
@ 2007-12-10 13:46 ` Nicolas Pitre
0 siblings, 0 replies; 7+ messages in thread
From: Nicolas Pitre @ 2007-12-10 13:46 UTC (permalink / raw)
To: Andreas Ericsson; +Cc: Jon Smirl, Git Mailing List, Junio C Hamano
On Mon, 10 Dec 2007, Andreas Ericsson wrote:
> Nicolas Pitre wrote:
> > If there is one besides futzing with /proc/cpuinfo I'd like to know about
> > it. Bonus points if it is portable.
> >
>
> Here is such a one. I've sent it before, using git-send-email, but that
> one doesn't seem to work too well for all list-members, probably because
> my own laptop appears to be the original SMTP-server and its name can't
> be looked up. Sorry for inlining it here instead of sending it as a mail
> on its own, but I have absolutely no idea how to get git-send-email to
> do ldap authentication and connect to our tls-enabled smtp-server
> without using /usr/bin/sendmail and adding my laptop as originating
> smtp-server.
>
> This patch replaces the one I sent earlier and *should* work on
> everything from Irix and AIX to Linux, Windows and every other
> posixish system. It passes all tests, both with and without
> THREADED_DELTA_SEARCH, and causes our weekly repack of our
> mother-ship repos to run roughly 4 times as fast (4 cores, no
> previous thread config).
>
> Extract with
> sed -n -e /^##SEDMEHERE##/,/##TOHERE##/p -e /^##/d
>
> ##SEDMEHERE##
> > From ddf08303bd7962be385abbd5e964455a90ed6055 Mon Sep 17 00:00:00 2001
> From: Andreas Ericsson <ae@op5.se>
> Date: Thu, 6 Dec 2007 22:09:27 +0100
> Subject: [PATCH] pack-objects: Add runtime detection of number of CPU's
>
> Packing objects can be done in parallell nowadays, but
> it's only done if the config option pack.threads is set
> to a value above 1. Because of that, the code-path used
> is sometimes not the most optimal one.
>
> This patch adds a routine to detect the number of active
> CPU's at runtime, which should provide a better default
> and activate the (hopefully) better codepath more often.
Your patch is whitespace dammaged.
Also please make it into a separate .c file. One day, maybe index-pack
will want to use it as well.
Nicolas
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-12-10 13:46 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-06 23:58 Better value for chunk_size when threaded Jon Smirl
2007-12-07 1:27 ` Nicolas Pitre
2007-12-07 1:37 ` Jon Smirl
2007-12-07 3:25 ` Nicolas Pitre
2007-12-10 10:26 ` Andreas Ericsson
2007-12-10 13:46 ` Nicolas Pitre
2007-12-07 8:57 ` Andreas Ericsson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).