* Better value for chunk_size when threaded @ 2007-12-06 23:58 Jon Smirl 2007-12-07 1:27 ` Nicolas Pitre 2007-12-07 8:57 ` Andreas Ericsson 0 siblings, 2 replies; 7+ messages in thread From: Jon Smirl @ 2007-12-06 23:58 UTC (permalink / raw) To: Git Mailing List, Nicolas Pitre I tried some various ideas out for chunk_size and the best strategy I found was to simply set it to a constant. How does 20,000 work on other CPUs? I'd turn on default threaded support with this change. With threads=1 versus non-threaded there is no appreciable difference in the time. Is there an API to ask how many CPUs are in the system? It would be nice to default the number of threads equal to the number of CPUs and only use pack.threads=X to override. Making all of this work by default should help when outside people decide to do a massive import. diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 4f44658..4d73be8 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1645,7 +1645,7 @@ static void ll_find_deltas(struct object_entry **list, unsigned list_size, } /* this should be auto-tuned somehow */ - chunk_size = window * 1000; + chunk_size = 20000; do { unsigned sublist_size = chunk_size; with chunk_size = 20000, everything is on a q6600 4GB threads = 5 time git repack -a -d -f --depth=250 --window=250 real 6m20.123s user 20m25.841s sys 0m5.520s threads = 4 time git repack -a -d -f --depth=250 --window=250 real 6m15.525s user 20m20.852s sys 0m5.356s threads = 4 time git repack -a -d -f real 1m31.537s user 3m2.063s sys 0m3.064s threads = 1 time git repack -a -d -f --depth=250 --window=250 real 18m46.005s user 18m43.122s sys 0m1.228s threads = 1 time git repack -a -d -f real 2m57.774s user 2m54.211s sys 0m1.228s Non-threaded time git repack -a -d -f --depth=250 --window=250 real 18m51.183s user 18m46.538s sys 0m1.604s Non-threaded time git repack -a -d -f real 2m54.849s user 2m51.267s sys 0m1.412s -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Better value for chunk_size when threaded 2007-12-06 23:58 Better value for chunk_size when threaded Jon Smirl @ 2007-12-07 1:27 ` Nicolas Pitre 2007-12-07 1:37 ` Jon Smirl 2007-12-10 10:26 ` Andreas Ericsson 2007-12-07 8:57 ` Andreas Ericsson 1 sibling, 2 replies; 7+ messages in thread From: Nicolas Pitre @ 2007-12-07 1:27 UTC (permalink / raw) To: Jon Smirl; +Cc: Git Mailing List On Thu, 6 Dec 2007, Jon Smirl wrote: > I tried some various ideas out for chunk_size and the best strategy I > found was to simply set it to a constant. How does 20,000 work on > other CPUs? That depends on the object size. If you have a repo with big objects but only 1000 of them for example, then the constant doesn't work. Ideally I'd opt for a value that tend towards around 5 seconds worth of work per segment, or something like that. Maybe using the actual objects size could be another way. > I'd turn on default threaded support with this change. With threads=1 > versus non-threaded there is no appreciable difference in the time. Would need a way to determine pthreads availability from Makefile. > Is there an API to ask how many CPUs are in the system? It would be > nice to default the number of threads equal to the number of CPUs and > only use pack.threads=X to override. If there is one besides futzing with /proc/cpuinfo I'd like to know about it. Bonus points if it is portable. Nicolas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Better value for chunk_size when threaded 2007-12-07 1:27 ` Nicolas Pitre @ 2007-12-07 1:37 ` Jon Smirl 2007-12-07 3:25 ` Nicolas Pitre 2007-12-10 10:26 ` Andreas Ericsson 1 sibling, 1 reply; 7+ messages in thread From: Jon Smirl @ 2007-12-07 1:37 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Git Mailing List On 12/6/07, Nicolas Pitre <nico@cam.org> wrote: > On Thu, 6 Dec 2007, Jon Smirl wrote: > > > I tried some various ideas out for chunk_size and the best strategy I > > found was to simply set it to a constant. How does 20,000 work on > > other CPUs? > > That depends on the object size. If you have a repo with big objects > but only 1000 of them for example, then the constant doesn't work. How about defaulting it to 20,000 and allowing an override? It's not fatal if we guess wrong, we just want to most common cases to work out of the box. 20,000 is definitely better than the current window * 1000. > Ideally I'd opt for a value that tend towards around 5 seconds worth of > work per segment, or something like that. Maybe using the actual > objects size could be another way. > > > I'd turn on default threaded support with this change. With threads=1 > > versus non-threaded there is no appreciable difference in the time. > > Would need a way to determine pthreads availability from Makefile. configure knows if pthreads is there. > > Is there an API to ask how many CPUs are in the system? It would be > > nice to default the number of threads equal to the number of CPUs and > > only use pack.threads=X to override. > > If there is one besides futzing with /proc/cpuinfo I'd like to know > about it. Bonus points if it is portable. > > > Nicolas > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Better value for chunk_size when threaded 2007-12-07 1:37 ` Jon Smirl @ 2007-12-07 3:25 ` Nicolas Pitre 0 siblings, 0 replies; 7+ messages in thread From: Nicolas Pitre @ 2007-12-07 3:25 UTC (permalink / raw) To: Jon Smirl; +Cc: Git Mailing List On Thu, 6 Dec 2007, Jon Smirl wrote: > On 12/6/07, Nicolas Pitre <nico@cam.org> wrote: > > On Thu, 6 Dec 2007, Jon Smirl wrote: > > > > > I tried some various ideas out for chunk_size and the best strategy I > > > found was to simply set it to a constant. How does 20,000 work on > > > other CPUs? > > > > That depends on the object size. If you have a repo with big objects > > but only 1000 of them for example, then the constant doesn't work. > > How about defaulting it to 20,000 and allowing an override? It's not > fatal if we guess wrong, we just want to most common cases to work out > of the box. 20,000 is definitely better than the current window * > 1000. Sure. ... But I think this can be made much better than that with no guessing at all. Say you have 4 threads. then let's divide the whole object list into 4 big segments and feed those to each thread. One thread will always finish before the others. The idea is to find the active thread with the largest amount of remaining objects to process at that point, and steal half of them and give that to the thread that just finished. Repeat for each thread that completes its segment until everything is done. Nicolas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Better value for chunk_size when threaded 2007-12-07 1:27 ` Nicolas Pitre 2007-12-07 1:37 ` Jon Smirl @ 2007-12-10 10:26 ` Andreas Ericsson 2007-12-10 13:46 ` Nicolas Pitre 1 sibling, 1 reply; 7+ messages in thread From: Andreas Ericsson @ 2007-12-10 10:26 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jon Smirl, Git Mailing List, Junio C Hamano Nicolas Pitre wrote: > On Thu, 6 Dec 2007, Jon Smirl wrote: > >> I tried some various ideas out for chunk_size and the best strategy I >> found was to simply set it to a constant. How does 20,000 work on >> other CPUs? > > That depends on the object size. If you have a repo with big objects > but only 1000 of them for example, then the constant doesn't work. > > Ideally I'd opt for a value that tend towards around 5 seconds worth of > work per segment, or something like that. Maybe using the actual > objects size could be another way. > >> I'd turn on default threaded support with this change. With threads=1 >> versus non-threaded there is no appreciable difference in the time. > > Would need a way to determine pthreads availability from Makefile. > >> Is there an API to ask how many CPUs are in the system? It would be >> nice to default the number of threads equal to the number of CPUs and >> only use pack.threads=X to override. > > If there is one besides futzing with /proc/cpuinfo I'd like to know > about it. Bonus points if it is portable. > Here is such a one. I've sent it before, using git-send-email, but that one doesn't seem to work too well for all list-members, probably because my own laptop appears to be the original SMTP-server and its name can't be looked up. Sorry for inlining it here instead of sending it as a mail on its own, but I have absolutely no idea how to get git-send-email to do ldap authentication and connect to our tls-enabled smtp-server without using /usr/bin/sendmail and adding my laptop as originating smtp-server. This patch replaces the one I sent earlier and *should* work on everything from Irix and AIX to Linux, Windows and every other posixish system. It passes all tests, both with and without THREADED_DELTA_SEARCH, and causes our weekly repack of our mother-ship repos to run roughly 4 times as fast (4 cores, no previous thread config). Extract with sed -n -e /^##SEDMEHERE##/,/##TOHERE##/p -e /^##/d ##SEDMEHERE## From ddf08303bd7962be385abbd5e964455a90ed6055 Mon Sep 17 00:00:00 2001 From: Andreas Ericsson <ae@op5.se> Date: Thu, 6 Dec 2007 22:09:27 +0100 Subject: [PATCH] pack-objects: Add runtime detection of number of CPU's Packing objects can be done in parallell nowadays, but it's only done if the config option pack.threads is set to a value above 1. Because of that, the code-path used is sometimes not the most optimal one. This patch adds a routine to detect the number of active CPU's at runtime, which should provide a better default and activate the (hopefully) better codepath more often. The code is a rework of "numcpus.c", written by one Philip Willoughby <pgw99@doc.ic.ac.uk>. numcpus.c is in the public domain and can presently be downloaded from http://csgsoft.doc.ic.ac.uk/numcpus/ Signed-off-by: Andreas Ericsson <ae@op5.se> --- builtin-pack-objects.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 48 insertions(+), 1 deletions(-) diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 250dc56..ccf5198 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -17,6 +17,13 @@ #ifdef THREADED_DELTA_SEARCH #include <pthread.h> +# ifdef _WIN32 +# define WIN32_LEAN_AND_MEAN +# include <windows.h> +# endif +# if defined(hpux) || defined(__hpux) || defined(_hpux) +# include <sys/pstat.h> +# endif #endif static const char pack_usage[] = "\ @@ -70,7 +77,7 @@ static int progress = 1; static int window = 10; static uint32_t pack_size_limit; static int depth = 50; -static int delta_search_threads = 1; +static long delta_search_threads; static int pack_to_stdout; static int num_preferred_base; static struct progress *progress_state; @@ -2004,6 +2011,44 @@ static int adjust_perm(const char *path, mode_t mode) return adjust_shared_perm(path); } +/* + * this is a disgusting nest of #ifdef's. I just love + * non-portable interfaces. By doing it in two steps + * we can get the function to be fairly coherent anyways + */ +#ifndef _SC_NPROCESSORS_ONLN +# ifdef _SC_NPROC_ONLN +# define _SC_NPROCESSORS_ONLN _SC_NPROC_ONLN +# elif defined _SC_CRAY_NCPU +# define _SC_NPROCESSORS_ONLN _SC_CRAY_NCPU +# endif +#endif +static long active_cpu_count(void) +{ +#ifdef THREADED_DELTA_SEARCH +# ifdef _SC_NPROCESSORS_ONLN + long ncpus; + + if ((ncpus = (long)sysconf(_SC_NPROCESSORS_ONLN)) > 0) + return ncpus; +# else +# ifdef _WIN32 + SYSTEM_INFO info; + GetSystemInfo(&info); + + return (long)info.dwNumberOfProcessors; +# endif /* _WIN32 */ +# if defined(hpux) || defined(__hpux) || defined(_hpux) + struct pst_dynamic psd; + + if (!pstat_getdynamic(&psd, sizeof(psd), (size_t)1, 0)) + return (long)psd.psd_proc_cnt; +# endif /* hpux */ +# endif /* _SC_NPROCESSORS_ONLN */ +#endif /* THREADED_DELTA_SEARCH */ + return 1; +} + int cmd_pack_objects(int argc, const char **argv, const char *prefix) { int use_internal_rev_list = 0; @@ -2019,6 +2064,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) rp_av[1] = "--objects"; /* --thin will make it --objects-edge */ rp_ac = 2; + delta_search_threads = active_cpu_count(); + git_config(git_pack_config); if (!pack_compression_seen && core_compression_seen) pack_compression_level = core_compression_level; -- 1.5.3.6.2031.gf9bdc ##TOHERE## -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Better value for chunk_size when threaded 2007-12-10 10:26 ` Andreas Ericsson @ 2007-12-10 13:46 ` Nicolas Pitre 0 siblings, 0 replies; 7+ messages in thread From: Nicolas Pitre @ 2007-12-10 13:46 UTC (permalink / raw) To: Andreas Ericsson; +Cc: Jon Smirl, Git Mailing List, Junio C Hamano On Mon, 10 Dec 2007, Andreas Ericsson wrote: > Nicolas Pitre wrote: > > If there is one besides futzing with /proc/cpuinfo I'd like to know about > > it. Bonus points if it is portable. > > > > Here is such a one. I've sent it before, using git-send-email, but that > one doesn't seem to work too well for all list-members, probably because > my own laptop appears to be the original SMTP-server and its name can't > be looked up. Sorry for inlining it here instead of sending it as a mail > on its own, but I have absolutely no idea how to get git-send-email to > do ldap authentication and connect to our tls-enabled smtp-server > without using /usr/bin/sendmail and adding my laptop as originating > smtp-server. > > This patch replaces the one I sent earlier and *should* work on > everything from Irix and AIX to Linux, Windows and every other > posixish system. It passes all tests, both with and without > THREADED_DELTA_SEARCH, and causes our weekly repack of our > mother-ship repos to run roughly 4 times as fast (4 cores, no > previous thread config). > > Extract with > sed -n -e /^##SEDMEHERE##/,/##TOHERE##/p -e /^##/d > > ##SEDMEHERE## > > From ddf08303bd7962be385abbd5e964455a90ed6055 Mon Sep 17 00:00:00 2001 > From: Andreas Ericsson <ae@op5.se> > Date: Thu, 6 Dec 2007 22:09:27 +0100 > Subject: [PATCH] pack-objects: Add runtime detection of number of CPU's > > Packing objects can be done in parallell nowadays, but > it's only done if the config option pack.threads is set > to a value above 1. Because of that, the code-path used > is sometimes not the most optimal one. > > This patch adds a routine to detect the number of active > CPU's at runtime, which should provide a better default > and activate the (hopefully) better codepath more often. Your patch is whitespace dammaged. Also please make it into a separate .c file. One day, maybe index-pack will want to use it as well. Nicolas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Better value for chunk_size when threaded 2007-12-06 23:58 Better value for chunk_size when threaded Jon Smirl 2007-12-07 1:27 ` Nicolas Pitre @ 2007-12-07 8:57 ` Andreas Ericsson 1 sibling, 0 replies; 7+ messages in thread From: Andreas Ericsson @ 2007-12-07 8:57 UTC (permalink / raw) To: Jon Smirl; +Cc: Git Mailing List, Nicolas Pitre Jon Smirl wrote: > I tried some various ideas out for chunk_size and the best strategy I > found was to simply set it to a constant. How does 20,000 work on > other CPUs? > > I'd turn on default threaded support with this change. With threads=1 > versus non-threaded there is no appreciable difference in the time. > > Is there an API to ask how many CPUs are in the system? It would be > nice to default the number of threads equal to the number of CPUs and > only use pack.threads=X to override. > I posted a patch to implement that just yesterday. It might need some polishing, but it hasn't received any comments so far. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-12-10 13:46 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-12-06 23:58 Better value for chunk_size when threaded Jon Smirl 2007-12-07 1:27 ` Nicolas Pitre 2007-12-07 1:37 ` Jon Smirl 2007-12-07 3:25 ` Nicolas Pitre 2007-12-10 10:26 ` Andreas Ericsson 2007-12-10 13:46 ` Nicolas Pitre 2007-12-07 8:57 ` Andreas Ericsson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).