git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Ericsson <ae@op5.se>
To: Nicolas Pitre <nico@cam.org>
Cc: Jon Smirl <jonsmirl@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: Better value for chunk_size when threaded
Date: Mon, 10 Dec 2007 11:26:59 +0100	[thread overview]
Message-ID: <475D1473.5090809@op5.se> (raw)
In-Reply-To: <alpine.LFD.0.99999.0712062014060.555@xanadu.home>

Nicolas Pitre wrote:
> On Thu, 6 Dec 2007, Jon Smirl wrote:
> 
>> I tried some various ideas out for chunk_size and the best strategy I
>> found was to simply set it to a constant. How does 20,000 work on
>> other CPUs?
> 
> That depends on the object size.  If you have a repo with big objects 
> but only 1000 of them for example, then the constant doesn't work.
> 
> Ideally I'd opt for a value that tend towards around 5 seconds worth of 
> work per segment, or something like that.  Maybe using the actual 
> objects size could be another way.
> 
>> I'd turn on default threaded support with this change. With threads=1
>> versus non-threaded there is no appreciable difference in the time.
> 
> Would need a way to determine pthreads availability from Makefile.
> 
>> Is there an API to ask how many CPUs are in the system? It would be
>> nice to default the number of threads equal to the number of CPUs and
>> only use pack.threads=X to override.
> 
> If there is one besides futzing with /proc/cpuinfo I'd like to know 
> about it.  Bonus points if it is portable.
> 

Here is such a one. I've sent it before, using git-send-email, but that
one doesn't seem to work too well for all list-members, probably because
my own laptop appears to be the original SMTP-server and its name can't
be looked up. Sorry for inlining it here instead of sending it as a mail
on its own, but I have absolutely no idea how to get git-send-email to
do ldap authentication and connect to our tls-enabled smtp-server
without using /usr/bin/sendmail and adding my laptop as originating
smtp-server.

This patch replaces the one I sent earlier and *should* work on
everything from Irix and AIX to Linux, Windows and every other
posixish system. It passes all tests, both with and without
THREADED_DELTA_SEARCH, and causes our weekly repack of our
mother-ship repos to run roughly 4 times as fast (4 cores, no
previous thread config).

Extract with
   sed -n -e /^##SEDMEHERE##/,/##TOHERE##/p -e /^##/d

##SEDMEHERE##
From ddf08303bd7962be385abbd5e964455a90ed6055 Mon Sep 17 00:00:00 2001
From: Andreas Ericsson <ae@op5.se>
Date: Thu, 6 Dec 2007 22:09:27 +0100
Subject: [PATCH] pack-objects: Add runtime detection of number of CPU's

Packing objects can be done in parallell nowadays, but
it's only done if the config option pack.threads is set
to a value above 1. Because of that, the code-path used
is sometimes not the most optimal one.

This patch adds a routine to detect the number of active
CPU's at runtime, which should provide a better default
and activate the (hopefully) better codepath more often.

The code is a rework of "numcpus.c", written by one
Philip Willoughby <pgw99@doc.ic.ac.uk>. numcpus.c is in
the public domain and can presently be downloaded from
http://csgsoft.doc.ic.ac.uk/numcpus/

Signed-off-by: Andreas Ericsson <ae@op5.se>
---
 builtin-pack-objects.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 48 insertions(+), 1 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 250dc56..ccf5198 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -17,6 +17,13 @@
 
 #ifdef THREADED_DELTA_SEARCH
 #include <pthread.h>
+# ifdef _WIN32
+#  define WIN32_LEAN_AND_MEAN
+# include <windows.h>
+# endif
+# if defined(hpux) || defined(__hpux) || defined(_hpux)
+#  include <sys/pstat.h>
+# endif
 #endif
 
 static const char pack_usage[] = "\
@@ -70,7 +77,7 @@ static int progress = 1;
 static int window = 10;
 static uint32_t pack_size_limit;
 static int depth = 50;
-static int delta_search_threads = 1;
+static long delta_search_threads;
 static int pack_to_stdout;
 static int num_preferred_base;
 static struct progress *progress_state;
@@ -2004,6 +2011,44 @@ static int adjust_perm(const char *path, mode_t mode)
 	return adjust_shared_perm(path);
 }
 
+/*
+ * this is a disgusting nest of #ifdef's. I just love
+ * non-portable interfaces. By doing it in two steps
+ * we can get the function to be fairly coherent anyways
+ */
+#ifndef _SC_NPROCESSORS_ONLN
+# ifdef _SC_NPROC_ONLN
+#  define _SC_NPROCESSORS_ONLN _SC_NPROC_ONLN
+# elif defined _SC_CRAY_NCPU
+#  define _SC_NPROCESSORS_ONLN _SC_CRAY_NCPU
+# endif
+#endif
+static long active_cpu_count(void)
+{
+#ifdef THREADED_DELTA_SEARCH
+# ifdef _SC_NPROCESSORS_ONLN
+	long ncpus;
+
+	if ((ncpus = (long)sysconf(_SC_NPROCESSORS_ONLN)) > 0)
+		return ncpus;
+# else
+#  ifdef _WIN32
+	SYSTEM_INFO info;
+	GetSystemInfo(&info);
+
+	return (long)info.dwNumberOfProcessors;
+#  endif /* _WIN32 */
+#  if defined(hpux) || defined(__hpux) || defined(_hpux)
+	struct pst_dynamic psd;
+
+	if (!pstat_getdynamic(&psd, sizeof(psd), (size_t)1, 0))
+		return (long)psd.psd_proc_cnt;
+#  endif /* hpux */
+# endif /* _SC_NPROCESSORS_ONLN */
+#endif /* THREADED_DELTA_SEARCH */
+	return 1;
+}
+
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
@@ -2019,6 +2064,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	rp_av[1] = "--objects"; /* --thin will make it --objects-edge */
 	rp_ac = 2;
 
+	delta_search_threads = active_cpu_count();
+
 	git_config(git_pack_config);
 	if (!pack_compression_seen && core_compression_seen)
 		pack_compression_level = core_compression_level;
-- 
1.5.3.6.2031.gf9bdc
##TOHERE##

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

  parent reply	other threads:[~2007-12-10 10:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-06 23:58 Better value for chunk_size when threaded Jon Smirl
2007-12-07  1:27 ` Nicolas Pitre
2007-12-07  1:37   ` Jon Smirl
2007-12-07  3:25     ` Nicolas Pitre
2007-12-10 10:26   ` Andreas Ericsson [this message]
2007-12-10 13:46     ` Nicolas Pitre
2007-12-07  8:57 ` Andreas Ericsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=475D1473.5090809@op5.se \
    --to=ae@op5.se \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonsmirl@gmail.com \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).