All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Ericsson <ae@op5.se>
To: Nicolas Pitre <nico@cam.org>
Cc: Jon Smirl <jonsmirl@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: Better value for chunk_size when threaded
Date: Mon, 10 Dec 2007 11:26:59 +0100	[thread overview]
Message-ID: <475D1473.5090809@op5.se> (raw)
In-Reply-To: <alpine.LFD.0.99999.0712062014060.555@xanadu.home>

Nicolas Pitre wrote:
> On Thu, 6 Dec 2007, Jon Smirl wrote:
> 
>> I tried some various ideas out for chunk_size and the best strategy I
>> found was to simply set it to a constant. How does 20,000 work on
>> other CPUs?
> 
> That depends on the object size.  If you have a repo with big objects 
> but only 1000 of them for example, then the constant doesn't work.
> 
> Ideally I'd opt for a value that tend towards around 5 seconds worth of 
> work per segment, or something like that.  Maybe using the actual 
> objects size could be another way.
> 
>> I'd turn on default threaded support with this change. With threads=1
>> versus non-threaded there is no appreciable difference in the time.
> 
> Would need a way to determine pthreads availability from Makefile.
> 
>> Is there an API to ask how many CPUs are in the system? It would be
>> nice to default the number of threads equal to the number of CPUs and
>> only use pack.threads=X to override.
> 
> If there is one besides futzing with /proc/cpuinfo I'd like to know 
> about it.  Bonus points if it is portable.
> 

Here is such a one. I've sent it before, using git-send-email, but that
one doesn't seem to work too well for all list-members, probably because
my own laptop appears to be the original SMTP-server and its name can't
be looked up. Sorry for inlining it here instead of sending it as a mail
on its own, but I have absolutely no idea how to get git-send-email to
do ldap authentication and connect to our tls-enabled smtp-server
without using /usr/bin/sendmail and adding my laptop as originating
smtp-server.

This patch replaces the one I sent earlier and *should* work on
everything from Irix and AIX to Linux, Windows and every other
posixish system. It passes all tests, both with and without
THREADED_DELTA_SEARCH, and causes our weekly repack of our
mother-ship repos to run roughly 4 times as fast (4 cores, no
previous thread config).

Extract with
   sed -n -e /^##SEDMEHERE##/,/##TOHERE##/p -e /^##/d

##SEDMEHERE##
From ddf08303bd7962be385abbd5e964455a90ed6055 Mon Sep 17 00:00:00 2001
From: Andreas Ericsson <ae@op5.se>
Date: Thu, 6 Dec 2007 22:09:27 +0100
Subject: [PATCH] pack-objects: Add runtime detection of number of CPU's

Packing objects can be done in parallell nowadays, but
it's only done if the config option pack.threads is set
to a value above 1. Because of that, the code-path used
is sometimes not the most optimal one.

This patch adds a routine to detect the number of active
CPU's at runtime, which should provide a better default
and activate the (hopefully) better codepath more often.

The code is a rework of "numcpus.c", written by one
Philip Willoughby <pgw99@doc.ic.ac.uk>. numcpus.c is in
the public domain and can presently be downloaded from
http://csgsoft.doc.ic.ac.uk/numcpus/

Signed-off-by: Andreas Ericsson <ae@op5.se>
---
 builtin-pack-objects.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 48 insertions(+), 1 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 250dc56..ccf5198 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -17,6 +17,13 @@
 
 #ifdef THREADED_DELTA_SEARCH
 #include <pthread.h>
+# ifdef _WIN32
+#  define WIN32_LEAN_AND_MEAN
+# include <windows.h>
+# endif
+# if defined(hpux) || defined(__hpux) || defined(_hpux)
+#  include <sys/pstat.h>
+# endif
 #endif
 
 static const char pack_usage[] = "\
@@ -70,7 +77,7 @@ static int progress = 1;
 static int window = 10;
 static uint32_t pack_size_limit;
 static int depth = 50;
-static int delta_search_threads = 1;
+static long delta_search_threads;
 static int pack_to_stdout;
 static int num_preferred_base;
 static struct progress *progress_state;
@@ -2004,6 +2011,44 @@ static int adjust_perm(const char *path, mode_t mode)
 	return adjust_shared_perm(path);
 }
 
+/*
+ * this is a disgusting nest of #ifdef's. I just love
+ * non-portable interfaces. By doing it in two steps
+ * we can get the function to be fairly coherent anyways
+ */
+#ifndef _SC_NPROCESSORS_ONLN
+# ifdef _SC_NPROC_ONLN
+#  define _SC_NPROCESSORS_ONLN _SC_NPROC_ONLN
+# elif defined _SC_CRAY_NCPU
+#  define _SC_NPROCESSORS_ONLN _SC_CRAY_NCPU
+# endif
+#endif
+static long active_cpu_count(void)
+{
+#ifdef THREADED_DELTA_SEARCH
+# ifdef _SC_NPROCESSORS_ONLN
+	long ncpus;
+
+	if ((ncpus = (long)sysconf(_SC_NPROCESSORS_ONLN)) > 0)
+		return ncpus;
+# else
+#  ifdef _WIN32
+	SYSTEM_INFO info;
+	GetSystemInfo(&info);
+
+	return (long)info.dwNumberOfProcessors;
+#  endif /* _WIN32 */
+#  if defined(hpux) || defined(__hpux) || defined(_hpux)
+	struct pst_dynamic psd;
+
+	if (!pstat_getdynamic(&psd, sizeof(psd), (size_t)1, 0))
+		return (long)psd.psd_proc_cnt;
+#  endif /* hpux */
+# endif /* _SC_NPROCESSORS_ONLN */
+#endif /* THREADED_DELTA_SEARCH */
+	return 1;
+}
+
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
@@ -2019,6 +2064,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	rp_av[1] = "--objects"; /* --thin will make it --objects-edge */
 	rp_ac = 2;
 
+	delta_search_threads = active_cpu_count();
+
 	git_config(git_pack_config);
 	if (!pack_compression_seen && core_compression_seen)
 		pack_compression_level = core_compression_level;
-- 
1.5.3.6.2031.gf9bdc
##TOHERE##

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

  parent reply	other threads:[~2007-12-10 10:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-06 23:58 Better value for chunk_size when threaded Jon Smirl
2007-12-07  1:27 ` Nicolas Pitre
2007-12-07  1:37   ` Jon Smirl
2007-12-07  3:25     ` Nicolas Pitre
2007-12-10 10:26   ` Andreas Ericsson [this message]
2007-12-10 13:46     ` Nicolas Pitre
2007-12-07  8:57 ` Andreas Ericsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=475D1473.5090809@op5.se \
    --to=ae@op5.se \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonsmirl@gmail.com \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.