git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] index-pack: split second pass obj handling into own function
@ 2012-02-28  4:36 Nguyễn Thái Ngọc Duy
  2012-02-28  4:36 ` [PATCH 2/2] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 12+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-02-28  4:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/index-pack.c |   31 ++++++++++++++++++-------------
 1 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index dd1c5c9..918684f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -682,6 +682,23 @@ static int compare_delta_entry(const void *a, const void *b)
 				   objects[delta_b->obj_no].type);
 }
 
+/*
+ * Second pass:
+ * - for all non-delta objects, look if it is used as a base for
+ *   deltas;
+ * - if used as a base, uncompress the object and apply all deltas,
+ *   recursively checking if the resulting object is used as a base
+ *   for some more deltas.
+ */
+static void second_pass(struct object_entry *obj)
+{
+	struct base_data *base_obj = alloc_base_data();
+	base_obj->obj = obj;
+	base_obj->data = NULL;
+	find_unresolved_deltas(base_obj);
+	display_progress(progress, nr_resolved_deltas);
+}
+
 /* Parse all objects and return the pack content SHA1 hash */
 static void parse_pack_objects(unsigned char *sha1)
 {
@@ -736,26 +753,14 @@ static void parse_pack_objects(unsigned char *sha1)
 	qsort(deltas, nr_deltas, sizeof(struct delta_entry),
 	      compare_delta_entry);
 
-	/*
-	 * Second pass:
-	 * - for all non-delta objects, look if it is used as a base for
-	 *   deltas;
-	 * - if used as a base, uncompress the object and apply all deltas,
-	 *   recursively checking if the resulting object is used as a base
-	 *   for some more deltas.
-	 */
 	if (verbose)
 		progress = start_progress("Resolving deltas", nr_deltas);
 	for (i = 0; i < nr_objects; i++) {
 		struct object_entry *obj = &objects[i];
-		struct base_data *base_obj = alloc_base_data();
 
 		if (is_delta_type(obj->type))
 			continue;
-		base_obj->obj = obj;
-		base_obj->data = NULL;
-		find_unresolved_deltas(base_obj);
-		display_progress(progress, nr_resolved_deltas);
+		second_pass(obj);
 	}
 }
 
-- 
1.7.3.1.256.g2539c.dirty

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] index-pack: support multithreaded delta resolving
  2012-02-28  4:36 [PATCH 1/2] index-pack: split second pass obj handling into own function Nguyễn Thái Ngọc Duy
@ 2012-02-28  4:36 ` Nguyễn Thái Ngọc Duy
  2012-03-02  6:09   ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-02-28  4:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This puts delta resolving on each base on a separate thread, one base
cache per thread.

An experiment on a 24 core machine with git.git shows that performance
does not increase proportional to the number of cores. So by default,
we use maximum 3 cores.

$ /usr/bin/time ~/t/git index-pack --threads=1 -v --stdin < XXX.pack
Receiving objects: 100% (146564/146564), 53.99 MiB | 17.47 MiB/s, done.
Resolving deltas: 100% (109205/109205), done.
pack    d5471e8365717a5812cbc81ec7277cb697a80f08
11.58user 0.37system 0:12.04elapsed 99%CPU (0avgtext+0avgdata 375088maxresident)k
0inputs+118592outputs (0major+56894minor)pagefaults 0swaps

$ ... --threads=2 ...
14.58user 0.47system 0:09.99elapsed 150%CPU (0avgtext+0avgdata 411536maxresident)k
0inputs+118592outputs (0major+79961minor)pagefaults 0swaps

$ ... --threads=3 ...
14.36user 0.64system 0:08.12elapsed 184%CPU (0avgtext+0avgdata 393312maxresident)k
0inputs+118592outputs (0major+50998minor)pagefaults 0swaps

$ ... --threads=4 ...
15.81user 0.71system 0:08.17elapsed 202%CPU (0avgtext+0avgdata 419152maxresident)k
0inputs+118592outputs (0major+54907minor)pagefaults 0swaps

$ ... --threads=5 ...
14.76user 0.72system 0:07.06elapsed 219%CPU (0avgtext+0avgdata 414112maxresident)k
0inputs+118592outputs (0major+59547minor)pagefaults 0swaps

$ ... --threads=8 ...
15.98user 0.81system 0:07.71elapsed 217%CPU (0avgtext+0avgdata 429904maxresident)k
0inputs+118592outputs (0major+66221minor)pagefaults 0swaps

$ ... --threads=12 ...
15.81user 0.74system 0:09.60elapsed 172%CPU (0avgtext+0avgdata 442336maxresident)k
0inputs+118592outputs (0major+61353minor)pagefaults 0swaps

$ ... --threads=16 ...
15.41user 0.57system 0:11.62elapsed 137%CPU (0avgtext+0avgdata 451728maxresident)k
0inputs+118592outputs (0major+63569minor)pagefaults 0swaps

$ ... --threads=24 ...
15.84user 0.63system 0:12.83elapsed 128%CPU (0avgtext+0avgdata 475728maxresident)k
0inputs+118592outputs (0major+58013minor)pagefaults 0swaps

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 On linux-2.6.git, 2 cores, real time from 3m18 went down to around 2m.

 Documentation/git-index-pack.txt |   10 ++
 Makefile                         |    2 +-
 builtin/index-pack.c             |  188 ++++++++++++++++++++++++++++++++++----
 3 files changed, 180 insertions(+), 20 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 909687f..39e6d0d 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -74,6 +74,16 @@ OPTIONS
 --strict::
 	Die, if the pack contains broken objects or links.
 
+--threads=<n>::
+	Specifies the number of threads to spawn when resolving
+	deltas. This requires that index-pack be compiled with
+	pthreads otherwise this option is ignored with a warning.
+	This is meant to reduce packing time on multiprocessor
+	machines. The required amount of memory for the delta search
+	window is however multiplied by the number of threads.
+	Specifying 0 will cause git to auto-detect the number of CPU's
+	and use maximum 3 threads.
+
 
 Note
 ----
diff --git a/Makefile b/Makefile
index 1fb1705..5fae875 100644
--- a/Makefile
+++ b/Makefile
@@ -2159,7 +2159,7 @@ builtin/branch.o builtin/checkout.o builtin/clone.o builtin/reset.o branch.o tra
 builtin/bundle.o bundle.o transport.o: bundle.h
 builtin/bisect--helper.o builtin/rev-list.o bisect.o: bisect.h
 builtin/clone.o builtin/fetch-pack.o transport.o: fetch-pack.h
-builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
+builtin/index-pack.o builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
 builtin/send-pack.o transport.o: send-pack.h
 builtin/log.o builtin/shortlog.o: shortlog.h
 builtin/prune.o builtin/reflog.o reachable.o: reachable.h
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 918684f..e331f23 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -9,6 +9,7 @@
 #include "progress.h"
 #include "fsck.h"
 #include "exec_cmd.h"
+#include "thread-utils.h"
 
 static const char index_pack_usage[] =
 "git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
@@ -38,6 +39,15 @@ struct base_data {
 	int ofs_first, ofs_last;
 };
 
+struct thread_local {
+#ifndef NO_PTHREADS
+	pthread_t thread;
+#endif
+	struct base_data *base_cache;
+	size_t base_cache_used;
+	int nr_resolved_deltas;
+};
+
 /*
  * Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want
  * to memcmp() only the first 20 bytes.
@@ -54,11 +64,12 @@ struct delta_entry {
 
 static struct object_entry *objects;
 static struct delta_entry *deltas;
-static struct base_data *base_cache;
-static size_t base_cache_used;
+static struct thread_local *thread_data;
 static int nr_objects;
+static int nr_processed;
 static int nr_deltas;
 static int nr_resolved_deltas;
+static int nr_threads;
 
 static int from_stdin;
 static int strict;
@@ -75,6 +86,42 @@ static git_SHA_CTX input_ctx;
 static uint32_t input_crc32;
 static int input_fd, output_fd, pack_fd;
 
+#ifndef NO_PTHREADS
+
+static pthread_mutex_t read_mutex;
+#define read_lock()		pthread_mutex_lock(&read_mutex)
+#define read_unlock()		pthread_mutex_unlock(&read_mutex)
+
+static pthread_mutex_t work_mutex;
+#define work_lock()		pthread_mutex_lock(&work_mutex)
+#define work_unlock()		pthread_mutex_unlock(&work_mutex)
+
+/*
+ * Mutex and conditional variable can't be statically-initialized on Windows.
+ */
+static void init_thread(void)
+{
+	init_recursive_mutex(&read_mutex);
+	pthread_mutex_init(&work_mutex, NULL);
+}
+
+static void cleanup_thread(void)
+{
+	pthread_mutex_destroy(&read_mutex);
+	pthread_mutex_destroy(&work_mutex);
+}
+
+#else
+
+#define read_lock()
+#define read_unlock()
+
+#define work_lock()
+#define work_unlock()
+
+#endif
+
+
 static int mark_link(struct object *obj, int type, void *data)
 {
 	if (!obj)
@@ -223,6 +270,18 @@ static NORETURN void bad_object(unsigned long offset, const char *format, ...)
 	die("pack has bad object at offset %lu: %s", offset, buf);
 }
 
+static struct thread_local *get_thread_data()
+{
+#ifndef NO_PTHREADS
+	int i;
+	pthread_t self = pthread_self();
+	for (i = 1; i < nr_threads; i++)
+		if (self == thread_data[i].thread)
+			return &thread_data[i];
+#endif
+	return &thread_data[0];
+}
+
 static struct base_data *alloc_base_data(void)
 {
 	struct base_data *base = xmalloc(sizeof(struct base_data));
@@ -237,15 +296,16 @@ static void free_base_data(struct base_data *c)
 	if (c->data) {
 		free(c->data);
 		c->data = NULL;
-		base_cache_used -= c->size;
+		get_thread_data()->base_cache_used -= c->size;
 	}
 }
 
 static void prune_base_data(struct base_data *retain)
 {
 	struct base_data *b;
-	for (b = base_cache;
-	     base_cache_used > delta_base_cache_limit && b;
+	struct thread_local *data = get_thread_data();
+	for (b = data->base_cache;
+	     data->base_cache_used > delta_base_cache_limit && b;
 	     b = b->child) {
 		if (b->data && b != retain)
 			free_base_data(b);
@@ -257,22 +317,23 @@ static void link_base_data(struct base_data *base, struct base_data *c)
 	if (base)
 		base->child = c;
 	else
-		base_cache = c;
+		get_thread_data()->base_cache = c;
 
 	c->base = base;
 	c->child = NULL;
 	if (c->data)
-		base_cache_used += c->size;
+		get_thread_data()->base_cache_used += c->size;
 	prune_base_data(c);
 }
 
 static void unlink_base_data(struct base_data *c)
 {
-	struct base_data *base = c->base;
+	struct base_data *base;
+	base = c->base;
 	if (base)
 		base->child = NULL;
 	else
-		base_cache = NULL;
+		get_thread_data()->base_cache = NULL;
 	free_base_data(c);
 }
 
@@ -461,19 +522,24 @@ static void sha1_object(const void *data, unsigned long size,
 			enum object_type type, unsigned char *sha1)
 {
 	hash_sha1_file(data, size, typename(type), sha1);
+	read_lock();
 	if (has_sha1_file(sha1)) {
 		void *has_data;
 		enum object_type has_type;
 		unsigned long has_size;
 		has_data = read_sha1_file(sha1, &has_type, &has_size);
+		read_unlock();
 		if (!has_data)
 			die("cannot read existing object %s", sha1_to_hex(sha1));
 		if (size != has_size || type != has_type ||
 		    memcmp(data, has_data, size) != 0)
 			die("SHA1 COLLISION FOUND WITH %s !", sha1_to_hex(sha1));
 		free(has_data);
-	}
+	} else
+		read_unlock();
+
 	if (strict) {
+		read_lock();
 		if (type == OBJ_BLOB) {
 			struct blob *blob = lookup_blob(sha1);
 			if (blob)
@@ -507,6 +573,7 @@ static void sha1_object(const void *data, unsigned long size,
 			}
 			obj->flags |= FLAG_CHECKED;
 		}
+		read_unlock();
 	}
 }
 
@@ -552,7 +619,7 @@ static void *get_base_data(struct base_data *c)
 		if (!delta_nr) {
 			c->data = get_data_from_pack(obj);
 			c->size = obj->size;
-			base_cache_used += c->size;
+			get_thread_data()->base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		for (; delta_nr > 0; delta_nr--) {
@@ -568,7 +635,7 @@ static void *get_base_data(struct base_data *c)
 			free(raw);
 			if (!c->data)
 				bad_object(obj->idx.offset, "failed to apply delta");
-			base_cache_used += c->size;
+			get_thread_data()->base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		free(delta);
@@ -596,7 +663,7 @@ static void resolve_delta(struct object_entry *delta_obj,
 		bad_object(delta_obj->idx.offset, "failed to apply delta");
 	sha1_object(result->data, result->size, delta_obj->real_type,
 		    delta_obj->idx.sha1);
-	nr_resolved_deltas++;
+	get_thread_data()->nr_resolved_deltas++;
 }
 
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,
@@ -696,7 +763,35 @@ static void second_pass(struct object_entry *obj)
 	base_obj->obj = obj;
 	base_obj->data = NULL;
 	find_unresolved_deltas(base_obj);
-	display_progress(progress, nr_resolved_deltas);
+}
+
+static void *threaded_second_pass(void *arg)
+{
+	struct thread_local *data = get_thread_data();
+	for (;;) {
+		int i, nr = 16;
+		work_lock();
+		nr_resolved_deltas += data->nr_resolved_deltas;
+		display_progress(progress, nr_resolved_deltas);
+		data->nr_resolved_deltas = 0;
+		while (nr_processed < nr_objects &&
+		       is_delta_type(objects[nr_processed].type))
+			nr_processed++;
+		if (nr_processed >= nr_objects) {
+			work_unlock();
+			break;
+		}
+		i = nr_processed;
+		nr_processed += nr;
+		work_unlock();
+
+		for (; nr && i < nr_objects; i++, nr--) {
+			if (is_delta_type(objects[i].type))
+				continue;
+			second_pass(&objects[i]);
+		}
+	}
+	return NULL;
 }
 
 /* Parse all objects and return the pack content SHA1 hash */
@@ -755,13 +850,30 @@ static void parse_pack_objects(unsigned char *sha1)
 
 	if (verbose)
 		progress = start_progress("Resolving deltas", nr_deltas);
-	for (i = 0; i < nr_objects; i++) {
-		struct object_entry *obj = &objects[i];
 
-		if (is_delta_type(obj->type))
-			continue;
-		second_pass(obj);
+	nr_processed = 0;
+#ifndef NO_PTHREADS
+	if (nr_threads > 1) {
+		init_thread();
+		for (i = 1; i < nr_threads; i++) {
+			int ret = pthread_create(&thread_data[i].thread, NULL,
+						 threaded_second_pass, NULL);
+			if (ret)
+				die("unable to create thread: %s", strerror(ret));
+		}
+		for (i = 1; i < nr_threads; i++) {
+			pthread_join(thread_data[i].thread, NULL);
+			thread_data[i].thread = 0;
+		}
+		cleanup_thread();
+
+		/* stop get_thread_data() from looking up beyond the
+		   first item, when fix_unresolved_deltas() runs */
+		nr_threads = 1;
+		return;
 	}
+#endif
+	threaded_second_pass(thread_data);
 }
 
 static int write_compressed(struct sha1file *f, void *in, unsigned int size)
@@ -967,6 +1079,17 @@ static int git_index_pack_config(const char *k, const char *v, void *cb)
 			die("bad pack.indexversion=%"PRIu32, opts->version);
 		return 0;
 	}
+	if (!strcmp(k, "pack.threads")) {
+		nr_threads = git_config_int(k, v);
+		if (nr_threads < 0)
+			die("invalid number of threads specified (%d)",
+			    nr_threads);
+#ifdef NO_PTHREADS
+		if (nr_threads != 1)
+			warning("no threads support, ignoring %s", k);
+#endif
+		return 0;
+	}
 	return git_default_config(k, v, cb);
 }
 
@@ -1125,6 +1248,16 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				keep_msg = "";
 			} else if (!prefixcmp(arg, "--keep=")) {
 				keep_msg = arg + 7;
+			} else if (!prefixcmp(arg, "--threads=")) {
+				char *end;
+				nr_threads = strtoul(arg+10, &end, 0);
+				if (!arg[10] || *end || nr_threads < 0)
+					usage(index_pack_usage);
+#ifdef NO_PTHREADS
+				if (nr_threads != 1)
+					warning("no threads support, "
+						"ignoring %s", arg);
+#endif
 			} else if (!prefixcmp(arg, "--pack_header=")) {
 				struct pack_header *hdr;
 				char *c;
@@ -1196,6 +1329,23 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (strict)
 		opts.flags |= WRITE_IDX_STRICT;
 
+#ifndef NO_PTHREADS
+	if (!nr_threads) {
+		nr_threads = online_cpus();
+		/* An experiment showed that more threads does not mean faster */
+		if (nr_threads > 3)
+			nr_threads = 3;
+	}
+	/* reserve thread_data[0] for the main thread */
+	if (nr_threads > 1)
+		nr_threads++;
+#else
+	if (nr_threads != 1)
+		warning("no threads support, ignoring --threads");
+	nr_threads = 1;
+#endif
+	thread_data = xcalloc(nr_threads, sizeof(*thread_data));
+
 	curr_pack = open_pack_file(pack_name);
 	parse_pack_header();
 	objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));
-- 
1.7.3.1.256.g2539c.dirty

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] index-pack: support multithreaded delta resolving
  2012-02-28  4:36 ` [PATCH 2/2] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
@ 2012-03-02  6:09   ` Junio C Hamano
  2012-03-02 13:42     ` [PATCH v2 " Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2012-03-02  6:09 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

When applied to 25a7850 and then merged to 'pu', the result fails to
correctly produce pack .idx file.

I spent an hour or so this afternoon, scratching my head, staring at the
output from tests added to t5510 by today's tr/maint-bundle-boundary topic
to see where it breaks.  Its last test creates a bundle that has three
objects, extracts a pack from it, and runs "index-pack --fix-thin" on it.

This topic makes it fail with "fatal: pack has 1 unresolved deltas".

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] index-pack: support multithreaded delta resolving
  2012-03-02  6:09   ` Junio C Hamano
@ 2012-03-02 13:42     ` Nguyễn Thái Ngọc Duy
  2012-03-02 18:53       ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-02 13:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyễn Thái Ngọc Duy

This puts delta resolving on each base on a separate thread, one base
cache per thread.

An experiment on a 24 core machine with git.git shows that performance
does not increase proportional to the number of cores. So by default,
we use maximum 3 cores.

$ /usr/bin/time ~/t/git index-pack --threads=1 -v --stdin < XXX.pack
Receiving objects: 100% (146564/146564), 53.99 MiB | 17.47 MiB/s, done.
Resolving deltas: 100% (109205/109205), done.
pack    d5471e8365717a5812cbc81ec7277cb697a80f08
11.58user 0.37system 0:12.04elapsed 99%CPU (0avgtext+0avgdata 375088maxresident)k
0inputs+118592outputs (0major+56894minor)pagefaults 0swaps

$ ... --threads=2 ...
14.58user 0.47system 0:09.99elapsed 150%CPU (0avgtext+0avgdata 411536maxresident)k
0inputs+118592outputs (0major+79961minor)pagefaults 0swaps

$ ... --threads=3 ...
14.36user 0.64system 0:08.12elapsed 184%CPU (0avgtext+0avgdata 393312maxresident)k
0inputs+118592outputs (0major+50998minor)pagefaults 0swaps

$ ... --threads=4 ...
15.81user 0.71system 0:08.17elapsed 202%CPU (0avgtext+0avgdata 419152maxresident)k
0inputs+118592outputs (0major+54907minor)pagefaults 0swaps

$ ... --threads=5 ...
14.76user 0.72system 0:07.06elapsed 219%CPU (0avgtext+0avgdata 414112maxresident)k
0inputs+118592outputs (0major+59547minor)pagefaults 0swaps

$ ... --threads=8 ...
15.98user 0.81system 0:07.71elapsed 217%CPU (0avgtext+0avgdata 429904maxresident)k
0inputs+118592outputs (0major+66221minor)pagefaults 0swaps

$ ... --threads=12 ...
15.81user 0.74system 0:09.60elapsed 172%CPU (0avgtext+0avgdata 442336maxresident)k
0inputs+118592outputs (0major+61353minor)pagefaults 0swaps

$ ... --threads=16 ...
15.41user 0.57system 0:11.62elapsed 137%CPU (0avgtext+0avgdata 451728maxresident)k
0inputs+118592outputs (0major+63569minor)pagefaults 0swaps

$ ... --threads=24 ...
15.84user 0.63system 0:12.83elapsed 128%CPU (0avgtext+0avgdata 475728maxresident)k
0inputs+118592outputs (0major+58013minor)pagefaults 0swaps

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 2012/3/2 Junio C Hamano <gitster@pobox.com>:
 > When applied to 25a7850 and then merged to 'pu', the result fails to
 > correctly produce pack .idx file.
 >
 > I spent an hour or so this afternoon, scratching my head, staring at the
 > output from tests added to t5510 by today's tr/maint-bundle-boundary topic
 > to see where it breaks.  Its last test creates a bundle that has three
 > objects, extracts a pack from it, and runs "index-pack --fix-thin" on it.
 >
 > This topic makes it fail with "fatal: pack has 1 unresolved deltas".

 And it thought it was good enough to CC you. Apparently parallel
 programming is hard. I make two mistakes:

 1. I make each thread save resolved delta counter in
    thread_data[].nr_resolved_deltas, then accumulate all of them to
    global nr_resolved_deltas later. This plan does not work with
    fix_unresolved_deltas() because it runs in single thread mode. It
    stores the counter in thread_data[0], but the counter is never
    added back to global nr_resolved_deltas. This makes t5510 fail.

 2. The reason I put nr_resolved_deltas to thread-local struct is to
    avoid locking. But I'm wrong. I still have two places where
    thread_data[].nr_resolved_deltas can be changed: increment in
    resolve_delta() and and reset in threaded_second_pass().

 So locking is required for changing nr_resolved_deltas anyway, I have
 removed thread_data[].nr_resolved_deltas and do the locking on global
 nr_resolved_deltas properly. "pu" seems to be happy with the updated
 series.

 One other thing. I did not consider to run fix_unresolved_deltas() in
 parallel originally because I didn't think it could be done. It can.
 But I'm not sure it's worth the effort. Anyway we can do that later
 if it turns out worth it.

 Documentation/git-index-pack.txt |   10 ++
 Makefile                         |    2 +-
 builtin/index-pack.c             |  214 ++++++++++++++++++++++++++++++++++----
 3 files changed, 206 insertions(+), 20 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 909687f..39e6d0d 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -74,6 +74,16 @@ OPTIONS
 --strict::
 	Die, if the pack contains broken objects or links.
 
+--threads=<n>::
+	Specifies the number of threads to spawn when resolving
+	deltas. This requires that index-pack be compiled with
+	pthreads otherwise this option is ignored with a warning.
+	This is meant to reduce packing time on multiprocessor
+	machines. The required amount of memory for the delta search
+	window is however multiplied by the number of threads.
+	Specifying 0 will cause git to auto-detect the number of CPU's
+	and use maximum 3 threads.
+
 
 Note
 ----
diff --git a/Makefile b/Makefile
index 1fb1705..5fae875 100644
--- a/Makefile
+++ b/Makefile
@@ -2159,7 +2159,7 @@ builtin/branch.o builtin/checkout.o builtin/clone.o builtin/reset.o branch.o tra
 builtin/bundle.o bundle.o transport.o: bundle.h
 builtin/bisect--helper.o builtin/rev-list.o bisect.o: bisect.h
 builtin/clone.o builtin/fetch-pack.o transport.o: fetch-pack.h
-builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
+builtin/index-pack.o builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
 builtin/send-pack.o transport.o: send-pack.h
 builtin/log.o builtin/shortlog.o: shortlog.h
 builtin/prune.o builtin/reflog.o reachable.o: reachable.h
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 918684f..edd7cbd 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -9,6 +9,7 @@
 #include "progress.h"
 #include "fsck.h"
 #include "exec_cmd.h"
+#include "thread-utils.h"
 
 static const char index_pack_usage[] =
 "git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
@@ -39,6 +40,19 @@ struct base_data {
 };
 
 /*
+ * Thread-local data for find_unresolved_deltas(). The main process
+ * also takes thread_data[0] to run find_unresolved_deltas() as part
+ * of fix_unresolved_deltas().
+ */
+struct thread_local {
+#ifndef NO_PTHREADS
+	pthread_t thread;
+#endif
+	struct base_data *base_cache;
+	size_t base_cache_used;
+};
+
+/*
  * Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want
  * to memcmp() only the first 20 bytes.
  */
@@ -54,11 +68,12 @@ struct delta_entry {
 
 static struct object_entry *objects;
 static struct delta_entry *deltas;
-static struct base_data *base_cache;
-static size_t base_cache_used;
+static struct thread_local *thread_data;
 static int nr_objects;
+static int nr_processed;
 static int nr_deltas;
 static int nr_resolved_deltas;
+static int nr_threads;
 
 static int from_stdin;
 static int strict;
@@ -75,6 +90,49 @@ static git_SHA_CTX input_ctx;
 static uint32_t input_crc32;
 static int input_fd, output_fd, pack_fd;
 
+#ifndef NO_PTHREADS
+
+static pthread_mutex_t read_mutex;
+#define read_lock()		pthread_mutex_lock(&read_mutex)
+#define read_unlock()		pthread_mutex_unlock(&read_mutex)
+
+static pthread_mutex_t counter_mutex;
+#define counter_lock()		pthread_mutex_lock(&counter_mutex)
+#define counter_unlock()		pthread_mutex_unlock(&counter_mutex)
+
+static pthread_mutex_t work_mutex;
+#define work_lock()		pthread_mutex_lock(&work_mutex)
+#define work_unlock()		pthread_mutex_unlock(&work_mutex)
+
+/*
+ * Mutex and conditional variable can't be statically-initialized on Windows.
+ */
+static void init_thread(void)
+{
+	init_recursive_mutex(&read_mutex);
+	pthread_mutex_init(&work_mutex, NULL);
+}
+
+static void cleanup_thread(void)
+{
+	pthread_mutex_destroy(&read_mutex);
+	pthread_mutex_destroy(&work_mutex);
+}
+
+#else
+
+#define read_lock()
+#define read_unlock()
+
+#define counter_lock()
+#define counter_unlock()
+
+#define work_lock()
+#define work_unlock()
+
+#endif
+
+
 static int mark_link(struct object *obj, int type, void *data)
 {
 	if (!obj)
@@ -223,6 +281,36 @@ static NORETURN void bad_object(unsigned long offset, const char *format, ...)
 	die("pack has bad object at offset %lu: %s", offset, buf);
 }
 
+static struct thread_local *get_thread_data(void)
+{
+#ifndef NO_PTHREADS
+	int i;
+	pthread_t self = pthread_self();
+	for (i = 1; i < nr_threads; i++)
+		if (self == thread_data[i].thread)
+			return &thread_data[i];
+#endif
+	return &thread_data[0];
+}
+
+static void resolve_one_delta(void)
+{
+#ifndef NO_PTHREADS
+	int i;
+	pthread_t self = pthread_self();
+	for (i = 1; i < nr_threads; i++)
+		if (self == thread_data[i].thread) {
+			counter_lock();
+			nr_resolved_deltas++;
+			counter_unlock();
+			return;
+		}
+#endif
+	assert(nr_threads == 1 &&
+	       "This should only be reached when all threads are gone");
+	nr_resolved_deltas++;
+}
+
 static struct base_data *alloc_base_data(void)
 {
 	struct base_data *base = xmalloc(sizeof(struct base_data));
@@ -237,15 +325,16 @@ static void free_base_data(struct base_data *c)
 	if (c->data) {
 		free(c->data);
 		c->data = NULL;
-		base_cache_used -= c->size;
+		get_thread_data()->base_cache_used -= c->size;
 	}
 }
 
 static void prune_base_data(struct base_data *retain)
 {
 	struct base_data *b;
-	for (b = base_cache;
-	     base_cache_used > delta_base_cache_limit && b;
+	struct thread_local *data = get_thread_data();
+	for (b = data->base_cache;
+	     data->base_cache_used > delta_base_cache_limit && b;
 	     b = b->child) {
 		if (b->data && b != retain)
 			free_base_data(b);
@@ -257,22 +346,23 @@ static void link_base_data(struct base_data *base, struct base_data *c)
 	if (base)
 		base->child = c;
 	else
-		base_cache = c;
+		get_thread_data()->base_cache = c;
 
 	c->base = base;
 	c->child = NULL;
 	if (c->data)
-		base_cache_used += c->size;
+		get_thread_data()->base_cache_used += c->size;
 	prune_base_data(c);
 }
 
 static void unlink_base_data(struct base_data *c)
 {
-	struct base_data *base = c->base;
+	struct base_data *base;
+	base = c->base;
 	if (base)
 		base->child = NULL;
 	else
-		base_cache = NULL;
+		get_thread_data()->base_cache = NULL;
 	free_base_data(c);
 }
 
@@ -461,19 +551,24 @@ static void sha1_object(const void *data, unsigned long size,
 			enum object_type type, unsigned char *sha1)
 {
 	hash_sha1_file(data, size, typename(type), sha1);
+	read_lock();
 	if (has_sha1_file(sha1)) {
 		void *has_data;
 		enum object_type has_type;
 		unsigned long has_size;
 		has_data = read_sha1_file(sha1, &has_type, &has_size);
+		read_unlock();
 		if (!has_data)
 			die("cannot read existing object %s", sha1_to_hex(sha1));
 		if (size != has_size || type != has_type ||
 		    memcmp(data, has_data, size) != 0)
 			die("SHA1 COLLISION FOUND WITH %s !", sha1_to_hex(sha1));
 		free(has_data);
-	}
+	} else
+		read_unlock();
+
 	if (strict) {
+		read_lock();
 		if (type == OBJ_BLOB) {
 			struct blob *blob = lookup_blob(sha1);
 			if (blob)
@@ -507,6 +602,7 @@ static void sha1_object(const void *data, unsigned long size,
 			}
 			obj->flags |= FLAG_CHECKED;
 		}
+		read_unlock();
 	}
 }
 
@@ -552,7 +648,7 @@ static void *get_base_data(struct base_data *c)
 		if (!delta_nr) {
 			c->data = get_data_from_pack(obj);
 			c->size = obj->size;
-			base_cache_used += c->size;
+			get_thread_data()->base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		for (; delta_nr > 0; delta_nr--) {
@@ -568,7 +664,7 @@ static void *get_base_data(struct base_data *c)
 			free(raw);
 			if (!c->data)
 				bad_object(obj->idx.offset, "failed to apply delta");
-			base_cache_used += c->size;
+			get_thread_data()->base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		free(delta);
@@ -596,7 +692,7 @@ static void resolve_delta(struct object_entry *delta_obj,
 		bad_object(delta_obj->idx.offset, "failed to apply delta");
 	sha1_object(result->data, result->size, delta_obj->real_type,
 		    delta_obj->idx.sha1);
-	nr_resolved_deltas++;
+	resolve_one_delta();
 }
 
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,
@@ -696,7 +792,32 @@ static void second_pass(struct object_entry *obj)
 	base_obj->obj = obj;
 	base_obj->data = NULL;
 	find_unresolved_deltas(base_obj);
-	display_progress(progress, nr_resolved_deltas);
+}
+
+static void *threaded_second_pass(void *arg)
+{
+	for (;;) {
+		int i, nr = 16;
+		work_lock();
+		display_progress(progress, nr_resolved_deltas);
+		while (nr_processed < nr_objects &&
+		       is_delta_type(objects[nr_processed].type))
+			nr_processed++;
+		if (nr_processed >= nr_objects) {
+			work_unlock();
+			break;
+		}
+		i = nr_processed;
+		nr_processed += nr;
+		work_unlock();
+
+		for (; nr && i < nr_objects; i++, nr--) {
+			if (is_delta_type(objects[i].type))
+				continue;
+			second_pass(&objects[i]);
+		}
+	}
+	return NULL;
 }
 
 /* Parse all objects and return the pack content SHA1 hash */
@@ -755,13 +876,30 @@ static void parse_pack_objects(unsigned char *sha1)
 
 	if (verbose)
 		progress = start_progress("Resolving deltas", nr_deltas);
-	for (i = 0; i < nr_objects; i++) {
-		struct object_entry *obj = &objects[i];
 
-		if (is_delta_type(obj->type))
-			continue;
-		second_pass(obj);
+	nr_processed = 0;
+#ifndef NO_PTHREADS
+	if (nr_threads > 1) {
+		init_thread();
+		for (i = 1; i < nr_threads; i++) {
+			int ret = pthread_create(&thread_data[i].thread, NULL,
+						 threaded_second_pass, NULL);
+			if (ret)
+				die("unable to create thread: %s", strerror(ret));
+		}
+		for (i = 1; i < nr_threads; i++) {
+			pthread_join(thread_data[i].thread, NULL);
+			thread_data[i].thread = 0;
+		}
+		cleanup_thread();
+
+		/* stop get_thread_data() from looking up beyond the
+		   first item, when fix_unresolved_deltas() runs */
+		nr_threads = 1;
+		return;
 	}
+#endif
+	threaded_second_pass(thread_data);
 }
 
 static int write_compressed(struct sha1file *f, void *in, unsigned int size)
@@ -967,6 +1105,17 @@ static int git_index_pack_config(const char *k, const char *v, void *cb)
 			die("bad pack.indexversion=%"PRIu32, opts->version);
 		return 0;
 	}
+	if (!strcmp(k, "pack.threads")) {
+		nr_threads = git_config_int(k, v);
+		if (nr_threads < 0)
+			die("invalid number of threads specified (%d)",
+			    nr_threads);
+#ifdef NO_PTHREADS
+		if (nr_threads != 1)
+			warning("no threads support, ignoring %s", k);
+#endif
+		return 0;
+	}
 	return git_default_config(k, v, cb);
 }
 
@@ -1125,6 +1274,16 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				keep_msg = "";
 			} else if (!prefixcmp(arg, "--keep=")) {
 				keep_msg = arg + 7;
+			} else if (!prefixcmp(arg, "--threads=")) {
+				char *end;
+				nr_threads = strtoul(arg+10, &end, 0);
+				if (!arg[10] || *end || nr_threads < 0)
+					usage(index_pack_usage);
+#ifdef NO_PTHREADS
+				if (nr_threads != 1)
+					warning("no threads support, "
+						"ignoring %s", arg);
+#endif
 			} else if (!prefixcmp(arg, "--pack_header=")) {
 				struct pack_header *hdr;
 				char *c;
@@ -1196,6 +1355,23 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (strict)
 		opts.flags |= WRITE_IDX_STRICT;
 
+#ifndef NO_PTHREADS
+	if (!nr_threads) {
+		nr_threads = online_cpus();
+		/* An experiment showed that more threads does not mean faster */
+		if (nr_threads > 3)
+			nr_threads = 3;
+	}
+	/* reserve thread_data[0] for the main thread */
+	if (nr_threads > 1)
+		nr_threads++;
+#else
+	if (nr_threads != 1)
+		warning("no threads support, ignoring --threads");
+	nr_threads = 1;
+#endif
+	thread_data = xcalloc(nr_threads, sizeof(*thread_data));
+
 	curr_pack = open_pack_file(pack_name);
 	parse_pack_header();
 	objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));
-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] index-pack: support multithreaded delta resolving
  2012-03-02 13:42     ` [PATCH v2 " Nguyễn Thái Ngọc Duy
@ 2012-03-02 18:53       ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2012-03-02 18:53 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

>  One other thing. I did not consider to run fix_unresolved_deltas() in
>  parallel originally because I didn't think it could be done. It can.
>  But I'm not sure it's worth the effort. Anyway we can do that later
>  if it turns out worth it.

My hunch agrees with your "not sure it's worth", as these are objects at
the boundary of --thin transfer, which should only be proportional to the
size of a single snapshot, not the depth of history, and I think your "we
can do that later" is a sound judgement.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] index-pack: support multithreaded delta resolving
  2012-03-12  2:32 [PATCH v2 0/2] Multithread index-pack Nguyễn Thái Ngọc Duy
@ 2012-03-12  2:32 ` Nguyễn Thái Ngọc Duy
  2012-03-12 10:57   ` Thomas Rast
  2012-03-13  0:32   ` Ramsay Jones
  0 siblings, 2 replies; 12+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-12  2:32 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ramsay Jones,
	Nguyễn Thái Ngọc Duy

This puts delta resolving on each base on a separate thread, one base
cache per thread. Per-thread data is grouped in struct thread_local.
When running with nr_threads == 1, no pthreads calls are made. The
system essentially runs in non-thread mode.

An experiment on a Xeon 24 core machine with linux-2.6.git shows that
performance does not increase proportional to the number of cores. So
by default, we use maximum 3 cores. Some numbers with --threads from 1
to 16:

1..4
real    1m16.310s  0m48.183s  0m37.866s  0m32.834s
user    1m13.773s  1m15.537s  1m15.781s  1m16.233s
sys     0m2.480s   0m3.936s   0m4.448s   0m4.852s

5..8
real    0m33.170s  0m30.369s  0m28.406s  0m26.968s
user    1m31.474s  1m30.322s  1m29.562s  1m28.694s
sys     0m6.096s   0m6.268s   0m6.684s   0m7.172s

9..12
real    0m26.288s  0m26.207s  0m26.239s  0m24.945s
user    1m29.530s  1m36.146s  1m43.134s  1m34.182s
sys     0m8.129s   0m8.437s   0m9.697s   0m10.201s

13..16
real    0m25.110s  0m25.043s  0m23.955s  0m25.746s
user    1m39.262s  1m43.598s  1m38.350s  1m59.775s
sys     0m10.997s  0m11.553s  0m11.949s  0m13.689s

Thanks to Ramsay Jones for troubleshooting on MinGW platform.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 I changed Ramsay's mutex patch a little bit and incorporate it here.
 Ramsay, it'd be great if you could try it again on MinGW

 Documentation/git-index-pack.txt |   10 ++
 Makefile                         |    2 +-
 builtin/index-pack.c             |  198 ++++++++++++++++++++++++++++++++++----
 3 files changed, 192 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 909687f..39e6d0d 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -74,6 +74,16 @@ OPTIONS
 --strict::
 	Die, if the pack contains broken objects or links.
 
+--threads=<n>::
+	Specifies the number of threads to spawn when resolving
+	deltas. This requires that index-pack be compiled with
+	pthreads otherwise this option is ignored with a warning.
+	This is meant to reduce packing time on multiprocessor
+	machines. The required amount of memory for the delta search
+	window is however multiplied by the number of threads.
+	Specifying 0 will cause git to auto-detect the number of CPU's
+	and use maximum 3 threads.
+
 
 Note
 ----
diff --git a/Makefile b/Makefile
index 1fb1705..5fae875 100644
--- a/Makefile
+++ b/Makefile
@@ -2159,7 +2159,7 @@ builtin/branch.o builtin/checkout.o builtin/clone.o builtin/reset.o branch.o tra
 builtin/bundle.o bundle.o transport.o: bundle.h
 builtin/bisect--helper.o builtin/rev-list.o bisect.o: bisect.h
 builtin/clone.o builtin/fetch-pack.o transport.o: fetch-pack.h
-builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
+builtin/index-pack.o builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
 builtin/send-pack.o transport.o: send-pack.h
 builtin/log.o builtin/shortlog.o: shortlog.h
 builtin/prune.o builtin/reflog.o reachable.o: reachable.h
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 918684f..c6712cb 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -9,6 +9,7 @@
 #include "progress.h"
 #include "fsck.h"
 #include "exec_cmd.h"
+#include "thread-utils.h"
 
 static const char index_pack_usage[] =
 "git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
@@ -38,6 +39,14 @@ struct base_data {
 	int ofs_first, ofs_last;
 };
 
+struct thread_local {
+#ifndef NO_PTHREADS
+	pthread_t thread;
+#endif
+	struct base_data *base_cache;
+	size_t base_cache_used;
+};
+
 /*
  * Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want
  * to memcmp() only the first 20 bytes.
@@ -54,11 +63,12 @@ struct delta_entry {
 
 static struct object_entry *objects;
 static struct delta_entry *deltas;
-static struct base_data *base_cache;
-static size_t base_cache_used;
+static struct thread_local nothread_data;
 static int nr_objects;
+static int nr_processed;
 static int nr_deltas;
 static int nr_resolved_deltas;
+static int nr_threads;
 
 static int from_stdin;
 static int strict;
@@ -75,6 +85,72 @@ static git_SHA_CTX input_ctx;
 static uint32_t input_crc32;
 static int input_fd, output_fd, pack_fd;
 
+#ifndef NO_PTHREADS
+
+static struct thread_local *thread_data;
+
+static pthread_mutex_t read_mutex;
+#define read_lock()		lock_mutex(&read_mutex)
+#define read_unlock()		unlock_mutex(&read_mutex)
+
+static pthread_mutex_t counter_mutex;
+#define counter_lock()		lock_mutex(&counter_mutex)
+#define counter_unlock()	unlock_mutex(&counter_mutex)
+
+static pthread_mutex_t work_mutex;
+#define work_lock()		lock_mutex(&work_mutex)
+#define work_unlock()		unlock_mutex(&work_mutex)
+
+static pthread_key_t key;
+
+static inline void lock_mutex(pthread_mutex_t *mutex)
+{
+	if (nr_threads > 1)
+		pthread_mutex_lock(mutex);
+}
+
+static inline void unlock_mutex(pthread_mutex_t *mutex)
+{
+	if (nr_threads > 1)
+		pthread_mutex_unlock(mutex);
+}
+
+/*
+ * Mutex and conditional variable can't be statically-initialized on Windows.
+ */
+static void init_thread(void)
+{
+	init_recursive_mutex(&read_mutex);
+	pthread_mutex_init(&counter_mutex, NULL);
+	pthread_mutex_init(&work_mutex, NULL);
+	pthread_key_create(&key, NULL);
+	thread_data = xcalloc(nr_threads, sizeof(*thread_data));
+}
+
+static void cleanup_thread(void)
+{
+	pthread_mutex_destroy(&read_mutex);
+	pthread_mutex_destroy(&counter_mutex);
+	pthread_mutex_destroy(&work_mutex);
+	pthread_key_delete(key);
+	nr_threads = 1;
+	free(thread_data);
+}
+
+#else
+
+#define read_lock()
+#define read_unlock()
+
+#define counter_lock()
+#define counter_unlock()
+
+#define work_lock()
+#define work_unlock()
+
+#endif
+
+
 static int mark_link(struct object *obj, int type, void *data)
 {
 	if (!obj)
@@ -223,6 +299,17 @@ static NORETURN void bad_object(unsigned long offset, const char *format, ...)
 	die("pack has bad object at offset %lu: %s", offset, buf);
 }
 
+static struct thread_local *get_thread_data()
+{
+#ifndef NO_PTHREADS
+	if (nr_threads > 1)
+		return pthread_getspecific(key);
+#endif
+	assert(nr_threads == 1 &&
+	       "This should only be reached when all threads are gone");
+	return &nothread_data;
+}
+
 static struct base_data *alloc_base_data(void)
 {
 	struct base_data *base = xmalloc(sizeof(struct base_data));
@@ -237,15 +324,16 @@ static void free_base_data(struct base_data *c)
 	if (c->data) {
 		free(c->data);
 		c->data = NULL;
-		base_cache_used -= c->size;
+		get_thread_data()->base_cache_used -= c->size;
 	}
 }
 
 static void prune_base_data(struct base_data *retain)
 {
 	struct base_data *b;
-	for (b = base_cache;
-	     base_cache_used > delta_base_cache_limit && b;
+	struct thread_local *data = get_thread_data();
+	for (b = data->base_cache;
+	     data->base_cache_used > delta_base_cache_limit && b;
 	     b = b->child) {
 		if (b->data && b != retain)
 			free_base_data(b);
@@ -257,12 +345,12 @@ static void link_base_data(struct base_data *base, struct base_data *c)
 	if (base)
 		base->child = c;
 	else
-		base_cache = c;
+		get_thread_data()->base_cache = c;
 
 	c->base = base;
 	c->child = NULL;
 	if (c->data)
-		base_cache_used += c->size;
+		get_thread_data()->base_cache_used += c->size;
 	prune_base_data(c);
 }
 
@@ -272,7 +360,7 @@ static void unlink_base_data(struct base_data *c)
 	if (base)
 		base->child = NULL;
 	else
-		base_cache = NULL;
+		get_thread_data()->base_cache = NULL;
 	free_base_data(c);
 }
 
@@ -461,19 +549,24 @@ static void sha1_object(const void *data, unsigned long size,
 			enum object_type type, unsigned char *sha1)
 {
 	hash_sha1_file(data, size, typename(type), sha1);
+	read_lock();
 	if (has_sha1_file(sha1)) {
 		void *has_data;
 		enum object_type has_type;
 		unsigned long has_size;
 		has_data = read_sha1_file(sha1, &has_type, &has_size);
+		read_unlock();
 		if (!has_data)
 			die("cannot read existing object %s", sha1_to_hex(sha1));
 		if (size != has_size || type != has_type ||
 		    memcmp(data, has_data, size) != 0)
 			die("SHA1 COLLISION FOUND WITH %s !", sha1_to_hex(sha1));
 		free(has_data);
-	}
+	} else
+		read_unlock();
+
 	if (strict) {
+		read_lock();
 		if (type == OBJ_BLOB) {
 			struct blob *blob = lookup_blob(sha1);
 			if (blob)
@@ -507,6 +600,7 @@ static void sha1_object(const void *data, unsigned long size,
 			}
 			obj->flags |= FLAG_CHECKED;
 		}
+		read_unlock();
 	}
 }
 
@@ -552,7 +646,7 @@ static void *get_base_data(struct base_data *c)
 		if (!delta_nr) {
 			c->data = get_data_from_pack(obj);
 			c->size = obj->size;
-			base_cache_used += c->size;
+			get_thread_data()->base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		for (; delta_nr > 0; delta_nr--) {
@@ -568,7 +662,7 @@ static void *get_base_data(struct base_data *c)
 			free(raw);
 			if (!c->data)
 				bad_object(obj->idx.offset, "failed to apply delta");
-			base_cache_used += c->size;
+			get_thread_data()->base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		free(delta);
@@ -596,7 +690,9 @@ static void resolve_delta(struct object_entry *delta_obj,
 		bad_object(delta_obj->idx.offset, "failed to apply delta");
 	sha1_object(result->data, result->size, delta_obj->real_type,
 		    delta_obj->idx.sha1);
+	counter_lock();
 	nr_resolved_deltas++;
+	counter_unlock();
 }
 
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,
@@ -696,7 +792,31 @@ static void second_pass(struct object_entry *obj)
 	base_obj->obj = obj;
 	base_obj->data = NULL;
 	find_unresolved_deltas(base_obj);
-	display_progress(progress, nr_resolved_deltas);
+}
+
+static void *threaded_second_pass(void *arg)
+{
+#ifndef NO_PTHREADS
+	if (nr_threads > 1)
+		pthread_setspecific(key, arg);
+#endif
+	for (;;) {
+		int i;
+		work_lock();
+		display_progress(progress, nr_resolved_deltas);
+		while (nr_processed < nr_objects &&
+		       is_delta_type(objects[nr_processed].type))
+			nr_processed++;
+		if (nr_processed >= nr_objects) {
+			work_unlock();
+			break;
+		}
+		i = nr_processed++;
+		work_unlock();
+
+		second_pass(&objects[i]);
+	}
+	return NULL;
 }
 
 /* Parse all objects and return the pack content SHA1 hash */
@@ -755,13 +875,25 @@ static void parse_pack_objects(unsigned char *sha1)
 
 	if (verbose)
 		progress = start_progress("Resolving deltas", nr_deltas);
-	for (i = 0; i < nr_objects; i++) {
-		struct object_entry *obj = &objects[i];
 
-		if (is_delta_type(obj->type))
-			continue;
-		second_pass(obj);
+	nr_processed = 0;
+#ifndef NO_PTHREADS
+	if (nr_threads > 1) {
+		init_thread();
+		for (i = 0; i < nr_threads; i++) {
+			int ret = pthread_create(&thread_data[i].thread, NULL,
+						 threaded_second_pass, thread_data + i);
+			if (ret)
+				die("unable to create thread: %s", strerror(ret));
+		}
+		for (i = 0; i < nr_threads; i++)
+			pthread_join(thread_data[i].thread, NULL);
+
+		cleanup_thread();
+		return;
 	}
+#endif
+	threaded_second_pass(NULL);
 }
 
 static int write_compressed(struct sha1file *f, void *in, unsigned int size)
@@ -967,6 +1099,18 @@ static int git_index_pack_config(const char *k, const char *v, void *cb)
 			die("bad pack.indexversion=%"PRIu32, opts->version);
 		return 0;
 	}
+	if (!strcmp(k, "pack.threads")) {
+		nr_threads = git_config_int(k, v);
+		if (nr_threads < 0)
+			die("invalid number of threads specified (%d)",
+			    nr_threads);
+#ifdef NO_PTHREADS
+		if (nr_threads != 1)
+			warning("no threads support, ignoring %s", k);
+		nr_threads = 1;
+#endif
+		return 0;
+	}
 	return git_default_config(k, v, cb);
 }
 
@@ -1125,6 +1269,17 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				keep_msg = "";
 			} else if (!prefixcmp(arg, "--keep=")) {
 				keep_msg = arg + 7;
+			} else if (!prefixcmp(arg, "--threads=")) {
+				char *end;
+				nr_threads = strtoul(arg+10, &end, 0);
+				if (!arg[10] || *end || nr_threads < 0)
+					usage(index_pack_usage);
+#ifdef NO_PTHREADS
+				if (nr_threads != 1)
+					warning("no threads support, "
+						"ignoring %s", arg);
+				nr_threads = 1;
+#endif
 			} else if (!prefixcmp(arg, "--pack_header=")) {
 				struct pack_header *hdr;
 				char *c;
@@ -1196,6 +1351,15 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (strict)
 		opts.flags |= WRITE_IDX_STRICT;
 
+#ifndef NO_PTHREADS
+	if (!nr_threads) {
+		nr_threads = online_cpus();
+		/* An experiment showed that more threads does not mean faster */
+		if (nr_threads > 3)
+			nr_threads = 3;
+	}
+#endif
+
 	curr_pack = open_pack_file(pack_name);
 	parse_pack_header();
 	objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));
-- 
1.7.3.1.256.g2539c.dirty

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] index-pack: support multithreaded delta resolving
  2012-03-12  2:32 ` [PATCH v2 2/2] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
@ 2012-03-12 10:57   ` Thomas Rast
  2012-03-12 11:42     ` Nguyen Thai Ngoc Duy
  2012-03-13  0:32   ` Ramsay Jones
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Rast @ 2012-03-12 10:57 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Junio C Hamano, Ramsay Jones

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> This puts delta resolving on each base on a separate thread, one base
> cache per thread. Per-thread data is grouped in struct thread_local.
> When running with nr_threads == 1, no pthreads calls are made. The
> system essentially runs in non-thread mode.

As discussed when we took the git-grep measurements, it may be
interesting to have a way to run 1 thread.  Can you put in such an
option?

> An experiment on a Xeon 24 core machine with linux-2.6.git shows that
> performance does not increase proportional to the number of cores. So
> by default, we use maximum 3 cores. Some numbers with --threads from 1
> to 16:
>
> 1..4
> real    1m16.310s  0m48.183s  0m37.866s  0m32.834s
> user    1m13.773s  1m15.537s  1m15.781s  1m16.233s
> sys     0m2.480s   0m3.936s   0m4.448s   0m4.852s
>
> 5..8
> real    0m33.170s  0m30.369s  0m28.406s  0m26.968s
> user    1m31.474s  1m30.322s  1m29.562s  1m28.694s
> sys     0m6.096s   0m6.268s   0m6.684s   0m7.172s

Interesting.  Is this a real 24-core machine or 12*2 hyperthreaded?
Does it use Turbo Boost and how far (how fast and on how many cores
simultaneously) does that go?

I'm asking because if Turbo Boost starts to wear off around 4 cores,
like these measurements suggest, then it may not be beneficial to spawn
threads on 2*2HT CPUs (found in many laptops) where Turbo Boost only
really works if you only use a single core.

Oh, and could you write a perf test for this? :-)

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] index-pack: support multithreaded delta resolving
  2012-03-12 10:57   ` Thomas Rast
@ 2012-03-12 11:42     ` Nguyen Thai Ngoc Duy
  2012-03-12 11:47       ` Thomas Rast
  0 siblings, 1 reply; 12+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-03-12 11:42 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Junio C Hamano, Ramsay Jones

2012/3/12 Thomas Rast <trast@inf.ethz.ch>:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> This puts delta resolving on each base on a separate thread, one base
>> cache per thread. Per-thread data is grouped in struct thread_local.
>> When running with nr_threads == 1, no pthreads calls are made. The
>> system essentially runs in non-thread mode.
>
> As discussed when we took the git-grep measurements, it may be
> interesting to have a way to run 1 thread.  Can you put in such an
> option?

Sorry I wasn't clear, nr_threads == 1 is equivalent to --threads=1. So
yes it supports running in non-thread mode.

>> An experiment on a Xeon 24 core machine with linux-2.6.git shows that
>> performance does not increase proportional to the number of cores. So
>> by default, we use maximum 3 cores. Some numbers with --threads from 1
>> to 16:
>>
>> 1..4
>> real    1m16.310s  0m48.183s  0m37.866s  0m32.834s
>> user    1m13.773s  1m15.537s  1m15.781s  1m16.233s
>> sys     0m2.480s   0m3.936s   0m4.448s   0m4.852s
>>
>> 5..8
>> real    0m33.170s  0m30.369s  0m28.406s  0m26.968s
>> user    1m31.474s  1m30.322s  1m29.562s  1m28.694s
>> sys     0m6.096s   0m6.268s   0m6.684s   0m7.172s
>
> Interesting.  Is this a real 24-core machine or 12*2 hyperthreaded?
> Does it use Turbo Boost and how far (how fast and on how many cores
> simultaneously) does that go?

I'll check on that later.

> I'm asking because if Turbo Boost starts to wear off around 4 cores,
> like these measurements suggest, then it may not be beneficial to spawn
> threads on 2*2HT CPUs (found in many laptops) where Turbo Boost only
> really works if you only use a single core.

That might explain why it performs poorly on my two (probably HT)
cores laptop after 4 threads. I was worried there was some contention
in the code (and failed to find one) that made it perform worse as
more threads were spawn. Any pointers for identifying cpu features in
linux?

> Oh, and could you write a perf test for this? :-)

Yeah, about that, index-pack is mostly used as part of git-fetch or
git-clone. Maybe we need to add --threads to those commands too, then
we can see how clone/fetch performs. I'll need such tests anyway if
I'm going to push for cheaper connectivity check in git-fetch in
another thread.

I guess one test with --threads=1, one with threads=2 and one without
--threads. Any ideas? We can try testing it on half available cores,
all cores, double available cores, but that would require exporting
online_cpus(), perhaps via test command. I didn't see grep --threads
perf test either (wanted to use it as template..)
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] index-pack: support multithreaded delta resolving
  2012-03-12 11:42     ` Nguyen Thai Ngoc Duy
@ 2012-03-12 11:47       ` Thomas Rast
  2012-03-12 12:18         ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Rast @ 2012-03-12 11:47 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, Junio C Hamano, Ramsay Jones

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> 2012/3/12 Thomas Rast <trast@inf.ethz.ch>:
>> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>>
>>> This puts delta resolving on each base on a separate thread, one base
>>> cache per thread. Per-thread data is grouped in struct thread_local.
>>> When running with nr_threads == 1, no pthreads calls are made. The
>>> system essentially runs in non-thread mode.
>>
>> As discussed when we took the git-grep measurements, it may be
>> interesting to have a way to run 1 thread.  Can you put in such an
>> option?
>
> Sorry I wasn't clear, nr_threads == 1 is equivalent to --threads=1. So
> yes it supports running in non-thread mode.

Well, in that case I wasn't clear: I meant that there should be a way to
run with the whole threading machinery enabled, but still only have one
thread (doing the work, possibly having another that fills the queue).

That allows us to see how big the overhead is.

>> Oh, and could you write a perf test for this? :-)
>
> Yeah, about that, index-pack is mostly used as part of git-fetch or
> git-clone. Maybe we need to add --threads to those commands too, then
> we can see how clone/fetch performs. I'll need such tests anyway if
> I'm going to push for cheaper connectivity check in git-fetch in
> another thread.
>
> I guess one test with --threads=1, one with threads=2 and one without
> --threads. Any ideas? We can try testing it on half available cores,
> all cores, double available cores, but that would require exporting
> online_cpus(), perhaps via test command. I didn't see grep --threads
> perf test either (wanted to use it as template..)

A simple one is in t/perf/p7810-grep.sh (in master already).  It doesn't
test threads though.  For index-pack you'll also have to find a good way
to choose a pack, perhaps simply the biggest one in the repo.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] index-pack: support multithreaded delta resolving
  2012-03-12 11:47       ` Thomas Rast
@ 2012-03-12 12:18         ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 12+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-03-12 12:18 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Junio C Hamano, Ramsay Jones

On Mon, Mar 12, 2012 at 6:47 PM, Thomas Rast <trast@inf.ethz.ch> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>> 2012/3/12 Thomas Rast <trast@inf.ethz.ch>:
>>> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>>>
>>>> This puts delta resolving on each base on a separate thread, one base
>>>> cache per thread. Per-thread data is grouped in struct thread_local.
>>>> When running with nr_threads == 1, no pthreads calls are made. The
>>>> system essentially runs in non-thread mode.
>>>
>>> As discussed when we took the git-grep measurements, it may be
>>> interesting to have a way to run 1 thread.  Can you put in such an
>>> option?
>>
>> Sorry I wasn't clear, nr_threads == 1 is equivalent to --threads=1. So
>> yes it supports running in non-thread mode.
>
> Well, in that case I wasn't clear: I meant that there should be a way to
> run with the whole threading machinery enabled, but still only have one
> thread (doing the work, possibly having another that fills the queue).
>
> That allows us to see how big the overhead is.

I really don't want to add overhead, no matter how small it is, to
--threads=1. How's GIT_USE_THREADS variable for testing purposes?
Threaded grep can share the same variable if you also like to avoid
threading machinery in git grep --threads=1.
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] index-pack: support multithreaded delta resolving
  2012-03-12  2:32 ` [PATCH v2 2/2] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
  2012-03-12 10:57   ` Thomas Rast
@ 2012-03-13  0:32   ` Ramsay Jones
  2012-03-14 10:29     ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 12+ messages in thread
From: Ramsay Jones @ 2012-03-13  0:32 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Junio C Hamano

Nguyễn Thái Ngọc Duy wrote:
[snipped]
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  I changed Ramsay's mutex patch a little bit and incorporate it here.
>  Ramsay, it'd be great if you could try it again on MinGW

Hmm, well do you want the good news, or the bad news ... :-D

First, I should say that I feel like I'm doing a very bad job of
communicating, so let me apologize for that and hope that this time
I make a better job of it!

This patch breaks the build on MinGW, since the emulation code has not
(thus far) included an implementation of pthread_key_delete(). I simply
commented out the call to that function, in cleanup_thread(), so that I
could test the remainder of the patch.

Although this patch is an improvement on previous patches, it still fails
in *exactly* the same way as earlier attempts.

I probably didn't make clear before that 'nr_threads' has been given too
many duties, which is the main reason for me introducing a new variable
'threads_active'. For example, ...

>  builtin/index-pack.c             |  198 ++++++++++++++++++++++++++++++++++----
>  3 files changed, 192 insertions(+), 18 deletions(-)
> 

[snipped]

> +static inline void lock_mutex(pthread_mutex_t *mutex)
> +{
> +	if (nr_threads > 1)
> +		pthread_mutex_lock(mutex);
> +}

What is this condition testing (ie. what does it mean)? Does it mean:

    1. there are some threads currently running ?
    2. the mutex variables are in a usable state ?

Does this expression always express the same invariant?

The answer, of course, is *no*.

Let us consider the call to parse_pack_objects() at line 1367. Let us
suppose that we have been asked to use threads (from the config file,
the command-line, or simply !NO_PTHREADS), so that when we call the
parse_pack_objects() function nr_threads > 1.

Note that, at this point, no threads are active and the mutex variables
have not been initialised.

Now, at the beginning of parse_pack_objects(), we find some 'first pass'
processing [for (i = 0; i < nr_objects; i++) ... lines 839-851], which
includes a call to sha1_object() at line 848. sha1_object() in turn has
an invocation of the read_lock() macro (line 552), which in turn calls
lock_mutex() with a pointer to the read_mutex.

Note that, at this point, no threads are active and the mutex variables
have not been initialised.

Also note that "nr_threads > 1" is true. At this point, nr_threads is still
playing the "this is how many threads I have been requested to create" role.
But again, no threads have been created yet, the mutex variables haven't been
initialised, and ... well, *boom*.

So, in order to get it to work on MinGW (and this time I only tested on MinGW),
I had to apply the patch below (look familiar?).

[I ran the same four tests as before, five times in a row. On *one* occasion
t5300.22 (verify-pack catches a corrupted type/size of the 1st packed object data)
failed because the 'dd' command crashed! So, maybe there is a problem lurking.]

ATB,
Ramsay Jones

-- >8 --
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7e3b287..6679734 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -69,6 +69,7 @@ static int nr_processed;
 static int nr_deltas;
 static int nr_resolved_deltas;
 static int nr_threads;
+static int threads_active;
 
 static int from_stdin;
 static int strict;
@@ -105,13 +106,13 @@ static pthread_key_t key;
 
 static inline void lock_mutex(pthread_mutex_t *mutex)
 {
-	if (nr_threads > 1)
+	if (threads_active)
 		pthread_mutex_lock(mutex);
 }
 
 static inline void unlock_mutex(pthread_mutex_t *mutex)
 {
-	if (nr_threads > 1)
+	if (threads_active)
 		pthread_mutex_unlock(mutex);
 }
 
@@ -125,14 +126,16 @@ static void init_thread(void)
 	pthread_mutex_init(&work_mutex, NULL);
 	pthread_key_create(&key, NULL);
 	thread_data = xcalloc(nr_threads, sizeof(*thread_data));
+	threads_active = 1;
 }
 
 static void cleanup_thread(void)
 {
+	threads_active = 0;
 	pthread_mutex_destroy(&read_mutex);
 	pthread_mutex_destroy(&counter_mutex);
 	pthread_mutex_destroy(&work_mutex);
-	pthread_key_delete(key);
+	/*pthread_key_delete(key);*/
 	nr_threads = 1;
 	free(thread_data);
 }
-- 8< --

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] index-pack: support multithreaded delta resolving
  2012-03-13  0:32   ` Ramsay Jones
@ 2012-03-14 10:29     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 12+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-03-14 10:29 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: git, Junio C Hamano

2012/3/13 Ramsay Jones <ramsay@ramsay1.demon.co.uk>:
> Nguyễn Thái Ngọc Duy wrote:
> [snipped]
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>> ---
>>  I changed Ramsay's mutex patch a little bit and incorporate it here.
>>  Ramsay, it'd be great if you could try it again on MinGW
>
> Hmm, well do you want the good news, or the bad news ... :-D
>
> First, I should say that I feel like I'm doing a very bad job of
> communicating, so let me apologize for that and hope that this time
> I make a better job of it!
>
> This patch breaks the build on MinGW, since the emulation code has not
> (thus far) included an implementation of pthread_key_delete(). I simply
> commented out the call to that function, in cleanup_thread(), so that I
> could test the remainder of the patch.
>
> Although this patch is an improvement on previous patches, it still fails
> in *exactly* the same way as earlier attempts.
>
> I probably didn't make clear before that 'nr_threads' has been given too
> many duties, which is the main reason for me introducing a new variable
> 'threads_active'. For example, ...

You are right. I will incorporate your changes in the next reroll. Thank you.

> [I ran the same four tests as before, five times in a row. On *one* occasion
> t5300.22 (verify-pack catches a corrupted type/size of the 1st packed object data)
> failed because the 'dd' command crashed! So, maybe there is a problem lurking.]

I fail to see how verify-pack can make dd crash, especially when
verify-pack is called after dd in t5300.22. Anyway it's good to keep
an eye on this case.
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-03-14 10:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-28  4:36 [PATCH 1/2] index-pack: split second pass obj handling into own function Nguyễn Thái Ngọc Duy
2012-02-28  4:36 ` [PATCH 2/2] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
2012-03-02  6:09   ` Junio C Hamano
2012-03-02 13:42     ` [PATCH v2 " Nguyễn Thái Ngọc Duy
2012-03-02 18:53       ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2012-03-12  2:32 [PATCH v2 0/2] Multithread index-pack Nguyễn Thái Ngọc Duy
2012-03-12  2:32 ` [PATCH v2 2/2] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
2012-03-12 10:57   ` Thomas Rast
2012-03-12 11:42     ` Nguyen Thai Ngoc Duy
2012-03-12 11:47       ` Thomas Rast
2012-03-12 12:18         ` Nguyen Thai Ngoc Duy
2012-03-13  0:32   ` Ramsay Jones
2012-03-14 10:29     ` Nguyen Thai Ngoc Duy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).