Re: [PATCH 07/11] object: try naive cuckoo hashing

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 07/11] object: try naive cuckoo hashing
@ 2011-08-13 10:22 George Spelvin
  2011-08-13 19:12 ` George Spelvin
  0 siblings, 1 reply; 3+ messages in thread
From: George Spelvin @ 2011-08-13 10:22 UTC (permalink / raw)
  To: gitster; +Cc: git, linux

I had vague memories of hearing about cuckoo hashing in the pasr, but
your posting inspired me to read up on it.

Your implementation doesn't quite match the standard one.  Did you get
it from somewhere, or is it your own creation?

In the classical design, the H1 and H2 hashes use two separate hash
tables (T1 and T2), so there is never any question about where insertion
should happen.

It's
- New items are inserted into T1 (at H1).
- If the slot is full, the new item is still stored there, but
  it is bumped to T2.
- If that T2 slot is full, it is still overwritten, but what was there
  is bumped back to T1.

etc., until a NULL pointer is found or 

You use a single hash table.  Is that variant analyzed somewhere, or is
it something you've found is better?

It seems that what your insert_obj_hash does is "if the H1 slot is open,
store there.  Otherwise, store in the H2 slot and bump the item already
there (if any)."  What this means is that as soon as you hit an object
already in its H2 slot, the insert will fall into an infinite loop and
eventually fail.

The original scheme could bump items out of H2 slots back to H1 slots.

Another advantage of the 2-table system is that every object hash two
possible homes, even if H1 == H2.  With one table, the hash functions
are twice as big, so the chance of that happening is cut in half, but
such objects have only one possible home and really gum up the works.

(You could define H2 = H1 + obj->sha1[1] % (obj_hash_size-1) to
solve this, using the standard shift-and-add optimizations for
computing modulo a power of two less one, but I'm still not sure
if it's worth it.)

Another technique for using a hash table at a high load factor is
multiple-entry buckets.  This is discussed in "A cool and practical
alternative to traditional hash tables"
http://www.ru.is/faculty/ulfar/CuckooHash.pdf

Because a bucket is a single cache line, accessing it adds no more
overhead than a single-entry bicket, *as long as you can validate the
lookup without following a pointer*.

The best way to do that is to store some additional validation data
(fortunately, SHA-1 provides lots; even if you're using all 5 words,
their sum is available) in the hash table itself.  This does make the
table larger, but speeds up lookups.

One way to speed up pointer-bumping in the 2-hash case would be to
store H1+H2 (the full 32-bit sum) as a validation value.  In addition to
allowing you to avoid following the pointer on misses the vast majority
of the time, this also lets you bump pointers from H1 to H2 without
actually following them.  You know one of the hashes (because you found
the pointer in that table slot), and a subtraction produces the other.

This produces insert code like the following:

struct obj_hash_entry {
	struct object *obj;
	uint32_t hash_sum;
} *obj_hash;

static struct object *insert_obj_hash(struct object *obj)
{
	uint32_t hash_sum = hash_val(obj->sha1);
	unsigned ix = hash_sum & (obj_hash_size - 1);
	unsigned n = 0, lim = 1;
	unsigned loop_check;	/* Ignore GCC warning */

	hash_sum += hash_val(obj->sha1 + 4)

	do {
		struct object *tmp_obj = obj_hash[ix].obj;
		uint32_t tmp_hash_sum = obj_hash[ix].hash_sum;

		obj_hash[ix].obj = obj;
		obj_hash[ix].hash_sum = hash_sum;

		/* Brent's cycle-finding algorithm */
		if (++n == lim) {	/* Less registers: if (n & (n-1)) == 0 */
			loop_check = ix;
			lim *= 2;
		}

		obj = tmp_obj;
		hash_sum = tmp_hash_sum;
		ix = (tmp_hash_sum - ix) & (obj_hash_size - 1);

	} while (obj && ix != loop_check);

	return obj;
}

I presume the optimization to lookup_object is obvious.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 07/11] object: try naive cuckoo hashing
  2011-08-13 10:22 [PATCH 07/11] object: try naive cuckoo hashing George Spelvin
@ 2011-08-13 19:12 ` George Spelvin
  0 siblings, 0 replies; 3+ messages in thread
From: George Spelvin @ 2011-08-13 19:12 UTC (permalink / raw)
  To: gitster, linux; +Cc: git

I've been doing a lot of reading on Cuckoo hashing.

Yes, the single-table variant is described and used.  However, the
insertion procedure is not the way you do it.

Also, d-ary Cuckoo hashing (also called d-Cuckoo hashing) where you use
more than 2 hash functions is also used.

The insertion algorithm, however, is not really agreed on.

One algorithm (mostly proposed for hardware) uses d separate tables.
Every entry displaced from table i is displaced to table i+1 (mod d).
http://infoscience.epfl.ch/record/164147/files/cuckoo_dir_hpca2011_camera_ready.pdf

In the single-table case, and in general, however, a displaced entry
has more than one possible new location.  This leads to the question of how
to choose.

One proposal is to do a breadth-first search looking for a path to
a free slot.  It's provable that this will succeed with high probability
before the exponential growth of the breadth of the search tree
gets too bad.
See "Space Efficient Hash Tables with Worst Case Constant Access Time"
http://www.itu.dk/people/pagh/papers/d-cuckoo-jour.pdf

Another suggested tehcnique is to just pick an alternative at random
and proceed.  This is recommended in e.g.
"Efficient Hash Probes on Modern Processors"
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.1189&rep=rep1&type=pdf

Both of these lead to rather complex implementations.  The random number
generator is probably simpler than the breadth-first search, but either way
there's a bunch of auxiliary code.

Sticking with two hash functions, but using multi-entry buckets is
definitely an attractive possibility.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 00/11] Micro-optimizing lookup_object()
@ 2011-08-11 17:53 Junio C Hamano
  2011-08-11 17:53 ` [PATCH 07/11] object: try naive cuckoo hashing Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2011-08-11 17:53 UTC (permalink / raw)
  To: git

I noticed that a typical "git repack -a -d -f" spends about 50% of the
time during the "Counting objects" phase inside lookup_object(). The
look-up is implemented as a hashtable with linear probing that limits the
maximum load-factor to 50%, and during repacking the Linux kernel
repository, we count 2,139,209 objects, and the worst case we probe 50
hash entries to find a single object in the hash table due to slot
collisions. The lookup_object() function is called 88,603,392 times.

After cleaning up the existing code a bit, this series adds "--count-only"
option to "pack-objects" to tell it to stop after the "Counting objects"
phase for benchmarking purposes. To emulate the "Counting objects" phase
of a full repack, we can run this (perhaps under "/usr/bin/time"):

  git pack-objects --count-only --keep-true-parents --honor-pack-keep \
      --non-empty --all --reflog --no-reuse-delta \
      --delta-base-offset --stdout </dev/null >/dev/null

On my development box, the slightly cleaned-up existing linear probing
code gives (best of three runs with hot cache):

  31.89user 2.16system 0:34.16elapsed 99%CPU (0avgtext+0avgdata 2965264maxresident)k
  0inputs+0outputs (0major+225336minor)pagefaults 0swaps

Then the series replaces lookup_object(), insert_obj_hash() and
grow_object_hash() with a naive implementation of cuckoo hashing that uses
two hash functions. The performance is not very impressive, and this wastes
too much memory, due to its rather poor strategy to re-grow the table size:

  32.04user 3.43system 0:35.58elapsed 99%CPU (0avgtext+0avgdata 8178672maxresident)k
  0inputs+0outputs (0major+1206546minor)pagefaults 0swaps

As we have 20-byte object names as the hash key material, we could easily
extend this to use 5 hash functions instead of 2. This reduces the memory
usage by improving the load factor, but we spend extra cycles in lookup:

  31.66user 2.14system 0:33.91elapsed 99%CPU (0avgtext+0avgdata 2874176maxresident)k
  0inputs+0outputs (0major+225342minor)pagefaults 0swaps

At this step with 5-way cuckoo, we are slightly better than the original
linear probing. By using smaller number of hash functions, we can reduce
the cycles we need in lookup, while we lose on the load factor.

Here is what we get from the code with 4 hashes:

  30.88user 2.32system 0:33.31elapsed 99%CPU (0avgtext+0avgdata 3135984maxresident)k
  0inputs+0outputs (0major+290857minor)pagefaults 0swaps

And here is with 3 hashes:

  30.68user 2.26system 0:33.05elapsed 99%CPU (0avgtext+0avgdata 3660832maxresident)k
  0inputs+0outputs (0major+421963minor)pagefaults 0swaps

The best balance is somewhere between 3-hash and 4-hash, it seems. We are
getting ~4% runtime performance improvements (31.89 vs 30.68).

Just as a sanity check, the final patch in the series reduces the number
of hashes back to 2, which yields a similar performance characteristics
from the original "naive" cuckoo implementation:

  31.06user 3.22system 0:34.39elapsed 99%CPU (0avgtext+0avgdata 8176512maxresident)k
  0inputs+0outputs (0major+1206542minor)pagefaults 0swaps

The real optimization opportunity _may_ be to reduce the calls we make to
the function---we are calling lookup() 40+ times on one object. But that
is outside the scope of this series.

This series is not for application but primarily is to serve as the
supplimental data for the above numbers. A re-rolled series that consists
of the earlier clean-ups and then a patch to replace the linear probing
with 4-way cuckoo will be queued instead in 'pu'.

Junio C Hamano (11):
  object.c: code movement for readability
  object.c: remove duplicated code for object hashing
  pack-objects --count-only
  object: next_size() helper for readability
  object hash: we know the table size is a power of two
  object: growing the hash-table more aggressively does not help much
  object: try naive cuckoo hashing
  object: try 5-way cuckoo -- use all 20-bytes of SHA-1
  object: try 4-way cuckoo
  object: try 3-way cuckoo
  object: try 2-way cuckoo again

 builtin/pack-objects.c |    7 +++
 object.c               |  138 +++++++++++++++++++++++++++++-------------------
 2 files changed, 91 insertions(+), 54 deletions(-)

-- 
1.7.6.433.g1421f

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 07/11] object: try naive cuckoo hashing
  2011-08-11 17:53 [PATCH 00/11] Micro-optimizing lookup_object() Junio C Hamano
@ 2011-08-11 17:53 ` Junio C Hamano
  0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2011-08-11 17:53 UTC (permalink / raw)
  To: git

Signed-off-by: Junio C Hamano <gitster@pobox.com>

---
../+v/65c71a61547aecd8a6830391fda31ed5d18e0529/git-pack-objects
Counting objects: 2139209, done.
32.04user 3.43system 0:35.58elapsed 99%CPU (0avgtext+0avgdata 8178672maxresident)k
0inputs+0outputs (0major+1206546minor)pagefaults 0swaps
Counting objects: 2139209, done.
33.41user 3.15system 0:36.67elapsed 99%CPU (0avgtext+0avgdata 8178688maxresident)k
0inputs+0outputs (0major+1206547minor)pagefaults 0swaps
Counting objects: 2139209, done.
32.24user 3.17system 0:35.51elapsed 99%CPU (0avgtext+0avgdata 8179968maxresident)k
0inputs+0outputs (0major+1206598minor)pagefaults 0swaps
---
 object.c |   99 ++++++++++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 67 insertions(+), 32 deletions(-)

diff --git a/object.c b/object.c
index c2c0a7d..7624c48 100644
--- a/object.c
+++ b/object.c
@@ -50,33 +50,52 @@ static unsigned int hash_val(const unsigned char *sha1)
 	return hash;
 }
 
-static void insert_obj_hash(struct object *obj, struct object **hash, unsigned int size)
-{
-	unsigned int j = hash_val(obj->sha1) & (size-1);
 
-	while (hash[j]) {
-		j++;
-		if (j >= size)
-			j = 0;
-	}
-	hash[j] = obj;
-}
+#define H1(sha1) (hash_val(sha1) % obj_hash_size)
+#define H2(sha1) (hash_val((sha1) + sizeof(unsigned int)) % obj_hash_size)
 
 struct object *lookup_object(const unsigned char *sha1)
 {
-	unsigned int i;
 	struct object *obj;
 
 	if (!obj_hash)
 		return NULL;
 
-	i = hash_val(sha1) & (obj_hash_size-1);
-	while ((obj = obj_hash[i]) != NULL) {
-		if (!hashcmp(sha1, obj->sha1))
-			break;
-		i++;
-		if (i == obj_hash_size)
-			i = 0;
+	if ((obj = obj_hash[H1(sha1)]) && !hashcmp(sha1, obj->sha1))
+		return obj;
+	if ((obj = obj_hash[H2(sha1)]) && !hashcmp(sha1, obj->sha1))
+		return obj;
+	return NULL;
+}
+
+static void grow_object_hash(void); /* forward */
+
+/*
+ * A naive single-table cuckoo hashing implementation.
+ * Return NULL when "obj" is successfully inserted. Otherwise
+ * return a pointer to the object to be inserted (which may
+ * be different from the original obj). The caller is expected
+ * to grow the hash table and re-insert the returned object.
+ */
+static struct object *insert_obj_hash(struct object *obj)
+{
+	int loop;
+
+	for (loop = obj_hash_size / 2; 0 <= loop; loop--) {
+		struct object *tmp_obj;
+		unsigned int ix;
+
+		ix = H1(obj->sha1);
+		if (!obj_hash[ix]) {
+			obj_hash[ix] = obj;
+			return NULL;
+		}
+		ix = H2(obj->sha1);
+		tmp_obj = obj_hash[ix];
+		obj_hash[ix] = obj;
+		if (!tmp_obj)
+			return NULL;
+		obj = tmp_obj;
 	}
 	return obj;
 }
@@ -89,25 +108,36 @@ static int next_size(int sz)
 
 static void grow_object_hash(void)
 {
-	int i;
-	int new_hash_size = next_size(obj_hash_size);
-	struct object **new_hash;
-
-	new_hash = xcalloc(new_hash_size, sizeof(struct object *));
-	for (i = 0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
-		if (!obj)
+	struct object **current_hash;
+	int current_size;
+
+	current_hash = obj_hash;
+	current_size = obj_hash_size;
+	while (1) {
+		int i;
+		obj_hash_size = next_size(obj_hash_size);
+		obj_hash = xcalloc(obj_hash_size, sizeof(struct object *));
+
+		for (i = 0; i < current_size; i++) {
+			if (!current_hash[i])
+				continue;
+			if (insert_obj_hash(current_hash[i]))
+				break;
+		}
+		if (i < current_size) {
+			/* too small - grow and retry */
+			free(obj_hash);
 			continue;
-		insert_obj_hash(obj, new_hash, new_hash_size);
+		}
+		free(current_hash);
+		return;
 	}
-	free(obj_hash);
-	obj_hash = new_hash;
-	obj_hash_size = new_hash_size;
 }
 
 void *create_object(const unsigned char *sha1, int type, void *o)
 {
 	struct object *obj = o;
+	struct object *to_insert;
 
 	obj->parsed = 0;
 	obj->used = 0;
@@ -117,8 +147,13 @@ void *create_object(const unsigned char *sha1, int type, void *o)
 
 	if (obj_hash_size - 1 <= nr_objs * 2)
 		grow_object_hash();
-
-	insert_obj_hash(obj, obj_hash, obj_hash_size);
+	to_insert = obj;
+	while (1) {
+		to_insert = insert_obj_hash(to_insert);
+		if (!to_insert)
+			break;
+		grow_object_hash();
+	}
 	nr_objs++;
 	return obj;
 }
-- 
1.7.6.433.g1421f

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-08-13 19:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-13 10:22 [PATCH 07/11] object: try naive cuckoo hashing George Spelvin
2011-08-13 19:12 ` George Spelvin
  -- strict thread matches above, loose matches on Subject: below --
2011-08-11 17:53 [PATCH 00/11] Micro-optimizing lookup_object() Junio C Hamano
2011-08-11 17:53 ` [PATCH 07/11] object: try naive cuckoo hashing Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).