git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC] fast-import: insert new object entries at start of hash bucket
@ 2010-11-23  7:53 Jonathan Nieder
  2010-11-23 12:51 ` Sverre Rabbelier
  2010-11-23 22:33 ` Shawn Pearce
  0 siblings, 2 replies; 6+ messages in thread
From: Jonathan Nieder @ 2010-11-23  7:53 UTC (permalink / raw)
  To: git
  Cc: Shawn O. Pearce, David Barr, Nicolas Pitre, Raja R Harinath,
	Sverre Rabbelier

From: David Barr <david.barr@cordelta.com>

More often than not, find_object is called for recently inserted objects.
Optimise for this case by inserting new entries at the start of the chain.
This doesn't affect the cost of new inserts but reduces the cost of find
and insert for existing object entries.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Hi,

After importing the first 500000 revs or so of the ASF repo with
svn-fe, it seems that fast-import gets a bit sluggish because its
object table fills up.  David noticed that one can get a bit of a
speed-up by noticing that objects are likely to be accessed soon after
they are inserted in this workflow.  (The basis for most svn deltas
comes from the revision just dumped.)

I would guess other workflows would display the same tendency if any;
at least, the same blob is likely to come up repeatedly in a few revs
and then not be mentioned for a while.  Though maybe there is some
reason to make the opposite assumption that I have not thought of.
Shawn?

A more dramatic improvement comes from increasing the size of the hash
table; in particular, David noticed it allows the CPU usage to
increase from ~60% to 100% of one core.  So presumably we should make
the size of the hash table dynamic.

Other aspects to investigate: choice of hash function; is it worth
moving accessed entries to the front of hash chains when they already
exist?

Timings in interesting workflows would also be nice.

Regardless, this seems like a good change on its own, especially
because it simplifies the code.

Thoughts?
Jonathan

 fast-import.c |    9 ++-------
 1 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 77549eb..e2f4874 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -539,22 +539,17 @@ static struct object_entry *insert_object(unsigned char *sha1)
 {
 	unsigned int h = sha1[0] << 8 | sha1[1];
 	struct object_entry *e = object_table[h];
-	struct object_entry *p = NULL;
 
 	while (e) {
 		if (!hashcmp(sha1, e->idx.sha1))
 			return e;
-		p = e;
 		e = e->next;
 	}
 
 	e = new_object(sha1);
-	e->next = NULL;
+	e->next = object_table[h];
 	e->idx.offset = 0;
-	if (p)
-		p->next = e;
-	else
-		object_table[h] = e;
+	object_table[h] = e;
 	return e;
 }
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-11-23 23:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-23  7:53 [PATCH/RFC] fast-import: insert new object entries at start of hash bucket Jonathan Nieder
2010-11-23 12:51 ` Sverre Rabbelier
2010-11-23 18:19   ` Jonathan Nieder
2010-11-23 22:33 ` Shawn Pearce
2010-11-23 23:17   ` Jonathan Nieder
2010-11-23 23:29     ` Shawn Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).