From: Jonathan Nieder <jrnieder@gmail.com>
To: git@vger.kernel.org
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
David Barr <david.barr@cordelta.com>,
Nicolas Pitre <nico@fluxnic.net>,
Raja R Harinath <harinath@hurrynot.org>,
Sverre Rabbelier <srabbelier@gmail.com>
Subject: [PATCH/RFC] fast-import: insert new object entries at start of hash bucket
Date: Tue, 23 Nov 2010 01:53:48 -0600 [thread overview]
Message-ID: <20101123075348.GA10367@burratino> (raw)
From: David Barr <david.barr@cordelta.com>
More often than not, find_object is called for recently inserted objects.
Optimise for this case by inserting new entries at the start of the chain.
This doesn't affect the cost of new inserts but reduces the cost of find
and insert for existing object entries.
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Hi,
After importing the first 500000 revs or so of the ASF repo with
svn-fe, it seems that fast-import gets a bit sluggish because its
object table fills up. David noticed that one can get a bit of a
speed-up by noticing that objects are likely to be accessed soon after
they are inserted in this workflow. (The basis for most svn deltas
comes from the revision just dumped.)
I would guess other workflows would display the same tendency if any;
at least, the same blob is likely to come up repeatedly in a few revs
and then not be mentioned for a while. Though maybe there is some
reason to make the opposite assumption that I have not thought of.
Shawn?
A more dramatic improvement comes from increasing the size of the hash
table; in particular, David noticed it allows the CPU usage to
increase from ~60% to 100% of one core. So presumably we should make
the size of the hash table dynamic.
Other aspects to investigate: choice of hash function; is it worth
moving accessed entries to the front of hash chains when they already
exist?
Timings in interesting workflows would also be nice.
Regardless, this seems like a good change on its own, especially
because it simplifies the code.
Thoughts?
Jonathan
fast-import.c | 9 ++-------
1 files changed, 2 insertions(+), 7 deletions(-)
diff --git a/fast-import.c b/fast-import.c
index 77549eb..e2f4874 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -539,22 +539,17 @@ static struct object_entry *insert_object(unsigned char *sha1)
{
unsigned int h = sha1[0] << 8 | sha1[1];
struct object_entry *e = object_table[h];
- struct object_entry *p = NULL;
while (e) {
if (!hashcmp(sha1, e->idx.sha1))
return e;
- p = e;
e = e->next;
}
e = new_object(sha1);
- e->next = NULL;
+ e->next = object_table[h];
e->idx.offset = 0;
- if (p)
- p->next = e;
- else
- object_table[h] = e;
+ object_table[h] = e;
return e;
}
--
1.7.2.3
next reply other threads:[~2010-11-23 7:54 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-23 7:53 Jonathan Nieder [this message]
2010-11-23 12:51 ` [PATCH/RFC] fast-import: insert new object entries at start of hash bucket Sverre Rabbelier
2010-11-23 18:19 ` Jonathan Nieder
2010-11-23 22:33 ` Shawn Pearce
2010-11-23 23:17 ` Jonathan Nieder
2010-11-23 23:29 ` Shawn Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101123075348.GA10367@burratino \
--to=jrnieder@gmail.com \
--cc=david.barr@cordelta.com \
--cc=git@vger.kernel.org \
--cc=harinath@hurrynot.org \
--cc=nico@fluxnic.net \
--cc=spearce@spearce.org \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).