From: "Shawn O. Pearce" <spearce@spearce.org>
To: git@vger.kernel.org
Subject: JGit server performance
Date: Sat, 7 Mar 2009 19:22:14 -0800 [thread overview]
Message-ID: <20090308032214.GU16213@spearce.org> (raw)
As Gerrit Code Review provides Gitosis-like functionality, but
that is implemented through JGit's pure Java implementation, the
performance of JGit's UploadPack matters to me.
JGit is about 66% slower on my hardware when cloning the linux-2.6
repository, compared to git-core (jgit 2m21s, git-core 1m23s).
The bottlenecks are:
~41.2% in ObjectLoader.getCachedBytes()
This is the tree objects being parsed out of the pack file.
The problem here (I believe) is we have horrible locality
when reading. The delta base for a tree isn't in memory most
of the time, because its been evicted by other trees accessed
since the last time that tree was touched.
Conceptually this makes some sense, as ObjectWalk does a depth
first traversal through the tree of each commit, in most-recent
to least-recent commit order. On a larger project like the
kernel we'll touch a lot more objects between two root trees,
and there isn't even any guarantee that two root trees that
appear near each other in the commit sequence have a delta
base relationship.
~20.5% in AbstractTreeIterator.getEntryObjectId()
The bulk of the time here is really down in NB.decodeInt32().
We spend a lot of time converting an object id in a tree data
stream into an AnyObjectId (really a reused MutableObjectId)
so that we can probe the ObjectIdSubclassMap to see if we have
seen this object before.
The sad fact is, we need all 20 bytes to be converted into the 5
words, because the majority of the time, we have actually seen
the object before, and it exists in our hash table. The only
way to know is to convert and compare all 5 words. Any attempt
to lazily convert the 5 words would just make it slower.
... and it falls off from there.
I'm at a loss on how to improve the performance. I don't think that
we can do anything about the 20% in getEntryObjectId() due to the
way our data structures are organized around the 5-word ObjectId,
and not a byte[]. That 20% is a penalty git-core doesn't have to
pay, and is most certainly one reason why JGit is so much slower.
The only thing that may work is to modify ObjectWalk to try and
deduce some delta-chain locality from the pack. Buffer up objects
that it needs to parse in a queue, rank them by the delta base
they would need to use, and then try to unfold the base first,
and then the children.
That is, do something like what IndexPack does, where we try to
unpack each object exactly once, and recursively process the delta
chain children after unpacking the parent.
We _might_ get better locality if we can queue up all root trees,
process all of them, then process the first level children, etc,
so go breadth first.
But that seems like a lot of code, and it probably wrecks the simple
recency ordering produced natively by ObjectWalk. :-\
--
Shawn.
next reply other threads:[~2009-03-08 3:28 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-08 3:22 Shawn O. Pearce [this message]
[not found] ` <d411cc4a0903090801w7748d26pb821a7bfb3db660@mail.gmail.com>
2009-03-09 15:27 ` JGit server performance Shawn O. Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090308032214.GU16213@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox