git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [JGIT PATCH 00/18] Misc. performance tweaks
@ 2008-12-25  2:10 Shawn O. Pearce
  2008-12-25  2:10 ` [JGIT PATCH 01/23] Improve hit performance on the UnpackedObjectCache Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:10 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Cloning linux-2.6.git through JGit was painful at best.  I found
and fixed some small bottlenecks after a day of profiling and
experimentation, but we're still slower than C git.

With this series I managed to drop the time for "git clone --bare"
over git:// using "jgit daemon" server and "C git" client.
Any difference between jgit and "C git" is in the server side.

  before:  7m42.488s
  after :  2m33.882s
  C git :  1m26.158s     ("git daemon" server)


So I'm still seeing a major bottleneck that I can't quite fix.

Object enumeration (aka "Counting ...") takes too long, because we
spend a huge amount of time unpacking delta chains for trees so we
can enumerate their referenced items.

Our UnpackedObjectCache gets <4% hit ratio when doing the trees
for linux-2.6.git.  Increasing the cache doesn't have a noticable
improvement on performance.

I tried rewriting UnpackedObjectCache to permit multiple objects
per hash bucket.  Even with that (and the maximum chain length
per bucket not exceeding 4 items) our hit ratio was still <5%,
so I tossed that implementation out.

"jgit rev-list --objects" vs. "git rev-list --objects" is a huge
difference, about 1m difference.  That's most of the time difference
I noted above between jgit and C git on the server side.

So with this series, we're better.  Its actually almost tolerable
to clone linux-2.6 through a jgit backed server.


Shawn O. Pearce (23):
  Improve hit performance on the UnpackedObjectCache
  Add MutableObjectId.clear() to set the id to zeroId
  Allow TreeWalk callers to pass a MutableObjectId to get the current
    id
  Switch ObjectWalk to use the new MutableObjectId form in TreeWalk
  Change walker based fetch to use TreeWalk's MutableObjectId accessor
  Reduce garbage allocation when using TreeWalk
  Switch ObjectWalk to use a naked CanonicalTreeParser because its
    faster
  Remove the unused PackFile.get(ObjectId) form
  Remove getId from ObjectLoader API as its unnecessary overhead
  Make mmap mode more reliable by forcing GC at the correct spot
  Rewrite WindowCache to use a hash table
  Change ByteArrayWindow to read content outside of WindowCache's lock
  Dispose of RevCommit buffers when they aren't used in PackWriter
  Don't unpack delta chains while writing a pack from a pack v1 index
  Don't unpack delta chains while converting delta to whole object
  Defer parsing of the ObjectId while walking a PackIndex Iterator
  Only do one getCachedBytes per whole object written
  Correctly use a long for the offsets within a generated pack
  Allow more direct access to determine isWritten
  Move "wantWrite" field of ObjectToPack into the flags field
  Use an ArrayList for the reuseLoader collection in PackWriter
  Don't cut off existing delta chains if we are reusing deltas
  Correctly honor the thin parameter to PackWriter.writePack

 .../jgit/pgm/opt/AbstractTreeIteratorHandler.java  |    6 +-
 .../tst/org/spearce/jgit/lib/PackIndexTest.java    |    4 +-
 .../tst/org/spearce/jgit/lib/PackWriterTest.java   |   14 +-
 .../tst/org/spearce/jgit/lib/T0004_PackReader.java |    4 +-
 .../jgit/errors/CorruptObjectException.java        |   12 +
 .../src/org/spearce/jgit/lib/ByteArrayWindow.java  |   31 ++
 .../src/org/spearce/jgit/lib/ByteBufferWindow.java |   17 +
 .../src/org/spearce/jgit/lib/ByteWindow.java       |   20 ++-
 .../src/org/spearce/jgit/lib/Constants.java        |    2 +-
 .../spearce/jgit/lib/DeltaPackedObjectLoader.java  |    3 +-
 .../src/org/spearce/jgit/lib/MutableObjectId.java  |    9 +
 .../src/org/spearce/jgit/lib/ObjectLoader.java     |   38 ---
 .../src/org/spearce/jgit/lib/PackFile.java         |   49 +---
 .../src/org/spearce/jgit/lib/PackIndex.java        |   48 ++--
 .../src/org/spearce/jgit/lib/PackIndexV1.java      |   20 +-
 .../src/org/spearce/jgit/lib/PackIndexV2.java      |   27 +-
 .../src/org/spearce/jgit/lib/PackWriter.java       |   63 +++--
 .../src/org/spearce/jgit/lib/Repository.java       |   29 +--
 .../org/spearce/jgit/lib/UnpackedObjectCache.java  |   21 +-
 .../org/spearce/jgit/lib/UnpackedObjectLoader.java |   12 +-
 .../spearce/jgit/lib/WholePackedObjectLoader.java  |    3 +-
 .../src/org/spearce/jgit/lib/WindowCache.java      |  323 ++++++++++++--------
 .../src/org/spearce/jgit/lib/WindowCursor.java     |   16 +
 .../src/org/spearce/jgit/lib/WindowedFile.java     |   61 +++--
 .../src/org/spearce/jgit/revwalk/ObjectWalk.java   |   51 ++--
 .../src/org/spearce/jgit/revwalk/RevWalk.java      |    8 +-
 .../spearce/jgit/transport/PackedObjectInfo.java   |    2 +-
 .../src/org/spearce/jgit/transport/UploadPack.java |    1 +
 .../jgit/transport/WalkFetchConnection.java        |   48 ++-
 .../jgit/treewalk/AbstractTreeIterator.java        |   48 +++
 .../spearce/jgit/treewalk/CanonicalTreeParser.java |   85 +++++-
 .../src/org/spearce/jgit/treewalk/TreeWalk.java    |   88 +++++-
 .../spearce/jgit/util/CountingOutputStream.java    |    5 +-
 33 files changed, 752 insertions(+), 416 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2008-12-27 17:34 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-25  2:10 [JGIT PATCH 00/18] Misc. performance tweaks Shawn O. Pearce
2008-12-25  2:10 ` [JGIT PATCH 01/23] Improve hit performance on the UnpackedObjectCache Shawn O. Pearce
2008-12-25  2:10   ` [JGIT PATCH 02/23] Add MutableObjectId.clear() to set the id to zeroId Shawn O. Pearce
2008-12-25  2:10     ` [JGIT PATCH 03/23] Allow TreeWalk callers to pass a MutableObjectId to get the current id Shawn O. Pearce
2008-12-25  2:11       ` [JGIT PATCH 04/23] Switch ObjectWalk to use the new MutableObjectId form in TreeWalk Shawn O. Pearce
2008-12-25  2:11         ` [JGIT PATCH 05/23] Change walker based fetch to use TreeWalk's MutableObjectId accessor Shawn O. Pearce
2008-12-25  2:11           ` [JGIT PATCH 06/23] Reduce garbage allocation when using TreeWalk Shawn O. Pearce
2008-12-25  2:11             ` [JGIT PATCH 07/23] Switch ObjectWalk to use a naked CanonicalTreeParser because its faster Shawn O. Pearce
2008-12-25  2:11               ` [JGIT PATCH 08/23] Remove the unused PackFile.get(ObjectId) form Shawn O. Pearce
2008-12-25  2:11                 ` [JGIT PATCH 09/23] Remove getId from ObjectLoader API as its unnecessary overhead Shawn O. Pearce
2008-12-25  2:11                   ` [JGIT PATCH 10/23] Make mmap mode more reliable by forcing GC at the correct spot Shawn O. Pearce
2008-12-25  2:11                     ` [JGIT PATCH 11/23] Rewrite WindowCache to use a hash table Shawn O. Pearce
2008-12-25  2:11                       ` [JGIT PATCH 12/23] Change ByteArrayWindow to read content outside of WindowCache's lock Shawn O. Pearce
2008-12-25  2:11                         ` [JGIT PATCH 13/23] Dispose of RevCommit buffers when they aren't used in PackWriter Shawn O. Pearce
2008-12-25  2:11                           ` [JGIT PATCH 14/23] Don't unpack delta chains while writing a pack from a pack v1 index Shawn O. Pearce
2008-12-25  2:11                             ` [JGIT PATCH 15/23] Don't unpack delta chains while converting delta to whole object Shawn O. Pearce
2008-12-25  2:11                               ` [JGIT PATCH 16/23] Defer parsing of the ObjectId while walking a PackIndex Iterator Shawn O. Pearce
2008-12-25  2:11                                 ` [JGIT PATCH 17/23] Only do one getCachedBytes per whole object written Shawn O. Pearce
2008-12-25  2:11                                   ` [JGIT PATCH 18/23] Correctly use a long for the offsets within a generated pack Shawn O. Pearce
2008-12-25  2:11                                     ` [JGIT PATCH 19/23] Allow more direct access to determine isWritten Shawn O. Pearce
2008-12-25  2:11                                       ` [JGIT PATCH 20/23] Move "wantWrite" field of ObjectToPack into the flags field Shawn O. Pearce
2008-12-25  2:11                                         ` [JGIT PATCH 21/23] Use an ArrayList for the reuseLoader collection in PackWriter Shawn O. Pearce
2008-12-25  2:11                                           ` [JGIT PATCH 22/23] Don't cut off existing delta chains if we are reusing deltas Shawn O. Pearce
2008-12-25  2:11                                             ` [JGIT PATCH 23/23] Correctly honor the thin parameter to PackWriter.writePack Shawn O. Pearce
2008-12-27 13:30                       ` [JGIT PATCH 11/23] Rewrite WindowCache to use a hash table Robin Rosenberg
2008-12-27 17:32                         ` [JGIT PATCH 11/23 v2] " Shawn O. Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).