Re: Why Git is so fast - Shawn O. Pearce

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Shawn O. Pearce" <spearce@spearce.org>
To: Jakub Narebski <jnareb@gmail.com>
Cc: Michael Witten <mfwitten@gmail.com>,
	Martin Langhoff <martin.langhoff@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Why Git is so fast
Date: Thu, 30 Apr 2009 11:52:44 -0700	[thread overview]
Message-ID: <20090430185244.GR23604@spearce.org> (raw)
In-Reply-To: <200904301728.06989.jnareb@gmail.com>

Jakub Narebski <jnareb@gmail.com> wrote:
> Let's rephrase question a bit then: what low-level operation were needed
> for good performance in JGit? 

Aside from the message I just posted:

- Avoid String, its too expensive most of the time.  Stick with
  byte[], and better, stick with data that is a triplet of (byte[],
  int start, int end) to define a region of data.  Yes, its annoying,
  as its 3 values you need to pass around instead of just 1, but
  its makes a big difference in running time.

- Avoid allocating byte[] for SHA-1s, instead we convert to 5 ints,
  which can be inlined into an object allocation.

- Subclass instead of contain references.  We extend ObjectId to
  attach application data, rather than contain a reference to an
  ObjectId.  Classical Java programming techniques would say this
  is a violation of encapsulatio.  But it gets us the same memory
  impact that C Git gets by saying:

    struct appdata {
      unsigned char[20] sha1;
      ....
	}

- We're hurting dearly for not having more efficient access to the
  pack-*.pack file data.  mmap in Java is crap.  We implement our
  own page buffer, reading in blocks of 8192 bytes at a time and
  holding them in our own cache.

  Really, we should write our own mmap library as an optional JNI
  thing, and tie it into libz so we can efficiently run inflate()
  off the pack data directly.

- We're hurting dearly for not having more efficient access to the
  pack-*.idx files.  Again, with no mmap we read the entire bloody
  index into memory.  But since you won't touch most of it we keep
  it in large byte[], but since you are searching with an ObjectId
  (5 ints) we pay a conversion price on every search step where
  we have to copy from the large byte[] to 5 local variable ints,
  and then compare to the ObjectId.  Its an overhead C git doesn't
  have to deal with.

Anyway.

I'm still just amazed at how well JGit runs given these limitations.
I guess that's Moore's Law for you.  10 years ago, JGit wouldn't
have been practical.

-- 
Shawn.

next prev parent reply	other threads:[~2009-04-30 18:54 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-27  8:55 Eric Sink's blog - notes on git, dscms and a "whole product" approach Martin Langhoff
2009-04-28 11:24 ` Cross-Platform Version Control (was: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Jakub Narebski
2009-04-28 21:00   ` Robin Rosenberg
2009-04-29  6:55   ` Martin Langhoff
2009-04-29  7:21     ` Jeff King
2009-04-29 20:05       ` Markus Heidelberg
2009-04-29  7:52     ` Cross-Platform Version Control Jakub Narebski
2009-04-29  8:25       ` Martin Langhoff
2009-04-28 18:16 ` Eric Sink's blog - notes on git, dscms and a "whole product" approach Jakub Narebski
2009-04-29  7:54   ` Sitaram Chamarty
2009-04-30 12:17   ` Why Git is so fast (was: Re: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Jakub Narebski
2009-04-30 12:56     ` Michael Witten
2009-04-30 15:28       ` Why Git is so fast Jakub Narebski
2009-04-30 18:52         ` Shawn O. Pearce [this message]
2009-04-30 20:36           ` Kjetil Barvik
2009-04-30 20:40             ` Shawn O. Pearce
2009-04-30 21:36               ` Kjetil Barvik
2009-05-01  0:23                 ` Steven Noonan
2009-05-01  1:25                   ` James Pickens
2009-05-01  9:19                   ` Kjetil Barvik
2009-05-01  9:34                     ` Mike Hommey
2009-05-01  9:42                       ` Kjetil Barvik
2009-05-01 17:42                 ` Tony Finch
2009-05-01  5:24             ` Dmitry Potapov
2009-05-01  9:42               ` Mike Hommey
2009-05-01 10:46                 ` Dmitry Potapov
2009-04-30 18:43       ` Why Git is so fast (was: Re: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Shawn O. Pearce
2009-04-30 14:22     ` Jeff King
2009-05-01 18:43       ` Linus Torvalds
2009-05-01 19:08         ` Jeff King
2009-05-01 19:13           ` david
2009-05-01 19:32             ` Nicolas Pitre
2009-05-01 21:17           ` Daniel Barkalow
2009-05-01 21:37           ` Linus Torvalds
2009-05-01 22:11             ` david
2009-04-30 18:56     ` Nicolas Pitre
2009-04-30 19:16       ` Alex Riesen
2009-05-04  8:01         ` Why Git is so fast Andreas Ericsson
2009-04-30 19:33       ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090430185244.GR23604@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=martin.langhoff@gmail.com \
    --cc=mfwitten@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).