git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Edward Ned Harvey <git@nedharvey.com>
Cc: git@vger.kernel.org
Subject: Re: git performance
Date: Thu, 23 Oct 2008 12:39:12 -0400	[thread overview]
Message-ID: <20081023163912.GA11489@coredump.intra.peff.net> (raw)
In-Reply-To: <000901c93490$e0c40ed0$a24c2c70$@com>

On Wed, Oct 22, 2008 at 05:55:14PM -0400, Edward Ned Harvey wrote:

> I'm talking about 40-50,000 files, on multi-user production linux,
> which means the cache is never warm, except when I'm benchmarking.

Well, if you have a cold cache it's going to take longer. :) You should
probably benchmark if you want to know exactly how long.

> Specifically RHEL 4 with the files on NFS mount.  Cold cache "svn st"
> takes ~10 mins.  Warm cache 20-30 sec.  Surprisingly to me,

Wow, that is awful. For comparison, "git status" from a cold on the
kernel repo takes me 17 seconds. From a warm cache, less than half a
second.

Yes, the cold cache case would probably be better with inotify, but
compared to svn, that's screaming fast. I haven't used perforce. If your
bottleneck really is stat'ing the tree, then yes, something that avoided
that might perform better (but weigh that particular optimization
against other things which might be slower).

> Out of curiosity, what are they talking about, when they say "git is
> fast?"

Well, there are the numbers above. When comparing to SVN or (god forbid)
CVS, there are order of magnitude speedups for most common operations.

>  Just the fact that it's all local disk, or is there more to it
> than that?  I could see - git would probably outperform perforce for

The things that generally make git fast are:

  - using a compact on-disk structure (including zlib and aggressive
    delta-finding) to keep your cache warm (and when it's not warm, to
    get data off the disk as quickly as possible)

  - the content-addressable nature of objects means we can just look at
    the data we need to solve a problem. For example,
    getting the history between point A and point B is "O(the number of
    commits between A and B)", _not_ "O(the size of the repo)".
    Viewing a log without generating diffs is "O(the number of
    commits)", not "O(some combination of the number of commits and the
    number of files in each commit)". Diffing two points in history is
    "O(the size of the differences between the two points)" and is
    totally independent of the number of commits between the two points.

  - most operations are streamable. "git log >/dev/null" on the kernel
    repo (about 90,000 commits) takes 8.5 seconds on my box. But it
    starts generating output immediately, so it _feels_ instant, and the
    rest of the data is generated while I read the first commit in my
    pager.

-Peff

  parent reply	other threads:[~2008-10-23 16:40 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-22 20:17 git performance Edward Ned Harvey
2008-10-22 20:36 ` Jeff King
2008-10-22 21:13   ` Peter Harris
2008-10-22 21:55   ` Edward Ned Harvey
2008-10-23  7:11     ` Andreas Ericsson
2008-10-23  7:11     ` Andreas Ericsson
2008-10-23  7:41     ` Andreas Ericsson
2008-10-23 12:16     ` Matthieu Moy
2008-10-23 16:39     ` Jeff King [this message]
     [not found]       ` <000001c9358f$232bac70$69830550$@com>
2008-10-24 14:29         ` Jeff King
2008-10-24 17:42           ` George Shammas
2008-10-24 19:06             ` Jakub Narebski
2008-10-24 17:53           ` Linus Torvalds
2008-10-24 18:20             ` Jeff King
2008-10-23 18:31     ` Daniel Barkalow
2008-10-23 22:24     ` Nanako Shiraishi
2008-10-24  3:56       ` Daniel Barkalow
2008-10-24  7:55     ` Pete Harlan
2008-10-24 23:10       ` Pete Harlan
2008-10-22 22:42 ` Jakub Narebski
2008-10-23  7:43   ` Andreas Ericsson
2008-10-23 13:04     ` Nguyen Thai Ngoc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081023163912.GA11489@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@nedharvey.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).