git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tomas Carnecky <tom@dbservice.com>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: Joshua Redstone <joshua.redstone@fb.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Git performance results on a large repository
Date: Sun, 05 Feb 2012 16:01:22 +0100	[thread overview]
Message-ID: <4F2E99C2.7090609@dbservice.com> (raw)
In-Reply-To: <CACsJy8DkLCK0ZUKNz_PJazsxjsRbWVVZwjAU5n2EAjJfCYtpoQ@mail.gmail.com>

On 2/4/12 7:53 AM, Nguyen Thai Ngoc Duy wrote:
> On Fri, Feb 3, 2012 at 9:20 PM, Joshua Redstone<joshua.redstone@fb.com>  wrote:
>> I timed a few common operations with both a warm OS file cache and a cold
>> cache.  i.e., I did a 'echo 3 | tee /proc/sys/vm/drop_caches' and then did
>> the operation in question a few times (first timing is the cold timing,
>> the next few are the warm timings).  The following results are on a server
>> with average hard drive (I.e., not flash)  and>  10GB of ram.
>>
>> 'git status' :   39 minutes cold, and 24 seconds warm.
>>
>> 'git blame':   44 minutes cold, 11 minutes warm.
>>
>> 'git add' (appending a few chars to the end of a file and adding it):   7
>> seconds cold and 5 seconds warm.
>>
>> 'git commit -m "foo bar3" --no-verify --untracked-files=no --quiet
>> --no-status':  41 minutes cold, 20 seconds warm.  I also hacked a version
>> of git to remove the three or four places where 'git commit' stats every
>> file in the repo, and this dropped the times to 30 minutes cold and 8
>> seconds warm.
> Have you tried "git update-index --assume-unchaged"? That should
> reduce mass lstat() and hopefully improve the above numbers. The
> interface is not exactly easy-to-use, but if it has significant gain,
> then we can try to improve UI.
>
> On the index size issue, ideally we should make minimum writes to
> index instead of rewriting 191 MB index. An improvement we could do
> now is to compress it, reduce disk footprint, thus disk I/O. If you
> compress the index with gzip, how big is it?
If you're not afraid to add filesystem-specific code to git, you could 
leverage the btrfs find-new command (or use the ioctl directly) to 
quickly find changed files since a certain point in time. Other CoW 
filesystems may have similar mechanisms. You could for example store the 
last generation id in an index extension, that's what those extensions 
are for, right?

tom

  parent reply	other threads:[~2012-02-05 15:35 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-03 14:20 Git performance results on a large repository Joshua Redstone
2012-02-03 14:56 ` Ævar Arnfjörð Bjarmason
2012-02-03 17:00   ` Joshua Redstone
2012-02-03 22:40     ` Sam Vilain
2012-02-03 22:57       ` Sam Vilain
2012-02-07  1:19       ` Nguyen Thai Ngoc Duy
2012-02-03 23:05     ` Matt Graham
2012-02-04  1:25   ` Evgeny Sazhin
2012-02-03 23:35 ` Chris Lee
2012-02-04  0:01 ` Zeki Mokhtarzada
2012-02-04  5:07 ` Joey Hess
2012-02-04  6:53 ` Nguyen Thai Ngoc Duy
2012-02-04 18:05   ` Joshua Redstone
2012-02-05  3:47     ` Nguyen Thai Ngoc Duy
2012-02-06 15:40       ` Joey Hess
2012-02-07 13:43         ` Nguyen Thai Ngoc Duy
2012-02-09 21:06           ` Joshua Redstone
2012-02-10  7:12             ` Nguyen Thai Ngoc Duy
2012-02-10  9:39               ` Christian Couder
2012-02-10 12:24                 ` Nguyen Thai Ngoc Duy
2012-02-06  7:10     ` David Mohs
2012-02-06 16:23     ` Matt Graham
2012-02-06 20:50       ` Joshua Redstone
2012-02-06 21:07         ` Greg Troxel
2012-02-07  1:28         ` david
2012-02-06 21:17     ` Sam Vilain
2012-02-04 20:05   ` Joshua Redstone
2012-02-05 15:01   ` Tomas Carnecky [this message]
2012-02-05 15:17     ` Nguyen Thai Ngoc Duy
2012-02-04  8:57 ` slinky
2012-02-04 21:42 ` Greg Troxel
2012-02-05  4:30 ` david
2012-02-05 11:24   ` David Barr
2012-02-07  8:58 ` Emanuele Zattin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F2E99C2.7090609@dbservice.com \
    --to=tom@dbservice.com \
    --cc=git@vger.kernel.org \
    --cc=joshua.redstone@fb.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).