All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joshua Redstone <joshua.redstone@fb.com>
To: "Carlos Martín Nieto" <cmn@elego.de>,
	"Tomas Carnecky" <tom@dbservice.com>,
	"Junio C Hamano" <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Debugging git-commit slowness on a large repo
Date: Wed, 7 Dec 2011 01:48:46 +0000	[thread overview]
Message-ID: <CB04005C.2C669%joshua.redstone@fb.com> (raw)
In-Reply-To: <20111203002347.GB2950@centaur.lab.cmartin.tk>

Hi Carlos and Tomas and Junio,

@Tomas, I tried adding the '--no-status' flag to 'git commit' and it sped
things up by maybe 15%, but commits still take a second.

@Carlos, by "same size", I mean roughly the same number of files and
number of bytes modified in each file.  In all experiments, it's less than
5 files modified per commit with changes totaling fewer than 10 KB, often
more like 1 KB.  I actually wrote a test script to generate commits,
customized for the stats on the repo I'm using.  It repeatedly generates
some changes, does 'git add [ list of files changed ]' followed by 'git
commit --no-status -m [ msg ]'.   It generates changes by picking fewer
than 5 files at random, modifying two 100-byte regions in each file, and
occasionally creates a new file of about 1 KB.  If it helps, I can
probably post the test script I've been using.

I tried doing a 'git read-tree HEAD' before each 'git add ; git commit'
iteration, and the time for git-commit jumped from about 1 second to about
8 seconds.  That is a pretty dramatic slowdown.  Any idea why?  I wonder
if that's related to the overall commit slowness.

@Carlos and/or @Junio, can you point me at any docs/code to understand
what a tree-cache is and how it differs from the index?  I did a google
search for [git tree-cache index], but nothing popped out.

Cheers,
Josh


On 12/2/11 4:23 PM, "Carlos Martín Nieto" <cmn@elego.de> wrote:

>On Fri, Dec 02, 2011 at 11:17:10PM +0000, Joshua Redstone wrote:
>> Hi,
>> I have a git repo with about 300k commits,  150k files totaling maybe
>>7GB.
>>  Locally committing a small change - say touching fewer than 300 bytes
>> across 4 files - consistently takes over one second, which seems kinda
>> slow.  This is using git 1.7.7.4 on a linux 2.6 box.  The time does not
>> improve after doing a git-gc (my .git dir has maybe 250 files after a
>>git
>> gc).  The same size commit on a brand new repo takes < 10ms.  Any
>>thoughts
>> on why committing a small change seems to take a long time on larger
>>repos?
>
>By "same size commit" do you mean the same amount of changes, or the
>same amount of files? Committing doesn't depend on the size of the
>repo (by itself), but on the size of the index, which depends on the
>amount of files to be committed (as git is snapshot-based). At one
>point, commit forgot how to write the tree cache to the index (a
>performance optimisation). Do the times improve if you run 'git
>read-tree HEAD' between one commit and another? Note that this will
>reset the index to the last commit, though for the tests I image you
>use some variation of 'git commit -a'.
>
>Thomas Rast wrote a patch to re-teach commit to store the tree cache,
>but there were some issues and never got applied.
>
>> 
>> Fwiw, I also tried doing the same test using libgit2 (via the pygit2
>> wrapper), and it was ever slower (about 6 seconds to commit the same
>>small
>> change).
>
>I don't know about the python bindings, but on the (somewhat
>unscientific) tests for libgit2's write-tree (the slow part of a
>creating a commit), it performs slightly faster than git's (though I
>think git's write-tree does update the tree cache, which libgit2
>doesn't currently). The speed could just be a side-effect of the small
>test repo. From your domain, I assume the data is not for public
>consumption, but it'd be great if you could post your code to pygit2's
>issue tracker so we can see how much of the slowdown comes from the
>bindings or the library.
>
>   cmn
>

  parent reply	other threads:[~2011-12-07  1:50 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-02 23:17 Debugging git-commit slowness on a large repo Joshua Redstone
2011-12-03  0:23 ` Carlos Martín Nieto
2011-12-05 17:38   ` Junio C Hamano
2011-12-07  1:48   ` Joshua Redstone [this message]
2011-12-07  2:08     ` Nguyen Thai Ngoc Duy
2011-12-07 22:48       ` Joshua Redstone
2011-12-08  1:39         ` Nguyen Thai Ngoc Duy
2011-12-09  0:09           ` Joshua Redstone
2011-12-09  0:17             ` Joshua Redstone
2011-12-13  0:15               ` Joshua Redstone
2011-12-20  0:51                 ` Joshua Redstone
2011-12-20  1:21                   ` Junio C Hamano
2011-12-20  1:40                     ` Joshua Redstone
2011-12-20  9:23                       ` Thomas Rast
2011-12-20 19:26                         ` Joshua Redstone
2011-12-04 13:54 ` Tomas Carnecky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CB04005C.2C669%joshua.redstone@fb.com \
    --to=joshua.redstone@fb.com \
    --cc=cmn@elego.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=tom@dbservice.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.