git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joshua Redstone <joshua.redstone@fb.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Nguyen Thai Ngoc Duy" <pclouds@gmail.com>,
	"Carlos Martín Nieto" <cmn@elego.de>,
	"Tomas Carnecky" <tom@dbservice.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Debugging git-commit slowness on a large repo
Date: Tue, 20 Dec 2011 01:40:47 +0000	[thread overview]
Message-ID: <CB152498.2D6DB%joshua.redstone@fb.com> (raw)
In-Reply-To: <7vehw0kphc.fsf@alter.siamese.dyndns.org>

You're right, more than optimizations, they are modifications that reduce
safety checks and make assumptions about the way one is using git (e.g.,
you always remember to add each file you want to commit).  I focused on
them because:

  1. In our installation, we don't use commit hooks that change what's
being committed, so it's good to know that in principle, there's a big
perf benefit to be had by leveraging that fact.

  2. At an abstract level, it seems like the cost of doing a commit should
be proportional to the amount of the repository touched by the commit, not
by the size of the repository.  These experiments are demonstrations of
one direction that a set of optimizations would need to go to get commit
performance more along those lines.

  3. We're also exploring storage systems that support more efficient ways
to query what's changed than stat'ing every file.

I forgot to mention, the times I quoted where with --no-verify and
--no-status.  Adding '-q' didn't speed up performance at all.


As a bonus, I've also profiled git-add on the 1-million file repo, and it
looks like, as you might expect, the time is dominated by reading and
writing the index.  The time for git-add is a couple of seconds.

Josh


On 12/19/11 5:21 PM, "Junio C Hamano" <gitster@pobox.com> wrote:

>Joshua Redstone <joshua.redstone@fb.com> writes:
>
>> I've managed to speed up git-commit on large repos by 4x by removing
>>some
>> safeguards that caused git to stat every file in the repo on commits
>>that
>> touch a small number of files.  The diff, for illustrative purposes
>>only,
>> is at:
>>
>>     https://gist.github.com/1499621
>>
>>
>> With a repo with 1 million files (but few commits), the diff drops the
>> commit time down from 7.3 seconds to 1.8 seconds, a 75% decrease. The
>> optimizations are:
>
>I do not know if these kind of changes are called "optimizations" or
>merely making the command record a random tree object that may have some
>resemblance to what you wanted to commit but is subtly incorrect. I didn't
>fetch your safety removal, though.
>
>Wouldn't you get a similar speed-up without being unsafe if you simply ran
>"git commit" without any parameter (i.e. write out the current index as a
>tree and make a commit), combined with "--no-status" and perhaps "-q" to
>avoid running the comparison between the resulting commit and the working
>tree state after the commit?

  reply	other threads:[~2011-12-20  1:41 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-02 23:17 Debugging git-commit slowness on a large repo Joshua Redstone
2011-12-03  0:23 ` Carlos Martín Nieto
2011-12-05 17:38   ` Junio C Hamano
2011-12-07  1:48   ` Joshua Redstone
2011-12-07  2:08     ` Nguyen Thai Ngoc Duy
2011-12-07 22:48       ` Joshua Redstone
2011-12-08  1:39         ` Nguyen Thai Ngoc Duy
2011-12-09  0:09           ` Joshua Redstone
2011-12-09  0:17             ` Joshua Redstone
2011-12-13  0:15               ` Joshua Redstone
2011-12-20  0:51                 ` Joshua Redstone
2011-12-20  1:21                   ` Junio C Hamano
2011-12-20  1:40                     ` Joshua Redstone [this message]
2011-12-20  9:23                       ` Thomas Rast
2011-12-20 19:26                         ` Joshua Redstone
2011-12-04 13:54 ` Tomas Carnecky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CB152498.2D6DB%joshua.redstone@fb.com \
    --to=joshua.redstone@fb.com \
    --cc=cmn@elego.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=tom@dbservice.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).