Re: Bad git status performance

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael J Gruber <git@drmicha.warpmail.net>
To: Jean-Luc Herren <jlh@gmx.ch>
Cc: Glenn Griffin <ggriffin.kernel@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: Bad git status performance
Date: Fri, 21 Nov 2008 16:19:50 +0100	[thread overview]
Message-ID: <4926D196.3000301@drmicha.warpmail.net> (raw)
In-Reply-To: <4926ADB8.5000307@gmx.ch>

Jean-Luc Herren venit, vidit, dixit 21.11.2008 13:46:
> Glenn Griffin wrote:
>> On Thu, Nov 20, 2008 at 4:28 PM, Jean-Luc Herren <jlh@gmx.ch> wrote:
>>> The first 'git status' shows the same difference as the second,
>>> just the second time it's staged instead of unstaged.  Why does it
>>> take 16 seconds the second time when it's instant the first time?
>> I believe the two runs of git status need to do very different things.
>>  When run the first time, git knows the files in your working
>> directory are not in the index so it can easily say those files are
>> 'Changed but not updated' just from their existence.
> 
> I might be mistaken about how the index works, but those paths
> *are* in the index at that time.  They just have the old content,
> i.e. the same content as in HEAD.  When HEAD == index, then
> nothing is staged.
> 
> But the presence of those files alone doesn't tell you that they
> have changed.  You have to look at the content and compare it to
> the index (== HEAD in this situation) to see whether they have
> changed or not and for some reason git can do this very quickly.
> 
>> The second run
>> those files do exist in both the index and the working directory, so
>> git status first shows the files that are 'Changes to be committed'
>> and that should be fast, but additionally git status will check to see
>> if those files in your working directory have changed since you added
>> them to the index.
> 
> Which is basically the same comparision as above, just it turns
> out that they have not changed.  But even then, we're talking
> about comparing a 1 byte file in the index to a 1 byte file in the
> work tree.  That doesn't take 16 seconds, even for 100 files.
> 
> So this makes me believe it's the first step (comparing HEAD to
> the index to show staged changes) that is slow.  And when you
> compare a 1MB file to a 1 byte file, you don't need to read all of
> the big file, you can tell they're not the same right after the
> first byte.  (Even an doing stat() is enough, since the size is
> not the same.)
> 
> Another thing that came to my mind is maybe rename detection kicks
> in, even though no path vanished and none is new.  I believe
> rename detection doesn't happen for unstaged changes, which might
> explain the difference in speed.
> 
> btw, I forgot to mention that I get this with branches maint,
> master, next and pu.

Interestingly, all of

git diff --stat
git diff --stat --cached
git diff --stat HEAD

are "fast" (0.2s or so), i.e. diffing index-wtree, HEAD-index,
HEAD-wtree. Linus' threaded stat doesn't help either for status, btw (20s).

Experimenting further: Using 10 files with 10MB each (rather than 100
times 1MB) brings down the time by a factor 10 roughly - and so does
using 100 files with 100k each. Huh? Latter may be expected (10MB
total), but former (100MB total)?

Now it's getting funny: Changing your "echo >" to "echo ">>" (in your
100 files 1MB case) makes things "almost fast" again (1.3s).

OK, it's "use the source, Luke" time... Actually the part you don't see
takes the most time:
wt_status_print_updated()

And in fact I can confirm your suspicion: wt_status_print_updated()
enforces rename detection (ignoring any config). Forcing it off
(rev.diffopt.detect_rename = 0;) cuts down the 20s to 0.75s.

How about a config option status.renames (or something like -M) for status?

Michael

next prev parent reply	other threads:[~2008-11-21 15:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-21  0:28 Bad git status performance Jean-Luc Herren
2008-11-21  0:42 ` David Bryson
     [not found] ` <c9e534200811201711y887ddd2t33013ec4a7db3c9a@mail.gmail.com>
2008-11-21 12:46   ` Jean-Luc Herren
2008-11-21 15:19     ` Michael J Gruber [this message]
2008-11-21 20:07       ` Jean-Luc Herren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4926D196.3000301@drmicha.warpmail.net \
    --to=git@drmicha.warpmail.net \
    --cc=ggriffin.kernel@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jlh@gmx.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).