From: Michael J Gruber <git@drmicha.warpmail.net>
To: Jean-Luc Herren <jlh@gmx.ch>
Cc: Glenn Griffin <ggriffin.kernel@gmail.com>,
Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: Bad git status performance
Date: Fri, 21 Nov 2008 16:19:50 +0100 [thread overview]
Message-ID: <4926D196.3000301@drmicha.warpmail.net> (raw)
In-Reply-To: <4926ADB8.5000307@gmx.ch>
Jean-Luc Herren venit, vidit, dixit 21.11.2008 13:46:
> Glenn Griffin wrote:
>> On Thu, Nov 20, 2008 at 4:28 PM, Jean-Luc Herren <jlh@gmx.ch> wrote:
>>> The first 'git status' shows the same difference as the second,
>>> just the second time it's staged instead of unstaged. Why does it
>>> take 16 seconds the second time when it's instant the first time?
>> I believe the two runs of git status need to do very different things.
>> When run the first time, git knows the files in your working
>> directory are not in the index so it can easily say those files are
>> 'Changed but not updated' just from their existence.
>
> I might be mistaken about how the index works, but those paths
> *are* in the index at that time. They just have the old content,
> i.e. the same content as in HEAD. When HEAD == index, then
> nothing is staged.
>
> But the presence of those files alone doesn't tell you that they
> have changed. You have to look at the content and compare it to
> the index (== HEAD in this situation) to see whether they have
> changed or not and for some reason git can do this very quickly.
>
>> The second run
>> those files do exist in both the index and the working directory, so
>> git status first shows the files that are 'Changes to be committed'
>> and that should be fast, but additionally git status will check to see
>> if those files in your working directory have changed since you added
>> them to the index.
>
> Which is basically the same comparision as above, just it turns
> out that they have not changed. But even then, we're talking
> about comparing a 1 byte file in the index to a 1 byte file in the
> work tree. That doesn't take 16 seconds, even for 100 files.
>
> So this makes me believe it's the first step (comparing HEAD to
> the index to show staged changes) that is slow. And when you
> compare a 1MB file to a 1 byte file, you don't need to read all of
> the big file, you can tell they're not the same right after the
> first byte. (Even an doing stat() is enough, since the size is
> not the same.)
>
> Another thing that came to my mind is maybe rename detection kicks
> in, even though no path vanished and none is new. I believe
> rename detection doesn't happen for unstaged changes, which might
> explain the difference in speed.
>
> btw, I forgot to mention that I get this with branches maint,
> master, next and pu.
Interestingly, all of
git diff --stat
git diff --stat --cached
git diff --stat HEAD
are "fast" (0.2s or so), i.e. diffing index-wtree, HEAD-index,
HEAD-wtree. Linus' threaded stat doesn't help either for status, btw (20s).
Experimenting further: Using 10 files with 10MB each (rather than 100
times 1MB) brings down the time by a factor 10 roughly - and so does
using 100 files with 100k each. Huh? Latter may be expected (10MB
total), but former (100MB total)?
Now it's getting funny: Changing your "echo >" to "echo ">>" (in your
100 files 1MB case) makes things "almost fast" again (1.3s).
OK, it's "use the source, Luke" time... Actually the part you don't see
takes the most time:
wt_status_print_updated()
And in fact I can confirm your suspicion: wt_status_print_updated()
enforces rename detection (ignoring any config). Forcing it off
(rev.diffopt.detect_rename = 0;) cuts down the 20s to 0.75s.
How about a config option status.renames (or something like -M) for status?
Michael
next prev parent reply other threads:[~2008-11-21 15:21 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-21 0:28 Bad git status performance Jean-Luc Herren
2008-11-21 0:42 ` David Bryson
[not found] ` <c9e534200811201711y887ddd2t33013ec4a7db3c9a@mail.gmail.com>
2008-11-21 12:46 ` Jean-Luc Herren
2008-11-21 15:19 ` Michael J Gruber [this message]
2008-11-21 20:07 ` Jean-Luc Herren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4926D196.3000301@drmicha.warpmail.net \
--to=git@drmicha.warpmail.net \
--cc=ggriffin.kernel@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jlh@gmx.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).