All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael J Gruber <git@drmicha.warpmail.net>
To: Jean-Luc Herren <jlh@gmx.ch>
Cc: Glenn Griffin <ggriffin.kernel@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: Bad git status performance
Date: Fri, 21 Nov 2008 16:19:50 +0100	[thread overview]
Message-ID: <4926D196.3000301@drmicha.warpmail.net> (raw)
In-Reply-To: <4926ADB8.5000307@gmx.ch>

Jean-Luc Herren venit, vidit, dixit 21.11.2008 13:46:
> Glenn Griffin wrote:
>> On Thu, Nov 20, 2008 at 4:28 PM, Jean-Luc Herren <jlh@gmx.ch> wrote:
>>> The first 'git status' shows the same difference as the second,
>>> just the second time it's staged instead of unstaged.  Why does it
>>> take 16 seconds the second time when it's instant the first time?
>> I believe the two runs of git status need to do very different things.
>>  When run the first time, git knows the files in your working
>> directory are not in the index so it can easily say those files are
>> 'Changed but not updated' just from their existence.
> 
> I might be mistaken about how the index works, but those paths
> *are* in the index at that time.  They just have the old content,
> i.e. the same content as in HEAD.  When HEAD == index, then
> nothing is staged.
> 
> But the presence of those files alone doesn't tell you that they
> have changed.  You have to look at the content and compare it to
> the index (== HEAD in this situation) to see whether they have
> changed or not and for some reason git can do this very quickly.
> 
>> The second run
>> those files do exist in both the index and the working directory, so
>> git status first shows the files that are 'Changes to be committed'
>> and that should be fast, but additionally git status will check to see
>> if those files in your working directory have changed since you added
>> them to the index.
> 
> Which is basically the same comparision as above, just it turns
> out that they have not changed.  But even then, we're talking
> about comparing a 1 byte file in the index to a 1 byte file in the
> work tree.  That doesn't take 16 seconds, even for 100 files.
> 
> So this makes me believe it's the first step (comparing HEAD to
> the index to show staged changes) that is slow.  And when you
> compare a 1MB file to a 1 byte file, you don't need to read all of
> the big file, you can tell they're not the same right after the
> first byte.  (Even an doing stat() is enough, since the size is
> not the same.)
> 
> Another thing that came to my mind is maybe rename detection kicks
> in, even though no path vanished and none is new.  I believe
> rename detection doesn't happen for unstaged changes, which might
> explain the difference in speed.
> 
> btw, I forgot to mention that I get this with branches maint,
> master, next and pu.

Interestingly, all of

git diff --stat
git diff --stat --cached
git diff --stat HEAD

are "fast" (0.2s or so), i.e. diffing index-wtree, HEAD-index,
HEAD-wtree. Linus' threaded stat doesn't help either for status, btw (20s).

Experimenting further: Using 10 files with 10MB each (rather than 100
times 1MB) brings down the time by a factor 10 roughly - and so does
using 100 files with 100k each. Huh? Latter may be expected (10MB
total), but former (100MB total)?

Now it's getting funny: Changing your "echo >" to "echo ">>" (in your
100 files 1MB case) makes things "almost fast" again (1.3s).

OK, it's "use the source, Luke" time... Actually the part you don't see
takes the most time:
wt_status_print_updated()

And in fact I can confirm your suspicion: wt_status_print_updated()
enforces rename detection (ignoring any config). Forcing it off
(rev.diffopt.detect_rename = 0;) cuts down the 20s to 0.75s.

How about a config option status.renames (or something like -M) for status?

Michael

  reply	other threads:[~2008-11-21 15:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-21  0:28 Bad git status performance Jean-Luc Herren
2008-11-21  0:42 ` David Bryson
     [not found] ` <c9e534200811201711y887ddd2t33013ec4a7db3c9a@mail.gmail.com>
2008-11-21 12:46   ` Jean-Luc Herren
2008-11-21 15:19     ` Michael J Gruber [this message]
2008-11-21 20:07       ` Jean-Luc Herren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4926D196.3000301@drmicha.warpmail.net \
    --to=git@drmicha.warpmail.net \
    --cc=ggriffin.kernel@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jlh@gmx.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.