git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: David Turner <dturner@twopensource.com>
Cc: git mailing list <git@vger.kernel.org>
Subject: Re: git reset for index restoration?
Date: Thu, 22 May 2014 14:23:03 -0400	[thread overview]
Message-ID: <20140522182303.GA1167@sigill.intra.peff.net> (raw)
In-Reply-To: <1400782096.18134.1.camel@stross>

On Thu, May 22, 2014 at 02:08:16PM -0400, David Turner wrote:

> On Thu, 2014-05-22 at 12:46 -0400, Jeff King wrote:
> > On Thu, May 22, 2014 at 12:22:43PM -0400, David Turner wrote:
> >
> > > If I have a git repository with a clean working tree, and I delete the
> > > index, then I can use git reset (with no arguments) to recreate it.
> > > However, when I do recreate it, it doesn't come back the same.  I have
> > > not analyzed this in detail, but the effect is that commands like git
> > > status take much longer because they must read objects out of a pack
> > > file.  In other words, the index seems to not realize that the index (or
> > > at least most of it) represents the same state as HEAD.  If I do git
> > > reset --hard, the index is restored to the original state (it's
> > > byte-for-byte identical), and the pack file is no longer read.
> >
> > Are you sure it's reading a packfile?
>
> Well, it's calling inflate(), and strace says it is reading
> e.g. .git/objects/pack/pack-....{idx,pack}.
>
> So, I would say so.

That seems odd that we would be spending extra time there. We do
inflate() the trees in order to diff the index against HEAD, but we
shouldn't need to inflate any blobs.

Here it is for me (on linux.git):

  [before, warm cache]
  $ time perf record -q git status >/dev/null
  real    0m0.192s
  user    0m0.080s
  sys     0m0.108s

  $ perf report | grep -v '#' | head -5
     7.46%      git  [kernel.kallsyms]   [k] __d_lookup_rcu
     4.55%      git  libz.so.1.2.8       [.] inflate
     3.53%      git  libc-2.18.so        [.] __memcmp_sse4_1
     3.46%      git  [kernel.kallsyms]   [k] security_inode_getattr
     3.29%      git  git                 [.] memihash

  $ time git reset
  real    0m0.080s
  user    0m0.036s
  sys     0m0.040s

So status is pretty quick, and the time is going to lstat in the kernel,
and some tree inflation. Reset is fast, because it has nothing much to
do. Now let's kill off the index's stat cache:

  $ rm .git/index
  $ time perf record -q git reset
  real    0m0.967s
  user    0m0.780s
  sys     0m0.180s

That took a while. What was it doing?

  $ perf report | grep -v '#' | head -5
     3.23%      git  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
     1.74%      git  libcrypto.so.1.0.0  [.] 0x000000000007e010            
     1.60%      git  [kernel.kallsyms]   [k] __d_lookup_rcu                
     1.51%      git  [kernel.kallsyms]   [k] page_fault                    
     1.44%      git  libc-2.18.so        [.] __memcmp_sse4_1               

Reading files and sha1. We hash the working-tree files here (reset
doesn't technically need to refresh the index from the working tree to
copy entries from HEAD into the index, but it does it so it can do fancy
things like tell you about which files are now out-of-date).

Now how does stat fare after this?

  $ time perf record -q git status >/dev/null
  real    0m0.189s
  user    0m0.088s
  sys     0m0.096s

Looks about the same as before to me.

Note that if you use "read-tree" instead of "reset", it _just_ loads the
index, and doesn't touch the working tree. If you then run "git status",
then _that_ command has to refresh the index, and it will pay the
hashing cost. Like:

  $ rm .git/index
  $ time git read-tree HEAD
  real    0m0.084s
  user    0m0.064s
  sys     0m0.016s
  $ time git status >/dev/null
  real    0m0.833s
  user    0m0.712s
  sys     0m0.112s

All of this is behaving as I would expect. Can you show us a set of
commands that deviate from this?

-Peff

  reply	other threads:[~2014-05-22 18:23 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-22 16:22 git reset for index restoration? David Turner
2014-05-22 16:46 ` Jeff King
2014-05-22 18:08   ` David Turner
2014-05-22 18:23     ` Jeff King [this message]
2014-05-22 19:26       ` David Turner
2014-05-22 16:46 ` Elijah Newren
2014-05-22 18:17   ` David Turner
2014-05-22 18:39     ` Jeff King
2014-05-22 19:07       ` David Turner
2014-05-22 19:09         ` Jeff King
2014-05-22 19:30           ` Jeff King
2014-05-22 21:34             ` Junio C Hamano
2014-05-22 21:53               ` David Turner
2014-05-22 21:58                 ` Junio C Hamano
2014-05-22 22:01                   ` David Turner
2014-05-22 22:12                     ` Junio C Hamano
2014-05-22 22:18                       ` Junio C Hamano
2014-05-22 23:33                         ` Duy Nguyen
2014-05-22 23:37                           ` David Turner
2014-05-22 22:29                       ` Junio C Hamano
2014-05-22 23:02                         ` David Turner
2014-05-22 23:14                           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140522182303.GA1167@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).