From: Piotr Krukowiecki <piotr.krukowiecki@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Thomas Rast <trast@inf.ethz.ch>,
Git Mailing List <git@vger.kernel.org>,
Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Subject: Re: git status: small difference between stating whole repository and small subdirectory
Date: Fri, 17 Feb 2012 18:19:06 +0100 [thread overview]
Message-ID: <CAA01Csq6vSekW=Fa236bB0H3LVtN43Gb2aLMVE+A1wVyUqYJ7A@mail.gmail.com> (raw)
In-Reply-To: <20120216192001.GB4348@sigill.intra.peff.net>
On Thu, Feb 16, 2012 at 8:20 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Feb 16, 2012 at 02:37:47PM +0100, Piotr Krukowiecki wrote:
>
>> >> $ time git status -- .
>> >> real 0m2.503s
>> >> user 0m0.160s
>> >> sys 0m0.096s
>> >>
>> >> $ time git status
>> >> real 0m9.663s
>> >> user 0m0.232s
>> >> sys 0m0.556s
>> >
>> > Did you drop caches here, too?
>>
>> Yes I did - with cache the status takes something like 0.1-0.3s on whole repo.
>
> OK, then that makes sense. It's pretty much just I/O on the filesystem
> and on the object db.
>
> You can break status down a little more to see which is which. Try "git
> update-index --refresh" to see just how expensive the lstat and index
> handling is.
"git update-index --refresh" with dropped cache took
real 0m3.726s
user 0m0.024s
sys 0m0.404s
while "git status" with dropped cache takes
real 0m13.578s
user 0m0.240s
sys 0m0.600s
I'm not sure why it takes more than the 9s reported before - IIRC I
did the previous test in single mode under bare shell and this time
I'm testing under gnome. This or it's the effect of running
update-index :/
Now status on subdir takes 9.5s. But still the
not-much-faster-status-on-subdir rule is true.
> And then try "git diff-index HEAD" for an idea of how expensive it is to
> just read the objects and compare to the index.
The diff-index after dropping cache takes
real 0m14.095s
user 0m0.268s
sys 0m0.564s
>> > Not really. You're showing an I/O problem, and repacking is git's way of
>> > reducing I/O.
>>
>> So if I understand correctly, the reason is because git must compare
>> workspace files with packed objects - and the problem is
>> reading/seeking/searching in the packs?
>
> Mostly reading (we keep a sorted index and access the packfiles via
> mmap, so we only touch the pages we need). But you're also paying to
> lstat() the directory tree, too. And you're paying to load (probably)
> the whole index into memory, although it's relatively compact compared
> to the actual file data.
If the index is the objects/pack/*.idx files than it's 21MB
>> Is there a way to make packs better? I think most operations are on
>> workdir files - so maybe it'd be possible to tell gc/repack/whatever
>> to optimize access to files which I currently have in workdir?
>
> It already does optimize for that case. If you can make it even better,
> I'm sure people would be happy to see the numbers.
If I understand correctly, you only need to compute sha1 on the
workdir files and compare it with sha1 files recorded in index/gitdir.
It seems that to get the sha1 from index/gitdir I need to read the
packfiles? Maybe it'd be possible to cache/index it somehow, for
example in separate and smaller file?
> Mostly I think it is just the case that disk I/O is slow, and the
> operation you're asking for has to do a certain amount of it. What kind
> of disk/filesystem are you pulling off of?
>
> It's not a fuse filesystem by any chance, is it? I have a repo on an
> encfs-mounted filesystem, and the lstat times are absolutely horrific.
No, it's ext4 and the disk Seagate Barracuda 7200.12 500GB, as it
reads on the cover :)
But IMO faster disk won't help with this - times will be smaller, but
you'll still have to read the same data, so the subdir times will be
just 2x faster than whole repo, won't it? So maybe in my case it will
go down to e.g. 2s on subdir, but for someone with larger repository
it will still be 10s...
--
Piotr Krukowiecki
next prev parent reply other threads:[~2012-02-17 17:19 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 9:42 git status: small difference between stating whole repository and small subdirectory Piotr Krukowiecki
2012-02-10 12:33 ` Nguyen Thai Ngoc Duy
2012-02-10 13:46 ` Piotr Krukowiecki
2012-02-10 14:37 ` Nguyen Thai Ngoc Duy
2012-02-13 16:54 ` Piotr Krukowiecki
2012-02-10 16:18 ` Piotr Krukowiecki
2012-02-14 11:34 ` Thomas Rast
2012-02-15 8:57 ` Piotr Krukowiecki
2012-02-15 11:01 ` Nguyen Thai Ngoc Duy
2012-02-15 15:14 ` Piotr Krukowiecki
2012-02-16 13:22 ` Piotr Krukowiecki
2012-02-15 19:03 ` Jeff King
2012-02-16 13:37 ` Piotr Krukowiecki
2012-02-16 14:05 ` Thomas Rast
2012-02-16 20:15 ` Junio C Hamano
2012-02-17 16:55 ` Piotr Krukowiecki
2012-02-16 19:20 ` Jeff King
2012-02-17 17:19 ` Piotr Krukowiecki [this message]
2012-02-17 20:37 ` Jeff King
2012-02-17 22:25 ` Junio C Hamano
2012-02-17 22:29 ` Jeff King
2012-02-20 8:25 ` Piotr Krukowiecki
2012-02-20 14:06 ` Jeff King
2012-02-20 14:09 ` Thomas Rast
2012-02-20 14:36 ` Nguyen Thai Ngoc Duy
2012-02-20 14:39 ` Jeff King
2012-02-20 15:11 ` Jeff King
2012-02-20 18:45 ` Thomas Rast
2012-02-20 20:35 ` Jeff King
2012-02-20 22:04 ` Junio C Hamano
2012-02-20 22:41 ` Jeff King
2012-02-20 23:31 ` Junio C Hamano
2012-02-21 7:21 ` Piotr Krukowiecki
2012-02-20 20:08 ` Junio C Hamano
2012-02-20 20:17 ` Jeff King
2012-02-21 14:45 ` Nguyen Thai Ngoc Duy
2012-02-21 19:16 ` Junio C Hamano
2012-02-22 2:12 ` Nguyen Thai Ngoc Duy
2012-02-22 2:55 ` Junio C Hamano
2012-02-22 12:54 ` Nguyen Thai Ngoc Duy
2012-02-22 13:17 ` Thomas Rast
2012-02-22 10:34 ` Nguyen Thai Ngoc Duy
2012-02-22 3:32 ` Junio C Hamano
2012-04-10 15:16 ` Piotr Krukowiecki
2012-04-10 16:23 ` Junio C Hamano
2012-04-10 18:00 ` Jeff King
2012-02-20 19:57 ` Junio C Hamano
2012-02-20 19:59 ` Thomas Rast
2012-02-20 14:16 ` Nguyen Thai Ngoc Duy
2012-02-20 14:22 ` Jeff King
2012-02-20 19:56 ` Junio C Hamano
2012-02-20 20:09 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAA01Csq6vSekW=Fa236bB0H3LVtN43Gb2aLMVE+A1wVyUqYJ7A@mail.gmail.com' \
--to=piotr.krukowiecki@gmail.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=trast@inf.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).