From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Stephan Hennig <mailing_list@arcor.de>, Nicolas Pitre <nico@cam.org>
Cc: Andreas Ericsson <ae@op5.se>, git@vger.kernel.org
Subject: Re: git pull is slow
Date: Sat, 12 Jul 2008 18:05:14 +0100 (BST) [thread overview]
Message-ID: <alpine.DEB.1.00.0807121546590.8950@racer> (raw)
In-Reply-To: <4878A442.6020405@arcor.de>
Hi,
On Sat, 12 Jul 2008, Stephan Hennig wrote:
> Johannes Schindelin schrieb:
> > On Fri, 11 Jul 2008, Andreas Ericsson wrote:
> >
> >> Seems like you're being bitten by a bug we had some months back,
> >> where the client requested full history for new tag objects.
> >
> > I do not think so. I think it is a problem with the pack. The
> > slowness is already there in the clone, in the resolving phase.
>
> Thanks for having a look at this! What does "problem with the pack"
> mean? Do you think it is a Git problem (client or server side?) or just
> a misconfiguration?
I thought that the blobs in the pack are just too similar. That makes for
a good compression, since you get many relatively small deltas. But it
also makes for a lot of work to reconstruct the blobs.
I suspected that you run out of space for the cache holding some
reconstructed blobs (to prevent reconstructing all of them from scratch).
To see what I mean, just look at
$ git -p verify-pack -v \
.git/objects/pack/pack-563c2d83940c7e2d8c20a35206a390e2e567282f.pack
(or whatever pack you have there). It has this:
-- snip --
chain length = 40: 7 objects
chain length = 41: 8 objects
chain length = 42: 4 objects
chain length = 43: 8 objects
chain length = 44: 6 objects
chain length = 45: 2 objects
chain length = 46: 6 objects
chain length = 47: 2 objects
chain length = 48: 2 objects
chain length = 49: 2 objects
chain length = 50: 2 objects
-- snap --
... but that could not be the reason, as my current git.git's pack shows
this:
-- snip --
chain length = 40: 122 objects
chain length = 41: 99 objects
chain length = 42: 77 objects
chain length = 43: 76 objects
chain length = 44: 69 objects
chain length = 45: 72 objects
chain length = 46: 66 objects
chain length = 47: 103 objects
chain length = 48: 77 objects
chain length = 49: 111 objects
chain length = 50: 86 objects
chain length > 50: 60 objects
-- snap --
... which is much worse.
So I tried this:
-- snip --
wortliste$ /usr/bin/time git index-pack -o /dev/null
.git/objects/pack/pack-563c2d83940c7e2d8c20a35206a390e2e567282f.pack
fatal: unable to create /dev/null: File exists
Command exited with non-zero status 128
27.12user 11.21system 2:51.02elapsed 22%CPU (0avgtext+0avgdata
0maxresident)k
81848inputs+0outputs (1134major+2042348minor)pagefaults 0swaps
-- snap --
Compare that to git.git:
-- snip --
git$ /usr/bin/time git index-pack -o /dev/null
.git/objects/pack/pack-355b54f45778b56c00099bf45369f8a4f2704a51.pack
fatal: unable to create /dev/null: File exists
Command exited with non-zero status 128
16.13user 0.38system 0:17.80elapsed 92%CPU (0avgtext+0avgdata
0maxresident)k
81288inputs+0outputs (38major+51917minor)pagefaults 0swaps
-- snap --
So it seems that the major faults (requiring I/O) occur substantially more
often with your repository.
BTW the RAM numbers here are obviously bogus, the program trashed the disk
like there was no tomorrow.
Okay, "valgrind --tool=massif" to the rescue:
-- snip --
MB
555.9^ , #
| @..#
| @. :@::#
| , @@: :@::#
| ,@. @:. .:: @: : @@: :@::#
| @: .@@::@:: :::: @: : @@: :@::#
| , .@: :@@::@:: :::: @: : @@: :@::#
| . .@ :@: :@@::@:: :::: @: : @@: :@::#
| . :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::#
| . : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::#
| . ,.: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# :
| .: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# :
| ::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# :
| :::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# :
| . ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# :.
| .: ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# ::
| ::: ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# ::
| : ::: ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# ::
| .:: ::: ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# ::
| . :::: ::: ::::: @:: : : :: : :: :@ :@: :@@::@:: :::: @: : @@: :@::# ::
0----------------------------------------------------------------------->Gi
32.83
-- snap --
Whoa. As you can see, your puny little 3.3 megabyte pack is blown to a
full 555 megabyte in RAM.
That is bad.
Okay, so what is the reason?
You have a pretty large file there, "wortliste", weighing in with 13
megabyte. This file is part of at least one of those 50-strong delta
chains.
To reconstruct the blobs, we have to store all intermediate versions in
RAM (since index-pack is called with "--stdin" from receive-pack, which is
called by clone). Now, the file was big from the beginning, so you end up
with ~13*50 megabyte (actually, even 100 megabyte less) while indexing
one single delta chain.
My tests were performed on a puny little laptop (512MB RAM, to be precise,
as I am a strong believer that developers with too powerful machines just
lose touch to reality and write programs that are only useful to
themselves, but useless for everyone else), where this hurt big time.
Now, I do not know the internals of index-pack enough to know if there is
a way to cut the memory usage (by throwing out earlier reconstructed
blobs, for example, and reconstructing them _again_ if need be), so I
Cc:ed Nico and hand the problem off to him.
I expect this to touch the resolve_delta() function of index-pack.c in a
major way, though.
Ciao,
Dscho
P.S.: It seems that "git verify-pack -v" only shows the sizes of the
deltas. Might be interesting to some to show the unpacked _full_ size,
too.
next prev parent reply other threads:[~2008-07-12 17:06 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-10 14:40 git pull is slow Stephan Hennig
2008-07-10 15:13 ` Martin Langhoff
2008-07-10 15:28 ` Petr Baudis
2008-07-10 15:30 ` Johannes Sixt
2008-07-10 15:45 ` Stephan Hennig
2008-07-10 15:50 ` Petr Baudis
2008-07-10 17:44 ` Stephan Hennig
2008-07-11 12:25 ` Stephan Hennig
2008-07-11 13:34 ` Andreas Ericsson
2008-07-11 14:04 ` Johannes Schindelin
2008-07-12 12:32 ` Stephan Hennig
2008-07-12 17:05 ` Johannes Schindelin [this message]
2008-07-13 1:15 ` Shawn O. Pearce
2008-07-13 13:59 ` Johannes Schindelin
2008-07-13 22:11 ` Shawn O. Pearce
2008-07-14 2:07 ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
2008-07-14 2:27 ` Nicolas Pitre
2008-07-14 3:12 ` Shawn O. Pearce
2008-07-14 11:44 ` Johannes Schindelin
2008-07-14 11:54 ` Jakub Narebski
2008-07-14 12:10 ` Johannes Schindelin
2008-07-14 12:16 ` Andreas Ericsson
2008-07-14 12:25 ` Johannes Schindelin
2008-07-14 12:51 ` Andreas Ericsson
2008-07-14 12:58 ` Johannes Schindelin
2008-07-15 2:21 ` Nicolas Pitre
2008-07-15 2:47 ` Shawn O. Pearce
2008-07-15 3:06 ` Nicolas Pitre
2008-07-17 16:06 ` Stephan Hennig
2008-07-17 16:25 ` Nicolas Pitre
2008-07-17 21:35 ` Shawn O. Pearce
2008-07-17 22:02 ` [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small Shawn O. Pearce
2008-07-17 23:45 ` Nicolas Pitre
2008-07-15 4:19 ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
2008-07-14 2:07 ` [PATCH 1/4] index-pack: Refactor base arguments of resolve_delta into a struct Shawn O. Pearce
2008-07-15 2:40 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 2/4] index-pack: Chain the struct base_data on the stack for traversal Shawn O. Pearce
2008-07-15 2:48 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 3/4] index-pack: Track the object_entry that creates each base_data Shawn O. Pearce
2008-07-14 10:15 ` Johannes Schindelin
2008-07-15 2:50 ` Nicolas Pitre
2008-07-15 3:20 ` Shawn O. Pearce
2008-07-15 3:42 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 4/4] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas Shawn O. Pearce
2008-07-15 3:05 ` Nicolas Pitre
2008-07-15 3:18 ` Shawn O. Pearce
2008-07-15 4:45 ` [PATCH v2] " Shawn O. Pearce
2008-07-15 5:05 ` Nicolas Pitre
2008-07-15 18:48 ` Junio C Hamano
2008-07-13 9:01 ` git pull is slow Stephan Hennig
2008-07-11 12:55 ` Stephan Hennig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.1.00.0807121546590.8950@racer \
--to=johannes.schindelin@gmx.de \
--cc=ae@op5.se \
--cc=git@vger.kernel.org \
--cc=mailing_list@arcor.de \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox