From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nicolas Pitre <nico@cam.org>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
Geert Bosch <bosch@adacore.com>, Andi Kleen <andi@firstfloor.org>,
Ken Pratt <ken@kenpratt.net>,
git@vger.kernel.org
Subject: Re: pack operation is thrashing my server
Date: Thu, 14 Aug 2008 16:14:26 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.1.10.0808141544150.3324@nehalem.linux-foundation.org> (raw)
In-Reply-To: <alpine.LFD.1.10.0808141633080.4352@xanadu.home>
On Thu, 14 Aug 2008, Nicolas Pitre wrote:
>
> Possible. However, the fact that both the "Compressing objects" and the
> "Writing objects" phases during a repack (without -f) together are
> _faster_ than the "Counting objects" phase is a sign that something is
> more significant than cache misses here, especially when tree
> information is a small portion of the total pack data size.
Hmm. I think I may have clue.
The size of the delta cache seems to be a sensitive parameter for this
thing. Not so much for the git archive, but working on the kernel tree,
raising it to 1024 seems to give a 20% performance improvement. That, in
turn, implies that we may be unpacking things over and over again because
of bad locality wrt delta generation.
I'm not sure how easy something like that is to fix, though. We generate
the object list in "recency" order for a reason, but that also happens to
be the worst possible order for re-using the delta cache - by the time we
get back to the next version of some tree entry, we'll have cycled through
all the other trees, and blown all the caches, so we'll end up likely
re-doing the whole delta chain.
So it's quite possible that what ends up happening is that some directory
with a deep delta chain will basically end up unpacking the whole chain -
which obviously includes inflating each delta - over and over again.
That's what the delta cache was supposed to avoid..
Looking at some call graphs, for the kernel I get:
- process_tree() called 10 million times
- causing parse_tree() called 479,466 times (whew, so 19 out of 20 trees
have already been seen and can be discarded)
- which in turn calls read_sha1_file() (total: 588,110 times, but there's
a hundred thousand+ commits)
but that actually causes
- 588,110 cals to cache_or_unpack_entry
out of which 5,850 calls hit in the cache, and 582,260 do *not*.
IOW, the delta cache effectively never triggers because the working set is
_way_ bigger than the cache, and the patterns aren't good. So since most
trees are deltas, and the max delta depth is 10, the average depth is
soemthing like 5, and we actually get an ugly
- 1,637,999 calls to unpack_compressed_entry
which all results in a zlib inflate call.
So we actually have three times as many calls to inflate as we even have
objects parsed, due to the delta chains on the trees (the commits almost
never delta-chain at all, much less any deeper than a couple of entries).
So yeah, trees are the problem here, and yes, avoiding inflating them
would help - but mainly because we do it something like four times per
object on average!
Ouch. But we really can't just make the cache bigger, and the bad access
patterns really are on purpose here. The delta cache was not meant for
this, it was really meant for the "dig deeper into the history of a single
file" kind of situation that gets very different patterns indeed.
I'll see if I can think of anything simple to avoid all this unnecessary
work. But it doesn't look too good.
Linus
next prev parent reply other threads:[~2008-08-14 23:16 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-10 19:47 pack operation is thrashing my server Ken Pratt
2008-08-10 23:06 ` Martin Langhoff
2008-08-10 23:12 ` Ken Pratt
2008-08-10 23:30 ` Martin Langhoff
2008-08-10 23:34 ` Ken Pratt
2008-08-11 3:04 ` Shawn O. Pearce
2008-08-11 7:43 ` Ken Pratt
2008-08-11 15:01 ` Shawn O. Pearce
2008-08-11 15:40 ` Avery Pennarun
2008-08-11 15:59 ` Shawn O. Pearce
2008-08-11 19:13 ` Ken Pratt
2008-08-11 19:10 ` Andi Kleen
2008-08-11 19:15 ` Ken Pratt
2008-08-13 2:38 ` Nicolas Pitre
2008-08-13 2:50 ` Andi Kleen
2008-08-13 2:57 ` Shawn O. Pearce
2008-08-11 19:22 ` Shawn O. Pearce
2008-08-11 19:29 ` Ken Pratt
2008-08-11 19:34 ` Shawn O. Pearce
2008-08-11 20:10 ` Andi Kleen
2008-08-13 3:12 ` Geert Bosch
2008-08-13 3:15 ` Shawn O. Pearce
2008-08-13 3:58 ` Geert Bosch
2008-08-13 14:37 ` Nicolas Pitre
2008-08-13 14:56 ` Jakub Narebski
2008-08-13 15:04 ` Shawn O. Pearce
2008-08-13 15:26 ` David Tweed
2008-08-13 23:54 ` Martin Langhoff
2008-08-14 9:04 ` David Tweed
2008-08-13 16:10 ` Johan Herland
2008-08-13 17:38 ` Ken Pratt
2008-08-13 17:57 ` Nicolas Pitre
2008-08-13 14:35 ` Nicolas Pitre
2008-08-13 14:59 ` Shawn O. Pearce
2008-08-13 15:43 ` Nicolas Pitre
2008-08-13 15:50 ` Shawn O. Pearce
2008-08-13 17:04 ` Nicolas Pitre
2008-08-13 17:19 ` Shawn O. Pearce
2008-08-14 6:33 ` Andreas Ericsson
2008-08-14 10:04 ` Thomas Rast
2008-08-14 10:15 ` Andreas Ericsson
2008-08-14 22:33 ` Shawn O. Pearce
2008-08-15 1:46 ` Nicolas Pitre
2008-08-14 14:01 ` Nicolas Pitre
2008-08-14 17:21 ` Linus Torvalds
2008-08-14 17:58 ` Linus Torvalds
2008-08-14 19:04 ` Nicolas Pitre
2008-08-14 19:44 ` Linus Torvalds
2008-08-14 21:30 ` Andi Kleen
2008-08-15 16:15 ` Linus Torvalds
2008-08-14 21:50 ` Nicolas Pitre
2008-08-14 23:14 ` Linus Torvalds [this message]
2008-08-14 23:39 ` Björn Steinbrink
2008-08-15 0:06 ` Linus Torvalds
2008-08-15 0:25 ` Linus Torvalds
2008-08-16 12:47 ` Björn Steinbrink
2008-08-16 0:34 ` Linus Torvalds
2008-09-07 1:03 ` Junio C Hamano
2008-09-07 1:46 ` Linus Torvalds
2008-09-07 2:33 ` Junio C Hamano
2008-09-07 17:11 ` Nicolas Pitre
2008-09-07 17:41 ` Junio C Hamano
2008-09-07 2:50 ` Jon Smirl
2008-09-07 3:07 ` Linus Torvalds
2008-09-07 3:43 ` Jon Smirl
2008-09-07 4:50 ` Linus Torvalds
2008-09-07 13:58 ` Jon Smirl
2008-09-07 17:08 ` Nicolas Pitre
2008-09-07 20:33 ` Jon Smirl
2008-09-08 14:17 ` Nicolas Pitre
2008-09-08 15:12 ` Jon Smirl
2008-09-08 16:01 ` Jon Smirl
2008-09-07 8:18 ` Andreas Ericsson
2008-09-07 7:45 ` Mike Hommey
2008-08-14 18:38 ` Nicolas Pitre
2008-08-14 18:55 ` Linus Torvalds
2008-08-13 16:01 ` Geert Bosch
2008-08-13 17:13 ` Dana How
2008-08-13 17:26 ` Nicolas Pitre
2008-08-13 12:43 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.1.10.0808141544150.3324@nehalem.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=bosch@adacore.com \
--cc=git@vger.kernel.org \
--cc=ken@kenpratt.net \
--cc=nico@cam.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).