From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <junkio@cox.net>, Nicolas Pitre <nico@cam.org>,
Git Mailing List <git@vger.kernel.org>
Subject: git-index-pack really does suck..
Date: Tue, 3 Apr 2007 08:15:12 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0704030754020.6730@woody.linux-foundation.org> (raw)
Junio, Nico,
I think we need to do something about it.
CLee was complaining about git-index-pack on #irc with the partial KDE
repo, and while I don't have the KDE repo, I decided to investigate a bit.
Even with just the kernel repo (with a single 170MB pack-file), I can do
git index-pack --stdin --fix-thin new.pack < .git/objects/pack/pack-*.pack
and it uses 52s of CPU-time, and on my 4GB machine it actually started
doing IO and swapping, because git-index-pack grew to 4.8GB in size. So
while I initially thought I'd want a bigger test-case to see the problem,
I sure as heck don't.
The 52s of CPU time exploded into almost three minutes of actual
real-time:
47.33user 5.79system 2:41.65elapsed 32%CPU
2117major+1245763minor
And that's on a good system with a powerful CPU, "enough memory" for any
reasonable development, and good disks! Very much ungood-plus-plus.
I haven't looked into exactly why yet, but I bet it's just that we keep
every single object expanded in memory. We do need to keep the objects
around, so that we can resolve delta's, but we can certainly do it other
ways.
Two suggestion for other ways:
- simple one: don't keep unexploded objects around, just keep the deltas,
and spend tons of CPU-time just re-expanding them if required.
We *should* be able to do it with just keeping the original 170MB
pack-file in memory, not expanding it to 3.8GB!
Still, even this will be painful once you have a big pack-file, and the
CPU waste is nasty (although a delta-base cache like we do in
sha1_file.c would probably fix it 99% - at that point it's getting
less simple, and the "best" solution below looks more palatable)
- best one: when writing out the pack-file, we incrementally keep a
"struct packed_git" around, and update the index for it dynamically,
and totally get rid of all objects that we've written out, because we
can re-create them.
This means that we should have _zero_ memory footprint except for the
one object that we're working on right then and there, and any
unresolved deltas where we've not seen the base at all (and the latter
generally shouldn't happen any more with most pack-files)
The "best one" wouldn't seem to be *that* painful, but as mentioned, I
haven't even started looking at the code yet, I thought I'd try to rope
Nico into looking at this first ;)
Linus
next reply other threads:[~2007-04-03 15:15 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-03 15:15 Linus Torvalds [this message]
[not found] ` <Pi ne.LNX.4.64.0704031413200.6730@woody.linux-foundation.org>
[not found] ` <alpine.LFD.0.98. 0704031836350.28181@xanadu.home>
[not found] ` <db 69205d0704031227q1009eabfhdd82aa3636f25bb6@mail.gmail.com>
[not found] ` <Pine.LNX.4.64.07 04031304420.6730@woody.linux-foundation.org>
[not found] ` <Pine.LNX.4.64.0704031322490.67 30@woody.linux-foundation.org>
2007-04-03 16:21 ` git-index-pack really does suck Linus Torvalds
2007-04-03 16:40 ` Nicolas Pitre
2007-04-03 16:33 ` Nicolas Pitre
2007-04-03 19:27 ` Chris Lee
2007-04-03 19:49 ` Nicolas Pitre
2007-04-03 19:54 ` Chris Lee
2007-04-03 20:18 ` Linus Torvalds
2007-04-03 20:32 ` Nicolas Pitre
2007-04-03 20:40 ` Junio C Hamano
2007-04-03 21:00 ` Linus Torvalds
2007-04-03 21:28 ` Nicolas Pitre
2007-04-03 22:49 ` Chris Lee
2007-04-03 23:12 ` Linus Torvalds
2007-04-03 20:56 ` Linus Torvalds
2007-04-03 21:03 ` Shawn O. Pearce
2007-04-03 21:13 ` Linus Torvalds
2007-04-03 21:17 ` Shawn O. Pearce
2007-04-03 21:26 ` Linus Torvalds
2007-04-03 21:28 ` Linus Torvalds
2007-04-03 22:31 ` Junio C Hamano
2007-04-03 22:38 ` Shawn O. Pearce
2007-04-03 22:41 ` Junio C Hamano
2007-04-05 10:22 ` [PATCH 1/2] git-fetch--tool pick-rref Junio C Hamano
2007-04-05 10:22 ` [PATCH 2/2] git-fetch: use fetch--tool pick-rref to avoid local fetch from alternate Junio C Hamano
2007-04-05 16:15 ` Shawn O. Pearce
2007-04-05 21:37 ` Junio C Hamano
2007-04-03 21:34 ` git-index-pack really does suck Nicolas Pitre
2007-04-03 21:37 ` Shawn O. Pearce
2007-04-03 21:44 ` Junio C Hamano
2007-04-03 21:53 ` Shawn O. Pearce
2007-04-03 22:10 ` Jeff King
2007-04-03 22:40 ` Dana How
2007-04-03 22:52 ` Linus Torvalds
2007-04-03 22:31 ` David Lang
2007-04-03 23:00 ` Nicolas Pitre
2007-04-03 21:21 ` Nicolas Pitre
2007-04-03 20:33 ` Linus Torvalds
2007-04-03 21:05 ` Nicolas Pitre
2007-04-03 21:11 ` Shawn O. Pearce
2007-04-03 21:24 ` Linus Torvalds
[not found] ` <alpine.LF D.0.98.0704031735470.28181@xanadu.home>
2007-04-03 21:42 ` Nicolas Pitre
2007-04-03 22:07 ` Junio C Hamano
2007-04-03 22:11 ` Shawn O. Pearce
2007-04-03 22:34 ` Nicolas Pitre
2007-04-03 22:14 ` Linus Torvalds
2007-04-03 22:55 ` Nicolas Pitre
2007-04-03 22:36 ` David Lang
2007-04-04 9:51 ` Alex Riesen
[not found] ` <P ine.LNX.4.63.0704061455380.24050@qynat.qvtvafvgr.pbz>
2007-04-06 21:56 ` David Lang
2007-04-06 22:47 ` Junio C Hamano
2007-04-06 22:49 ` Junio C Hamano
2007-04-06 22:22 ` David Lang
2007-04-06 22:55 ` Junio C Hamano
2007-04-06 22:28 ` David Lang
2007-04-03 23:29 ` Linus Torvalds
2007-04-03 20:34 ` Junio C Hamano
2007-04-03 20:53 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0704030754020.6730@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).