git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git-index-pack really does suck..
@ 2007-04-03 15:15 Linus Torvalds
       [not found] ` <Pi ne.LNX.4.64.0704031413200.6730@woody.linux-foundation.org>
                   ` (4 more replies)
  0 siblings, 5 replies; 58+ messages in thread
From: Linus Torvalds @ 2007-04-03 15:15 UTC (permalink / raw)
  To: Junio C Hamano, Nicolas Pitre, Git Mailing List


Junio, Nico,
 I think we need to do something about it.

CLee was complaining about git-index-pack on #irc with the partial KDE 
repo, and while I don't have the KDE repo, I decided to investigate a bit.

Even with just the kernel repo (with a single 170MB pack-file), I can do

	git index-pack --stdin --fix-thin new.pack < .git/objects/pack/pack-*.pack

and it uses 52s of CPU-time, and on my 4GB machine it actually started 
doing IO and swapping, because git-index-pack grew to 4.8GB in size. So 
while I initially thought I'd want a bigger test-case to see the problem, 
I sure as heck don't.

The 52s of CPU time exploded into almost three minutes of actual 
real-time:

	47.33user 5.79system 2:41.65elapsed 32%CPU
	2117major+1245763minor

And that's on a good system with a powerful CPU, "enough memory" for any 
reasonable development, and good disks! Very much ungood-plus-plus.

I haven't looked into exactly why yet, but I bet it's just that we keep 
every single object expanded in memory. We do need to keep the objects 
around, so that we can resolve delta's, but we can certainly do it other 
ways. 

Two suggestion for other ways:

 - simple one: don't keep unexploded objects around, just keep the deltas, 
   and spend tons of CPU-time just re-expanding them if required.

   We *should* be able to do it with just keeping the original 170MB 
   pack-file in memory, not expanding it to 3.8GB! 

   Still, even this will be painful once you have a big pack-file, and the 
   CPU waste is nasty (although a delta-base cache like we do in 
   sha1_file.c would probably fix it 99% - at that point it's getting 
   less simple, and the "best" solution below looks more palatable)

 - best one: when writing out the pack-file, we incrementally keep a 
   "struct packed_git" around, and update the index for it dynamically, 
   and totally get rid of all objects that we've written out, because we 
   can re-create them.

   This means that we should have _zero_ memory footprint except for the 
   one object that we're working on right then and there, and any 
   unresolved deltas where we've not seen the base at all (and the latter 
   generally shouldn't happen any more with most pack-files)

The "best one" wouldn't seem to be *that* painful, but as mentioned, I 
haven't even started looking at the code yet, I thought I'd try to rope 
Nico into looking at this first ;)

		Linus

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2007-04-06 22:59 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-03 15:15 git-index-pack really does suck Linus Torvalds
     [not found] ` <Pi ne.LNX.4.64.0704031413200.6730@woody.linux-foundation.org>
     [not found]   ` <alpine.LFD.0.98. 0704031836350.28181@xanadu.home>
     [not found] ` <db 69205d0704031227q1009eabfhdd82aa3636f25bb6@mail.gmail.com>
     [not found]   ` <Pine.LNX.4.64.07 04031304420.6730@woody.linux-foundation.org>
     [not found]     ` <Pine.LNX.4.64.0704031322490.67 30@woody.linux-foundation.org>
2007-04-03 16:21 ` Linus Torvalds
2007-04-03 16:40   ` Nicolas Pitre
2007-04-03 16:33 ` Nicolas Pitre
2007-04-03 19:27 ` Chris Lee
2007-04-03 19:49   ` Nicolas Pitre
2007-04-03 19:54     ` Chris Lee
2007-04-03 20:18   ` Linus Torvalds
2007-04-03 20:32     ` Nicolas Pitre
2007-04-03 20:40       ` Junio C Hamano
2007-04-03 21:00         ` Linus Torvalds
2007-04-03 21:28           ` Nicolas Pitre
2007-04-03 22:49           ` Chris Lee
2007-04-03 23:12             ` Linus Torvalds
2007-04-03 20:56       ` Linus Torvalds
2007-04-03 21:03         ` Shawn O. Pearce
2007-04-03 21:13           ` Linus Torvalds
2007-04-03 21:17             ` Shawn O. Pearce
2007-04-03 21:26               ` Linus Torvalds
2007-04-03 21:28                 ` Linus Torvalds
2007-04-03 22:31                   ` Junio C Hamano
2007-04-03 22:38                     ` Shawn O. Pearce
2007-04-03 22:41                       ` Junio C Hamano
2007-04-05 10:22                   ` [PATCH 1/2] git-fetch--tool pick-rref Junio C Hamano
2007-04-05 10:22                   ` [PATCH 2/2] git-fetch: use fetch--tool pick-rref to avoid local fetch from alternate Junio C Hamano
2007-04-05 16:15                     ` Shawn O. Pearce
2007-04-05 21:37                       ` Junio C Hamano
2007-04-03 21:34               ` git-index-pack really does suck Nicolas Pitre
2007-04-03 21:37                 ` Shawn O. Pearce
2007-04-03 21:44                   ` Junio C Hamano
2007-04-03 21:53                     ` Shawn O. Pearce
2007-04-03 22:10                       ` Jeff King
2007-04-03 22:40                 ` Dana How
2007-04-03 22:52                   ` Linus Torvalds
2007-04-03 22:31                     ` David Lang
2007-04-03 23:00                   ` Nicolas Pitre
2007-04-03 21:21         ` Nicolas Pitre
2007-04-03 20:33     ` Linus Torvalds
2007-04-03 21:05       ` Nicolas Pitre
2007-04-03 21:11         ` Shawn O. Pearce
2007-04-03 21:24         ` Linus Torvalds
     [not found]           ` <alpine.LF D.0.98.0704031735470.28181@xanadu.home>
2007-04-03 21:42           ` Nicolas Pitre
2007-04-03 22:07             ` Junio C Hamano
2007-04-03 22:11               ` Shawn O. Pearce
2007-04-03 22:34               ` Nicolas Pitre
2007-04-03 22:14             ` Linus Torvalds
2007-04-03 22:55               ` Nicolas Pitre
2007-04-03 22:36                 ` David Lang
2007-04-04  9:51                   ` Alex Riesen
     [not found]                     ` <P ine.LNX.4.63.0704061455380.24050@qynat.qvtvafvgr.pbz>
2007-04-06 21:56                     ` David Lang
2007-04-06 22:47                       ` Junio C Hamano
2007-04-06 22:49                         ` Junio C Hamano
2007-04-06 22:22                           ` David Lang
2007-04-06 22:55                             ` Junio C Hamano
2007-04-06 22:28                               ` David Lang
2007-04-03 23:29                 ` Linus Torvalds
2007-04-03 20:34     ` Junio C Hamano
2007-04-03 20:53       ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).