git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Junio C Hamano <junkio@cox.net>, Git Mailing List <git@vger.kernel.org>
Subject: Re: git-index-pack really does suck..
Date: Tue, 03 Apr 2007 12:33:46 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.0.98.0704031220470.28181@xanadu.home> (raw)
In-Reply-To: <Pine.LNX.4.64.0704030754020.6730@woody.linux-foundation.org>

On Tue, 3 Apr 2007, Linus Torvalds wrote:

> 
> Junio, Nico,
>  I think we need to do something about it.

Sure.  Mea culpa.

> CLee was complaining about git-index-pack on #irc with the partial KDE 
> repo, and while I don't have the KDE repo, I decided to investigate a bit.
> 
> Even with just the kernel repo (with a single 170MB pack-file), I can do
> 
> 	git index-pack --stdin --fix-thin new.pack < .git/objects/pack/pack-*.pack
> 
> and it uses 52s of CPU-time, and on my 4GB machine it actually started 
> doing IO and swapping, because git-index-pack grew to 4.8GB in size.

Right.

> Two suggestion for other ways:
> 
>  - simple one: don't keep unexploded objects around, just keep the deltas, 
>    and spend tons of CPU-time just re-expanding them if required.
> 
>    We *should* be able to do it with just keeping the original 170MB 
>    pack-file in memory, not expanding it to 3.8GB! 
> 
>    Still, even this will be painful once you have a big pack-file, and the 
>    CPU waste is nasty (although a delta-base cache like we do in 
>    sha1_file.c would probably fix it 99% - at that point it's getting 
>    less simple, and the "best" solution below looks more palatable)
> 
>  - best one: when writing out the pack-file, we incrementally keep a 
>    "struct packed_git" around, and update the index for it dynamically, 
>    and totally get rid of all objects that we've written out, because we 
>    can re-create them.
> 
>    This means that we should have _zero_ memory footprint except for the 
>    one object that we're working on right then and there, and any 
>    unresolved deltas where we've not seen the base at all (and the latter 
>    generally shouldn't happen any more with most pack-files)

Even better:

  - Fix my own stupid mistake with a _single_ line of code:

diff --git a/index-pack.c b/index-pack.c
index 6284fe3..3c768fb 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -358,6 +358,7 @@ static void sha1_object(const void *data, unsigned long size,
 		if (size != has_size || type != has_type ||
 		    memcmp(data, has_data, size) != 0)
 			die("SHA1 COLLISION FOUND WITH %s !", sha1_to_hex(sha1));
+		free(has_data);
 	}
 }

The thing is, that code path is executed _only_ when index-pack is 
encountering an object already in the repository in order to protect 
against possible SHA1 collision attacks.  See commit 8685da42561d log 
for the full story.

Normally this should not happen in normal usage scenarios because the 
objects you fetch are those that you don't already have.  But if you 
manually run index-pack inside an existing repository then you'll 
already have _all_ those objects already explaining the high CPU usage.

But this is no excuse for not freeing the data though.


Nicolas

  parent reply	other threads:[~2007-04-03 16:33 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-03 15:15 git-index-pack really does suck Linus Torvalds
     [not found] ` <Pi ne.LNX.4.64.0704031413200.6730@woody.linux-foundation.org>
     [not found]   ` <alpine.LFD.0.98. 0704031836350.28181@xanadu.home>
     [not found] ` <db 69205d0704031227q1009eabfhdd82aa3636f25bb6@mail.gmail.com>
     [not found]   ` <Pine.LNX.4.64.07 04031304420.6730@woody.linux-foundation.org>
     [not found]     ` <Pine.LNX.4.64.0704031322490.67 30@woody.linux-foundation.org>
2007-04-03 16:21 ` Linus Torvalds
2007-04-03 16:40   ` Nicolas Pitre
2007-04-03 16:33 ` Nicolas Pitre [this message]
2007-04-03 19:27 ` Chris Lee
2007-04-03 19:49   ` Nicolas Pitre
2007-04-03 19:54     ` Chris Lee
2007-04-03 20:18   ` Linus Torvalds
2007-04-03 20:32     ` Nicolas Pitre
2007-04-03 20:40       ` Junio C Hamano
2007-04-03 21:00         ` Linus Torvalds
2007-04-03 21:28           ` Nicolas Pitre
2007-04-03 22:49           ` Chris Lee
2007-04-03 23:12             ` Linus Torvalds
2007-04-03 20:56       ` Linus Torvalds
2007-04-03 21:03         ` Shawn O. Pearce
2007-04-03 21:13           ` Linus Torvalds
2007-04-03 21:17             ` Shawn O. Pearce
2007-04-03 21:26               ` Linus Torvalds
2007-04-03 21:28                 ` Linus Torvalds
2007-04-03 22:31                   ` Junio C Hamano
2007-04-03 22:38                     ` Shawn O. Pearce
2007-04-03 22:41                       ` Junio C Hamano
2007-04-05 10:22                   ` [PATCH 1/2] git-fetch--tool pick-rref Junio C Hamano
2007-04-05 10:22                   ` [PATCH 2/2] git-fetch: use fetch--tool pick-rref to avoid local fetch from alternate Junio C Hamano
2007-04-05 16:15                     ` Shawn O. Pearce
2007-04-05 21:37                       ` Junio C Hamano
2007-04-03 21:34               ` git-index-pack really does suck Nicolas Pitre
2007-04-03 21:37                 ` Shawn O. Pearce
2007-04-03 21:44                   ` Junio C Hamano
2007-04-03 21:53                     ` Shawn O. Pearce
2007-04-03 22:10                       ` Jeff King
2007-04-03 22:40                 ` Dana How
2007-04-03 22:52                   ` Linus Torvalds
2007-04-03 22:31                     ` David Lang
2007-04-03 23:00                   ` Nicolas Pitre
2007-04-03 21:21         ` Nicolas Pitre
2007-04-03 20:33     ` Linus Torvalds
2007-04-03 21:05       ` Nicolas Pitre
2007-04-03 21:11         ` Shawn O. Pearce
2007-04-03 21:24         ` Linus Torvalds
     [not found]           ` <alpine.LF D.0.98.0704031735470.28181@xanadu.home>
2007-04-03 21:42           ` Nicolas Pitre
2007-04-03 22:07             ` Junio C Hamano
2007-04-03 22:11               ` Shawn O. Pearce
2007-04-03 22:34               ` Nicolas Pitre
2007-04-03 22:14             ` Linus Torvalds
2007-04-03 22:55               ` Nicolas Pitre
2007-04-03 22:36                 ` David Lang
2007-04-04  9:51                   ` Alex Riesen
     [not found]                     ` <P ine.LNX.4.63.0704061455380.24050@qynat.qvtvafvgr.pbz>
2007-04-06 21:56                     ` David Lang
2007-04-06 22:47                       ` Junio C Hamano
2007-04-06 22:49                         ` Junio C Hamano
2007-04-06 22:22                           ` David Lang
2007-04-06 22:55                             ` Junio C Hamano
2007-04-06 22:28                               ` David Lang
2007-04-03 23:29                 ` Linus Torvalds
2007-04-03 20:34     ` Junio C Hamano
2007-04-03 20:53       ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.0.98.0704031220470.28181@xanadu.home \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).