git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Chris Lee <clee@kde.org>, Junio C Hamano <junkio@cox.net>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: git-index-pack really does suck..
Date: Tue, 3 Apr 2007 13:18:33 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0704031304420.6730@woody.linux-foundation.org> (raw)
In-Reply-To: <db69205d0704031227q1009eabfhdd82aa3636f25bb6@mail.gmail.com>



On Tue, 3 Apr 2007, Chris Lee wrote:
> 
> There's another issue here.
> 
> I'm running git-index-pack as part of a workflow like so:
> 
> $ git-verify-pack -v .git/objects/pack/*.idx > /tmp/all-objects
> $ grep 'blob' /tmp/all-objects > /tmp/blob-objects
> $ cat /tmp/blob-objects | awk '{print $1;}' | git-pack-objects
> --delta-base-offset --all-progress --stdout > blob.pack
> $ git-index-pack -v blob.pack
> 
> Now, when I run 'git-index-pack' on blob.pack in the current
> directory, memory usage is pretty horrific (even with the applied
> patch to not leak all everything). Shawn tells me that index-pack
> should only be decompressing the object twice - once from the repo and
> once from blob.pack - iff I call git-index-pack with --stdin, which I
> am not.
> 
> If I move the blob.pack into /tmp, and run git-index-pack on it there,
> it completes much faster and the memory usage never exceeds 200MB.
> (Inside the repo, it takes up over 3G of RES according to top.)

Yeah. What happens is that inside the repo, because we do all the 
duplicate object checks (verifying that there are no evil hash collisions) 
even after fixing the memory leak, we end up keeping *track* of all those 
objects.

And with a large repository, it's quite the expensive operation.

That whole "verify no SHA1 hash collision" code is really pretty damn 
paranoid. Maybe we shouldn't have it enabled by default.

So how about this updated patch? We could certainly make "git pull" imply 
"--paranoid" if we want to, but even that is likely pretty unnecessary. 
It's not like anybody has ever shown a SHA1 collision, and if the *local* 
repository is corrupt (and has an object with the wrong SHA1 - that's what 
the testsuite checks for), then it's probably good to get the valid object 
from the remote..

This includes the previous one-liner, but also adds the "--paranoid" flag 
and fixes up the Documentation and tests to match.

Junio, your choice, but regardless which one you choose:

	Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Thanks,

		Linus

---
 Documentation/git-index-pack.txt |    3 +--
 index-pack.c                     |    6 +++++-
 t/t5300-pack-object.sh           |    4 ++--
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 2229ee8..7d8d33b 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -8,8 +8,7 @@ git-index-pack - Build pack index file for an existing packed archive
 
 SYNOPSIS
 --------
-'git-index-pack' [-v] [-o <index-file>] <pack-file>
-'git-index-pack' --stdin [--fix-thin] [--keep] [-v] [-o <index-file>] [<pack-file>]
+'git-index-pack' [--stdin [--fix-thin] [--keep]] [-v] [--paranoid] [-o <index-file>] <pack-file>
 
 
 DESCRIPTION
diff --git a/index-pack.c b/index-pack.c
index 6284fe3..8a4c27a 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -45,6 +45,7 @@ static int nr_resolved_deltas;
 
 static int from_stdin;
 static int verbose;
+static int paranoid;
 
 static volatile sig_atomic_t progress_update;
 
@@ -348,7 +349,7 @@ static void sha1_object(const void *data, unsigned long size,
 			enum object_type type, unsigned char *sha1)
 {
 	hash_sha1_file(data, size, typename(type), sha1);
-	if (has_sha1_file(sha1)) {
+	if (paranoid && has_sha1_file(sha1)) {
 		void *has_data;
 		enum object_type has_type;
 		unsigned long has_size;
@@ -358,6 +359,7 @@ static void sha1_object(const void *data, unsigned long size,
 		if (size != has_size || type != has_type ||
 		    memcmp(data, has_data, size) != 0)
 			die("SHA1 COLLISION FOUND WITH %s !", sha1_to_hex(sha1));
+		free(has_data);
 	}
 }
 
@@ -839,6 +841,8 @@ int main(int argc, char **argv)
 		if (*arg == '-') {
 			if (!strcmp(arg, "--stdin")) {
 				from_stdin = 1;
+			} else if (!strcmp(arg, "--paranoid")) {
+				paranoid = 1;
 			} else if (!strcmp(arg, "--fix-thin")) {
 				fix_thin_pack = 1;
 			} else if (!strcmp(arg, "--keep")) {
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 35e036a..407c71e 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -262,7 +262,7 @@ test_expect_success \
 		.git/objects/c8/2de19312b6c3695c0c18f70709a6c535682a67'
 
 test_expect_failure \
-    'make sure index-pack detects the SHA1 collision' \
-    'git-index-pack -o bad.idx test-3.pack'
+    'make sure index-pack detects the SHA1 collision when paranoid' \
+    'git-index-pack --paranoid -o bad.idx test-3.pack'
 
 test_done

  parent reply	other threads:[~2007-04-03 20:20 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-03 15:15 git-index-pack really does suck Linus Torvalds
     [not found] ` <db 69205d0704031227q1009eabfhdd82aa3636f25bb6@mail.gmail.com>
     [not found]   ` <Pine.LNX.4.64.07 04031304420.6730@woody.linux-foundation.org>
     [not found]     ` <Pine.LNX.4.64.0704031322490.67 30@woody.linux-foundation.org>
     [not found] ` <Pi ne.LNX.4.64.0704031413200.6730@woody.linux-foundation.org>
     [not found]   ` <alpine.LFD.0.98. 0704031836350.28181@xanadu.home>
2007-04-03 16:21 ` Linus Torvalds
2007-04-03 16:40   ` Nicolas Pitre
2007-04-03 16:33 ` Nicolas Pitre
2007-04-03 19:27 ` Chris Lee
2007-04-03 19:49   ` Nicolas Pitre
2007-04-03 19:54     ` Chris Lee
2007-04-03 20:18   ` Linus Torvalds [this message]
2007-04-03 20:32     ` Nicolas Pitre
2007-04-03 20:40       ` Junio C Hamano
2007-04-03 21:00         ` Linus Torvalds
2007-04-03 21:28           ` Nicolas Pitre
2007-04-03 22:49           ` Chris Lee
2007-04-03 23:12             ` Linus Torvalds
2007-04-03 20:56       ` Linus Torvalds
2007-04-03 21:03         ` Shawn O. Pearce
2007-04-03 21:13           ` Linus Torvalds
2007-04-03 21:17             ` Shawn O. Pearce
2007-04-03 21:26               ` Linus Torvalds
2007-04-03 21:28                 ` Linus Torvalds
2007-04-03 22:31                   ` Junio C Hamano
2007-04-03 22:38                     ` Shawn O. Pearce
2007-04-03 22:41                       ` Junio C Hamano
2007-04-05 10:22                   ` [PATCH 1/2] git-fetch--tool pick-rref Junio C Hamano
2007-04-05 10:22                   ` [PATCH 2/2] git-fetch: use fetch--tool pick-rref to avoid local fetch from alternate Junio C Hamano
2007-04-05 16:15                     ` Shawn O. Pearce
2007-04-05 21:37                       ` Junio C Hamano
2007-04-03 21:34               ` git-index-pack really does suck Nicolas Pitre
2007-04-03 21:37                 ` Shawn O. Pearce
2007-04-03 21:44                   ` Junio C Hamano
2007-04-03 21:53                     ` Shawn O. Pearce
2007-04-03 22:10                       ` Jeff King
2007-04-03 22:40                 ` Dana How
2007-04-03 22:52                   ` Linus Torvalds
2007-04-03 22:31                     ` David Lang
2007-04-03 23:00                   ` Nicolas Pitre
2007-04-03 21:21         ` Nicolas Pitre
2007-04-03 20:33     ` Linus Torvalds
2007-04-03 21:05       ` Nicolas Pitre
2007-04-03 21:11         ` Shawn O. Pearce
2007-04-03 21:24         ` Linus Torvalds
     [not found]           ` <alpine.LF D.0.98.0704031735470.28181@xanadu.home>
2007-04-03 21:42           ` Nicolas Pitre
2007-04-03 22:07             ` Junio C Hamano
2007-04-03 22:11               ` Shawn O. Pearce
2007-04-03 22:34               ` Nicolas Pitre
2007-04-03 22:14             ` Linus Torvalds
2007-04-03 22:55               ` Nicolas Pitre
2007-04-03 22:36                 ` David Lang
2007-04-04  9:51                   ` Alex Riesen
     [not found]                     ` <P ine.LNX.4.63.0704061455380.24050@qynat.qvtvafvgr.pbz>
2007-04-06 21:56                     ` David Lang
2007-04-06 22:47                       ` Junio C Hamano
2007-04-06 22:49                         ` Junio C Hamano
2007-04-06 22:22                           ` David Lang
2007-04-06 22:55                             ` Junio C Hamano
2007-04-06 22:28                               ` David Lang
2007-04-03 23:29                 ` Linus Torvalds
2007-04-03 20:34     ` Junio C Hamano
2007-04-03 20:53       ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0704031304420.6730@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=clee@kde.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).