git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] Skip SHA-1 collision test on "index-pack --verify"
@ 2012-02-24 12:23 Nguyễn Thái Ngọc Duy
  2012-02-24 12:23 ` [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 11+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-02-24 12:23 UTC (permalink / raw)
  To: git; +Cc: Ian Kumlien, Nguyễn Thái Ngọc Duy

index-pack --verify (or verify-pack) is about verifying the pack
itself. SHA-1 collision test is about outside (probably malicious)
objects with the same SHA-1 entering current repo.

SHA-1 collision test is currently done unconditionally. Which means if
you verify an in-repo pack, all objects from the pack will be checked
against objects in repo, which are themselves.

Skip this test for --verify, unless --strict is also specified.

linux-2.6 $ ls -sh .git/objects/pack/pack-e7732c98a8d54840add294c3c562840f78764196.pack
401M .git/objects/pack/pack-e7732c98a8d54840add294c3c562840f78764196.pack

Without the patch (and with another patch to cut out second pass in
index-pack):

linux-2.6 $ time ~/w/git/old index-pack -v --verify .git/objects/pack/pack-e7732c98a8d54840add294c3c562840f78764196.pack
Indexing objects: 100% (1944656/1944656), done.
fatal: pack has 1617280 unresolved deltas

real    1m1.223s
user    0m55.028s
sys     0m0.828s

With the patch:

linux-2.6 $ time ~/w/git/git index-pack -v --verify .git/objects/pack/pack-e7732c98a8d54840add294c3c562840f78764196.pack
Indexing objects: 100% (1944656/1944656), done.
fatal: pack has 1617280 unresolved deltas

real    0m41.714s
user    0m40.994s
sys     0m0.550s

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/index-pack.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index dd1c5c9..cee83b9 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -62,6 +62,7 @@ static int nr_resolved_deltas;
 
 static int from_stdin;
 static int strict;
+static int verify;
 static int verbose;
 
 static struct progress *progress;
@@ -461,7 +462,7 @@ static void sha1_object(const void *data, unsigned long size,
 			enum object_type type, unsigned char *sha1)
 {
 	hash_sha1_file(data, size, typename(type), sha1);
-	if (has_sha1_file(sha1)) {
+	if ((strict || !verify) && has_sha1_file(sha1)) {
 		void *has_data;
 		enum object_type has_type;
 		unsigned long has_size;
@@ -1078,7 +1079,7 @@ static void show_pack_info(int stat_only)
 
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
-	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, stat = 0;
+	int i, fix_thin_pack = 0, stat_only = 0, stat = 0;
 	const char *curr_pack, *curr_index;
 	const char *index_name = NULL, *pack_name = NULL;
 	const char *keep_name = NULL, *keep_msg = NULL;
-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs
  2012-02-24 12:23 [PATCH 1/2] Skip SHA-1 collision test on "index-pack --verify" Nguyễn Thái Ngọc Duy
@ 2012-02-24 12:23 ` Nguyễn Thái Ngọc Duy
  2012-02-24 14:30   ` Ian Kumlien
                     ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-02-24 12:23 UTC (permalink / raw)
  To: git; +Cc: Ian Kumlien, Nguyễn Thái Ngọc Duy

This command unpacks every non-delta objects in order to:

1. calculate sha-1
2. do byte-to-byte sha-1 collision test if we happen to have objects
   with the same sha-1
3. validate object content in strict mode

All this requires the entire object to stay in memory, a bad news for
giant blobs. This patch lowers memory consumption by not saving the
object in memory whenever possible, calculating SHA-1 while unpacking
the object.

This patch assumes that the collision test is rarely needed. The
collision test will be done later in second pass if necessary, which
puts the entire object back to memory again (We could even do the
collision test without putting the entire object back in memory, by
comparing as we unpack it).

In strict mode, it always keeps non-blob objects in memory for
validation (blobs do not need data validation). "--strict --verify"
also keeps blobs in memory.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Does anybody do "git index-pack --stdin < .git/objects/pack/something"?

 builtin/index-pack.c |   74 +++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index cee83b9..0c1f915 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -277,30 +277,60 @@ static void unlink_base_data(struct base_data *c)
 	free_base_data(c);
 }
 
-static void *unpack_entry_data(unsigned long offset, unsigned long size)
+static void *unpack_entry_data(unsigned long offset, unsigned long size,
+			       enum object_type type, unsigned char *sha1)
 {
+	static char fixed_buf[8192];
 	int status;
 	git_zstream stream;
-	void *buf = xmalloc(size);
+	void *buf;
+	git_SHA_CTX c;
+
+	if (sha1) {		/* do hash_sha1_file internally */
+		char hdr[32];
+		int hdrlen = sprintf(hdr, "%s %lu", typename(type), size)+1;
+		git_SHA1_Init(&c);
+		git_SHA1_Update(&c, hdr, hdrlen);
+
+		buf = fixed_buf;
+	} else {
+		buf = xmalloc(size);
+	}
 
 	memset(&stream, 0, sizeof(stream));
 	git_inflate_init(&stream);
 	stream.next_out = buf;
-	stream.avail_out = size;
+	stream.avail_out = buf == fixed_buf ? sizeof(fixed_buf) : size;
 
 	do {
 		stream.next_in = fill(1);
 		stream.avail_in = input_len;
 		status = git_inflate(&stream, 0);
 		use(input_len - stream.avail_in);
+		if (sha1) {
+			git_SHA1_Update(&c, buf, stream.next_out - (unsigned char *)buf);
+			stream.next_out = buf;
+			stream.avail_out = sizeof(fixed_buf);
+		}
 	} while (status == Z_OK);
 	if (stream.total_out != size || status != Z_STREAM_END)
 		bad_object(offset, "inflate returned %d", status);
 	git_inflate_end(&stream);
+	if (sha1) {
+		git_SHA1_Final(sha1, &c);
+		buf = NULL;
+	}
 	return buf;
 }
 
-static void *unpack_raw_entry(struct object_entry *obj, union delta_base *delta_base)
+static int is_delta_type(enum object_type type)
+{
+	return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
+}
+
+static void *unpack_raw_entry(struct object_entry *obj,
+			      union delta_base *delta_base,
+			      unsigned char *sha1)
 {
 	unsigned char *p;
 	unsigned long size, c;
@@ -360,7 +390,17 @@ static void *unpack_raw_entry(struct object_entry *obj, union delta_base *delta_
 	}
 	obj->hdr_size = consumed_bytes - obj->idx.offset;
 
-	data = unpack_entry_data(obj->idx.offset, obj->size);
+	/*
+	 * --verify --strict: sha1_object() does all collision test
+	 *          --strict: sha1_object() does all except blobs,
+	 *                    blobs tested in second pass
+	 * --verify         : no collision test
+	 *                  : all in second pass
+	 */
+	if (is_delta_type(obj->type) ||
+	    (strict && (verify || obj->type != OBJ_BLOB)))
+		sha1 = NULL;	/* save unpacked object */
+	data = unpack_entry_data(obj->idx.offset, obj->size, obj->type, sha1);
 	obj->idx.crc32 = input_crc32;
 	return data;
 }
@@ -461,8 +501,9 @@ static void find_delta_children(const union delta_base *base,
 static void sha1_object(const void *data, unsigned long size,
 			enum object_type type, unsigned char *sha1)
 {
-	hash_sha1_file(data, size, typename(type), sha1);
-	if ((strict || !verify) && has_sha1_file(sha1)) {
+	if (data)
+		hash_sha1_file(data, size, typename(type), sha1);
+	if (data && has_sha1_file(sha1)) {
 		void *has_data;
 		enum object_type has_type;
 		unsigned long has_size;
@@ -511,11 +552,6 @@ static void sha1_object(const void *data, unsigned long size,
 	}
 }
 
-static int is_delta_type(enum object_type type)
-{
-	return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
-}
-
 /*
  * This function is part of find_unresolved_deltas(). There are two
  * walkers going in the opposite ways.
@@ -702,7 +738,7 @@ static void parse_pack_objects(unsigned char *sha1)
 				nr_objects);
 	for (i = 0; i < nr_objects; i++) {
 		struct object_entry *obj = &objects[i];
-		void *data = unpack_raw_entry(obj, &delta->base);
+		void *data = unpack_raw_entry(obj, &delta->base, obj->idx.sha1);
 		obj->real_type = obj->type;
 		if (is_delta_type(obj->type)) {
 			nr_deltas++;
@@ -744,6 +780,9 @@ static void parse_pack_objects(unsigned char *sha1)
 	 * - if used as a base, uncompress the object and apply all deltas,
 	 *   recursively checking if the resulting object is used as a base
 	 *   for some more deltas.
+	 * - if the same object exists in repository and we're not in strict
+	 *   mode, we skipped the sha-1 collision test in the first pass.
+	 *   Do it now.
 	 */
 	if (verbose)
 		progress = start_progress("Resolving deltas", nr_deltas);
@@ -753,6 +792,15 @@ static void parse_pack_objects(unsigned char *sha1)
 
 		if (is_delta_type(obj->type))
 			continue;
+
+		if (((!strict && !verify) ||
+		     (strict && !verify && obj->type == OBJ_BLOB)) &&
+		    has_sha1_file(obj->idx.sha1)) {
+			void *data = get_data_from_pack(obj);
+			sha1_object(data, obj->size, obj->type, obj->idx.sha1);
+			free(data);
+		}
+
 		base_obj->obj = obj;
 		base_obj->data = NULL;
 		find_unresolved_deltas(base_obj);
-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs
  2012-02-24 12:23 ` [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs Nguyễn Thái Ngọc Duy
@ 2012-02-24 14:30   ` Ian Kumlien
  2012-02-24 14:40   ` Ian Kumlien
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Ian Kumlien @ 2012-02-24 14:30 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Fri, Feb 24, 2012 at 07:23:21PM +0700, Nguyễn Thái Ngọc Duy wrote:
> This command unpacks every non-delta objects in order to:
> 
> 1. calculate sha-1
> 2. do byte-to-byte sha-1 collision test if we happen to have objects
>    with the same sha-1
> 3. validate object content in strict mode
> 
> All this requires the entire object to stay in memory, a bad news for
> giant blobs. This patch lowers memory consumption by not saving the
> object in memory whenever possible, calculating SHA-1 while unpacking
> the object.
> 
> This patch assumes that the collision test is rarely needed. The
> collision test will be done later in second pass if necessary, which
> puts the entire object back to memory again (We could even do the
> collision test without putting the entire object back in memory, by
> comparing as we unpack it).
> 
> In strict mode, it always keeps non-blob objects in memory for
> validation (blobs do not need data validation). "--strict --verify"
> also keeps blobs in memory.

I applied both patches to git master, with some manual tinkering so i
might have missed some change that caused this to break.

But i get a segmentation fault and i just thought that i'd send you a
small trace before i even start trying to look in to this:
0xb7eb5b43 in SHA1_Update () from /lib/i686/cmov/libcrypto.so.0.9.8
(gdb) bt
#0  0xb7eb5b43 in SHA1_Update () from /lib/i686/cmov/libcrypto.so.0.9.8
#1  0x08116a2d in write_sha1_file_prepare
#2  0x08116a83 in hash_sha1_file
#3  0x0807c2a6 in sha1_object 
#4  0x0807d74a in parse_pack_objects
#5  0x0807de6f in cmd_index_pack 
#6  0x0804be97 in run_builtin 
#7  handle_internal_command 
#8  0x0804c0ad in run_argv 
#9  main

Sorry about the censorship but i don't know how sensetive this data
is...


sha1_file.c:2343
---
static void write_sha1_file_prepare(const void *buf, unsigned long len,
                                    const char *type, unsigned char *sha1,
                                    char *hdr, int *hdrlen)
{
        git_SHA_CTX c;

        /* Generate the header */
        *hdrlen = sprintf(hdr, "%s %lu", type, len)+1;

        /* Sha1.. */
        git_SHA1_Init(&c);
        git_SHA1_Update(&c, hdr, *hdrlen);
        git_SHA1_Update(&c, buf, len); <== this line fails.
        git_HA1_Final(sha1, &c);
}
---

Just keep sending patches, i have atleast one git to test it on. ;)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs
  2012-02-24 12:23 ` [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs Nguyễn Thái Ngọc Duy
  2012-02-24 14:30   ` Ian Kumlien
@ 2012-02-24 14:40   ` Ian Kumlien
  2012-02-24 15:37   ` Ian Kumlien
  2012-02-24 16:16   ` Ian Kumlien
  3 siblings, 0 replies; 11+ messages in thread
From: Ian Kumlien @ 2012-02-24 14:40 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Fri, Feb 24, 2012 at 07:23:21PM +0700, Nguyễn Thái Ngọc Duy wrote:
> This command unpacks every non-delta objects in order to:
> 
> 1. calculate sha-1
> 2. do byte-to-byte sha-1 collision test if we happen to have objects
>    with the same sha-1
> 3. validate object content in strict mode
> 
> All this requires the entire object to stay in memory, a bad news for
> giant blobs. This patch lowers memory consumption by not saving the
> object in memory whenever possible, calculating SHA-1 while unpacking
> the object.
> 
> This patch assumes that the collision test is rarely needed. The
> collision test will be done later in second pass if necessary, which
> puts the entire object back to memory again (We could even do the
> collision test without putting the entire object back in memory, by
> comparing as we unpack it).
> 
> In strict mode, it always keeps non-blob objects in memory for
> validation (blobs do not need data validation). "--strict --verify"
> also keeps blobs in memory.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Actually, nevermind my last report - i had missed a merge :(

And now that i merged that part it seems like it doesn't do much..
(No real output for 2+ minutes)

I think i should reapply the patches again and verify that everything is
correct before reporting any additional progress.

But, this might not be before monday, unfortunately... But *thanks* for
posting the patches!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs
  2012-02-24 12:23 ` [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs Nguyễn Thái Ngọc Duy
  2012-02-24 14:30   ` Ian Kumlien
  2012-02-24 14:40   ` Ian Kumlien
@ 2012-02-24 15:37   ` Ian Kumlien
  2012-02-24 16:16   ` Ian Kumlien
  3 siblings, 0 replies; 11+ messages in thread
From: Ian Kumlien @ 2012-02-24 15:37 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Fri, Feb 24, 2012 at 07:23:21PM +0700, Nguyễn Thái Ngọc Duy wrote:
> This command unpacks every non-delta objects in order to:
> 
> 1. calculate sha-1
> 2. do byte-to-byte sha-1 collision test if we happen to have objects
>    with the same sha-1
> 3. validate object content in strict mode
> 
> All this requires the entire object to stay in memory, a bad news for
> giant blobs. This patch lowers memory consumption by not saving the
> object in memory whenever possible, calculating SHA-1 while unpacking
> the object.
> 
> This patch assumes that the collision test is rarely needed. The
> collision test will be done later in second pass if necessary, which
> puts the entire object back to memory again (We could even do the
> collision test without putting the entire object back in memory, by
> comparing as we unpack it).
> 
> In strict mode, it always keeps non-blob objects in memory for
> validation (blobs do not need data validation). "--strict --verify"
> also keeps blobs in memory.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Finally, reapplied the patches and so on:
remote: Counting objects: 1425, done.
remote: Compressing objects: 100% (617/617), done.
remote: Total 1425 (delta 790), reused 1425 (delta 790)
Receiving objects: 100% (1425/1425), 56.06 MiB | 3.97 MiB/s, done.
Resolving deltas: 100% (790/790), done.

real	1m57.742s
user	0m29.950s
sys	0m6.308s

*YAY*

I wonder how the hell i could have missed several parts of the patch =(

But there seems to be some issue in gerrit 2.1.8, will have to check
against a newer gerrit to verify if it's still a problem.

FYI - it seems to hang doing nothing.

As for your patches:
Tested-by: Ian Kumlien <pomac@vapor.com>

;)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs
  2012-02-24 12:23 ` [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs Nguyễn Thái Ngọc Duy
                     ` (2 preceding siblings ...)
  2012-02-24 15:37   ` Ian Kumlien
@ 2012-02-24 16:16   ` Ian Kumlien
  2012-02-25  1:49     ` Nguyen Thai Ngoc Duy
  3 siblings, 1 reply; 11+ messages in thread
From: Ian Kumlien @ 2012-02-24 16:16 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Fri, Feb 24, 2012 at 07:23:21PM +0700, Nguyễn Thái Ngọc Duy wrote:
> This command unpacks every non-delta objects in order to:
> 
> 1. calculate sha-1
> 2. do byte-to-byte sha-1 collision test if we happen to have objects
>    with the same sha-1
> 3. validate object content in strict mode
> 
> All this requires the entire object to stay in memory, a bad news for
> giant blobs. This patch lowers memory consumption by not saving the
> object in memory whenever possible, calculating SHA-1 while unpacking
> the object.
> 
> This patch assumes that the collision test is rarely needed. The
> collision test will be done later in second pass if necessary, which
> puts the entire object back to memory again (We could even do the
> collision test without putting the entire object back in memory, by
> comparing as we unpack it).
> 
> In strict mode, it always keeps non-blob objects in memory for
> validation (blobs do not need data validation). "--strict --verify"
> also keeps blobs in memory.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Writing objects: 100% (1425/1425), 56.06 MiB | 4.62 MiB/s, done.
Total 1425 (delta 790), reused 1425 (delta 790)
fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
To ../test_data/
 ! [remote rejected] master -> master (missing necessary objects)
 ! [remote rejected] origin/HEAD -> origin/HEAD (missing necessary objects)
 ! [remote rejected] origin/master -> origin/master (missing necessary objects)
error: failed to push some refs to '../test_data/'

So there are additional code paths to look at... =( 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs
  2012-02-24 16:16   ` Ian Kumlien
@ 2012-02-25  1:49     ` Nguyen Thai Ngoc Duy
  2012-02-25 13:17       ` Ian Kumlien
  2012-02-25 22:45       ` Ian Kumlien
  0 siblings, 2 replies; 11+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-02-25  1:49 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: git

2012/2/24 Ian Kumlien <pomac@vapor.com>:
> Writing objects: 100% (1425/1425), 56.06 MiB | 4.62 MiB/s, done.
> Total 1425 (delta 790), reused 1425 (delta 790)
> fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
> fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
> fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
> fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
> To ../test_data/
>  ! [remote rejected] master -> master (missing necessary objects)
>  ! [remote rejected] origin/HEAD -> origin/HEAD (missing necessary objects)
>  ! [remote rejected] origin/master -> origin/master (missing necessary objects)
> error: failed to push some refs to '../test_data/'
>
> So there are additional code paths to look at... =(

I can't say where that came from. Does this help? (Space damaged, may
need manual application)

-- 8< --
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 264e3ae..6dc46eb 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -183,7 +183,8 @@ static void show_object(struct object *obj,
        struct rev_list_info *info = cb_data;

        finish_object(obj, path, component, cb_data);
-       if (info->revs->verify_objects && !obj->parsed && obj->type !=
OBJ_COMMIT)
+       if (info->revs->verify_objects && !obj->parsed &&
+           obj->type != OBJ_COMMIT && obj->type != OBJ_BLOB)
                parse_object(obj->sha1);
        show_object_with_name(stdout, obj, path, component);
 }
-- 8< --

If not, you might need to apply this to generate coredump, then look
and see where that failed malloc comes from

-- 8< --
diff --git a/wrapper.c b/wrapper.c
index 85f09df..03f423e 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -40,9 +40,11 @@ void *xmalloc(size_t size)
                ret = malloc(size);
                if (!ret && !size)
                        ret = malloc(1);
-               if (!ret)
+               if (!ret) {
+                       *(char*)0 = 1;
                        die("Out of memory, malloc failed (tried to
allocate %lu bytes)",
                            (unsigned long)size);
+               }
        }
 #ifdef XMALLOC_POISON
        memset(ret, 0xA5, size);
-- 8< --

-- 
Duy

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs
  2012-02-25  1:49     ` Nguyen Thai Ngoc Duy
@ 2012-02-25 13:17       ` Ian Kumlien
  2012-02-25 22:45       ` Ian Kumlien
  1 sibling, 0 replies; 11+ messages in thread
From: Ian Kumlien @ 2012-02-25 13:17 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

On Sat, Feb 25, 2012 at 08:49:55AM +0700, Nguyen Thai Ngoc Duy wrote:
> 2012/2/24 Ian Kumlien <pomac@vapor.com>:
> > Writing objects: 100% (1425/1425), 56.06 MiB | 4.62 MiB/s, done.
> > Total 1425 (delta 790), reused 1425 (delta 790)
> > fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
> > fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
> > fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
> > fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
> > To ../test_data/
> >  ! [remote rejected] master -> master (missing necessary objects)
> >  ! [remote rejected] origin/HEAD -> origin/HEAD (missing necessary objects)
> >  ! [remote rejected] origin/master -> origin/master (missing necessary objects)
> > error: failed to push some refs to '../test_data/'
> >
> > So there are additional code paths to look at... =(
> 
> I can't say where that came from. Does this help? (Space damaged, may
> need manual application)

Everything has so far, since i'm using mainline to get the gzip fixes in
;)

Anyway, with:
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 264e3ae..533081d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -183,7 +183,8 @@ static void show_object(struct object *obj,
        struct rev_list_info *info = cb_data;

        finish_object(obj, path, component, cb_data);
-       if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
+       if (info->revs->verify_objects && !obj->parsed
+                       && obj->type != OBJ_COMMIT && obj->type != OBJ_BLOB)
                parse_object(obj->sha1);
        show_object_with_name(stdout, obj, path, component);
 }
---

I get:
../git/git push --mirror ../test_data/
Counting objects: 1425, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (617/617), done.
Writing objects: 100% (1425/1425), 56.06 MiB | 4.22 MiB/s, done.
Total 1425 (delta 790), reused 1425 (delta 790)
error: index-pack died of signal 11
error: unpack failed: index-pack abnormal exit
To ../test_data/
 ! [remote rejected] master -> master (n/a (unpacker error))
 ! [remote rejected] origin/HEAD -> origin/HEAD (n/a (unpacker error))
 ! [remote rejected] origin/master -> origin/master (n/a (unpacker error))
error: failed to push some refs to '../test_data/'

Which, to me, means that the installed git is now the problem - it can't verify 
the pack and say that it's all ok ;)

I'll have to look some more at this on monday, or during the weekend if i get too curious =)

For now, thank $deity that $company i work for allows VPN from Linux machines! It looks
really good, i wonder if there is further tests i should do - any clues?


> -- 
> Duy

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs
  2012-02-25  1:49     ` Nguyen Thai Ngoc Duy
  2012-02-25 13:17       ` Ian Kumlien
@ 2012-02-25 22:45       ` Ian Kumlien
  2012-02-26  4:10         ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 11+ messages in thread
From: Ian Kumlien @ 2012-02-25 22:45 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

On Sat, Feb 25, 2012 at 08:49:55AM +0700, Nguyen Thai Ngoc Duy wrote:
> 2012/2/24 Ian Kumlien <pomac@vapor.com>:
> > Writing objects: 100% (1425/1425), 56.06 MiB | 4.62 MiB/s, done.
> > Total 1425 (delta 790), reused 1425 (delta 790)
> > fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
> > fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
> > fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
> > fatal: Out of memory, malloc failed (tried to allocate 3310214315 bytes)
> > To ../test_data/
> >  ! [remote rejected] master -> master (missing necessary objects)
> >  ! [remote rejected] origin/HEAD -> origin/HEAD (missing necessary objects)
> >  ! [remote rejected] origin/master -> origin/master (missing necessary objects)
> > error: failed to push some refs to '../test_data/'
> >
> > So there are additional code paths to look at... =(
> 
> I can't say where that came from. Does this help? (Space damaged, may
> need manual application)

> If not, you might need to apply this to generate coredump, then look
> and see where that failed malloc comes from

Actually, i added a backtrace and used addr2line to confirm my
suspicion... which is:
builtin/index-pack.c:414

ie get_data_from_pack... 

It looks to me like, if we are to support this kind of things, we need a
slightly different approach - instead of passing the data around, it
feels like passing a function pointer around would be beneficial.

Looking at the code i see alot of places where this would be a issue,
just the fact that get_data_from_pack is used in several functions that
might do some small operation and then just free it.

I understand and recognize that my "problem" is not what git was
designed for; it was designed for small files, which is very evident in
how it approaches the data... And I'd most definetly have to look alot
closer to this code... =)

> -- 
> Duy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs
  2012-02-25 22:45       ` Ian Kumlien
@ 2012-02-26  4:10         ` Nguyen Thai Ngoc Duy
  2012-02-26 13:28           ` Ian Kumlien
  0 siblings, 1 reply; 11+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-02-26  4:10 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: git

On Sun, Feb 26, 2012 at 5:45 AM, Ian Kumlien <pomac@vapor.com> wrote:
> Actually, i added a backtrace and used addr2line to confirm my
> suspicion... which is:
> builtin/index-pack.c:414
>
> ie get_data_from_pack...

That function should only be called when objects are deltified, which
should _not_ happen for large blobs. What is its caller?

>
> It looks to me like, if we are to support this kind of things, we need a
> slightly different approach - instead of passing the data around, it
> feels like passing a function pointer around would be beneficial.
>
> Looking at the code i see alot of places where this would be a issue,
> just the fact that get_data_from_pack is used in several functions that
> might do some small operation and then just free it.
>
> I understand and recognize that my "problem" is not what git was
> designed for; it was designed for small files, which is very evident in
> how it approaches the data... And I'd most definetly have to look alot
> closer to this code... =)
>
>> --
>> Duy



-- 
Duy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs
  2012-02-26  4:10         ` Nguyen Thai Ngoc Duy
@ 2012-02-26 13:28           ` Ian Kumlien
  0 siblings, 0 replies; 11+ messages in thread
From: Ian Kumlien @ 2012-02-26 13:28 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

On Sun, Feb 26, 2012 at 11:10:14AM +0700, Nguyen Thai Ngoc Duy wrote:
> On Sun, Feb 26, 2012 at 5:45 AM, Ian Kumlien <pomac@vapor.com> wrote:
> > Actually, i added a backtrace and used addr2line to confirm my
> > suspicion... which is:
> > builtin/index-pack.c:414
> >
> > ie get_data_from_pack...
> 
> That function should only be called when objects are deltified, which
> should _not_ happen for large blobs. What is its caller?

Full backtrace:

for x in 0x536031 0x451b0e 0x452212 0x4523f5 0x452711 0x452799 0x452bbb
0x454344 0x4170d1 0x41726c ; do addr2line $x -e ../git/git ; done
git/wrapper.c:41
git/builtin/index-pack.c:414
git/builtin/index-pack.c:588
git/builtin/index-pack.c:625
git/builtin/index-pack.c:679
git/builtin/index-pack.c:694
git/builtin/index-pack.c:805
git/builtin/index-pack.c:1246
git/git.c:308
git/git.c:467

Which means:
xmalloc
get_data_from_pack
get_base_data -- line just after: if (!delta_nr) {
resolve_delta
find_unresolved_deltas_1
find_unresolved_deltas
parse_pack_objects
cmd_index_pack
[skipping the git.c part]

Btw, i'm running these tests on a 64 bit laptop - since i'm not at work
;) (had to manually limit xmalloc but it triggers at the same point)

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-02-26 13:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-24 12:23 [PATCH 1/2] Skip SHA-1 collision test on "index-pack --verify" Nguyễn Thái Ngọc Duy
2012-02-24 12:23 ` [PATCH 2/2] index-pack: reduce memory usage when the pack has large blobs Nguyễn Thái Ngọc Duy
2012-02-24 14:30   ` Ian Kumlien
2012-02-24 14:40   ` Ian Kumlien
2012-02-24 15:37   ` Ian Kumlien
2012-02-24 16:16   ` Ian Kumlien
2012-02-25  1:49     ` Nguyen Thai Ngoc Duy
2012-02-25 13:17       ` Ian Kumlien
2012-02-25 22:45       ` Ian Kumlien
2012-02-26  4:10         ` Nguyen Thai Ngoc Duy
2012-02-26 13:28           ` Ian Kumlien

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).