git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sergey Vlasov <vsu@altlinux.ru>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: GPF in index-pack
Date: Sun, 6 Aug 2006 17:46:48 +0400	[thread overview]
Message-ID: <20060806174648.c8cc3b44.vsu@altlinux.ru> (raw)
In-Reply-To: <20060806040848.GF20565@spearce.org>

[-- Attachment #1: Type: text/plain, Size: 4855 bytes --]

On Sun, 6 Aug 2006 00:08:48 -0400 Shawn Pearce wrote:

> Jon Smirl <jonsmirl@gmail.com> wrote:
> > On 8/5/06, Shawn Pearce <spearce@spearce.org> wrote:
> > >Jon Smirl <jonsmirl@gmail.com> wrote:
[...]
> > >> Why does resolve_delta in index-pack.c need to be recursive? Is there
> > >> a better way to code that routine? If it mmaps the file that uses 1GB
> > >> address space, why does it need another 1.5GB to build an index?
> > >
> > >Probably the easiest way to code the routine.  Delta depth is
> > >bounded; in the fast-import.c that I sent out last night I hardcoded
> > >it to 10, which is (I believe) the default for GIT.  So long as that
> > >routine is recursive only along a single delta chain the recursion
> > >depth won't be very high and shouldn't be the problem.
> > 
> > When I put index-pack in gdb at the seg fault, resolve_delta had
> > recursed more than 20,000 times. I stopped looking after that.
> 
> Ouch.  I'm not familiar with this code, but looking at right now its
> also not entirely obviously what its recursing for.  Plus dinner
> is trying to be burned on the grill, so my attention is on that
> more than on GIT.  :-)

git-pack-objects never creates a pack file with duplicate objects,
therefore git-index-pack was never tested on such pack files - no wonder
that it breaks.

The case of patch revert (A -> B -> A again) is probably the problem -
your dumb pack generator will probably write this:

	A
	B (delta based on A)
	A (delta based on B)

git-index-pack will first unpack the first copy of A, then notice that A
is used as a delta base for B and apply the delta, then it will find the
second copy of A and apply that delta, and then it will find B again...

Please try the patch at the end of this message - it should help to
avoid the infinite recursion in git-index-pack.  However, I'm not sure
that other git parts won't do something bad when they encounter an index
with duplicate sha1 entries (and git-index-pack cannot remove these
duplicates, because the number of index entries must match the pack
header).

> > >> I had a prior 400MB pack file built with fast-import that I was able
> > >> to index ok.
> > >
> > >Dumb luck?  Maybe that had no duplicates while this one does?
> > 
> > Is there a git command to list the sha1's in a pack that doesn't have
> > an index? I could sort it, sort it unqiue, and then diff the outputs.

git-index-pack :)

(git-unpack-objects will also have all sha1 values at the end, but the
side effect of unpacking all objects to separate files is probably not
what you want to get.)

> Not that I know of.  Packs themselves don't have the SHA1 values and
> getting them from a pack without an index is a painful exercise as
> you don't know where the base of an object resides within the pack
> when you need it to generate the object's raw content to determine
> its ID.

Yes, this is a problem.  git-unpack-objects can unpack all objects in a
single pass, but only because it temporarily saves all deltas which
cannot be resolved yet, and can read back objects which it has written
before.  git-index-pack needs to work without modifying the object
database, so it works in two passes:

	/*
	 * First pass:
	 * - find locations of all objects;
	 * - calculate SHA1 of all non-delta objects;
	 * - remember base SHA1 for all deltas.
	 */

	/*
	 * Second pass:
	 * - for all non-delta objects, look if it is used as a base for
	 *   deltas;
	 * - if used as a base, uncompress the object and apply all deltas,
	 *   recursively checking if the resulting object is used as a base
	 *   for some more deltas.
	 */

-----------------------------------------------------------------------

From: Sergey Vlasov <vsu@altlinux.ru>
Date: Sun, 6 Aug 2006 17:28:29 +0400
Subject: [PATCH] git-index-pack: Avoid infinite recursion if the pack has duplicate objects

Although git-pack-objects never creates packs which contain the same
object more than once, other tools may be not so careful, so add a check
to prevents infinite recursion of resolve_delta() for such packs.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
---
 index-pack.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/index-pack.c b/index-pack.c
index b39953d..a8e3b1f 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -257,6 +257,13 @@ static void resolve_delta(struct delta_e
 	unsigned long next_obj_offset;
 	int j, first, last;
 
+	/*
+	 * Do nothing if this delta was resolved earlier (this can happen if
+	 * the pack contains duplicate objects for some reason).
+	 */
+	if (obj->real_type != OBJ_DELTA)
+		return;
+
 	obj->real_type = type;
 	delta_data = unpack_raw_entry(obj->offset, &delta_type,
 				      &delta_size, base_sha1,
-- 
1.4.2.rc3.g23aa

[-- Attachment #2: Type: application/pgp-signature, Size: 190 bytes --]

      parent reply	other threads:[~2006-08-06 13:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <9e4733910608051805j1192d910hf55393f1dbe1e472@mail.gmail.com>
2006-08-06  2:44 ` GPF in index-pack Jon Smirl
2006-08-06  2:48   ` Shawn Pearce
2006-08-06  2:58     ` Jon Smirl
2006-08-06  3:32       ` Shawn Pearce
2006-08-06  4:00         ` Jon Smirl
2006-08-06  4:08           ` Shawn Pearce
2006-08-06  4:16             ` Jon Smirl
2006-08-06  4:22               ` Jon Smirl
2006-08-06 13:46             ` Sergey Vlasov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060806174648.c8cc3b44.vsu@altlinux.ru \
    --to=vsu@altlinux.ru \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).