From: Sergey Vlasov <vsu@altlinux.ru>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: GPF in index-pack
Date: Sun, 6 Aug 2006 17:46:48 +0400 [thread overview]
Message-ID: <20060806174648.c8cc3b44.vsu@altlinux.ru> (raw)
In-Reply-To: <20060806040848.GF20565@spearce.org>
[-- Attachment #1: Type: text/plain, Size: 4855 bytes --]
On Sun, 6 Aug 2006 00:08:48 -0400 Shawn Pearce wrote:
> Jon Smirl <jonsmirl@gmail.com> wrote:
> > On 8/5/06, Shawn Pearce <spearce@spearce.org> wrote:
> > >Jon Smirl <jonsmirl@gmail.com> wrote:
[...]
> > >> Why does resolve_delta in index-pack.c need to be recursive? Is there
> > >> a better way to code that routine? If it mmaps the file that uses 1GB
> > >> address space, why does it need another 1.5GB to build an index?
> > >
> > >Probably the easiest way to code the routine. Delta depth is
> > >bounded; in the fast-import.c that I sent out last night I hardcoded
> > >it to 10, which is (I believe) the default for GIT. So long as that
> > >routine is recursive only along a single delta chain the recursion
> > >depth won't be very high and shouldn't be the problem.
> >
> > When I put index-pack in gdb at the seg fault, resolve_delta had
> > recursed more than 20,000 times. I stopped looking after that.
>
> Ouch. I'm not familiar with this code, but looking at right now its
> also not entirely obviously what its recursing for. Plus dinner
> is trying to be burned on the grill, so my attention is on that
> more than on GIT. :-)
git-pack-objects never creates a pack file with duplicate objects,
therefore git-index-pack was never tested on such pack files - no wonder
that it breaks.
The case of patch revert (A -> B -> A again) is probably the problem -
your dumb pack generator will probably write this:
A
B (delta based on A)
A (delta based on B)
git-index-pack will first unpack the first copy of A, then notice that A
is used as a delta base for B and apply the delta, then it will find the
second copy of A and apply that delta, and then it will find B again...
Please try the patch at the end of this message - it should help to
avoid the infinite recursion in git-index-pack. However, I'm not sure
that other git parts won't do something bad when they encounter an index
with duplicate sha1 entries (and git-index-pack cannot remove these
duplicates, because the number of index entries must match the pack
header).
> > >> I had a prior 400MB pack file built with fast-import that I was able
> > >> to index ok.
> > >
> > >Dumb luck? Maybe that had no duplicates while this one does?
> >
> > Is there a git command to list the sha1's in a pack that doesn't have
> > an index? I could sort it, sort it unqiue, and then diff the outputs.
git-index-pack :)
(git-unpack-objects will also have all sha1 values at the end, but the
side effect of unpacking all objects to separate files is probably not
what you want to get.)
> Not that I know of. Packs themselves don't have the SHA1 values and
> getting them from a pack without an index is a painful exercise as
> you don't know where the base of an object resides within the pack
> when you need it to generate the object's raw content to determine
> its ID.
Yes, this is a problem. git-unpack-objects can unpack all objects in a
single pass, but only because it temporarily saves all deltas which
cannot be resolved yet, and can read back objects which it has written
before. git-index-pack needs to work without modifying the object
database, so it works in two passes:
/*
* First pass:
* - find locations of all objects;
* - calculate SHA1 of all non-delta objects;
* - remember base SHA1 for all deltas.
*/
/*
* Second pass:
* - for all non-delta objects, look if it is used as a base for
* deltas;
* - if used as a base, uncompress the object and apply all deltas,
* recursively checking if the resulting object is used as a base
* for some more deltas.
*/
-----------------------------------------------------------------------
From: Sergey Vlasov <vsu@altlinux.ru>
Date: Sun, 6 Aug 2006 17:28:29 +0400
Subject: [PATCH] git-index-pack: Avoid infinite recursion if the pack has duplicate objects
Although git-pack-objects never creates packs which contain the same
object more than once, other tools may be not so careful, so add a check
to prevents infinite recursion of resolve_delta() for such packs.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
---
index-pack.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/index-pack.c b/index-pack.c
index b39953d..a8e3b1f 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -257,6 +257,13 @@ static void resolve_delta(struct delta_e
unsigned long next_obj_offset;
int j, first, last;
+ /*
+ * Do nothing if this delta was resolved earlier (this can happen if
+ * the pack contains duplicate objects for some reason).
+ */
+ if (obj->real_type != OBJ_DELTA)
+ return;
+
obj->real_type = type;
delta_data = unpack_raw_entry(obj->offset, &delta_type,
&delta_size, base_sha1,
--
1.4.2.rc3.g23aa
[-- Attachment #2: Type: application/pgp-signature, Size: 190 bytes --]
prev parent reply other threads:[~2006-08-06 13:47 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <9e4733910608051805j1192d910hf55393f1dbe1e472@mail.gmail.com>
2006-08-06 2:44 ` GPF in index-pack Jon Smirl
2006-08-06 2:48 ` Shawn Pearce
2006-08-06 2:58 ` Jon Smirl
2006-08-06 3:32 ` Shawn Pearce
2006-08-06 4:00 ` Jon Smirl
2006-08-06 4:08 ` Shawn Pearce
2006-08-06 4:16 ` Jon Smirl
2006-08-06 4:22 ` Jon Smirl
2006-08-06 13:46 ` Sergey Vlasov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060806174648.c8cc3b44.vsu@altlinux.ru \
--to=vsu@altlinux.ru \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).