* Re: GPF in index-pack
[not found] <9e4733910608051805j1192d910hf55393f1dbe1e472@mail.gmail.com>
@ 2006-08-06 2:44 ` Jon Smirl
2006-08-06 2:48 ` Shawn Pearce
0 siblings, 1 reply; 9+ messages in thread
From: Jon Smirl @ 2006-08-06 2:44 UTC (permalink / raw)
To: git
Process size is 2.6GB when the seg fault happen. That's a lot of
memory to build a pack index over 1M objects.
I'm running a 3:1 process address space split. I wonder why it didn't
grow all the way to 3GB. I still have RAM and swap available.
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GPF in index-pack
2006-08-06 2:44 ` GPF in index-pack Jon Smirl
@ 2006-08-06 2:48 ` Shawn Pearce
2006-08-06 2:58 ` Jon Smirl
0 siblings, 1 reply; 9+ messages in thread
From: Shawn Pearce @ 2006-08-06 2:48 UTC (permalink / raw)
To: Jon Smirl; +Cc: git
Jon Smirl <jonsmirl@gmail.com> wrote:
> Process size is 2.6GB when the seg fault happen. That's a lot of
> memory to build a pack index over 1M objects.
>
> I'm running a 3:1 process address space split. I wonder why it didn't
> grow all the way to 3GB. I still have RAM and swap available.
Was the pack you are trying to index built with that fast-import.c
I sent last night? Its possible its doing something weird that
pack-index can't handle, such as insert a duplicate object into
the same pack...
How big is the pack file? I'd expect pack-index to be using
something around 24 MB of memory (24 bytes/entry) but maybe its
hanging onto a lot of data (memory leak?) as it decompresses the
entries to compute the checksums.
--
Shawn.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GPF in index-pack
2006-08-06 2:48 ` Shawn Pearce
@ 2006-08-06 2:58 ` Jon Smirl
2006-08-06 3:32 ` Shawn Pearce
0 siblings, 1 reply; 9+ messages in thread
From: Jon Smirl @ 2006-08-06 2:58 UTC (permalink / raw)
To: Shawn Pearce; +Cc: git
On 8/5/06, Shawn Pearce <spearce@spearce.org> wrote:
> Jon Smirl <jonsmirl@gmail.com> wrote:
> > Process size is 2.6GB when the seg fault happen. That's a lot of
> > memory to build a pack index over 1M objects.
> >
> > I'm running a 3:1 process address space split. I wonder why it didn't
> > grow all the way to 3GB. I still have RAM and swap available.
>
> Was the pack you are trying to index built with that fast-import.c
> I sent last night? Its possible its doing something weird that
> pack-index can't handle, such as insert a duplicate object into
> the same pack...
built with fast-import.
> How big is the pack file? I'd expect pack-index to be using
> something around 24 MB of memory (24 bytes/entry) but maybe its
> hanging onto a lot of data (memory leak?) as it decompresses the
> entries to compute the checksums.
It is 934MB in size with 985,000 entries.
Why does resolve_delta in index-pack.c need to be recursive? Is there
a better way to code that routine? If it mmaps the file that uses 1GB
address space, why does it need another 1.5GB to build an index?
I had a prior 400MB pack file built with fast-import that I was able
to index ok.
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GPF in index-pack
2006-08-06 2:58 ` Jon Smirl
@ 2006-08-06 3:32 ` Shawn Pearce
2006-08-06 4:00 ` Jon Smirl
0 siblings, 1 reply; 9+ messages in thread
From: Shawn Pearce @ 2006-08-06 3:32 UTC (permalink / raw)
To: Jon Smirl; +Cc: git
Jon Smirl <jonsmirl@gmail.com> wrote:
> On 8/5/06, Shawn Pearce <spearce@spearce.org> wrote:
> >Jon Smirl <jonsmirl@gmail.com> wrote:
> >> Process size is 2.6GB when the seg fault happen. That's a lot of
> >> memory to build a pack index over 1M objects.
> >>
> >> I'm running a 3:1 process address space split. I wonder why it didn't
> >> grow all the way to 3GB. I still have RAM and swap available.
> >
> >Was the pack you are trying to index built with that fast-import.c
> >I sent last night? Its possible its doing something weird that
> >pack-index can't handle, such as insert a duplicate object into
> >the same pack...
>
> built with fast-import.
>
> >How big is the pack file? I'd expect pack-index to be using
> >something around 24 MB of memory (24 bytes/entry) but maybe its
> >hanging onto a lot of data (memory leak?) as it decompresses the
> >entries to compute the checksums.
>
> It is 934MB in size with 985,000 entries.
>
> Why does resolve_delta in index-pack.c need to be recursive? Is there
> a better way to code that routine? If it mmaps the file that uses 1GB
> address space, why does it need another 1.5GB to build an index?
Probably the easiest way to code the routine. Delta depth is
bounded; in the fast-import.c that I sent out last night I hardcoded
it to 10, which is (I believe) the default for GIT. So long as that
routine is recursive only along a single delta chain the recursion
depth won't be very high and shouldn't be the problem.
> I had a prior 400MB pack file built with fast-import that I was able
> to index ok.
Dumb luck? Maybe that had no duplicates while this one does?
--
Shawn.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GPF in index-pack
2006-08-06 3:32 ` Shawn Pearce
@ 2006-08-06 4:00 ` Jon Smirl
2006-08-06 4:08 ` Shawn Pearce
0 siblings, 1 reply; 9+ messages in thread
From: Jon Smirl @ 2006-08-06 4:00 UTC (permalink / raw)
To: Shawn Pearce; +Cc: git
On 8/5/06, Shawn Pearce <spearce@spearce.org> wrote:
> Jon Smirl <jonsmirl@gmail.com> wrote:
> > On 8/5/06, Shawn Pearce <spearce@spearce.org> wrote:
> > >Jon Smirl <jonsmirl@gmail.com> wrote:
> > >> Process size is 2.6GB when the seg fault happen. That's a lot of
> > >> memory to build a pack index over 1M objects.
> > >>
> > >> I'm running a 3:1 process address space split. I wonder why it didn't
> > >> grow all the way to 3GB. I still have RAM and swap available.
> > >
> > >Was the pack you are trying to index built with that fast-import.c
> > >I sent last night? Its possible its doing something weird that
> > >pack-index can't handle, such as insert a duplicate object into
> > >the same pack...
> >
> > built with fast-import.
> >
> > >How big is the pack file? I'd expect pack-index to be using
> > >something around 24 MB of memory (24 bytes/entry) but maybe its
> > >hanging onto a lot of data (memory leak?) as it decompresses the
> > >entries to compute the checksums.
> >
> > It is 934MB in size with 985,000 entries.
> >
> > Why does resolve_delta in index-pack.c need to be recursive? Is there
> > a better way to code that routine? If it mmaps the file that uses 1GB
> > address space, why does it need another 1.5GB to build an index?
>
> Probably the easiest way to code the routine. Delta depth is
> bounded; in the fast-import.c that I sent out last night I hardcoded
> it to 10, which is (I believe) the default for GIT. So long as that
> routine is recursive only along a single delta chain the recursion
> depth won't be very high and shouldn't be the problem.
When I put index-pack in gdb at the seg fault, resolve_delta had
recursed more than 20,000 times. I stopped looking after that.
> > I had a prior 400MB pack file built with fast-import that I was able
> > to index ok.
>
> Dumb luck? Maybe that had no duplicates while this one does?
Is there a git command to list the sha1's in a pack that doesn't have
an index? I could sort it, sort it unqiue, and then diff the outputs.
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GPF in index-pack
2006-08-06 4:00 ` Jon Smirl
@ 2006-08-06 4:08 ` Shawn Pearce
2006-08-06 4:16 ` Jon Smirl
2006-08-06 13:46 ` Sergey Vlasov
0 siblings, 2 replies; 9+ messages in thread
From: Shawn Pearce @ 2006-08-06 4:08 UTC (permalink / raw)
To: Jon Smirl; +Cc: git
Jon Smirl <jonsmirl@gmail.com> wrote:
> On 8/5/06, Shawn Pearce <spearce@spearce.org> wrote:
> >Jon Smirl <jonsmirl@gmail.com> wrote:
> >> On 8/5/06, Shawn Pearce <spearce@spearce.org> wrote:
> >> >Jon Smirl <jonsmirl@gmail.com> wrote:
> >> >> Process size is 2.6GB when the seg fault happen. That's a lot of
> >> >> memory to build a pack index over 1M objects.
> >> >>
> >> >> I'm running a 3:1 process address space split. I wonder why it didn't
> >> >> grow all the way to 3GB. I still have RAM and swap available.
> >> >
> >> >Was the pack you are trying to index built with that fast-import.c
> >> >I sent last night? Its possible its doing something weird that
> >> >pack-index can't handle, such as insert a duplicate object into
> >> >the same pack...
> >>
> >> built with fast-import.
> >>
> >> >How big is the pack file? I'd expect pack-index to be using
> >> >something around 24 MB of memory (24 bytes/entry) but maybe its
> >> >hanging onto a lot of data (memory leak?) as it decompresses the
> >> >entries to compute the checksums.
> >>
> >> It is 934MB in size with 985,000 entries.
> >>
> >> Why does resolve_delta in index-pack.c need to be recursive? Is there
> >> a better way to code that routine? If it mmaps the file that uses 1GB
> >> address space, why does it need another 1.5GB to build an index?
> >
> >Probably the easiest way to code the routine. Delta depth is
> >bounded; in the fast-import.c that I sent out last night I hardcoded
> >it to 10, which is (I believe) the default for GIT. So long as that
> >routine is recursive only along a single delta chain the recursion
> >depth won't be very high and shouldn't be the problem.
>
> When I put index-pack in gdb at the seg fault, resolve_delta had
> recursed more than 20,000 times. I stopped looking after that.
Ouch. I'm not familiar with this code, but looking at right now its
also not entirely obviously what its recursing for. Plus dinner
is trying to be burned on the grill, so my attention is on that
more than on GIT. :-)
> >> I had a prior 400MB pack file built with fast-import that I was able
> >> to index ok.
> >
> >Dumb luck? Maybe that had no duplicates while this one does?
>
> Is there a git command to list the sha1's in a pack that doesn't have
> an index? I could sort it, sort it unqiue, and then diff the outputs.
Not that I know of. Packs themselves don't have the SHA1 values and
getting them from a pack without an index is a painful exercise as
you don't know where the base of an object resides within the pack
when you need it to generate the object's raw content to determine
its ID.
Yea, I'm sure the pack format has some great properties, but storing
the base as a 20 byte object ID rather the offset of the base within
the pack has its limitations...
--
Shawn.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GPF in index-pack
2006-08-06 4:08 ` Shawn Pearce
@ 2006-08-06 4:16 ` Jon Smirl
2006-08-06 4:22 ` Jon Smirl
2006-08-06 13:46 ` Sergey Vlasov
1 sibling, 1 reply; 9+ messages in thread
From: Jon Smirl @ 2006-08-06 4:16 UTC (permalink / raw)
To: Shawn Pearce; +Cc: git
On 8/6/06, Shawn Pearce <spearce@spearce.org> wrote:
> > When I put index-pack in gdb at the seg fault, resolve_delta had
> > recursed more than 20,000 times. I stopped looking after that.
>
> Ouch. I'm not familiar with this code, but looking at right now its
> also not entirely obviously what its recursing for. Plus dinner
> is trying to be burned on the grill, so my attention is on that
> more than on GIT. :-)
It has recursed 131,000 times before running out of memory. Something
must be wrong in the resolve_delta() logic.
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GPF in index-pack
2006-08-06 4:16 ` Jon Smirl
@ 2006-08-06 4:22 ` Jon Smirl
0 siblings, 0 replies; 9+ messages in thread
From: Jon Smirl @ 2006-08-06 4:22 UTC (permalink / raw)
To: Shawn Pearce; +Cc: git
I started another run and added code to record the object_names. I can
use the sort utilities then to figure out how many duplicates there
are. It takes two hours for the run to finish so I won't have results
until tomorrow.
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GPF in index-pack
2006-08-06 4:08 ` Shawn Pearce
2006-08-06 4:16 ` Jon Smirl
@ 2006-08-06 13:46 ` Sergey Vlasov
1 sibling, 0 replies; 9+ messages in thread
From: Sergey Vlasov @ 2006-08-06 13:46 UTC (permalink / raw)
To: Jon Smirl; +Cc: Shawn Pearce, git
[-- Attachment #1: Type: text/plain, Size: 4855 bytes --]
On Sun, 6 Aug 2006 00:08:48 -0400 Shawn Pearce wrote:
> Jon Smirl <jonsmirl@gmail.com> wrote:
> > On 8/5/06, Shawn Pearce <spearce@spearce.org> wrote:
> > >Jon Smirl <jonsmirl@gmail.com> wrote:
[...]
> > >> Why does resolve_delta in index-pack.c need to be recursive? Is there
> > >> a better way to code that routine? If it mmaps the file that uses 1GB
> > >> address space, why does it need another 1.5GB to build an index?
> > >
> > >Probably the easiest way to code the routine. Delta depth is
> > >bounded; in the fast-import.c that I sent out last night I hardcoded
> > >it to 10, which is (I believe) the default for GIT. So long as that
> > >routine is recursive only along a single delta chain the recursion
> > >depth won't be very high and shouldn't be the problem.
> >
> > When I put index-pack in gdb at the seg fault, resolve_delta had
> > recursed more than 20,000 times. I stopped looking after that.
>
> Ouch. I'm not familiar with this code, but looking at right now its
> also not entirely obviously what its recursing for. Plus dinner
> is trying to be burned on the grill, so my attention is on that
> more than on GIT. :-)
git-pack-objects never creates a pack file with duplicate objects,
therefore git-index-pack was never tested on such pack files - no wonder
that it breaks.
The case of patch revert (A -> B -> A again) is probably the problem -
your dumb pack generator will probably write this:
A
B (delta based on A)
A (delta based on B)
git-index-pack will first unpack the first copy of A, then notice that A
is used as a delta base for B and apply the delta, then it will find the
second copy of A and apply that delta, and then it will find B again...
Please try the patch at the end of this message - it should help to
avoid the infinite recursion in git-index-pack. However, I'm not sure
that other git parts won't do something bad when they encounter an index
with duplicate sha1 entries (and git-index-pack cannot remove these
duplicates, because the number of index entries must match the pack
header).
> > >> I had a prior 400MB pack file built with fast-import that I was able
> > >> to index ok.
> > >
> > >Dumb luck? Maybe that had no duplicates while this one does?
> >
> > Is there a git command to list the sha1's in a pack that doesn't have
> > an index? I could sort it, sort it unqiue, and then diff the outputs.
git-index-pack :)
(git-unpack-objects will also have all sha1 values at the end, but the
side effect of unpacking all objects to separate files is probably not
what you want to get.)
> Not that I know of. Packs themselves don't have the SHA1 values and
> getting them from a pack without an index is a painful exercise as
> you don't know where the base of an object resides within the pack
> when you need it to generate the object's raw content to determine
> its ID.
Yes, this is a problem. git-unpack-objects can unpack all objects in a
single pass, but only because it temporarily saves all deltas which
cannot be resolved yet, and can read back objects which it has written
before. git-index-pack needs to work without modifying the object
database, so it works in two passes:
/*
* First pass:
* - find locations of all objects;
* - calculate SHA1 of all non-delta objects;
* - remember base SHA1 for all deltas.
*/
/*
* Second pass:
* - for all non-delta objects, look if it is used as a base for
* deltas;
* - if used as a base, uncompress the object and apply all deltas,
* recursively checking if the resulting object is used as a base
* for some more deltas.
*/
-----------------------------------------------------------------------
From: Sergey Vlasov <vsu@altlinux.ru>
Date: Sun, 6 Aug 2006 17:28:29 +0400
Subject: [PATCH] git-index-pack: Avoid infinite recursion if the pack has duplicate objects
Although git-pack-objects never creates packs which contain the same
object more than once, other tools may be not so careful, so add a check
to prevents infinite recursion of resolve_delta() for such packs.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
---
index-pack.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/index-pack.c b/index-pack.c
index b39953d..a8e3b1f 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -257,6 +257,13 @@ static void resolve_delta(struct delta_e
unsigned long next_obj_offset;
int j, first, last;
+ /*
+ * Do nothing if this delta was resolved earlier (this can happen if
+ * the pack contains duplicate objects for some reason).
+ */
+ if (obj->real_type != OBJ_DELTA)
+ return;
+
obj->real_type = type;
delta_data = unpack_raw_entry(obj->offset, &delta_type,
&delta_size, base_sha1,
--
1.4.2.rc3.g23aa
[-- Attachment #2: Type: application/pgp-signature, Size: 190 bytes --]
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-08-06 13:47 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <9e4733910608051805j1192d910hf55393f1dbe1e472@mail.gmail.com>
2006-08-06 2:44 ` GPF in index-pack Jon Smirl
2006-08-06 2:48 ` Shawn Pearce
2006-08-06 2:58 ` Jon Smirl
2006-08-06 3:32 ` Shawn Pearce
2006-08-06 4:00 ` Jon Smirl
2006-08-06 4:08 ` Shawn Pearce
2006-08-06 4:16 ` Jon Smirl
2006-08-06 4:22 ` Jon Smirl
2006-08-06 13:46 ` Sergey Vlasov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).