* Problem with pack
@ 2006-08-25 8:45 Sergio Callegari
2006-08-25 9:12 ` Johannes Schindelin
0 siblings, 1 reply; 17+ messages in thread
From: Sergio Callegari @ 2006-08-25 8:45 UTC (permalink / raw)
To: git
Hi,
I am encountering a problem after a
git repack -a -d
on an archive.
The packet that has been generated appears to be corrupted:
git verify-pack -v pack-ebcdfbbda07e5a3e4136aa1f499990b35685bab4.idx
fatal: failed to read delta-pack base object
2849bd2bd8a76bbca37df2a4c8e8b990811d01a7
and apparently I might have lost all the history of my project.
I have a few questions:
1) I am working on both a pc and a notebook, syncing the two everytime I
move from one to the other.
On the PC I have git 1.4.2, on the notebook git 1.4.0. I am using
"unison" as a syncing tool.
Might the data loss have something to do with...
- the version of git I am using or the mixing of two versions?
- the syncing? I have noticed that after a sync, git is not immediately
in a happy state... for instance if I run
git diff
git lists diff commands for every file, even if those have not changed.
However after a
git status
everything seems fine again.
2) git unpack-objects seems to be able to extract some objects from the
pack, but at a certain point it dies.
- does it die on the first error or does it try to extract everything
that is possible to extract after the error?
- if it's the first, is there a way to trigger the second behaviour to
try to save as much as possible from the pack?
Thanks,
Sergio Callegari
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
2006-08-25 8:45 Sergio Callegari
@ 2006-08-25 9:12 ` Johannes Schindelin
0 siblings, 0 replies; 17+ messages in thread
From: Johannes Schindelin @ 2006-08-25 9:12 UTC (permalink / raw)
To: Sergio Callegari; +Cc: git
Hi,
On Fri, 25 Aug 2006, Sergio Callegari wrote:
> git verify-pack -v pack-ebcdfbbda07e5a3e4136aa1f499990b35685bab4.idx
> fatal: failed to read delta-pack base object 2849bd2bd8a76bbca37df2a4c8e8b990811d01a7
Eeeh! Not good.
> 1) I am working on both a pc and a notebook, syncing the two everytime I move
> from one to the other.
So, you still have one "good" version? Please make a backup immediately.
(If only to reproduce the problem.)
> On the PC I have git 1.4.2, on the notebook git 1.4.0. I am using "unison" as
> a syncing tool.
> Might the data loss have something to do with...
> - the version of git I am using or the mixing of two versions?
We tried very hard to maintain backward compatibility. So, it should be
fine.
> - the syncing? I have noticed that after a sync, git is not immediately in a
> happy state...
That is okay. The "index" (remember, git has a staging area called
"index") stores ctimes and mtimes, and these are not synced.
> 2) git unpack-objects seems to be able to extract some objects from the pack,
> but at a certain point it dies.
> - does it die on the first error or does it try to extract everything that is
> possible to extract after the error?
Since unpack-objects does not use the index, it cannot extract anything
after the first error. We _could_ enhance unpack-objects to be nice and
optionally take a pack-index to try to reconstruct as many objects as
possible.
BTW I'd recommend not syncing with unison, but with the git transports: If
your PC and Laptop are connected, you could do something like
git pull laptop:my_project/.git
(Of course, you have to adjust it for your setup.) This would sync your
repository on the PC with the one on the laptop.
Hth,
Dscho
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
@ 2006-08-25 10:07 Sergio Callegari
2006-08-25 10:20 ` Andreas Ericsson
2006-08-26 10:09 ` Junio C Hamano
0 siblings, 2 replies; 17+ messages in thread
From: Sergio Callegari @ 2006-08-25 10:07 UTC (permalink / raw)
To: git
>
> > git verify-pack -v pack-ebcdfbbda07e5a3e4136aa1f499990b35685bab4.idx
> > fatal: failed to read delta-pack base object 2849bd2bd8a76bbca37df2a4c8e8b990811d01a7
>
> Eeeh! Not good.
>
> > 1) I am working on both a pc and a notebook, syncing the two everytime I move
> > from one to the other.
>
> So, you still have one "good" version? Please make a backup immediately.
> (If only to reproduce the problem.)
>
I have a good working tree, but unfortunately I realized that there was
a problem with the pack only _after_ the sync:
I was not expecting this kind of problem, so I silly did a repack as the
last thing, I went home, I attached the laptop to the net, I run unison,
I started to work and I realized that there was a problem when I
attempted a new repack which failed complaining about the corrupted pack...
So actually, I do not even know where the corruption came from (an hd
error, the sync tool, ...)
I only have the corrupted pack and its index and a good last working tree.
BTW, it would be nice to have some "security measure" in git reset...
e.g. an option to trigger the following behavior:
- saving all current changes in a temporary commit
- checking that the current HEAD can be re-checked out before the reset
> Since unpack-objects does not use the index, it cannot extract anything
> after the first error. We _could_ enhance unpack-objects to be nice and
> optionally take a pack-index to try to reconstruct as many objects as
> possible.
>
That would be very useful...
Btw, even without that, if I understand correctly, git packs are
collections of compressed objects, each of which has its own header
stating how long is the compressed object itself. In my case, the error
is in inflating one object (git unpack-objects says inflate returns
-3)... so shouldn't there be a way to try to skip to the next object
even in this case?
> BTW I'd recommend not syncing with unison, but with the git transports: If
> your PC and Laptop are connected, you could do something like
>
> git pull laptop:my_project/.git
>
Actually, the project, including the git archive gets syncronized as a
part of a syncronization process including all my Documents directory
(the project is in fact a LaTeX manual with somehow complex LaTeX
packages and classes). Syncronizing in this way actually worked very
well so far, because at once I was getting in sync all my working trees
and all my repos...
Sergio
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
2006-08-25 10:07 Sergio Callegari
@ 2006-08-25 10:20 ` Andreas Ericsson
2006-08-25 10:41 ` Jakub Narebski
2006-08-25 10:58 ` Junio C Hamano
2006-08-26 10:09 ` Junio C Hamano
1 sibling, 2 replies; 17+ messages in thread
From: Andreas Ericsson @ 2006-08-25 10:20 UTC (permalink / raw)
To: Sergio Callegari; +Cc: git
Sergio Callegari wrote:
>>
>> > git verify-pack -v pack-ebcdfbbda07e5a3e4136aa1f499990b35685bab4.idx
>> > fatal: failed to read delta-pack base object
>> 2849bd2bd8a76bbca37df2a4c8e8b990811d01a7
>>
>> Eeeh! Not good.
>>
>> > 1) I am working on both a pc and a notebook, syncing the two
>> everytime I move
>> > from one to the other.
>>
>> So, you still have one "good" version? Please make a backup
>> immediately. (If only to reproduce the problem.)
>>
> I have a good working tree, but unfortunately I realized that there was
> a problem with the pack only _after_ the sync:
> I was not expecting this kind of problem, so I silly did a repack as the
> last thing, I went home, I attached the laptop to the net, I run unison,
> I started to work and I realized that there was a problem when I
> attempted a new repack which failed complaining about the corrupted pack...
>
> So actually, I do not even know where the corruption came from (an hd
> error, the sync tool, ...)
>
> I only have the corrupted pack and its index and a good last working tree.
>
> BTW, it would be nice to have some "security measure" in git reset...
> e.g. an option to trigger the following behavior:
>
> - saving all current changes in a temporary commit
> - checking that the current HEAD can be re-checked out before the reset
>
The recommended way is to do a throw-away branch to commit your
temporary commit to (or the 'master' so long as you remember to use
reset). The current HEAD can always, barring object database errors, be
checked out if 'git status' reports no changes in the working tree.
Unfortunately, the object database is often enormous, so doing a full
fsck-objects before each change to the branch you're on (which is
basically what a reset is), would take far too long to be viable.
>> Since unpack-objects does not use the index, it cannot extract
>> anything after the first error. We _could_ enhance unpack-objects to
>> be nice and optionally take a pack-index to try to reconstruct as many
>> objects as possible.
>>
> That would be very useful...
> Btw, even without that, if I understand correctly, git packs are
> collections of compressed objects, each of which has its own header
> stating how long is the compressed object itself. In my case, the error
> is in inflating one object (git unpack-objects says inflate returns
> -3)... so shouldn't there be a way to try to skip to the next object
> even in this case?
It should be possible, assuming the pack index is still intact. The pack
index is where the headers are stored, afaik.
>> BTW I'd recommend not syncing with unison, but with the git
>> transports: If your PC and Laptop are connected, you could do
>> something like
>>
>> git pull laptop:my_project/.git
>>
> Actually, the project, including the git archive gets syncronized as a
> part of a syncronization process including all my Documents directory
> (the project is in fact a LaTeX manual with somehow complex LaTeX
> packages and classes). Syncronizing in this way actually worked very
> well so far, because at once I was getting in sync all my working trees
> and all my repos...
>
The largest benefit of using git's synchronization methods is that you
immediately get a pack-file verification, and also that you never risk
overwriting anything in either repo if you've forgotten to sync between
the two (say you've made changes on your laptop, forgot to send them to
your workstation, then made changes on your workstation and then you try
to sync them). It's possible to recover from such a situation using the
lost-found tool, but it can be cumbersome, and uncommitted changes, as
well as changes to the working tree, are lost forever.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
2006-08-25 10:20 ` Andreas Ericsson
@ 2006-08-25 10:41 ` Jakub Narebski
2006-08-25 10:58 ` Junio C Hamano
1 sibling, 0 replies; 17+ messages in thread
From: Jakub Narebski @ 2006-08-25 10:41 UTC (permalink / raw)
To: git
Andreas Ericsson wrote:
>>> BTW I'd recommend not syncing with unison, but with the git
>>> transports: If your PC and Laptop are connected, you could do
>>> something like
>>>
>>> git pull laptop:my_project/.git
>>>
>> Actually, the project, including the git archive gets syncronized as a
>> part of a syncronization process including all my Documents directory
>> (the project is in fact a LaTeX manual with somehow complex LaTeX
>> packages and classes). Syncronizing in this way actually worked very
>> well so far, because at once I was getting in sync all my working trees
>> and all my repos...
>>
>
> The largest benefit of using git's synchronization methods is that you
> immediately get a pack-file verification, and also that you never risk
> overwriting anything in either repo if you've forgotten to sync between
> the two (say you've made changes on your laptop, forgot to send them to
> your workstation, then made changes on your workstation and then you try
> to sync them). It's possible to recover from such a situation using the
> lost-found tool, but it can be cumbersome, and uncommitted changes, as
> well as changes to the working tree, are lost forever.
Unison (which if I remember correctly uses rsync, or rsync over ssh) detect
such case and ask user what to do if both sides changed a file (copy from
one side, copy from second side, view diff, merge,...).
But you can always tell unison to ignore git object database
ignore = Name .git
and perhaps also ignore working directories under git control
ignore = Path path/to/working/dir/
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
2006-08-25 10:20 ` Andreas Ericsson
2006-08-25 10:41 ` Jakub Narebski
@ 2006-08-25 10:58 ` Junio C Hamano
1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-08-25 10:58 UTC (permalink / raw)
To: Andreas Ericsson; +Cc: git, Sergio Callegari
Andreas Ericsson <ae@op5.se> writes:
>> Btw, even without that, if I understand correctly, git packs are
>> collections of compressed objects, each of which has its own header
>> stating how long is the compressed object itself. In my case, the
>> error is in inflating one object (git unpack-objects says inflate
>> returns -3)... so shouldn't there be a way to try to skip to the
>> next object even in this case?
>
> It should be possible, assuming the pack index is still intact. The
> pack index is where the headers are stored, afaik.
The problem Sergio seems to be having is because somehow he does
not have a base object that another object that is in the pack
depends on, because the latter object is stored in deltified
form.
This should never happen unless .pack itself is corrupted
(git-pack-objects, unless explicitly told to do so with --thin
flag to git-rev-list upstream, would not make a delta against
objects not in the same pack).
When a delta is written to the pack file, unless its base object
has already written out, git-pack-objects writes out the base
object immediately after that deltified object. So one
possibility is that the pack was truncated soon after the delta
that is having trouble with finding its base object. In such a
case, the proposed recovery measure of skipping the corruption
and keep going would not buy you that much. On the other hand,
if the corruption is in the middle (e.g. a single disk block was
wiped out), having .idx file might help you resync.
Does the pack pass git-verify-pack test, I wonder?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
@ 2006-08-25 12:31 Sergio Callegari
2006-08-26 18:20 ` Linus Torvalds
0 siblings, 1 reply; 17+ messages in thread
From: Sergio Callegari @ 2006-08-25 12:31 UTC (permalink / raw)
To: git
>
> Andreas Ericsson <ae@op5.se> writes:
>
> >> Btw, even without that, if I understand correctly, git packs are
> >> collections of compressed objects, each of which has its own header
> >> stating how long is the compressed object itself. In my case, the
> >> error is in inflating one object (git unpack-objects says inflate
> >> returns -3)... so shouldn't there be a way to try to skip to the
> >> next object even in this case?
> >
> > It should be possible, assuming the pack index is still intact. The
> > pack index is where the headers are stored, afaik.
>
> The problem Sergio seems to be having is because somehow he does
> not have a base object that another object that is in the pack
> depends on, because the latter object is stored in deltified
> form.
>
> This should never happen unless .pack itself is corrupted
> (git-pack-objects, unless explicitly told to do so with --thin
> flag to git-rev-list upstream, would not make a delta against
> objects not in the same pack).
>
> When a delta is written to the pack file, unless its base object
> has already written out, git-pack-objects writes out the base
> object immediately after that deltified object. So one
> possibility is that the pack was truncated soon after the delta
> that is having trouble with finding its base object. In such a
> case, the proposed recovery measure of skipping the corruption
> and keep going would not buy you that much. On the other hand,
> if the corruption is in the middle (e.g. a single disk block was
> wiped out), having .idx file might help you resync.
>
> Does the pack pass git-verify-pack test, I wonder?
If I try to verify the pack I get:
git verify-pack -v pack-ebcdfbbda07e5a3e4136aa1f499990b35685bab4.idx
fatal: failed to read delta-pack base object
2849bd2bd8a76bbca37df2a4c8e8b990811d01a7
the package length seems reasonable, however... (no evident sign of
truncation, but I haven't looked inside the index to check the exact
positions of objects)...
and git unpack-object dies with error code -3 in inflate...
If I am not wrong (but I might easily be so) this should not be relative
to truncation...
Sergio
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
2006-08-25 10:07 Sergio Callegari
2006-08-25 10:20 ` Andreas Ericsson
@ 2006-08-26 10:09 ` Junio C Hamano
2006-08-26 10:31 ` Junio C Hamano
1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2006-08-26 10:09 UTC (permalink / raw)
To: git; +Cc: Sergio Callegari, Nicolas Pitre, Linus Torvalds
Sergio Callegari <scallegari@arces.unibo.it> writes:
> I was not expecting this kind of problem, so I silly did a repack as
> the last thing, I went home, I attached the laptop to the net, I run
> unison, I started to work and I realized that there was a problem when
> I attempted a new repack which failed complaining about the corrupted
> pack...
Sorry about the mixed "intended audience" of this message,
asking Sergio for a bit more info as the end user who had
problems with git, and at the same time describing the code
level analysis of possible cause to ask for help from git
developers. Nico CC'ed because he seems to be the person who
knows the best around this area including zlib.
- - -
Earlier you said that the mothership has 1.4.2, and the note has
1.4.0. The sequence of events as I understand are:
- repack -a -d on the mothership with 1.4.2; no problems
observed.
- transfer the results to note; this was done behind git so
no problems observed.
- tried to repack on note with 1.4.0; got "failed to
read delta-pack base object" error.
Can you make the pack/idx available to the public for
postmortem?
Also I wonder if the pack can be read by 1.4.2.
Earlier you said "unpack-objects <$that-pack.pack" fails with
"error code -3 in inflate..." What exact error do you get?
I am guessing that it is get_data() which says:
"inflate returned %d\n"
(side note: we should not say \n there).
static void *get_data(unsigned long size)
{
z_stream stream;
void *buf = xmalloc(size);
memset(&stream, 0, sizeof(stream));
stream.next_out = buf;
stream.avail_out = size;
stream.next_in = fill(1);
stream.avail_in = len;
inflateInit(&stream);
for (;;) {
int ret = inflate(&stream, 0);
use(len - stream.avail_in);
if (stream.total_out == size && ret == Z_STREAM_END)
break;
if (ret != Z_OK)
die("inflate returned %d\n", ret);
stream.next_in = fill(1);
stream.avail_in = len;
}
inflateEnd(&stream);
return buf;
}
This pattern appears practically everywhere. When inflate()
returns, we expect its return value to be either Z_OK or
Z_STREAM_END and everything else is treated as an error. Also
when we receive Z_STREAM_END the resulting length had better be
the size we expect. I do not have any problem with the latter
but have always felt uneasy about the former but being no zlib
expert had been using the code as is. zlib.h says Z_BUF_ERROR
is not fatal -- it just means this round of call with the given
input and output buffer did not make any progress. I've been
wondering if it is possible for inflate to eat some input but
that was not enough to produce one byte of output, and what
would return in such a case...
About the error message you got while attempting to repack, the
exact same error message appears in two places, but the one that
is emitted is the one in sha1_file.c::unpack_delta_entry(). The
other one is in unpack-objects.c and is not run by repack.
This function is called after:
- we find that we need to access an object;
- we decided to use the .pack/.idx pair; when mmap'ing
the .idx file, we validate that the pair is not
corrupt by calling check_packed_git_idx(). This does
the sha-1 checksum of both files;
- we find the location of the deltified object in the
.pack file by looking at the corresponding .idx;
- we read from that location a handful bytes, find that
it is deltified and learn its base object name.
So it is not likely that the .pack/.idx pair was corrupted after
it was written (i.e. not a bit rot).
Now, in unpack_delta_entry_function():
- we find the location of its base object in the .pack
file by looking at the .idx again; if this fails, we
would die with a different error message "failed to
find delta-pack base object";
- we call unpack_entry_gently(); we would see the error
message you saw only when this function returns NULL.
So unpack_entry_gently() while reading the base object returned
NULL. Let's see how it can:
- we read the data for the base object in the pack we
identified earlier. First we learn its type and size.
- the base object could be also a deltified object, in
which case it would recursively call unpack_delta_entry();
however, that function would never return NULL. it
either succeeds or die()s. So the base must not have
been OBJ_DELTA type.
- the object type recorded there for the base object
could have been something bogus, in which case we
would return NULL. But that would mean pack-object in
1.4.2 generated a bogus pack on the mothership.
- if the object type is not bogus, unpack_non_delta_entry()
is called to extract the data for the base object.
this decompresses the data stream, and if it does
not inflate well it would return NULL.
So there are only a few ways you can get that error message.
- pack-objects in 1.4.2 produced an invalid pack by
recording:
- bogus object type for the base object, or
- incorrectly recording the offset of the base object in
the pack file in .idx, or
- incorrectly recording the size of the base object in
the pack file;
and checksumed the bogus resulting pack/idx pair as if
nothing was wrong.
- pack-objects in 1.4.2 produced a deflated stream that
made unpack_non_delta_entry() unhappy.
Other changes between 1.4.0 and 1.4.2 that I do not think are
related are:
- we slightly changed the way data is deflated with
commit 12f6c30 to favour speed over compression, but
this only affects loose objects and not packs.
- we introduced a new file format for loose objects with
commit 93821bd, but this needs to be explicitly
enabled by .git/config option. Even if the mothership
1.4.2 had recorded loose objects in the new format,
the process to create the pack by first expanding and
then recompressing with the old-and-proven code, so
this should not affect the resulting pack. Even if
such a loose object were copied to the note with 1.4.0
together with the pack, object reading code always
favours what's in the pack, so it should not even be
touched. Actually, the codepath that emits the error
message does not read the base object from anywhere
other than from the same pack.
- we slightly changed the way data for the object to be
deltified and the base object is read in pack-objects
with commit 560b25a; I did not see anything obviously
wrong with that change.
- we slightly changed the way we pick the base object
when making a delta with commits 8dbbd14 and 51d1e83;
these should not change what happens after a pair is
decided to be used as delta and its base.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
2006-08-26 10:09 ` Junio C Hamano
@ 2006-08-26 10:31 ` Junio C Hamano
0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-08-26 10:31 UTC (permalink / raw)
To: git; +Cc: Sergio Callegari, Nicolas Pitre, Linus Torvalds
Junio C Hamano <junkio@cox.net> writes:
> Earlier you said "unpack-objects <$that-pack.pack" fails with
> "error code -3 in inflate..." What exact error do you get?
> I am guessing that it is get_data() which says:
>
> "inflate returned %d\n"
>
> (side note: we should not say \n there).
> ...
> This pattern appears practically everywhere...
> ... I've been
> wondering if it is possible for inflate to eat some input but
> that was not enough to produce one byte of output, and what [it]
> would return in such a case...
I do not think this fear does not apply to this particular case;
return value -3 is Z_DATA_ERROR, so the deflated stream is
corrupt.
> So there are only a few ways you can get that error message.
> ...
I just realized there is another not so inplausible explanation.
When the problematic pack was made on the mothership,
csum-file.c::sha1write_compressed() gave the data for the base
object to zlib, an alpha particle hit a memory cell that
contained zlib output buffer (resulting in a corrupt deflated
stream in variable "out"), and sha1write() wrote it out while
computing the right checksum.
Is the memory on your mothership reliable (I do not want to make
this message sound like one on the kernel list, but memtest86
might be in order)?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
2006-08-25 12:31 Sergio Callegari
@ 2006-08-26 18:20 ` Linus Torvalds
0 siblings, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2006-08-26 18:20 UTC (permalink / raw)
To: Sergio Callegari; +Cc: git
On Fri, 25 Aug 2006, Sergio Callegari wrote:
>
> If I try to verify the pack I get:
>
> git verify-pack -v pack-ebcdfbbda07e5a3e4136aa1f499990b35685bab4.idx
> fatal: failed to read delta-pack base object 2849bd2bd8a76bbca37df2a4c8e8b990811d01a7
>
> the package length seems reasonable, however... (no evident sign of
> truncation, but I haven't looked inside the index to check the exact positions
> of objects)...
Can you make the corrupt pack-file and index available publically (or
perhaps at least to a few git people?)
The fact that verify-pack is happy with the SHA1 checksum is interesting,
because it means that the pack-file at least didn't get corrupted on-disk
(or through the sync operation). Iow, it must have gotten corrupted at
write-out itself somehow, and it would be interesting to see what the
pack-file looks like.
> and git unpack-object dies with error code -3 in inflate...
That's Z_DATA_ERROR, which is what you get if the input to inflate is bad.
Linus
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
@ 2006-08-26 18:49 Sergio Callegari
0 siblings, 0 replies; 17+ messages in thread
From: Sergio Callegari @ 2006-08-26 18:49 UTC (permalink / raw)
To: git
>
> Junio C Hamano <junkio@cox.net> writes:
>
> > Earlier you said "unpack-objects <$that-pack.pack" fails with
> > "error code -3 in inflate..." What exact error do you get?
> > I am guessing that it is get_data() which says:
> >
> > "inflate returned %d\n"
> >
> > (side note: we should not say \n there).
> > ...
> > This pattern appears practically everywhere...
> > ... I've been
> > wondering if it is possible for inflate to eat some input but
> > that was not enough to produce one byte of output, and what [it]
> > would return in such a case...
>
> I do not think this fear does not apply to this particular case;
> return value -3 is Z_DATA_ERROR, so the deflated stream is
> corrupt.
>
> > So there are only a few ways you can get that error message.
> > ...
>
> I just realized there is another not so inplausible explanation.
>
> When the problematic pack was made on the mothership,
> csum-file.c::sha1write_compressed() gave the data for the base
> object to zlib, an alpha particle hit a memory cell that
> contained zlib output buffer (resulting in a corrupt deflated
> stream in variable "out"), and sha1write() wrote it out while
> computing the right checksum.
>
> Is the memory on your mothership reliable (I do not want to make
> this message sound like one on the kernel list, but memtest86
> might be in order)?
>
For sure you can never say, but so far I have never had any problem with
that machine nor I had any after this incident...
Sergio
>
>
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
@ 2006-08-26 18:53 Sergio Callegari
2006-08-26 19:24 ` Linus Torvalds
0 siblings, 1 reply; 17+ messages in thread
From: Sergio Callegari @ 2006-08-26 18:53 UTC (permalink / raw)
To: git
> Earlier you said that the mothership has 1.4.2, and the note has
> 1.4.0. The sequence of events as I understand are:
>
> - repack -a -d on the mothership with 1.4.2; no problems
> observed.
>
> - transfer the results to note; this was done behind git so
> no problems observed.
>
> - tried to repack on note with 1.4.0; got "failed to
> read delta-pack base object" error.
>
Yes... only before that I had a few more iterations PC<->notebook always
syncing with unison.
> Can you make the pack/idx available to the public for
> postmortem?
>
Yes... I can make them available... the pack/idx actually do not contain
anything extremely confidential (just a bunch of LaTeX files).
Only, being that there is conference data and stuff by people who
professionally organize conferences, I'd prefer to make it available
directly to some specific people that I can trust not re-distributing it
rather than putting it in the general public.
> Also I wonder if the pack can be read by 1.4.2.
>
No it cannot.
> Earlier you said "unpack-objects <$that-pack.pack" fails with
> "error code -3 in inflate..." What exact error do you get?
> I am guessing that it is get_data() which says:
>
> "inflate returned %d\n"
>
This is right...
>
Just a question...
Might the problem have come out of a scenario like the following...
1) I use unison to sync my documents (rather than using the git tools...
silly me!)
2) I get things wrong in controlling unison (without realizing that I
do) and the result is that I lose some blobs.
3) I repack an unclean tree (missing some objects)
Can this be the case?
Thanks
Sergio
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
2006-08-26 18:53 Sergio Callegari
@ 2006-08-26 19:24 ` Linus Torvalds
0 siblings, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2006-08-26 19:24 UTC (permalink / raw)
To: Sergio Callegari; +Cc: git
On Sat, 26 Aug 2006, Sergio Callegari wrote:
>
> Might the problem have come out of a scenario like the following...
>
> 1) I use unison to sync my documents (rather than using the git tools...
> silly me!)
> 2) I get things wrong in controlling unison (without realizing that I
> do) and the result is that I lose some blobs.
> 3) I repack an unclean tree (missing some objects)
>
> Can this be the case?
I do think that your synchronization using unison is _somehow_ part of the
reason why bad things happened, but I really can't see why it would cause
problems, and perhaps more importantly, git should have noticed them
earlier (and, in particular, failed the repack). So a git bug and/or
misfeature is involved somehow.
One thing that may have happened is that the use of unison somehow
corrupted an older pack (or you had a disk corruption), and that it was
missed because the corruption was in a delta of the old pack that was
silently re-used for the new one.
That would explain how the SHA1 of the pack-file matches - the repack
would have re-computed the SHA1 properly, but since the source delta
itself was corrupt, the resulting new pack is corrupt.
If you had used git itself to synchronize the two repositories, that
corruption of one repo would have been noticed when it transfers the data
over to the other side, which is one reason why the native git syncing
tools are so superior to doing a filesystem-level synchronization.
With a filesystem-level sync (unison or anything else - rsync, cp -r,
etc), a problem introduced in one repository will be copied to another one
without any sanity checking.
(Which is not to say that the native protocol might not miss something
too, but it should be _much_ harder to trigger: for anything but the
initial close, the native protocol will unpack all objects and recompute
their SHA1 hashes from first principles on the receiving side, rather than
trust the sender implicitly, so it's fundamentally safer. But maybe we
could be even _more_ anal somewhere).
So as a suggestion if you want to be careful:
- only use "git fetch/pull" to synchronize two git repos, because that's
inherently safer than any non-native synchronization.
- if you repack reasonably often, do "git fsck-objects" (which is very
cheap when there aren't a lot of unpacked objects) to verify the
archive before "git repack -a -d" to repack it.
- the "fsck-objects" thing won't catch a corrupt pack (unless you ask for
it with "--full", which is expensive for bigger projects), but at least
with "git fetch/pull", such corruption should not be able to replicate
to another repository.
but in the meantime, when you find a place to put the corrupt pack/index
file, please include me and Junio at a minimum into the group of people
who you tell where to find it (and/or passwords to access it). I'll
happily keep your data private (I've done it before for others).
Linus
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
@ 2006-08-27 17:45 Sergio Callegari
2006-08-27 18:27 ` Linus Torvalds
0 siblings, 1 reply; 17+ messages in thread
From: Sergio Callegari @ 2006-08-27 17:45 UTC (permalink / raw)
To: git
>
> I do think that your synchronization using unison is _somehow_ part of the
> reason why bad things happened, but I really can't see why it would cause
> problems, and perhaps more importantly, git should have noticed them
> earlier (and, in particular, failed the repack). So a git bug and/or
> misfeature is involved somehow.
>
Glad if my broken pack can help finding out!
> One thing that may have happened is that the use of unison somehow
> corrupted an older pack (or you had a disk corruption), and that it was
> missed because the corruption was in a delta of the old pack that was
> silently re-used for the new one.
>
> That would explain how the SHA1 of the pack-file matches - the repack
> would have re-computed the SHA1 properly, but since the source delta
> itself was corrupt, the resulting new pack is corrupt.
>
There is something that I still do not understand... (sorry if I ask
stupid questions)...
Since packs have an sha signature too, if there was a data corruption
(disk or transfer), shouldn't that have been detected at the repack?
I.e. doesn't repack -d verify the available data before cancelling anything?
> If you had used git itself to synchronize the two repositories, that
> corruption of one repo would have been noticed when it transfers the data
> over to the other side, which is one reason why the native git syncing
> tools are so superior to doing a filesystem-level synchronization.
>
I think I learnt the lesson!
> With a filesystem-level sync (unison or anything else - rsync, cp -r,
> etc), a problem introduced in one repository will be copied to another one
> without any sanity checking.
>
Idem!
> but in the meantime, when you find a place to put the corrupt pack/index
> file, please include me and Junio at a minimum into the group of people
> who you tell where to find it (and/or passwords to access it). I'll
> happily keep your data private (I've done it before for others).
>
>
Sure... I have already sent an email to Junio to arrange this.
Thanks,
Sergio
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
2006-08-27 17:45 Sergio Callegari
@ 2006-08-27 18:27 ` Linus Torvalds
2006-08-27 19:26 ` Nicolas Pitre
2006-08-27 21:55 ` Junio C Hamano
0 siblings, 2 replies; 17+ messages in thread
From: Linus Torvalds @ 2006-08-27 18:27 UTC (permalink / raw)
To: Sergio Callegari, Junio C Hamano; +Cc: Git Mailing List
On Sun, 27 Aug 2006, Sergio Callegari wrote:
>
> There is something that I still do not understand... (sorry if I ask stupid
> questions)...
> Since packs have an sha signature too, if there was a data corruption (disk or
> transfer), shouldn't that have been detected at the repack? I.e. doesn't
> repack -d verify the available data before cancelling anything?
The packs do have a SHA1 signature, but verifying it is too expensive for
normal operations. It's only verified when you explicitly ask for it, ie
by git-fsck-objects and git-verify-pack.
Now, for small projects we could easily verify the SHA1 csum when we load
the pack, but imagine doing the same thing when the pack is half a
gigabyte in size, and I think you see the problem.. Especially as most
normal operations wouldn't even otherwise touch more than a small small
fraction of the pack contents, so verifying the SHA1 would be relatively
very expensive indeed.
Now, "git repack -a" is obviously special in that the "-a" will mean that
we will generally touch all of the old pack _anyway_, and thus verifying
the signature is no longer at all as unreasonable as it is under other
circumstances. And very arguably, if you _also_ do "-d", then since that
is a fairly dangerous operation with the potential for real data loss, you
could well argue that we should do it.
However, since the data was _already_ corrupt in that situation, and since
a "git-fsck-objects --full" _will_ pick up the corruption in that case
both before and after, equally arguably it's also true that there's really
not a huge advantage to checking it in "git repack -a -d".
In other words, in your case, the reason you ended up with the corruption
spreading was _not_ that "git repack -a -d" might have silently not
noticed it, but really the fact that unison meant that the corruption
would spread from one archive to another in the first place.
NOTE! This is all assuming my theory that a packed entry was broken in the
first place was correct. We obviously still don't _know_ what the problem
was. So far it's just a theory.
Final note: a "git repack -a -d" normally actually _does_ do almost as
much checking as a "git-fsck-objects". It's literally just the "copy the
already packed object from an old pack to a new one" that it an
optimization that short-circuits all the normal git sanity checks. All the
other cases will effectively do a lot of integrity checking just by virtue
of unpacking the data in the object that is packed, before re-packing it.
So it might well be the case that we should simply add an extra integrity
check to the raw data copy in builtin-pack-objects.c: write_object().
Now, these days there is actually two cases of that: the pack-to-pack copy
(which has existed for a long while) and the new "!legacy_loose_object()"
case. They should perhaps both verify the integrity of what they copy.
Junio, comments?
Linus
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
2006-08-27 18:27 ` Linus Torvalds
@ 2006-08-27 19:26 ` Nicolas Pitre
2006-08-27 21:55 ` Junio C Hamano
1 sibling, 0 replies; 17+ messages in thread
From: Nicolas Pitre @ 2006-08-27 19:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Sergio Callegari, Junio C Hamano, Git Mailing List
On Sun, 27 Aug 2006, Linus Torvalds wrote:
> Now, "git repack -a" is obviously special in that the "-a" will mean that
> we will generally touch all of the old pack _anyway_, and thus verifying
> the signature is no longer at all as unreasonable as it is under other
> circumstances. And very arguably, if you _also_ do "-d", then since that
> is a fairly dangerous operation with the potential for real data loss, you
> could well argue that we should do it.
We definitely should do it with -d.
> However, since the data was _already_ corrupt in that situation, and since
> a "git-fsck-objects --full" _will_ pick up the corruption in that case
> both before and after, equally arguably it's also true that there's really
> not a huge advantage to checking it in "git repack -a -d".
There really is an advantage. Given that the absence of -d leaves old
objects/packs around, there is a greater chance for still finding an
early copy of the bad object.
> In other words, in your case, the reason you ended up with the corruption
> spreading was _not_ that "git repack -a -d" might have silently not
> noticed it, but really the fact that unison meant that the corruption
> would spread from one archive to another in the first place.
But -d would definitely delete old packs that could have had a non
corrupted copy of the desired object.
> Final note: a "git repack -a -d" normally actually _does_ do almost as
> much checking as a "git-fsck-objects". It's literally just the "copy the
> already packed object from an old pack to a new one" that it an
> optimization that short-circuits all the normal git sanity checks. All the
> other cases will effectively do a lot of integrity checking just by virtue
> of unpacking the data in the object that is packed, before re-packing it.
>
> So it might well be the case that we should simply add an extra integrity
> check to the raw data copy in builtin-pack-objects.c: write_object().
>
> Now, these days there is actually two cases of that: the pack-to-pack copy
> (which has existed for a long while) and the new "!legacy_loose_object()"
> case. They should perhaps both verify the integrity of what they copy.
I think that git-pack-object should grow another flag: --verify-src or
something. That flag would force the verification of the pack checksum
for any pack used for the repack operation. Then it should be used
anytime -d is provided to git-repack. When -d is not provided then we
can skip that --verify-src security measure since nothing gets deleted
and therefore no risk of loosing a good object over a corrupted one
would happen.
Nicolas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Problem with pack
2006-08-27 18:27 ` Linus Torvalds
2006-08-27 19:26 ` Nicolas Pitre
@ 2006-08-27 21:55 ` Junio C Hamano
1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2006-08-27 21:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> NOTE! This is all assuming my theory that a packed entry was broken in the
> first place was correct. We obviously still don't _know_ what the problem
> was. So far it's just a theory.
But it is a good theory. More plausible than alpha particle
hitting the output buffer of zlib at the right moment, although
the effects are the same ;-).
> So it might well be the case that we should simply add an extra integrity
> check to the raw data copy in builtin-pack-objects.c: write_object().
I would agree that it is a sensible thing to do to insert check
at places shown in the attached. The revalidate_pack_piece()
would:
- decode object header to make sure it decodes to sensible
enum object_type value from the start of the buffer given as
its first argument;
- if it is of type OBJ_DELTA, skip 20-byte base object name;
- the second argument is the length of the piece -- make sure
the above steps did not require more than the length;
- make sure the remainder is sane by running inflate() into
void. When fed the remainder in full, inflate() should
return Z_OK.
For the first step we need to refactor unpack_object_header() in
sha1_file.c a tiny bit to reuse it.
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 46f524d..0521cad 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -276,6 +282,7 @@ static unsigned long write_object(struct
map = map_sha1_file(entry->sha1, &mapsize);
if (map && !legacy_loose_object(map)) {
/* We can copy straight into the pack file */
+ revalidate_pack_piece(map, mapsize);
sha1write(f, map, mapsize);
munmap(map, mapsize);
written++;
@@ -319,6 +326,7 @@ static unsigned long write_object(struct
datalen = find_packed_object_size(p, entry->in_pack_offset);
buf = (char *) p->pack_base + entry->in_pack_offset;
+ revalidate_pack_piece(buf, datalen);
sha1write(f, buf, datalen);
unuse_packed_git(p);
hdrlen = 0; /* not really */
^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2006-08-27 21:55 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-26 18:49 Problem with pack Sergio Callegari
-- strict thread matches above, loose matches on Subject: below --
2006-08-27 17:45 Sergio Callegari
2006-08-27 18:27 ` Linus Torvalds
2006-08-27 19:26 ` Nicolas Pitre
2006-08-27 21:55 ` Junio C Hamano
2006-08-26 18:53 Sergio Callegari
2006-08-26 19:24 ` Linus Torvalds
2006-08-25 12:31 Sergio Callegari
2006-08-26 18:20 ` Linus Torvalds
2006-08-25 10:07 Sergio Callegari
2006-08-25 10:20 ` Andreas Ericsson
2006-08-25 10:41 ` Jakub Narebski
2006-08-25 10:58 ` Junio C Hamano
2006-08-26 10:09 ` Junio C Hamano
2006-08-26 10:31 ` Junio C Hamano
2006-08-25 8:45 Sergio Callegari
2006-08-25 9:12 ` Johannes Schindelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).