git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Problematic git pack
       [not found]             ` <7v7j0qihwl.fsf@assigned-by-dhcp.cox.net>
@ 2006-08-30 18:11               ` Linus Torvalds
  0 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2006-08-30 18:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sergio Callegari, Git Mailing List


[ git list cc'd, just because maybe somebody was interested in seeing what 
  looks very much like the resolution of this issue..

  For people who haven't followed (the private) email exhanges with Sergio 
  about his git corruption, what was going on is that we initially were 
  able to re-generate all but one object (and the objects that were 
  dependent on it through deltas - there were three deltas against it, but 
  two of the deltas happened to delete the corruption and thus only one 
  of the dependent objects actually ended up having the wrong data, in 
  addition to the primary corrupted object, of course)..

  Junio pinpointed what the filesystem name of that primary object was, 
  and Sergio actually had the original object in his working tree, so 
  Junio could then generate a new pack with the one corrupted object 
  fixed, which obviously meant that all the deltas now worked too.

  This is my (probably final) analysis of the resulting differences.. ]

On Wed, 30 Aug 2006, Junio C Hamano wrote:
> 
> Ok, I was going to attach the resurrected pack that should
> contain everything your corrupt pack had, but it is a bit too
> large, so I'll place it here [*1*].  Drop me a note when you
> retrieved it, so that I can remove it.

VERY interesting.

The pack you generated looks very different from the original corrupt 
pack, but when I compare it to one that I generated _from_ that corrupted 
original by forcing a full repack, they are actually very very similar.

[ Side note: the explanation for the difference between my repacked 
  version and the original corrupt pack in turn seems to be fairly 
  straightforward: the older objects in the original pack were created 
  with an older version of git that defaulted to maximum zlib compression, 
  while the newer parts of the pack were done with the current default of 
  "default compression".

  So when Junio re-created the packfile from scratch with a modern git 
  version, it would now re-compress everything with the modern compression 
  value, and thus the byte-stream of Junio's pack would look very 
  different from Sergio's original one for the older objects. So what I 
  did was to repack the _corrupt_ data with a modern git, to be able to 
  compare the two sanely. ]

In fact, doing a hexdump with 'od', and diffing the results, here's the 
differences:

	--- od.unfixed  2006-08-30 10:11:36.000000000 -0700
	+++ od.fixed    2006-08-30 09:33:02.000000000 -0700
	@@ -3358,7 +3358,7 @@
	 0150720 2c0c bc8b 2f07 3733 767d 35bb 6bb6 4b28
	 0150740 0e3c 0ddc 955b eb2d 57e0 754b b1ec b9f7
	 0150760 8ac8 87bf 9d44 fd11 a041 c4ea cd1d 26de
	-0151000 94ea 9cf3 7dcd b596 13a3 61bb 48db e69e
	+0151000 96ea 9cf3 7dcd b596 13a3 61bb 48db e69e
	 0151020 21e9 1288 6294 8dd4 8b3c 6cc6 e4e7 518f
	 0151040 6166 3e2d d635 b631 9ec6 0613 aee3 caab
	 0151060 d4fd 3ab3 6e18 c43a 8dee 15b9 bc6d 0748
	@@ -8455,7 +8455,7 @@
	 0410140 9f30 2514 f447 0c0d 477e 27c7 a0e1 c9ec
	 0410160 c0b1 f352 76dd 4ff6 24b1 03c2 0ed2 363a
	 0410200 034f 637c 8e11 0c86 e2db 2625 75c4 508b
	-0410220 6fdb bf01 af2d b23b fbb7 0128 49bd 2bd8
	+0410220 6fdb bf01 54b6 b23d fbb7 0128 49bd 2bd8
	 0410240 a76b bca3 7df2 a4c8 e8b9 9081 1d01 a778
	 0410260 9cad 564d 6f1b 4518 7605 e563 c501 0995
	 0410300 231a 0151 5255 f6ce ccee ec07 8a22 25eb
	@@ -394626,7 +394626,7 @@
	 30054020 4951 6272 665e 7a6e 6272 517e b1ad d1e4
	 30054040 a55c 9a40 fd62 a2a9 c925 99f9 79b5 d5aa
	 30054060 5c0a 4010 03b4 d6a1 a064 b323 f74a c6cd
	-30054100 df78 a219 27c7 f205 0200 8970 3307 ae17
	-30054120 5e11 8f8c e015 00f8 5d51 9152 0d07 79b2
	-30054140 45dd
	+30054100 df78 a219 27c7 f205 0200 8970 3307 4599
	+30054120 3854 bacb 704e 7312 11e1 60e3 c14b 0a90
	+30054140 e22b
	 30054142

You can ignore the last hunk, that's just the SHA1 of the pack itself, and 
that will obviously differ in all 20 bytes, so that difference is not 
interesting. 

So the _real_ difference is literally just the one byte at offset 0151000 
(decimal 53760) which in the fixed pack is 0x96, and in the corrupt pack 
it is 0x94. That's a single-bit difference (bit #1 has been cleared).

The other difference is at 0410224 (decimal 135316), where the sequence 
"af2d b23b" should have been "54b6 b23d", and that's just final zlib CRC 
of that one object (the corrupted object is fairly large: it's 127905 
bytes uncompressed, and I think it was ~100kB compressed too, because most 
of it is the data for the image inside of it).

The way I know that: doing "git-show-index" (and sorting by object offset) 
on the index gives you:

	...
	13830 2849bd2bd8a76bbca37df2a4c8e8b990811d01a7
	135320 42abdeecbf1b49c8354ca9639bd19c378be6d7d4
	...

which means that the one-bit error was in the middle of that (known 
corrupt) object 2849bd2bd8, and the four-byte difference is exactly the 
four last bytes of that same object - which is obviously where you'd 
expect to find the crc32 for the compressed data.

[ NOTE! The only reason the crc32 has changed is exactly the fact that I 
  forced a full repack of the corrupt pack. In the _original_ pack the 
  object 2849bd2bd8a76bbca37df2a4c8e8b990811d01a7 was at:

	...
	13369 2849bd2bd8a76bbca37df2a4c8e8b990811d01a7
	134859 42abdeecbf1b49c8354ca9639bd19c378be6d7d4
	...

  and thus the crc32 in the original is the four bytes at 134855, and 
  indeed in the original corrupt pack we see:

	...
	0407260 d236 3a03 4f63 7c8e 110c 86e2 db26 2575
	0407300 c450 8b6f dbbf 0154 b6b2 3dfb b701 2849
				 ^^ ^^^^ ^^
	0407320 bd2b d8a7 6bbc a37d f2a4 c8e8 b990 811d
	...

  so note how that contains the _original_ CRC, because in the original 
  corrupt pack, the CRC was the one that had been computed with the 
  original (uncorrupted) zlib data, but obviously didn't actually _match_ 
  the actual corrupted data itself - which is why we got a zlib error on 
  unpacking it ]

So I think this is pretty damn conclusive. Sergio had a single-bit error 
in a pack-file, and that error got propagated because "git repack" didn't 
notice, and because he used unison to synchronize between two different 
machines, and that obviously happily transferred the corruption.

Now, that makes me feel happy on one level, because it's almost certainly 
a hardware problem - subtle memory corruption, or disk corruption that 
happened when either reading or writing the image. Sergio may not be that 
happy about it, of course.

It _could_ be something else (hey, it could be the kernel or git itself 
that has a wild pointer and corrupted a single bit), but I'd say that the 
hardware is the primary suspect.

Now, what to take away from this:

 - git _did_ find the error, but it would have been easier for everybody 
   if it had noticed it a bit earlier. Ergo:

 - we should make the repacking verify the object even when it just 
   blindly copies it, so that we do _not_ end up in the situation that a 
   pack-file has a "valid" SHA1, even though the contents are actually 
   corrupt.

   This should be easy enough, since zlib already has the crc (actually, 
   we should probably do a full unpack of that object, even if it's 
   expensive: if it's a delta, we'd need that to verify that the SHA1 of 
   the base object is valid).

 - git pack-files are extremely dense (we knew that already, and mostly 
   consider it to be a really good thing), and a single-bit error can be 
   absolutely devastating. For important data, always keep a copy on 
   another machine (that's obviously true regardless of whether you use 
   git or not ;), and _always_ create the copy with git itself, or at 
   least verify it with "git-fsck-objects --full" before you overwrite the 
   previous version.

   The point being, if you have even a single-bit error (and for all we 
   know, it could have been introduced by the transfer itself, and then 
   been re-introduced in the original place when transferring the data 
   back), you absolutely do _not_ want to transfer that to your backup 
   location too.

Finally, this also points out that the corrupted packs _can_ be fixed, but 
I think Sergio was a bit lucky (to offset all the bad luck). Sergio still 
had access to the original file that had had its object corrupted. And it 
took a fair amount of work, and some git hacking by somebody who really 
understood git (Junio).

Maybe we'll end up having some of that effort being useful and checked in, 
and we'll eventually have more infrastructure for fixing these things, but 
I suspect that in most cases, even a _single_ bit of corruption will 
generally result in so much havoc that nobody should depend on that. It's 
a lot better to have backups.

			Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problematic git pack
@ 2006-08-31  8:45 Sergio Callegari
  2006-08-31 11:15 ` Johannes Schindelin
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Sergio Callegari @ 2006-08-31  8:45 UTC (permalink / raw)
  To: git

What can I say... I had never seen before such an action at such a rapid 
pace following the indication of a potential problem.
Thanks Linus and Junio and everybody who might have contributed.
>   Junio could then generate a new pack with the one corrupted object 
>   fixed, which obviously meant that all the deltas now worked too.
>   
Excellent news...
>   This is my (probably final) analysis of the resulting differences.. ]
>
> On Wed, 30 Aug 2006, Junio C Hamano wrote:
> > 
> > Ok, I was going to attach the resurrected pack that should
> > contain everything your corrupt pack had, but it is a bit too
> > large, so I'll place it here [*1*].  Drop me a note when you
> > retrieved it, so that I can remove it.
>   
Junio, can you please send me privately details about [*1*] so I can 
retrieve the pack also?

I also have another question... (maybe it was answered in some previous 
thread on this list, in this case a pointer would be enough).
Now I am going to have the fixed archive and also a new archive, which I 
restarted from the latest working copy I had of my project.
Is there any way to automatically do real "surgery" to attach one to the 
other and get a single archive with all the history?
Obviously, if I try to change a commit object to modify its parents, its 
signature changes, so I need to modify its childs and so on, is this 
correct?
Alternatively I belive that grafts should be a way to go... I had never 
used them before, do all git tools support them? Particularly do they 
get pushed and pulled correctly?
> So the _real_ difference is literally just the one byte at offset 0151000 
> (decimal 53760) which in the fixed pack is 0x96, and in the corrupt pack 
> it is 0x94. That's a single-bit difference (bit #1 has been cleared).
>
>   
So, possibly, the alpha particle theory could be the plausible one in 
the end...
> Now, that makes me feel happy on one level, because it's almost certainly 
> a hardware problem - subtle memory corruption, or disk corruption that 
> happened when either reading or writing the image. Sergio may not be that 
> happy about it, of course.
>   
The bad thing is that I don't know which of my two machines (the laptop 
or the desktop) caused the issue!

> Finally, this also points out that the corrupted packs _can_ be fixed, but 
> I think Sergio was a bit lucky (to offset all the bad luck). Sergio still 
> had access to the original file that had had its object corrupted. 
Actually, this could possibly be a not so rare case... In my tree I had 
the development of some LaTeX documents and packages (code like, the 
really "precious" files) and a few binary objects (images and openoffice 
files mainly, by far less precious).
Since the binary objects were so much overwhelming in size with regard 
to the text ones, assuming a single error the probability of having it 
in a non-code object was much larger than that of having it in a 
precious code object. Also commit and tree objects should be much 
smaller than data objects.
This assumption is the reason which initally pushed me to ask help to 
try to unpack at least all the correct objects (one of my first 
questions was: does git unpack-objects die on the first error or is 
there a way to convince it to simply skip the wrong object (or the delta 
against a wrong object)...
If git unpack-objects can gain an option like --continue-on-errors and 
if checkout/reset can also get an option to do the same (i.e. in a tree 
with missing objects, checkout all that can be found), I believe that 
one is at a good point already...
Finally, having a command to create an object out of a single file 
(contrary of git cat-file) could help re-creating the missing objects...
> And it 
> took a fair amount of work, and some git hacking by somebody who really 
> understood git (Junio).
>
> Maybe we'll end up having some of that effort being useful and checked in, 
> and we'll eventually have more infrastructure for fixing these things, but 
> I suspect that in most cases, even a _single_ bit of corruption will 
> generally result in so much havoc that nobody should depend on that. It's 
> a lot better to have backups.
>
> 			Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problematic git pack
  2006-08-31  8:45 Problematic git pack Sergio Callegari
@ 2006-08-31 11:15 ` Johannes Schindelin
  2006-08-31 16:23 ` Nicolas Pitre
  2006-08-31 21:33 ` Linus Torvalds
  2 siblings, 0 replies; 5+ messages in thread
From: Johannes Schindelin @ 2006-08-31 11:15 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: git

Hi,

On Thu, 31 Aug 2006, Sergio Callegari wrote:

> Now I am going to have the fixed archive and also a new archive, which I
> restarted from the latest working copy I had of my project.
> Is there any way to automatically do real "surgery" to attach one to the other
> and get a single archive with all the history?

You can "graft" the new onto the old branch:

If <40-hex-chars-old> is the commit id of the youngest commit of the 
reconstructed branch, and <40-hex-chars-new> is the commit id of the 
initial commit of the newly started branch, you can put this line into 
.git/info/grafts:

<40-hex-chars-new> <40-hex-chars-old>

This will make git believe that the initial commit is no initial commit, 
but has the old head as single parent. And yes, AFAICT all git tools 
support this. I used this technique many times to be able to merge 
unrelated developments.

NOTE! This is the quickest way if you want to have the history _locally_.

If you want to be able to distribute it (or synchronize it between your 
laptop and PC _with git!_), you can rewrite the history by either 
git-rebase, or by using cg-admin-rewritehist if you are using cogito.

Unfortunately, I do not use cogito nor git-rebase, so if you want to walk 
that path, others have to help. (And most likely, we'd put the result into 
Documentation/howto/.)

Ciao,
Dscho

P.S.: Of course, if you do not insist on a super clean history, you can 
fake a merge. Just put <40-hex-chars-old> into .git/MERGE_HEAD and commit. 
This will pretend that your new head and your old head were merged, and 
the result is the new head. This _should_ even work with git-bisect, but 
it is slightly ugly.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problematic git pack
  2006-08-31  8:45 Problematic git pack Sergio Callegari
  2006-08-31 11:15 ` Johannes Schindelin
@ 2006-08-31 16:23 ` Nicolas Pitre
  2006-08-31 21:33 ` Linus Torvalds
  2 siblings, 0 replies; 5+ messages in thread
From: Nicolas Pitre @ 2006-08-31 16:23 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: git

On Thu, 31 Aug 2006, Sergio Callegari wrote:

> The bad thing is that I don't know which of my two machines (the laptop or the
> desktop) caused the issue!

memtest86 is your friend: http://www.memtest.org


Nicolas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problematic git pack
  2006-08-31  8:45 Problematic git pack Sergio Callegari
  2006-08-31 11:15 ` Johannes Schindelin
  2006-08-31 16:23 ` Nicolas Pitre
@ 2006-08-31 21:33 ` Linus Torvalds
  2 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2006-08-31 21:33 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: git



On Thu, 31 Aug 2006, Sergio Callegari wrote:
>
> Junio, can you please send me privately details about [*1*] so I can retrieve
> the pack also?

He already did, search for "members.cox.net" in your email archive (it's 
Message-ID: <7v7j0qihwl.fsf@assigned-by-dhcp.cox.net> to be precise).

> I also have another question... (maybe it was answered in some previous thread
> on this list, in this case a pointer would be enough).
> Now I am going to have the fixed archive and also a new archive, which I
> restarted from the latest working copy I had of my project.
> Is there any way to automatically do real "surgery" to attach one to the other
> and get a single archive with all the history?

Yes. This is just what a "grafts" file is for.

Put the old pack/idx files into the .git/objects/packs directory, and then 
you can create "fake parenthood" information in a ".git/info/grafts" file 
by just adding text-lines of the format "<sha1> <fakeparentsha1>" (with 
each SHA being the regular 40-byte hex representation).

> Obviously, if I try to change a commit object to modify its parents, its
> signature changes, so I need to modify its childs and so on, is this correct?
> Alternatively I belive that grafts should be a way to go... I had never used
> them before, do all git tools support them? Particularly do they get pushed
> and pulled correctly?

Nope, they won't get pushed and pulled correctly, you need to put the 
grafts files in all repositories. Alternatively, you can re-create the 
whole history, I think cogito had some history re-writing tool.

> > So the _real_ difference is literally just the one byte at offset 0151000
> > (decimal 53760) which in the fixed pack is 0x96, and in the corrupt pack it
> > is 0x94. That's a single-bit difference (bit #1 has been cleared).
> 
> So, possibly, the alpha particle theory could be the plausible one in the
> end...

Yes. It's just that Junio's original theory required it to not just hit a 
memory cell, it also had to hit it at _just_ the right time in between 
being written and the SHA1 of the buffer being computed. So the original 
theory was very unlikely indeed.

My theory of the corruption just causing a re-computed SHA1 when repacking 
(and silently copying the corruption without realizing it) meant that 
there was no such small and unlikely window, but that any regular memory 
(or disk) corruption could easily have caused it at any time, and then a 
subsequent re-pack "fixed" the SHA1 to match the corruption..

> The bad thing is that I don't know which of my two machines (the laptop or the
> desktop) caused the issue!

I'd suggest running memtest86 for a few days on both (not necessarily at 
the same time - keep one working machine to do you job on ;)

> > Finally, this also points out that the corrupted packs _can_ be fixed, but I
> > think Sergio was a bit lucky (to offset all the bad luck). Sergio still had
> > access to the original file that had had its object corrupted. 
>
> Actually, this could possibly be a not so rare case... In my tree I had the
> development of some LaTeX documents and packages (code like, the really
> "precious" files) and a few binary objects (images and openoffice files
> mainly, by far less precious).

Sure. In your case you had checked in generated files too, and yes, they 
were the larger ones. That's not true in general - in many other projects, 
the _directory_ structure (ie the git "tree" objects) will be a large 
portion of the project, and probably more likely to be corrupt. Now, to 
some degree the tree objects are likely the ones easiest to "repair" 
(because you can try to look at the history and figure things out by 
hand), but at the same time, people also tend to have deeper delta-chains 
and it would just be _very_ painful.

So I do think you were somewhat lucky.

> Finally, having a command to create an object out of a single file (contrary
> of git cat-file) could help re-creating the missing objects...

Hmm. Like "git-hash-object"?

			Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-08-31 21:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-31  8:45 Problematic git pack Sergio Callegari
2006-08-31 11:15 ` Johannes Schindelin
2006-08-31 16:23 ` Nicolas Pitre
2006-08-31 21:33 ` Linus Torvalds
     [not found] <44F1D826.2010701@arces.unibo.it>
     [not found] ` <7v1wr1yjjz.fsf@assigned-by-dhcp.cox.net>
     [not found]   ` <44F4006C.1040908@arces.unibo.it>
     [not found]     ` <7vmz9nn90t.fsf@assigned-by-dhcp.cox.net>
     [not found]       ` <Pine.LNX.4.64.0608291007170.27779@g5.osdl.org>
     [not found]         ` <7vodu2iryg.fsf@assigned-by-dhcp.cox.net>
     [not found]           ` <44F5615F.7010809@arces.unibo.it>
     [not found]             ` <7v7j0qihwl.fsf@assigned-by-dhcp.cox.net>
2006-08-30 18:11               ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).