git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Alex Riesen <raa.lkml@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: bug: git-repack -a -d produces broken pack on NFS
Date: Thu, 27 Apr 2006 16:54:34 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0604271630030.3701@g5.osdl.org> (raw)
In-Reply-To: <20060427213207.GA6709@steel.home>


Ok, trying to think some more about this..

On Thu, 27 Apr 2006, Alex Riesen wrote:
> 
> $SRC/linux.git$ git repack -a -d
> Generating pack...
> Done counting 235947 objects.
> Deltifying 235947 objects.
>  100% (235947/235947) done
> Writing 235947 objects.
>  100% (235947/235947) done
> Total 235947, written 235947 (delta 182131), reused 235466 (delta 181650)
> Pack pack-6dcda5a7782864d57ec44bd30ebec13b07df2c87 created.
> $SRC/linux.git$ git fsck-objects --full
> git-fsck-objects: error: Packfile .git/objects/pack/pack-6dcda5a7782864d57ec44bd30ebec13b07df2c87.pack SHA1 mismatch with idx

This is interesting on so many levels.

First off, the index file or the pack-file is clearly somehow corrupt, 
because when you then try to do the "git clone" off the result later on 
(which won't actually check the SHA1's), it gets

> git-index-pack: fatal: packfile '/mnt/large/tmp/raa/tmp/.git/objects/pack/tmp-wcRvk5': bad object at offset 102601801: inflate returned -3

which means that either the offset was wrong, or the data at that offset 
was wrong.

That made me suspect the object re-use code - it might have been broken in 
the original pack, and then on re-use the broken data would have been just 
copied over.

HOWEVER - that doesn't actually fly as an explanation, because even if the 
data itself was broken, the repack would have re-generated the SHA1, so if 
the problem had been about copying an already broken pack over, you'd have 
gotten the "git clone" error, but you would _not_ have gotten the "pack 
SHA1 does not match index" error.

So in order for the SHA1 to not match, we literally must have corrupted 
things when we created the pack-file.

However, I've stared and stared at the sha1file writing code, and I don't 
see how you _could_ corrupt it. We use it with interruptible file 
descriptors all the time (sockets - the exact same code is used to 
transfer packs over the network), and that "intr" shouldn't matter one 
whit. We're doing very safe things, as far as I can tell.

The thing is, even if a wild pointer corrupts the write buffer for the 
sha1file writing code somehow, we actually always do the "calculate the 
SHA1" and "flush the buffer to the file" together. So even if somebody 
corrupted the buffer, we'd still generate the "right" SHA1 (of the 
corrupted buffer).

So the only thing that I can see that can generate bad SHA1 checksums is
 - actual problem in the SHA1 buffers themselves (ie a wild pointer 
   corrupting the "SHA1_CTX" thing itself)
 - real filesystem corruption. With NFS, the UDP checksums aren't all that 
   strong, but the ethernet CRC should catch things (there have been 
   reports of network cards that don't check the CRC well, but quite 
   frankly, I haven't seen one in a _loong_ time)
 - RAM corruption and/or kernel NFS bugs.

I'll continue to stare at the code, but I can't see anything even remotely 
suspicious in git itself so far.

		Linus

      parent reply	other threads:[~2006-04-27 23:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-27 21:32 bug: git-repack -a -d produces broken pack on NFS Alex Riesen
2006-04-27 22:11 ` Linus Torvalds
2006-04-27 22:17   ` Junio C Hamano
2006-04-27 22:29     ` Linus Torvalds
2006-04-27 22:44       ` Junio C Hamano
2006-04-27 22:18   ` Linus Torvalds
2006-04-28 22:27   ` Alex Riesen
2006-04-28 23:18     ` Linus Torvalds
2006-04-27 23:54 ` Linus Torvalds [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0604271630030.3701@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=raa.lkml@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).