Git development
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "R. Tyler Ballance" <tyler@slide.com>
Cc: "Nicolas Pitre" <nico@cam.org>, "Jan Krüger" <jk@jk.gs>,
	"Git ML" <git@vger.kernel.org>
Subject: Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
Date: Tue, 6 Jan 2009 20:54:06 -0800 (PST)	[thread overview]
Message-ID: <alpine.LFD.2.00.0901062026500.3057@localhost.localdomain> (raw)
In-Reply-To: <1231292360.8870.61.camel@starfruit>



On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
> 
> I'll back the patch out and redeploy, it's worth mentioning that a
> coworker of mine just got the issue as well (on 1.6.1). He was able to
> `git pull` and the error went away, but I doubt that it "magically fixed
> itself"

Quite frankly, that behaviour sounds like a disk _cache_ corruption issue. 
The fact that some corruption "comes and goes" and sometimes magically 
heals itself sounds very much like some disk cache problem, and then that 
particular part of the cache gets replaced and then when re-populated it 
is magically correct.

We had that in one case with a Linux NFS client, where a rename across 
directories caused problems.

This was a networked filesystem on OS X, right? File caching is much more 
"interesting" in networked filesystems than it is in normal private 
on-disk ones.

> I've tarred one of the repositories that had it in a reproducible state
> so I can create a build and extract the tar and run against that to
> verify any patches anybody might have, but unfortunately at 7GB of
> company code and assets, I can't exactly share ;)

The thing to do is

 - untar it on some trusted machine with a local disk and a known-good 
   filesystem.

   IOW, not that networked samba share.

 - verify that it really does happen on that machine, with that untarred 
   image. Because maybe it doesn't. 

   The hope is that you caught the corruption in the cache, and it 
   actually got written out to the tar-file. But if it _is_ a disk cache 
   (well, network cache) issue, maybe the IO required to tar everything up 
   was enough to flush it, and the tar-file actually _works_ because it 
   got repopulated correctly.

   So that's why you should double-check that it really ends up being 
   corrupt after being untarred again.

 - go back and test the original git repo on the network share, preferably 
   on another client. See if the error has gone away.

 - If so, try to compare that known-corrupt filesystem with the original 
   one:  and preferably do this on another machine over the network mount. 

   See if they differ. They obviously should *not* differ, since it's an 
   tar/untar of the same files, but ...

The fact that you seem to get a _lot_ of these errors really does make it 
sound like something in your environment. It's actually really hard to get 
git to corrupt anything. Especially objects that got packed. They've been 
quiescent for a long time, they got repacked in a very simple way, they 
are totally read-only.

But it is _not_ hard to corrupt network filesystems. It's downright 
trivial with some of them, especially with some hardware (eg there's no 
end-to-end checksumming except for the _extremely_ weak 16-bit IP csum, 
and even that has been known to be disabled, or screwed up by ethernet 
cards that do IP packet offloading and thus computing the csum not on the 
data that tee user actually wrote, but the data that the card received, 
which is not necessarily at all the same thing).

And while ethernet uses a stronger CRC, that one is not end-to-end, so 
corruption on the card or in a switch in between easily defeats that too. 

Just google for something like

	"OS X" SMB "file corruption"

and you'll find quite a bit of hits. Not all that unusual.

				Linus

  parent reply	other threads:[~2009-01-07  4:56 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-09  8:36 [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Jan Krüger
2008-12-09  9:02 ` R. Tyler Ballance
2008-12-09 16:24 ` Shawn O. Pearce
2009-01-06 22:52 ` R. Tyler Ballance
2009-01-07  1:25   ` Nicolas Pitre
2009-01-07  1:39     ` R. Tyler Ballance
2009-01-07  2:09       ` Nicolas Pitre
2009-01-07  2:47         ` R. Tyler Ballance
2009-01-07  3:21           ` Nicolas Pitre
2009-01-07  4:54       ` Linus Torvalds [this message]
2009-01-07  7:41         ` R. Tyler Ballance
2009-01-07  8:16           ` Junio C Hamano
2009-01-07  8:32             ` R. Tyler Ballance
2009-01-07  9:42               ` Junio C Hamano
2009-01-07  9:05           ` R. Tyler Ballance
2009-01-07 15:31           ` Nicolas Pitre
2009-01-07 16:07           ` Linus Torvalds
2009-01-07 16:08             ` Linus Torvalds
2009-01-07 22:55             ` R. Tyler Ballance
2009-01-07 23:29               ` Linus Torvalds
2009-01-08  0:28                 ` Public repro case! " R. Tyler Ballance
2009-01-08  0:48                   ` Linus Torvalds
2009-01-08  0:57                     ` R. Tyler Ballance
2009-01-08  1:08                       ` Linus Torvalds
2009-01-08  1:29                         ` Linus Torvalds
2009-01-08  1:46                           ` Shawn O. Pearce
2009-01-08  2:21                     ` James Pickens
2009-01-08  2:43                       ` Shawn O. Pearce
2009-01-08  5:40                         ` Junio C Hamano
2009-01-08  6:04                           ` Shawn O. Pearce
2009-01-08  2:52                       ` Boyd Stephen Smith Jr.
2009-01-08  2:52                   ` Linus Torvalds
2009-01-08  3:01                     ` Shawn O. Pearce
2009-01-08  3:06                       ` Linus Torvalds
2009-01-08  3:13                         ` Shawn O. Pearce
2009-01-08  3:16                           ` [PATCH] Wrap inflateInit to retry allocation after releasing pack memory Shawn O. Pearce
2009-01-08  3:54                             ` Linus Torvalds
2009-01-08  5:23                               ` Junio C Hamano
2009-01-08 15:35                                 ` Linus Torvalds
2009-01-08 15:34                               ` Shawn O. Pearce
2009-01-08 16:14                                 ` Linus Torvalds
2009-01-08 18:15                               ` R. Tyler Ballance
2009-01-08 20:22                                 ` Linus Torvalds
2009-01-08 20:37                                   ` R. Tyler Ballance
2009-01-09  1:43                                   ` Junio C Hamano
2009-01-08  0:37                 ` [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Linus Torvalds
2009-01-08  0:49                   ` R. Tyler Ballance
2009-01-08  1:01                     ` Linus Torvalds
2009-01-08  1:06                       ` R. Tyler Ballance

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0901062026500.3057@localhost.localdomain \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=jk@jk.gs \
    --cc=nico@cam.org \
    --cc=tyler@slide.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox