git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fsck option to remove corrupt objects - why/why not?
       [not found] <20141015234637.9B4FC781EFB@mail110.syd.optusnet.com.au>
@ 2014-10-16  0:13 ` Ben Aveling
  2014-10-16  9:04   ` Johan Herland
  2014-10-16 11:59   ` Matthieu Moy
  0 siblings, 2 replies; 6+ messages in thread
From: Ben Aveling @ 2014-10-16  0:13 UTC (permalink / raw)
  To: git

On 14/10/2014 19:21, Jeff King wrote:
> On Mon, Oct 13, 2014 at 09:37:27AM +1100, Ben Aveling wrote:
>> A question about fsck - is there a reason it doesn't have an option to
>> delete bad objects?
> If the objects are reachable, then deleting them would create other big
> problems (i.e., we would be breaking the object graph!).

The man page for fsck advises:

    "Any corrupt objects you will have to find in backups or other
    archives (i.e., you can just remove them and do an /rsync/ with some
    other site in the hopes that somebody else has the object you have
    corrupted)."


And that seems sensible to me - the object is corrupt, it is unusable, 
the object graph is already broken, we already have big problems, 
removing the corrupt object(s) doesn't create any new problems, and it 
allows the possibility that the damaged objects can be restored.

I ask because I have a corrupt repository, and every time I run fsck, it 
reports one corrupt object, then stops. I could write a script to 
repeatedly call fsck and then remove the next corrupt object, but it 
raises the question for me; could it make sense to extend fsck with the 
option to do to the removes? Or even better, do the removes and then do 
the necessary [r]sync, assuming the user has another repository that has 
a good copy of the bad objects, which in this case I do.

Regards, Ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fsck option to remove corrupt objects - why/why not?
  2014-10-16  0:13 ` fsck option to remove corrupt objects - why/why not? Ben Aveling
@ 2014-10-16  9:04   ` Johan Herland
  2014-10-16 12:25     ` Jeff King
  2014-10-16 16:36     ` Junio C Hamano
  2014-10-16 11:59   ` Matthieu Moy
  1 sibling, 2 replies; 6+ messages in thread
From: Johan Herland @ 2014-10-16  9:04 UTC (permalink / raw)
  To: Ben Aveling; +Cc: Git mailing list

On Thu, Oct 16, 2014 at 2:13 AM, Ben Aveling <bena.001@optusnet.com.au> wrote:
> On 14/10/2014 19:21, Jeff King wrote:
>> On Mon, Oct 13, 2014 at 09:37:27AM +1100, Ben Aveling wrote:
>>> A question about fsck - is there a reason it doesn't have an option to
>>> delete bad objects?
>>
>> If the objects are reachable, then deleting them would create other big
>> problems (i.e., we would be breaking the object graph!).
>
>
> The man page for fsck advises:
>
>    "Any corrupt objects you will have to find in backups or other
>    archives (i.e., you can just remove them and do an /rsync/ with some
>    other site in the hopes that somebody else has the object you have
>    corrupted)."
>
>
> And that seems sensible to me - the object is corrupt, it is unusable, the
> object graph is already broken, we already have big problems, removing the
> corrupt object(s) doesn't create any new problems, and it allows the
> possibility that the damaged objects can be restored.
>
> I ask because I have a corrupt repository, and every time I run fsck, it
> reports one corrupt object, then stops. I could write a script to repeatedly
> call fsck and then remove the next corrupt object, but it raises the
> question for me; could it make sense to extend fsck with the option to do to
> the removes?

I am positive to this idea. Yesterday a colleague of mine came to me
with a repo containing a single corrupt object (in a 1.2GB packfile).
We were lucky, since we had a copy of the repo with a good copy of the
same object. However, we were lucky in a couple of other respects, as
well:

I simply copied the packfile containing the good copy into the
corrupted repo, and then ran a "git gc", which "happened" to use the
good copy of the corrupted object and complete successfully (instead
of barfing on the bad copy). The GC then removed the old
(now-obsolete) packfiles, and thus the corruption was gone.

However, exactly _why_ git happened to prefer the good copy in my
copied packfile instead of the bad copy in the existing packfile, I do
not know. I suspect some amount of pure luck was involved. Indeed, I
feared I would have to explode the corrupt pack, then manually replace
the )(now-loose) bad copy with a good copy (from a similarly exploded
pristine pack), and then finally repack everything again. That said,
I'm not at all sure that Git would be able to successfully explode a
pack containing corrupt objects...

I think a better solution would be to tell fsck to remove the corrupt
object(s), as you suggest above, and then copy in the good pack. In
that case, there would be no question that the good copy would be used
in the subsequent GC.

> Or even better, do the removes and then do the necessary
> [r]sync, assuming the user has another repository that has a good copy of
> the bad objects, which in this case I do.

Hmm. I am not sure we want to automate the syncing step. First, git
cannot know _which_ remote is likely to have a good copy of the bad
object. Second, we do not necessarily know what caused the corruption
in the first place, and whether syncing with a remote (which will
create certain amount of write activity on a possibly dying disk
drive) is a good idea at all. Finally, this syncing step will have to
bypass Git's usual reachability analysis (which easily skips fetching
a corrupt blob from otherwise-reachable history), is more involved
than simply calling out to "git fetch"...


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fsck option to remove corrupt objects - why/why not?
  2014-10-16  0:13 ` fsck option to remove corrupt objects - why/why not? Ben Aveling
  2014-10-16  9:04   ` Johan Herland
@ 2014-10-16 11:59   ` Matthieu Moy
  1 sibling, 0 replies; 6+ messages in thread
From: Matthieu Moy @ 2014-10-16 11:59 UTC (permalink / raw)
  To: Ben Aveling; +Cc: git

Ben Aveling <bena.001@optusnet.com.au> writes:

> And that seems sensible to me - the object is corrupt, it is unusable,
> the object graph is already broken, we already have big problems,
> removing the corrupt object(s) doesn't create any new problems, and it
> allows the possibility that the damaged objects can be restored.

Removing completely may remove a chance to restore the corrupt object
(rather unlikely, but I can imagine fine binary file surgery to un-break
a broken object file).

But we could move them out of Git's object directory (a bit like
.git/lost-found, we could have .git/corrupt). For unpacked objects, it's
trivial (just mv them in the directory). For packed objects, I don't
know what happens in case they are corrupt. That would solve essentially
any problem that you can solve by removing the file, but it makes the
operation reversible.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fsck option to remove corrupt objects - why/why not?
  2014-10-16  9:04   ` Johan Herland
@ 2014-10-16 12:25     ` Jeff King
  2014-10-16 12:48       ` Johan Herland
  2014-10-16 16:36     ` Junio C Hamano
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff King @ 2014-10-16 12:25 UTC (permalink / raw)
  To: Johan Herland; +Cc: Ben Aveling, Git mailing list

On Thu, Oct 16, 2014 at 11:04:04AM +0200, Johan Herland wrote:

> I simply copied the packfile containing the good copy into the
> corrupted repo, and then ran a "git gc", which "happened" to use the
> good copy of the corrupted object and complete successfully (instead
> of barfing on the bad copy). The GC then removed the old
> (now-obsolete) packfiles, and thus the corruption was gone.
> 
> However, exactly _why_ git happened to prefer the good copy in my
> copied packfile instead of the bad copy in the existing packfile, I do
> not know. I suspect some amount of pure luck was involved.

I'm not sure that it is luck, but more like 8eca0b4 (implement some
resilience against pack corruptions, 2008-06-23) working as intended[1].
Generally, git should be able to warn about corrupted objects and look
in other packs for them (both for regular operations, and for
repacking).

-Peff

[1] That's just one of the many commits dealing with this. Try running
    "git log --author=Nicolas.Pitre --grep=corrupt" for more. :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fsck option to remove corrupt objects - why/why not?
  2014-10-16 12:25     ` Jeff King
@ 2014-10-16 12:48       ` Johan Herland
  0 siblings, 0 replies; 6+ messages in thread
From: Johan Herland @ 2014-10-16 12:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Ben Aveling, Git mailing list

On Thu, Oct 16, 2014 at 2:25 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Oct 16, 2014 at 11:04:04AM +0200, Johan Herland wrote:
>> I simply copied the packfile containing the good copy into the
>> corrupted repo, and then ran a "git gc", which "happened" to use the
>> good copy of the corrupted object and complete successfully (instead
>> of barfing on the bad copy). The GC then removed the old
>> (now-obsolete) packfiles, and thus the corruption was gone.
>>
>> However, exactly _why_ git happened to prefer the good copy in my
>> copied packfile instead of the bad copy in the existing packfile, I do
>> not know. I suspect some amount of pure luck was involved.
>
> I'm not sure that it is luck, but more like 8eca0b4 (implement some
> resilience against pack corruptions, 2008-06-23) working as intended[1].
> Generally, git should be able to warn about corrupted objects and look
> in other packs for them (both for regular operations, and for
> repacking).
>
> -Peff
>
> [1] That's just one of the many commits dealing with this. Try running
>     "git log --author=Nicolas.Pitre --grep=corrupt" for more. :)

Indeed, from reading the logs, it seems what I assumed was a lucky
strike, was actually carefully designed behavior. With that in mind,
I'm no longer so sure that fsck actually needs an option to remove
corrupt objects. Instead, it's probably better to leave the corrupt
object in place until a good copy can be located and copied into the
repo, at which point Nicolas' brilliant work will make sure a simple
repack takes care of fixing the corruption.

That said, we should consider documenting this strategy for fixing corruptions:
 - Locate the a good copy of the affected objects in another repo
 - Copy relevant pack file or loose object into this repo
 - Run "git gc"
 - Profit!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fsck option to remove corrupt objects - why/why not?
  2014-10-16  9:04   ` Johan Herland
  2014-10-16 12:25     ` Jeff King
@ 2014-10-16 16:36     ` Junio C Hamano
  1 sibling, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2014-10-16 16:36 UTC (permalink / raw)
  To: Johan Herland; +Cc: Ben Aveling, Git mailing list

Johan Herland <johan@herland.net> writes:

> I simply copied the packfile containing the good copy into the
> corrupted repo, and then ran a "git gc", which "happened" to use the
> good copy of the corrupted object and complete successfully (instead
> of barfing on the bad copy). The GC then removed the old
> (now-obsolete) packfiles, and thus the corruption was gone.
>
> However, exactly _why_ git happened to prefer the good copy in my
> copied packfile instead of the bad copy in the existing packfile, I do
> not know.

By design ;-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-10-16 16:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20141015234637.9B4FC781EFB@mail110.syd.optusnet.com.au>
2014-10-16  0:13 ` fsck option to remove corrupt objects - why/why not? Ben Aveling
2014-10-16  9:04   ` Johan Herland
2014-10-16 12:25     ` Jeff King
2014-10-16 12:48       ` Johan Herland
2014-10-16 16:36     ` Junio C Hamano
2014-10-16 11:59   ` Matthieu Moy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).