git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, git@vger.kernel.org
Subject: [PATCH] add a howto document about corrupted blob recovery
Date: Fri, 09 Nov 2007 12:28:19 -0500 (EST)	[thread overview]
Message-ID: <alpine.LFD.0.9999.0711091221210.21255@xanadu.home> (raw)
In-Reply-To: <alpine.LFD.0.999.0711090758560.15101@woody.linux-foundation.org>

Extracted from a post by Linus on the mailing list.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---

On Fri, 9 Nov 2007, Linus Torvalds wrote:

> But since you don't seem to have backups right now, the good news is that 
> especially with a single blob being corrupt, these things *are* somewhat 
> debuggable.

I was in the process of writing a similar message, but Linus was quicker 
and his version is actually much nicer.  Certainly good howto material.

diff --git a/Documentation/howto/recover-corrupted-blob-object.txt b/Documentation/howto/recover-corrupted-blob-object.txt
new file mode 100644
index 0000000..9b6853c
--- /dev/null
+++ b/Documentation/howto/recover-corrupted-blob-object.txt
@@ -0,0 +1,134 @@
+Date: Fri, 9 Nov 2007 08:28:38 -0800 (PST)
+From: Linus Torvalds <torvalds@linux-foundation.org>
+Subject: corrupt object on git-gc
+Abstract: Some tricks to reconstruct blob objects in order to fix
+ a corrupted repository.
+
+On Fri, 9 Nov 2007, Yossi Leybovich wrote:
+> 
+> Did not help still the repository look for this object?
+> Any one know how can I track this object and understand which file is it
+
+So exactly *because* the SHA1 hash is cryptographically secure, the hash 
+itself doesn't actually tell you anything, in order to fix a corrupt 
+object you basically have to find the "original source" for it. 
+
+The easiest way to do that is almost always to have backups, and find the 
+same object somewhere else. Backups really are a good idea, and git makes 
+it pretty easy (if nothing else, just clone the repository somewhere else, 
+and make sure that you do *not* use a hard-linked clone, and preferably 
+not the same disk/machine).
+
+But since you don't seem to have backups right now, the good news is that 
+especially with a single blob being corrupt, these things *are* somewhat 
+debuggable.
+
+First off, move the corrupt object away, and *save* it. The most common 
+cause of corruption so far has been memory corruption, but even so, there 
+are people who would be interested in seeing the corruption - but it's 
+basically impossible to judge the corruption until we can also see the 
+original object, so right now the corrupt object is useless, but it's very 
+interesting for the future, in the hope that you can re-create a 
+non-corrupt version.
+
+So:
+
+> ib]$ mv .git/objects/4b/9458b3786228369c63936db65827de3cc06200 ../
+
+This is the right thing to do, although it's usually best to save it under 
+it's full SHA1 name (you just dropped the "4b" from the result ;).
+
+Let's see what that tells us:
+
+> ib]$ git-fsck --full
+> broken link from    tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8
+>              to    blob 4b9458b3786228369c63936db65827de3cc06200
+> missing blob 4b9458b3786228369c63936db65827de3cc06200
+
+Ok, I removed the "dangling commit" messages, because they are just 
+messages about the fact that you probably have rebased etc, so they're not 
+at all interesting. But what remains is still very useful. In particular, 
+we now know which tree points to it!
+
+Now you can do
+
+	git ls-tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8
+
+which will show something like
+
+	100644 blob 8d14531846b95bfa3564b58ccfb7913a034323b8    .gitignore
+	100644 blob ebf9bf84da0aab5ed944264a5db2a65fe3a3e883    .mailmap
+	100644 blob ca442d313d86dc67e0a2e5d584b465bd382cbf5c    COPYING
+	100644 blob ee909f2cc49e54f0799a4739d24c4cb9151ae453    CREDITS
+	040000 tree 0f5f709c17ad89e72bdbbef6ea221c69807009f6    Documentation
+	100644 blob 1570d248ad9237e4fa6e4d079336b9da62d9ba32    Kbuild
+	100644 blob 1c7c229a092665b11cd46a25dbd40feeb31661d9    MAINTAINERS
+	...
+
+and you should now have a line that looks like
+
+	10064 blob 4b9458b3786228369c63936db65827de3cc06200	my-magic-file
+
+in the output. This already tells you a *lot* it tells you what file the 
+corrupt blob came from!
+
+Now, it doesn't tell you quite enough, though: it doesn't tell what 
+*version* of the file didn't get correctly written! You might be really 
+lucky, and it may be the version that you already have checked out in your 
+working tree, in which case fixing this problem is really simple, just do
+
+	git hash-object -w my-magic-file
+
+again, and if it outputs the missing SHA1 (4b945..) you're now all done!
+
+But that's the really lucky case, so let's assume that it was some older 
+version that was broken. How do you tell which version it was?
+
+The easiest way to do it is to do
+
+	git log --raw --all --full-history -- subdirectory/my-magic-file 
+
+and that will show you the whole log for that file (please realize that 
+the tree you had may not be the top-level tree, so you need to figure out 
+which subdirectory it was in on your own), and because you're asking for 
+raw output, you'll now get something like
+
+	commit abc
+	Author:
+	Date:
+	  ..
+	:100644 100644 4b9458b... newsha... M  somedirectory/my-magic-file
+
+
+	commit xyz
+	Author:
+	Date:
+	
+	  ..
+	:100644 100644 oldsha... 4b9458b... M	somedirectory/my-magic-file
+
+and this actually tells you what the *previous* and *subsequent* versions 
+of that file were! So now you can look at those ("oldsha" and "newsha" 
+respectively), and hopefully you have done commits often, and can 
+re-create the missing my-magic-file version by looking at those older and 
+newer versions!
+
+If you can do that, you can now recreate the missing object with
+
+	git hash-object -w <recreated-file>
+
+and your repository is good again!
+
+(Btw, you could have ignored the fsck, and started with doing a 
+
+	git log --raw --all
+
+and just looked for the sha of the missing object (4b9458b..) in that 
+whole thing. It's up to you - git does *have* a lot of information, it is 
+just missing one particular blob version.
+
+Trying to recreate trees and especially commits is *much* harder. So you 
+were lucky that it's a blob. It's quite possible that you can recreate the 
+thing.
+
+			Linus

  reply	other threads:[~2007-11-09 17:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-09 13:38 corrupt object on git-gc Yossi Leybovich
2007-11-09 13:46 ` Andreas Ericsson
2007-11-09 15:01   ` Yossi Leybovich
2007-11-09 15:34     ` Johannes Sixt
2007-11-09 15:53       ` Yossi Leybovich
2007-11-09 16:03         ` Johannes Sixt
2007-11-09 16:03         ` Nicolas Pitre
2007-11-09 16:31           ` Yossi Leybovich
2007-11-09 16:52             ` Nicolas Pitre
2007-11-09 16:28 ` Linus Torvalds
2007-11-09 17:28   ` Nicolas Pitre [this message]
2007-11-09 17:30     ` [PATCH] add a howto document about corrupted blob recovery Johannes Schindelin
2007-11-26  2:12     ` J. Bruce Fields
2007-11-09 17:53   ` corrupt object on git-gc Yossi Leybovich
2007-11-09 18:02     ` Linus Torvalds
2007-11-09 18:37       ` Yossi Leybovich
2007-11-09 18:55         ` Linus Torvalds
2007-11-09 19:07           ` Mike Hommey
2007-11-09 19:41             ` Yossi Leybovich
2007-11-09 19:52               ` Mike Hommey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.0.9999.0711091221210.21255@xanadu.home \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).