From: "Shawn O. Pearce" <spearce@spearce.org>
To: "Jan Krüger" <jk@jk.gs>
Cc: Git ML <git@vger.kernel.org>, tyler@slide.com
Subject: Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
Date: Tue, 9 Dec 2008 08:24:02 -0800 [thread overview]
Message-ID: <20081209162402.GP31551@spearce.org> (raw)
In-Reply-To: <20081209093627.77039a1f@perceptron>
Jan Krüger <jk@jk.gs> wrote:
> For fixing a corrupted repository by using backup copies of individual
> files, allow write_sha1_file() to write loose files even if the object
> already exists in a pack file, but only if the existing entry is marked
> as corrupted.
Huh. So I'm digging around sha1_file.c and I'm not yet sure why
your patch makes a difference.
has_sha1_file() calls find_pack_entry() to determine which pack has
the object, and at what offset (if found). It doesn't care about
the offset, but it does care about the successful match.
find_pack_entry() already considers the bad_object_sha1 for each
pack, before it even tries the binary search within the index.
So if the entry was known to be bad has_sha1_file() will return 0,
unless the object is loose.
Where this breaks down is if the object is being created,
its very likely we didn't attempt to read it in this process.
The bad_object_sha1 table is transient and populated only when
unpacking an object entry fails. So for example during a merge
if a tree was stored in a pack and is corrupt and the merge
result produces that same tree object we won't write it out with
write_sha1_file() because it exists in a pack, but since we never
read it we also don't know the pack entry is corrupt.
Its horribly inefficient to read every object before we write it
back out. The best thing to do when faced with corruption is to
stop and repack, overlaying the object database from a known good
copy of the repository so pack-objects can use the good copy when
a corrupt object is identified.
So I agree with you that changing this in write_sha1_file() is a
bad idea for the normal good cases, but I also don't see how this
patch changes anything at all... the code path you introduced is
already implemented.
> diff --git a/sha1_file.c b/sha1_file.c
> index 6c0e251..17085cc 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2373,14 +2373,17 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha
> char hdr[32];
> int hdrlen;
>
> - /* Normally if we have it in the pack then we do not bother writing
> - * it out into .git/objects/??/?{38} file.
> - */
> write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
> if (returnsha1)
> hashcpy(returnsha1, sha1);
> - if (has_sha1_file(sha1))
> - return 0;
> + /* Normally if we have it in the pack then we do not bother writing
> + * it out into .git/objects/??/?{38} file. We do, though, if there
> + * is no chance that we have an uncorrupted version of the object.
> + */
> + if (has_sha1_file(sha1)) {
> + if (has_loose_object(sha1) || !has_packed_and_bad(sha1))
> + return 0;
> + }
> return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
> }
--
Shawn.
next prev parent reply other threads:[~2008-12-09 16:25 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-09 8:36 [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Jan Krüger
2008-12-09 9:02 ` R. Tyler Ballance
2008-12-09 16:24 ` Shawn O. Pearce [this message]
2009-01-06 22:52 ` R. Tyler Ballance
2009-01-07 1:25 ` Nicolas Pitre
2009-01-07 1:39 ` R. Tyler Ballance
2009-01-07 2:09 ` Nicolas Pitre
2009-01-07 2:47 ` R. Tyler Ballance
2009-01-07 3:21 ` Nicolas Pitre
2009-01-07 4:54 ` Linus Torvalds
2009-01-07 7:41 ` R. Tyler Ballance
2009-01-07 8:16 ` Junio C Hamano
2009-01-07 8:32 ` R. Tyler Ballance
2009-01-07 9:42 ` Junio C Hamano
2009-01-07 9:05 ` R. Tyler Ballance
2009-01-07 15:31 ` Nicolas Pitre
2009-01-07 16:07 ` Linus Torvalds
2009-01-07 16:08 ` Linus Torvalds
2009-01-07 22:55 ` R. Tyler Ballance
2009-01-07 23:29 ` Linus Torvalds
2009-01-08 0:28 ` Public repro case! " R. Tyler Ballance
2009-01-08 0:48 ` Linus Torvalds
2009-01-08 0:57 ` R. Tyler Ballance
2009-01-08 1:08 ` Linus Torvalds
2009-01-08 1:29 ` Linus Torvalds
2009-01-08 1:46 ` Shawn O. Pearce
2009-01-08 2:21 ` James Pickens
2009-01-08 2:43 ` Shawn O. Pearce
2009-01-08 5:40 ` Junio C Hamano
2009-01-08 6:04 ` Shawn O. Pearce
2009-01-08 2:52 ` Boyd Stephen Smith Jr.
2009-01-08 2:52 ` Linus Torvalds
2009-01-08 3:01 ` Shawn O. Pearce
2009-01-08 3:06 ` Linus Torvalds
2009-01-08 3:13 ` Shawn O. Pearce
2009-01-08 3:16 ` [PATCH] Wrap inflateInit to retry allocation after releasing pack memory Shawn O. Pearce
2009-01-08 3:54 ` Linus Torvalds
2009-01-08 5:23 ` Junio C Hamano
2009-01-08 15:35 ` Linus Torvalds
2009-01-08 15:34 ` Shawn O. Pearce
2009-01-08 16:14 ` Linus Torvalds
2009-01-08 18:15 ` R. Tyler Ballance
2009-01-08 20:22 ` Linus Torvalds
2009-01-08 20:37 ` R. Tyler Ballance
2009-01-09 1:43 ` Junio C Hamano
2009-01-08 0:37 ` [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Linus Torvalds
2009-01-08 0:49 ` R. Tyler Ballance
2009-01-08 1:01 ` Linus Torvalds
2009-01-08 1:06 ` R. Tyler Ballance
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081209162402.GP31551@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=jk@jk.gs \
--cc=tyler@slide.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).