git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Dmitry Ivankov <divanorama@gmail.com>
Cc: git@vger.kernel.org, "Shawn O. Pearce" <spearce@spearce.org>,
	David Barr <davidbarr@google.com>
Subject: Re: [PATCH/WIP 6/7] fast-import: workaround data corruption
Date: Thu, 28 Jul 2011 08:31:34 +0200	[thread overview]
Message-ID: <20110728063134.GA4179@elie> (raw)
In-Reply-To: <1311828370-30477-7-git-send-email-divanorama@gmail.com>

Hi!

Dmitry Ivankov wrote:

> fast-import keeps track of some delta-base for tree objects. When it is
> time to compute the delta, base object is constructed from in-memory
> tree representation using children's delta bases sha1. But these can be
> unrelated due to several bugs, and it leads to object with wrong sha1
> being delta-written to the packfile.

This seems like as good a starting point as any.  Small language
nitpicks:

 - "some delta-base" is somewhat vague

 - missing article (probably "the") before "base object" and
   "children's"

 - does "children's delta bases sha1" mean "the pre-computed object
   names and modes in versions[0].{sha1,mode} for each tree entry" or
   something else?

 - when you say "unrelated": unrelated to what?

> We have the base sha1 and what we think it's data is. Verify sha1 and if
> it doesn't match, report it to stderr and don't use delta for this tree.

At any rate, if I understand correctly, the idea is that when store_tree()
is being called, root->versions[0].sha1 does not match the tree object
implied by root->tree->entries[*]->{name,versions[0].{mode,sha1}}.  This
patch adds a quick check to notice when that happens.

Makes a lot of sense, especially as a demonstration of the problem.
store_object() returns early in many cases so this probably does make
the bug easier to find (good).

I wonder if it would make more sense to perform this sanity check in
store_object() so blobs could benefit, too.  How much does the check slow
fast-import down (if at all)?

> We could also die() here when bugs are fixed.

If the check is fast, sure, why not? :)  The only reason I can think
of is that computing a SHA-1 is an O(size of object) cost which it
would be nice to avoid if profiling shows it is noticeable.

> Or we can see if the data
> we've got is from our pack file and so still try to use it as a base.

Can you elaborate?  When would versions[0].sha1 not match the data
but the data still be in the pack file?

> --- a/fast-import.c
> +++ b/fast-import.c
[...]
> @@ -1486,10 +1487,21 @@ static void store_tree(struct tree_entry *root)
>  
>  	le = find_object(root->versions[0].sha1);
>  	if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
> +		unsigned char sh[20];

What does "sh" stand for?

>  		mktree(t, 0, &old_tree);
>  		lo.data = old_tree;
>  		lo.offset = le->idx.offset;
>  		lo.depth = t->delta_depth;
> +
> +		prepare_object_hash(OBJ_TREE, &old_tree, NULL, NULL, sh);
> +		if (hashcmp(sh, root->versions[0].sha1)) {
> +			fprintf(stderr, "internal sha1 delta base mismatch,"
> +					" won't use delta for that tree\n");
> +			lo.data = empty;

To avoid a memory leak and introducing an unnecessary variable:

			strbuf_reset(&lo.data);

Thanks, that was interesting.

  reply	other threads:[~2011-07-28  6:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-28  4:46 [PATCH/WIP 0/7] was: long fast-import errors out "failed to apply delta" Dmitry Ivankov
2011-07-28  4:46 ` [PATCH/WIP 1/7] fast-import: extract object preparation function Dmitry Ivankov
2011-07-28  4:46 ` [PATCH/WIP 2/7] fast-import: be saner with temporary trees Dmitry Ivankov
2011-07-28  7:27   ` Jonathan Nieder
2011-07-28  4:46 ` [PATCH/WIP 3/7] fast-import: fix a data corruption in parse_ls Dmitry Ivankov
2011-07-28  7:34   ` Jonathan Nieder
2011-07-28  4:46 ` [PATCH/WIP 4/7] fast-import: fix data corruption in store_tree Dmitry Ivankov
2011-07-28  7:42   ` Jonathan Nieder
2011-07-28  8:11     ` Dmitry Ivankov
2011-07-28  4:46 ` [PATCH/WIP 5/7] fast-import: extract tree_content reading function Dmitry Ivankov
2011-07-28  4:46 ` [PATCH/WIP 6/7] fast-import: workaround data corruption Dmitry Ivankov
2011-07-28  6:31   ` Jonathan Nieder [this message]
2011-07-28  4:46 ` [PATCH/WIP 7/7] fast-import: fix data corruption in load_tree Dmitry Ivankov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110728063134.GA4179@elie \
    --to=jrnieder@gmail.com \
    --cc=davidbarr@google.com \
    --cc=divanorama@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).