From: Jonathan Nieder <jrnieder@gmail.com>
To: Dmitry Ivankov <divanorama@gmail.com>
Cc: git@vger.kernel.org, "Shawn O. Pearce" <spearce@spearce.org>,
David Barr <davidbarr@google.com>
Subject: Re: [PATCH/WIP 6/7] fast-import: workaround data corruption
Date: Thu, 28 Jul 2011 08:31:34 +0200 [thread overview]
Message-ID: <20110728063134.GA4179@elie> (raw)
In-Reply-To: <1311828370-30477-7-git-send-email-divanorama@gmail.com>
Hi!
Dmitry Ivankov wrote:
> fast-import keeps track of some delta-base for tree objects. When it is
> time to compute the delta, base object is constructed from in-memory
> tree representation using children's delta bases sha1. But these can be
> unrelated due to several bugs, and it leads to object with wrong sha1
> being delta-written to the packfile.
This seems like as good a starting point as any. Small language
nitpicks:
- "some delta-base" is somewhat vague
- missing article (probably "the") before "base object" and
"children's"
- does "children's delta bases sha1" mean "the pre-computed object
names and modes in versions[0].{sha1,mode} for each tree entry" or
something else?
- when you say "unrelated": unrelated to what?
> We have the base sha1 and what we think it's data is. Verify sha1 and if
> it doesn't match, report it to stderr and don't use delta for this tree.
At any rate, if I understand correctly, the idea is that when store_tree()
is being called, root->versions[0].sha1 does not match the tree object
implied by root->tree->entries[*]->{name,versions[0].{mode,sha1}}. This
patch adds a quick check to notice when that happens.
Makes a lot of sense, especially as a demonstration of the problem.
store_object() returns early in many cases so this probably does make
the bug easier to find (good).
I wonder if it would make more sense to perform this sanity check in
store_object() so blobs could benefit, too. How much does the check slow
fast-import down (if at all)?
> We could also die() here when bugs are fixed.
If the check is fast, sure, why not? :) The only reason I can think
of is that computing a SHA-1 is an O(size of object) cost which it
would be nice to avoid if profiling shows it is noticeable.
> Or we can see if the data
> we've got is from our pack file and so still try to use it as a base.
Can you elaborate? When would versions[0].sha1 not match the data
but the data still be in the pack file?
> --- a/fast-import.c
> +++ b/fast-import.c
[...]
> @@ -1486,10 +1487,21 @@ static void store_tree(struct tree_entry *root)
>
> le = find_object(root->versions[0].sha1);
> if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
> + unsigned char sh[20];
What does "sh" stand for?
> mktree(t, 0, &old_tree);
> lo.data = old_tree;
> lo.offset = le->idx.offset;
> lo.depth = t->delta_depth;
> +
> + prepare_object_hash(OBJ_TREE, &old_tree, NULL, NULL, sh);
> + if (hashcmp(sh, root->versions[0].sha1)) {
> + fprintf(stderr, "internal sha1 delta base mismatch,"
> + " won't use delta for that tree\n");
> + lo.data = empty;
To avoid a memory leak and introducing an unnecessary variable:
strbuf_reset(&lo.data);
Thanks, that was interesting.
next prev parent reply other threads:[~2011-07-28 6:31 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-28 4:46 [PATCH/WIP 0/7] was: long fast-import errors out "failed to apply delta" Dmitry Ivankov
2011-07-28 4:46 ` [PATCH/WIP 1/7] fast-import: extract object preparation function Dmitry Ivankov
2011-07-28 4:46 ` [PATCH/WIP 2/7] fast-import: be saner with temporary trees Dmitry Ivankov
2011-07-28 7:27 ` Jonathan Nieder
2011-07-28 4:46 ` [PATCH/WIP 3/7] fast-import: fix a data corruption in parse_ls Dmitry Ivankov
2011-07-28 7:34 ` Jonathan Nieder
2011-07-28 4:46 ` [PATCH/WIP 4/7] fast-import: fix data corruption in store_tree Dmitry Ivankov
2011-07-28 7:42 ` Jonathan Nieder
2011-07-28 8:11 ` Dmitry Ivankov
2011-07-28 4:46 ` [PATCH/WIP 5/7] fast-import: extract tree_content reading function Dmitry Ivankov
2011-07-28 4:46 ` [PATCH/WIP 6/7] fast-import: workaround data corruption Dmitry Ivankov
2011-07-28 6:31 ` Jonathan Nieder [this message]
2011-07-28 4:46 ` [PATCH/WIP 7/7] fast-import: fix data corruption in load_tree Dmitry Ivankov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110728063134.GA4179@elie \
--to=jrnieder@gmail.com \
--cc=davidbarr@google.com \
--cc=divanorama@gmail.com \
--cc=git@vger.kernel.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.