From: Jonathan Nieder <jrnieder@gmail.com>
To: Dmitry Ivankov <divanorama@gmail.com>
Cc: git@vger.kernel.org, "Shawn O. Pearce" <spearce@spearce.org>,
David Barr <davidbarr@google.com>
Subject: Re: [PATCH/WIP 6/7] fast-import: workaround data corruption
Date: Thu, 28 Jul 2011 08:31:34 +0200 [thread overview]
Message-ID: <20110728063134.GA4179@elie> (raw)
In-Reply-To: <1311828370-30477-7-git-send-email-divanorama@gmail.com>
Hi!
Dmitry Ivankov wrote:
> fast-import keeps track of some delta-base for tree objects. When it is
> time to compute the delta, base object is constructed from in-memory
> tree representation using children's delta bases sha1. But these can be
> unrelated due to several bugs, and it leads to object with wrong sha1
> being delta-written to the packfile.
This seems like as good a starting point as any. Small language
nitpicks:
- "some delta-base" is somewhat vague
- missing article (probably "the") before "base object" and
"children's"
- does "children's delta bases sha1" mean "the pre-computed object
names and modes in versions[0].{sha1,mode} for each tree entry" or
something else?
- when you say "unrelated": unrelated to what?
> We have the base sha1 and what we think it's data is. Verify sha1 and if
> it doesn't match, report it to stderr and don't use delta for this tree.
At any rate, if I understand correctly, the idea is that when store_tree()
is being called, root->versions[0].sha1 does not match the tree object
implied by root->tree->entries[*]->{name,versions[0].{mode,sha1}}. This
patch adds a quick check to notice when that happens.
Makes a lot of sense, especially as a demonstration of the problem.
store_object() returns early in many cases so this probably does make
the bug easier to find (good).
I wonder if it would make more sense to perform this sanity check in
store_object() so blobs could benefit, too. How much does the check slow
fast-import down (if at all)?
> We could also die() here when bugs are fixed.
If the check is fast, sure, why not? :) The only reason I can think
of is that computing a SHA-1 is an O(size of object) cost which it
would be nice to avoid if profiling shows it is noticeable.
> Or we can see if the data
> we've got is from our pack file and so still try to use it as a base.
Can you elaborate? When would versions[0].sha1 not match the data
but the data still be in the pack file?
> --- a/fast-import.c
> +++ b/fast-import.c
[...]
> @@ -1486,10 +1487,21 @@ static void store_tree(struct tree_entry *root)
>
> le = find_object(root->versions[0].sha1);
> if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
> + unsigned char sh[20];
What does "sh" stand for?
> mktree(t, 0, &old_tree);
> lo.data = old_tree;
> lo.offset = le->idx.offset;
> lo.depth = t->delta_depth;
> +
> + prepare_object_hash(OBJ_TREE, &old_tree, NULL, NULL, sh);
> + if (hashcmp(sh, root->versions[0].sha1)) {
> + fprintf(stderr, "internal sha1 delta base mismatch,"
> + " won't use delta for that tree\n");
> + lo.data = empty;
To avoid a memory leak and introducing an unnecessary variable:
strbuf_reset(&lo.data);
Thanks, that was interesting.
next prev parent reply other threads:[~2011-07-28 6:31 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-28 4:46 [PATCH/WIP 0/7] was: long fast-import errors out "failed to apply delta" Dmitry Ivankov
2011-07-28 4:46 ` [PATCH/WIP 1/7] fast-import: extract object preparation function Dmitry Ivankov
2011-07-28 4:46 ` [PATCH/WIP 2/7] fast-import: be saner with temporary trees Dmitry Ivankov
2011-07-28 7:27 ` Jonathan Nieder
2011-07-28 4:46 ` [PATCH/WIP 3/7] fast-import: fix a data corruption in parse_ls Dmitry Ivankov
2011-07-28 7:34 ` Jonathan Nieder
2011-07-28 4:46 ` [PATCH/WIP 4/7] fast-import: fix data corruption in store_tree Dmitry Ivankov
2011-07-28 7:42 ` Jonathan Nieder
2011-07-28 8:11 ` Dmitry Ivankov
2011-07-28 4:46 ` [PATCH/WIP 5/7] fast-import: extract tree_content reading function Dmitry Ivankov
2011-07-28 4:46 ` [PATCH/WIP 6/7] fast-import: workaround data corruption Dmitry Ivankov
2011-07-28 6:31 ` Jonathan Nieder [this message]
2011-07-28 4:46 ` [PATCH/WIP 7/7] fast-import: fix data corruption in load_tree Dmitry Ivankov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110728063134.GA4179@elie \
--to=jrnieder@gmail.com \
--cc=davidbarr@google.com \
--cc=divanorama@gmail.com \
--cc=git@vger.kernel.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).