From: Jonathan Nieder <jrnieder@gmail.com>
To: Dmitry Ivankov <divanorama@gmail.com>
Cc: git@vger.kernel.org, "Shawn O. Pearce" <spearce@spearce.org>,
David Barr <davidbarr@google.com>
Subject: Re: [PATCH 2/3] fast-import: add a check for tree delta base sha1
Date: Sat, 13 Aug 2011 16:02:21 -0500 [thread overview]
Message-ID: <20110813210221.GA16194@elie.gateway.2wire.net> (raw)
In-Reply-To: <1313145170-24471-3-git-send-email-divanorama@gmail.com>
Hi,
Dmitry Ivankov wrote:
> fast-import is able to write imported tree objects in delta format.
> It holds a tree structure in memory where each tree entry may have
> a delta base sha1 assigned. When delta base data is needed it is
> reconstructed from this in-memory structure. Though sometimes the
> delta base data doesn't match the delta base sha1 so wrong or even
> corrupt pack is produced.
I'm having trouble parsing this; not sure why. Some guesses:
- dropping the word "imported" could help, since it is the
content of trees that comes from the user, not the tree objects
- it's not clear to me what the second sentence is saying. Do you
mean that git looks at the versions[0].sha1 fields of item in
t->entries to construct an in-memory tree object to delta against,
instead of finding the object named by versions[0].sha1, inflating
it, and using it directly?
- the third sentence seems to be describing a problem, but I'm not
sure what the relationship is to this patch: will this patch fix
that problem, or does it just add a test illustrating it?
> To create a small easily reproducible test, add an excessive check
> for delta base sha1. It's not likely that computing sha1 for each
> tree delta base costs us much.
Since the first word in fast-import is "fast", I would be much
happier with some measurements with a typical import (i.e., one that
doesn't use --cat-blob-fd) than the statement "It's not likely". :)
If this tweak will continue to be useful after the fix, perhaps it
could be made optional. I haven't thought carefully about this,
though.
On to the patch.
[...]
> +++ b/fast-import.c
> @@ -1455,12 +1455,22 @@ static void store_tree(struct tree_entry *root)
> store_tree(t->entries[i]);
> }
>
> + if (!is_null_sha1(root->versions[0].sha1)
> + && S_ISDIR(root->versions[0].mode)) {
> + unsigned char old_tree_sha1[20];
> + mktree(t, 0, &old_tree);
> + prepare_object_hash(OBJ_TREE, &old_tree,
> + NULL, NULL, old_tree_sha1);
> +
> + if (hashcmp(old_tree_sha1, root->versions[0].sha1))
> + die("internal tree delta base sha1 mismatch");
This is the heart of the patch; it involves several changes.
1. construct the base object tree whether the base object is in the
current pack or not
2. calculate its hash and compare to ->versions[0].sha1 as a sanity
check.
For large trees, I fear it could be an important slowdown.
> +
> - le = find_object(root->versions[0].sha1);
> - if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
> - mktree(t, 0, &old_tree);
> + le = find_object(root->versions[0].sha1);
> + if (le && le->pack_id == pack_id) {
> lo.data = old_tree;
> lo.offset = le->idx.offset;
> lo.depth = t->delta_depth;
> }
> + }
[...]
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -734,6 +734,44 @@ test_expect_success \
[...]
> +cat >input2 <<INPUT_END
> +commit refs/heads/L2
> +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> +data <<COMMIT
> +update L2
> +COMMIT
> +from refs/heads/L2^0
> +M 040000 @A g
> +M 040000 @E g/b
> +M 040000 @E g/b/h
> +INPUT_END
> +
> +test_expect_failure \
> + 'L: verify internal tree delta base' \
> + 'git fast-import <input &&
> + A=$(git ls-tree L2 a | tr " " "\t" | cut -f 3) &&
> + E=$(git ls-tree L2 a/e | tr " " "\t" | cut -f 3) &&
> + cat input2 | sed -e "s/@A/$A/" -e "s/@E/$E/" >input &&
> + git fast-import <input'
The description ("L: verify internal tree delta base" here) should
describe something we want to work --- a facility or a statement ---
and should leave out words like "verify" unless it is a test of
verification facilities.
In this example, I guess it is testing something like "delta base is
not corrupted when replacing one directory by another"? (That's a
random, wild guess and not meant as an example to be used verbatim.)
I suppose I would be happier if we can find a way to reproduce this
without modifying the behavior in such an invasive way. Which should
be easier while thinking about the fix, so I'll move on to that.
next prev parent reply other threads:[~2011-08-13 21:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-12 10:32 [PATCH 0/3] fix data corruption in fast-import Dmitry Ivankov
2011-08-12 10:32 ` [PATCH 1/3] fast-import: extract object preparation function Dmitry Ivankov
2011-08-12 10:32 ` [PATCH 2/3] fast-import: add a check for tree delta base sha1 Dmitry Ivankov
2011-08-13 21:02 ` Jonathan Nieder [this message]
2011-08-12 10:32 ` [PATCH 3/3] fast-import: prevent producing bad delta Dmitry Ivankov
2011-08-14 18:32 ` [PATCH v2 0/2] fix data corruption in fast-import Dmitry Ivankov
2011-08-14 18:32 ` [PATCH v2 1/2] fast-import: add a test for tree delta base corruption Dmitry Ivankov
2011-08-14 18:32 ` [PATCH v2 2/2] fast-import: prevent producing bad delta Dmitry Ivankov
2011-08-20 1:09 ` [PATCH v3] fast-import: do not write bad delta for replaced subtrees Jonathan Nieder
2011-08-20 9:08 ` Andreas Schwab
2011-08-20 15:43 ` Jonathan Nieder
2011-08-20 17:22 ` [PATCH v4] " Dmitry Ivankov
2011-08-20 17:48 ` Jonathan Nieder
2011-08-20 18:28 ` Dmitry Ivankov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110813210221.GA16194@elie.gateway.2wire.net \
--to=jrnieder@gmail.com \
--cc=davidbarr@google.com \
--cc=divanorama@gmail.com \
--cc=git@vger.kernel.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).