From: Dmitry Ivankov <divanorama@gmail.com>
To: git@vger.kernel.org
Cc: Jonathan Nieder <jrnieder@gmail.com>,
Andreas Schwab <schwab@linux-m68k.org>,
"Shawn O. Pearce" <spearce@spearce.org>,
David Barr <davidbarr@google.com>,
Dmitry Ivankov <divanorama@gmail.com>
Subject: [PATCH v4] fast-import: do not write bad delta for replaced subtrees
Date: Sat, 20 Aug 2011 23:22:26 +0600 [thread overview]
Message-ID: <1313860946-1596-1-git-send-email-divanorama@gmail.com> (raw)
In-Reply-To: <20110820154356.GB15864@elie.gateway.2wire.net>
How about adding a new bit field "no_delta" instead? The patch is
smaller this way. Also could 04000 theoretically be S_IFDIR on some
platform?
Changes:
- switch to a separate no_delta bit in tree_entry, don't assume that it
is used only for trees. Could be useful for blobs too once their delta
base logic changes (now it's just delta against blob that was stored
just before the current one)
- when setting no_delta = 1 don't check for S_ISDIR(versions[0].mode),
this is a redundant check and logic duplication. Who knows, maybe some
day we'll want to delta a tree against blob. :)
- removed the asserts on mode, as now "mode" meaning isn't changed
- removed the duplicated signed-of-by :)
-- >8 --
Subject: fast-import: do not write bad delta for replaced subtrees
To produce deltas for tree objects, fast-import tracks two versions of
each tree entry - a base and the current version. The base version on
a tree stands both for a delta base of this tree, and for a entry
inside the delta base of the parent tree. So care needs to be taken to
keep them consistent.
Unfortunately this all gets forgotten when replacing one subtree by
another using tree_content_set. When writing an entry representing
the new subtree, it keeps the old base sha1, since it is needed by the
parent tree. But the new tree doesn't have the implied base version
entries, and when it is time to write it to pack, git writes an
invalid delta that is declared to have one base (the old tree name)
but actually has another one (the new tree for an "M" command, or the
tree's old base for an "R" or "C" command).
How to fix it? Modifying the new subtree's entries to match the
declared base would be expensive, since it requires reading the tree
corresponding to the declared base from the object db and recursively
rewriting children's base versions to match. Invalidating the parent
trees' bases would involve recursively walking up the tree and
disables deltas for each tree it touches, meaning a larger pack.
Let's just mark the new tree as do-not-delta-me instead. Add a new bit
for tree_entry named no_delta. It is set to 1 when subtree is replaced
and reset back to 0 when we set a new legal delta base, that is when
e->versions[0] is changed.
tree_content_replace is in a similar predicament to tree_content_set,
except that because it is only used to replace the root, just
invalidating the base sha1 there (instead of setting the no-delta bit)
is fine.
Initial hack by Jonathan, test and description by Dmitry.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
fast-import.c | 27 ++++++++++++++++++++++++++-
t/t9300-fast-import.sh | 40 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 66 insertions(+), 1 deletions(-)
diff --git a/fast-import.c b/fast-import.c
index 7cc2262..3bae498 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -221,6 +221,7 @@ struct tree_entry {
uint16_t mode;
unsigned char sha1[20];
} versions[2];
+ unsigned no_delta : 1;
};
struct tree_content {
@@ -1368,6 +1369,7 @@ static void load_tree(struct tree_entry *root)
if (!c)
die("Corrupt mode in %s", sha1_to_hex(sha1));
e->versions[0].mode = e->versions[1].mode;
+ e->no_delta = 0;
e->name = to_atom(c, strlen(c));
c += e->name->str_len + 1;
hashcpy(e->versions[0].sha1, (unsigned char *)c);
@@ -1437,7 +1439,10 @@ static void store_tree(struct tree_entry *root)
store_tree(t->entries[i]);
}
- le = find_object(root->versions[0].sha1);
+ if (root->no_delta)
+ le = NULL;
+ else
+ le = find_object(root->versions[0].sha1);
if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
mktree(t, 0, &old_tree);
lo.data = old_tree;
@@ -1453,6 +1458,7 @@ static void store_tree(struct tree_entry *root)
struct tree_entry *e = t->entries[i];
if (e->versions[1].mode) {
e->versions[0].mode = e->versions[1].mode;
+ e->no_delta = 0;
hashcpy(e->versions[0].sha1, e->versions[1].sha1);
t->entries[j++] = e;
} else {
@@ -1471,6 +1477,7 @@ static void tree_content_replace(
{
if (!S_ISDIR(mode))
die("Root cannot be a non-directory");
+ hashclr(root->versions[0].sha1);
hashcpy(root->versions[1].sha1, sha1);
if (root->tree)
release_tree_content_recursive(root->tree);
@@ -1515,11 +1522,28 @@ static int tree_content_set(
if (e->tree)
release_tree_content_recursive(e->tree);
e->tree = subtree;
+
+ /*
+ * We need to leave e->versions[0].sha1 alone
+ * to avoid modifying the preimage tree used
+ * when writing out the parent directory.
+ * But after replacing the subdir with a
+ * completely different one, e->versions[0]
+ * is not a good delta base any more, and
+ * besides, we've thrown away the tree
+ * entries needed to make a delta against it.
+ *
+ * Let's just disable deltas when the time
+ * comes to write this subtree to pack.
+ */
+ e->no_delta = 1;
+
hashclr(root->versions[1].sha1);
return 1;
}
if (!S_ISDIR(e->versions[1].mode)) {
e->tree = new_tree_content(8);
+ e->no_delta = 1;
e->versions[1].mode = S_IFDIR;
}
if (!e->tree)
@@ -1537,6 +1561,7 @@ static int tree_content_set(
e = new_tree_entry();
e->name = to_atom(p, n);
e->versions[0].mode = 0;
+ e->no_delta = 0;
hashclr(e->versions[0].sha1);
t->entries[t->entry_count++] = e;
if (slash1) {
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index f256475..04fa70d 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -734,6 +734,46 @@ test_expect_success \
git diff-tree --abbrev --raw L^ L >output &&
test_cmp expect output'
+cat >input <<INPUT_END
+blob
+mark :1
+data <<EOF
+the data
+EOF
+
+commit refs/heads/L2
+committer C O Mitter <committer@example.com> 1112912473 -0700
+data <<COMMIT
+init L2
+COMMIT
+M 644 :1 a/b/c
+M 644 :1 a/b/d
+M 644 :1 a/e/f
+
+commit refs/heads/L2
+committer C O Mitter <committer@example.com> 1112912473 -0700
+data <<COMMIT
+update L2
+COMMIT
+C a g
+C a/e g/b
+M 644 :1 g/b/h
+INPUT_END
+
+cat <<EOF >expect
+g/b/f
+g/b/h
+EOF
+
+test_expect_success \
+ 'L: modifying a copied tree does not produce a corrupt pack' \
+ 'test_when_finished "git update-ref -d refs/heads/L2" &&
+ git fast-import <input &&
+ git ls-tree L2 g/b/ >tmp &&
+ cut -f 2 <tmp >actual &&
+ test_cmp expect actual &&
+ git fsck L2'
+
###
### series M
###
--
1.7.3.4
next prev parent reply other threads:[~2011-08-20 17:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-12 10:32 [PATCH 0/3] fix data corruption in fast-import Dmitry Ivankov
2011-08-12 10:32 ` [PATCH 1/3] fast-import: extract object preparation function Dmitry Ivankov
2011-08-12 10:32 ` [PATCH 2/3] fast-import: add a check for tree delta base sha1 Dmitry Ivankov
2011-08-13 21:02 ` Jonathan Nieder
2011-08-12 10:32 ` [PATCH 3/3] fast-import: prevent producing bad delta Dmitry Ivankov
2011-08-14 18:32 ` [PATCH v2 0/2] fix data corruption in fast-import Dmitry Ivankov
2011-08-14 18:32 ` [PATCH v2 1/2] fast-import: add a test for tree delta base corruption Dmitry Ivankov
2011-08-14 18:32 ` [PATCH v2 2/2] fast-import: prevent producing bad delta Dmitry Ivankov
2011-08-20 1:09 ` [PATCH v3] fast-import: do not write bad delta for replaced subtrees Jonathan Nieder
2011-08-20 9:08 ` Andreas Schwab
2011-08-20 15:43 ` Jonathan Nieder
2011-08-20 17:22 ` Dmitry Ivankov [this message]
2011-08-20 17:48 ` [PATCH v4] " Jonathan Nieder
2011-08-20 18:28 ` Dmitry Ivankov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1313860946-1596-1-git-send-email-divanorama@gmail.com \
--to=divanorama@gmail.com \
--cc=davidbarr@google.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=schwab@linux-m68k.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).