git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: [PATCH 2/2] diffcore-rename: avoid processing duplicate destinations
Date: Thu, 26 Feb 2015 20:42:27 -0500	[thread overview]
Message-ID: <20150227014227.GB3210@peff.net> (raw)
In-Reply-To: <20150227013847.GA2983@peff.net>

The rename code cannot handle an input where we have
duplicate destinations (i.e., more than one diff_filepair in
the queue with the same string in its pair->two->path). We
end up allocating only one slot in the rename_dst mapping.
If we fill in the diff_filepair for that slot, when we
re-queue the results, we may queue that filepair multiple
times. When the diff is finally flushed, the filepair is
processed and free()d multiple times, leading to heap
corruption.

This situation should only happen when a tree diff sees
duplicates in one of the trees (see the added test for a
detailed example). Rather than handle it, the sanest thing
is just to turn off rename detection altogether for the
diff.

Signed-off-by: Jeff King <peff@peff.net>
---
Like I mentioned before, I'm OK with not checking the actual diff output
in the test. It's not like it was planned, and is just what diff_tree()
happens to produce. It does make sense, though. We descend into the
first "outer/" of the "a/" side along with the sole "outer/" of the
"b/" side. We see that the entries from "b/" are all added. Then we come
back out, and see that "a/" has another "outer/", but that "b/" does
not. So all of those entries look like they were deleted.

 diffcore-rename.c          |  8 +++--
 t/t4058-diff-duplicates.sh | 79 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 2 deletions(-)
 create mode 100755 t/t4058-diff-duplicates.sh

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 4250cc0..af1fe08 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -466,8 +466,12 @@ void diffcore_rename(struct diff_options *options)
 			else if (!DIFF_OPT_TST(options, RENAME_EMPTY) &&
 				 is_empty_blob_sha1(p->two->sha1))
 				continue;
-			else
-				add_rename_dst(p->two);
+			else if (add_rename_dst(p->two) < 0) {
+				warning("skipping rename detection, detected"
+					" duplicate destination '%s'",
+					p->two->path);
+				goto cleanup;
+			}
 		}
 		else if (!DIFF_OPT_TST(options, RENAME_EMPTY) &&
 			 is_empty_blob_sha1(p->one->sha1))
diff --git a/t/t4058-diff-duplicates.sh b/t/t4058-diff-duplicates.sh
new file mode 100755
index 0000000..0a23242
--- /dev/null
+++ b/t/t4058-diff-duplicates.sh
@@ -0,0 +1,79 @@
+#!/bin/sh
+
+test_description='test tree diff when trees have duplicate entries'
+. ./test-lib.sh
+
+# make_tree_entry <mode> <mode> <sha1>
+#
+# We have to rely on perl here because not all printfs understand
+# hex escapes (only octal), and xxd is not portable.
+make_tree_entry () {
+	printf '%s %s\0' "$1" "$2" &&
+	perl -e 'print chr(hex($_)) for ($ARGV[0] =~ /../g)' "$3"
+}
+
+# Like git-mktree, but without all of the pesky sanity checking.
+# Arguments come in groups of three, each group specifying a single
+# tree entry (see make_tree_entry above).
+make_tree () {
+	while test $# -gt 2; do
+		make_tree_entry "$1" "$2" "$3"
+		shift; shift; shift
+	done |
+	git hash-object -w -t tree --stdin
+}
+
+# this is kind of a convoluted setup, but matches
+# a real-world case. Each tree contains four entries
+# for the given path, one with one sha1, and three with
+# the other. The first tree has them split across
+# two subtrees (which are themselves duplicate entries in
+# the root tree), and the second has them all in a single subtree.
+test_expect_success 'create trees with duplicate entries' '
+	blob_one=$(echo one | git hash-object -w --stdin) &&
+	blob_two=$(echo two | git hash-object -w --stdin) &&
+	inner_one_a=$(make_tree \
+		100644 inner $blob_one
+	) &&
+	inner_one_b=$(make_tree \
+		100644 inner $blob_two \
+		100644 inner $blob_two \
+		100644 inner $blob_two
+	) &&
+	outer_one=$(make_tree \
+		040000 outer $inner_one_a \
+		040000 outer $inner_one_b
+	) &&
+	inner_two=$(make_tree \
+		100644 inner $blob_one \
+		100644 inner $blob_two \
+		100644 inner $blob_two \
+		100644 inner $blob_two
+	) &&
+	outer_two=$(make_tree \
+		040000 outer $inner_two
+	) &&
+	git tag one $outer_one &&
+	git tag two $outer_two
+'
+
+test_expect_success 'diff-tree between trees' '
+	{
+		printf ":000000 100644 $_z40 $blob_two A\touter/inner\n" &&
+		printf ":000000 100644 $_z40 $blob_two A\touter/inner\n" &&
+		printf ":000000 100644 $_z40 $blob_two A\touter/inner\n" &&
+		printf ":100644 000000 $blob_two $_z40 D\touter/inner\n" &&
+		printf ":100644 000000 $blob_two $_z40 D\touter/inner\n" &&
+		printf ":100644 000000 $blob_two $_z40 D\touter/inner\n"
+	} >expect &&
+	git diff-tree -r --no-abbrev one two >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'diff-tree with renames' '
+	# same expectation as above, since we disable rename detection
+	git diff-tree -M -r --no-abbrev one two >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.3.0.449.g1690e78

  parent reply	other threads:[~2015-02-27  1:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-24 21:43 [BUG] diffcore-rename with duplicate tree entries can segfault Jeff King
2015-02-24 22:42 ` Junio C Hamano
2015-02-24 22:49   ` Jeff King
2015-02-24 23:11     ` Junio C Hamano
2015-02-24 23:47       ` Jeff King
2015-02-25  5:00         ` Junio C Hamano
2015-02-25 21:40           ` Jeff King
2015-02-25 21:50             ` Junio C Hamano
2015-02-27  1:38               ` [PATCH 0/2] " Jeff King
2015-02-27  1:39                 ` [PATCH 1/2] diffcore-rename: split locate_rename_dst into two functions Jeff King
2015-02-27  1:42                 ` Jeff King [this message]
2015-02-27 21:48                   ` [PATCH 2/2] diffcore-rename: avoid processing duplicate destinations Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150227014227.GB3210@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).