From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [BUG] diffcore-rename with duplicate tree entries can segfault
Date: Wed, 25 Feb 2015 16:40:32 -0500 [thread overview]
Message-ID: <20150225214032.GA32295@peff.net> (raw)
In-Reply-To: <xmqq8ufmpouz.fsf@gitster.dls.corp.google.com>
On Tue, Feb 24, 2015 at 09:00:20PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > 3. The sort order check is wrong. :-/ It needs to take into account
> > git's magic "if it's a tree, pretend it has '/' after it" rule.
> > That's not too hard for a single tree (fsck.c:verify_ordered does
> > it). But for filepairs, I'm not sure what to do. Most cases
> > have a single mode/name pair. But what about a D/F typechange? If
> > "foo" becomes "foo/", which do I use to sort?
>
> I think diff-index populates the diff queue in a wrong order and
> then calls diffcore_fix_diff_index() to fix it up.
>
> I am a bit worried about the effect this stricter input check might
> have to "diff --no-index" codepath, though.
Interesting, I didn't even know about diffcore_fix_diff_index.
Let's forget about sorting for a moment. Unsorted trees will produce odd
results, certainly, but the order of your diff is perhaps the least of
the problems. The only reason we care about it here is to do a cheap
uniq() operation.
But we can also do that with a hash table, or an auxiliary sorted array.
And sure enough, that's exactly what the rename_dst array is. When we
are inserting into it, we can easily detect duplicates there, and just
abort the rename operation. Like:
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 4e132f1..e26dd62 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -27,7 +27,7 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
struct diff_rename_dst *dst = &(rename_dst[next]);
int cmp = strcmp(two->path, dst->two->path);
if (!cmp)
- return dst;
+ return insert_ok ? NULL : dst;
if (cmp < 0) {
last = next;
continue;
@@ -450,8 +450,10 @@ void diffcore_rename(struct diff_options *options)
else if (!DIFF_OPT_TST(options, RENAME_EMPTY) &&
is_empty_blob_sha1(p->two->sha1))
continue;
- else
- locate_rename_dst(p->two, 1);
+ else {
+ if (!locate_rename_dst(p->two, 1))
+ goto cleanup;
+ }
}
else if (!DIFF_OPT_TST(options, RENAME_EMPTY) &&
is_empty_blob_sha1(p->one->sha1))
which I think is a pretty simple and sane fix. I do notice that
rename_dst and rename_src are essentially hash tables. They predate
hashmap.c by quite a bit, but it might be worth moving them. It would
simplify the code, and I suspect perform better (they progressively
insert into a sorted list; in the worst case that means O(n^2)
memmove's, though if the input is sorted we get best-case behavior).
So to go forward, I'm happy to prepare a patch, but I'd like to know:
1. Does something like the above look reasonable to you (I'd probably
refactor it to avoid the bizarre return value semantics from
locate_rename_dst, though)?
2. If so, do you want something minimal like what's above, or do you
mind if I build it on top of a hashmap conversion? I suspect the
logic may also end up more clear with the hashmap (since inserting
versus lookup will be more distinct in the callers).
-Peff
next prev parent reply other threads:[~2015-02-25 21:40 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-24 21:43 [BUG] diffcore-rename with duplicate tree entries can segfault Jeff King
2015-02-24 22:42 ` Junio C Hamano
2015-02-24 22:49 ` Jeff King
2015-02-24 23:11 ` Junio C Hamano
2015-02-24 23:47 ` Jeff King
2015-02-25 5:00 ` Junio C Hamano
2015-02-25 21:40 ` Jeff King [this message]
2015-02-25 21:50 ` Junio C Hamano
2015-02-27 1:38 ` [PATCH 0/2] " Jeff King
2015-02-27 1:39 ` [PATCH 1/2] diffcore-rename: split locate_rename_dst into two functions Jeff King
2015-02-27 1:42 ` [PATCH 2/2] diffcore-rename: avoid processing duplicate destinations Jeff King
2015-02-27 21:48 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150225214032.GA32295@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).