From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Jonathan Nieder" <jrnieder@gmail.com>,
"Zbigniew Jędrzejewski-Szmek" <zbyszek@in.waw.pl>,
"Randal L. Schwartz" <merlyn@stonehenge.com>,
"Ralf Nyren" <ralf.nyren@ericsson.com>,
git@vger.kernel.org
Subject: [PATCH 1/2] teach diffcore-rename to optionally ignore empty content
Date: Thu, 22 Mar 2012 18:52:13 -0400 [thread overview]
Message-ID: <20120322225213.GA14902@sigill.intra.peff.net> (raw)
In-Reply-To: <20120322224651.GA14874@sigill.intra.peff.net>
Our rename detection is a heuristic, matching pairs of
removed and added files with similar or identical content.
It's unlikely to be wrong when there is actual content to
compare, and we already take care not to do inexact rename
detection when there is not enough content to produce good
results.
However, we always do exact rename detection, even when the
blob is tiny or empty. It's easy to get false positives with
an empty blob, simply because it is an obvious content to
use as a boilerplate (e.g., when telling git that an empty
directory is worth tracking via an empty .gitignore).
This patch lets callers specify whether or not they are
interested in using empty files as rename sources and
destinations. The default is "yes", keeping the original
behavior. It works by detecting the empty-blob sha1 for
rename sources and destinations.
One more flexible alternative would be to allow the caller
to specify a minimum size for a blob to be "interesting" for
rename detection. But that would catch small boilerplate
files, not large ones (e.g., if you had the GPL COPYING file
in many directories).
A better alternative would be to allow a "-rename"
gitattribute to allow boilerplate files to be marked as
such. I'll leave the complexity of that solution until such
time as somebody actually wants it. The complaints we've
seen so far revolve around empty files, so let's start with
the simple thing.
Signed-off-by: Jeff King <peff@peff.net>
---
From the previous discussion, we know we could get away with just
dropping empty files from the rename_src list. However, doing it for
both the src and dst lists is a little more obvious and robust. And
since some of the rename detection is O(src*dst), keeping the lists
as small as possible is a good thing.
I added command-line triggers mostly for testing and debugging, and
didn't bother to advertise them in the documentation. Obviously if we
decide that diff should just have this behavior, this patch can be
even smaller.
diff.c | 5 +++++
diff.h | 2 +-
diffcore-rename.c | 6 ++++++
3 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/diff.c b/diff.c
index 377ec1e..0b70aad 100644
--- a/diff.c
+++ b/diff.c
@@ -3136,6 +3136,7 @@ void diff_setup(struct diff_options *options)
options->rename_limit = -1;
options->dirstat_permille = diff_dirstat_permille_default;
options->context = 3;
+ DIFF_OPT_SET(options, RENAME_EMPTY);
options->change = diff_change;
options->add_remove = diff_addremove;
@@ -3506,6 +3507,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
}
else if (!strcmp(arg, "--no-renames"))
options->detect_rename = 0;
+ else if (!strcmp(arg, "--rename-empty"))
+ DIFF_OPT_SET(options, RENAME_EMPTY);
+ else if (!strcmp(arg, "--no-rename-empty"))
+ DIFF_OPT_CLR(options, RENAME_EMPTY);
else if (!strcmp(arg, "--relative"))
DIFF_OPT_SET(options, RELATIVE_NAME);
else if (!prefixcmp(arg, "--relative=")) {
diff --git a/diff.h b/diff.h
index cb68743..dd48eca 100644
--- a/diff.h
+++ b/diff.h
@@ -60,7 +60,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
#define DIFF_OPT_SILENT_ON_REMOVE (1 << 5)
#define DIFF_OPT_FIND_COPIES_HARDER (1 << 6)
#define DIFF_OPT_FOLLOW_RENAMES (1 << 7)
-/* (1 << 8) unused */
+#define DIFF_OPT_RENAME_EMPTY (1 << 8)
/* (1 << 9) unused */
#define DIFF_OPT_HAS_CHANGES (1 << 10)
#define DIFF_OPT_QUICK (1 << 11)
diff --git a/diffcore-rename.c b/diffcore-rename.c
index f639601..216a7a4 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -512,9 +512,15 @@ void diffcore_rename(struct diff_options *options)
else if (options->single_follow &&
strcmp(options->single_follow, p->two->path))
continue; /* not interested */
+ else if (!DIFF_OPT_TST(options, RENAME_EMPTY) &&
+ is_empty_blob_sha1(p->two->sha1))
+ continue;
else
locate_rename_dst(p->two, 1);
}
+ else if (!DIFF_OPT_TST(options, RENAME_EMPTY) &&
+ is_empty_blob_sha1(p->one->sha1))
+ continue;
else if (!DIFF_PAIR_UNMERGED(p) && !DIFF_FILE_VALID(p->two)) {
/*
* If the source is a broken "delete", and
--
1.7.10.rc0.9.gdcbe9
next prev parent reply other threads:[~2012-03-22 22:52 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-21 10:28 Strange effect merging empty file Ralf Nyren
2012-03-21 10:54 ` Zbigniew Jędrzejewski-Szmek
2012-03-21 17:14 ` Junio C Hamano
2012-03-22 12:17 ` Randal L. Schwartz
2012-03-22 12:39 ` Ralf Nyren
2012-03-22 12:47 ` Zbigniew Jędrzejewski-Szmek
2012-03-22 14:01 ` Jeff King
2012-03-22 17:03 ` Junio C Hamano
2012-03-22 17:59 ` Jeff King
2012-03-22 18:25 ` Jeff King
2012-03-22 18:52 ` Jeff King
2012-03-22 18:53 ` [PATCH 1/3] drop casts from users EMPTY_TREE_SHA1_BIN Jeff King
2012-03-22 18:53 ` [PATCH 2/3] make is_empty_blob_sha1 available everywhere Jeff King
2012-03-22 18:53 ` [PATCH 3/3] merge-recursive: don't detect renames from empty files Jeff King
2012-03-22 19:18 ` Jonathan Nieder
2012-03-22 21:53 ` Jeff King
2012-03-22 18:52 ` Strange effect merging empty file Junio C Hamano
2012-03-22 19:03 ` Jeff King
2012-03-22 19:12 ` Junio C Hamano
2012-03-22 22:46 ` [PATCH 0/2] merging renames of empty files Jeff King
2012-03-22 22:52 ` Jeff King [this message]
2012-03-22 22:52 ` [PATCH 2/2] merge-recursive: don't detect " Jeff King
2012-03-22 23:37 ` [PATCH 0/2] merging " Junio C Hamano
2012-03-23 0:23 ` Jeff King
2012-03-23 4:56 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120322225213.GA14902@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=merlyn@stonehenge.com \
--cc=ralf.nyren@ericsson.com \
--cc=zbyszek@in.waw.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).