* [PATCH v7 1/3] Introduce bulk-move detection in diffcore.
2010-10-23 21:06 [PATCH v7 0/3] Detection of directory renames Yann Dirson
@ 2010-10-23 21:07 ` Yann Dirson
2010-10-25 8:08 ` Junio C Hamano
2010-10-23 21:07 ` [PATCH v7 2/3] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Yann Dirson @ 2010-10-23 21:07 UTC (permalink / raw)
To: git; +Cc: Yann Dirson, Yann Dirson
This feature tries to group together files moving from and to
identical directories - the most common case being directory renames.
It is activated by the new --detect-bulk-moves diffcore
flag.
This is only the first step, adding the basic functionnality and
adding support to raw diff output (and it is not supported in
unified-diff output yet)
It is implemented as a new pass in diffcore-rename, occuring after the
file renames get detected, grouping those renames looking like a move
of a full directory into some other place.
The output of raw diff is displayed as "Rnnn a/* b/". Those cannot be
confused with renames of files named "whatever/*" with a literal star
character, from the full-zero SHA1's.
Possible optimisations to this code include:
* avoid use of i_am_not_single by using a separate list
* use a more informative prefixcmp to avoid strcmp calls
eg. in discard_if_outside()
* optimize for bulk insertions (avoid useless successive memmove's)
Other future developements to be made on top of this include:
* extension of unified-diff format to express this
* detect inexact bulk-moves (where some files were not moved, or were
moved to a different place) - problem of computing a similarity score
* display as such the special case of directory move/rename
* application of such new diffs: issue a conflict, or just a warning ?
* teach git-svn (and others ?) to make use of that flag
* handle new conflict type "bulk-move/add"
* detect "directory splits" as well
* use inexact dir renames to bump score of below-threshold renames
from/to same locations
* support other types of bluk-grouping, like prefixes (see eg. kernel
5d1e859c), and maybe config-specified patterns
* add yours here
This patch has been improved by the following contributions:
- Jonathan Nieder: better implementation of copy_dirname()
- Jonathan Nieder: portable implementation of memrchr() in another patch
- Junio C Hamano: split individual renames hiding under control of another flag
- Junio C Hamano: coding style issues
- Ævar Arnfjörð Bjarmason: Don't use C99 comments.
- Jonathan Nieder: just too many other helpful suggestions to list them all
Thanks-to: Jonathan Nieder <jrnieder@gmail.com>
Thanks-to: Junio C Hamano <gitster@pobox.com>
Thanks-to: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Yann Dirson <ydirson@free.fr>
---
Documentation/diff-options.txt | 4 +
Documentation/gitdiffcore.txt | 12 ++
diff-lib.c | 6 +-
diff.c | 14 ++-
diff.h | 3 +
diffcore-rename.c | 354 ++++++++++++++++++++++++++++++++++++++--
diffcore.h | 1 +
tree-diff.c | 4 +-
8 files changed, 383 insertions(+), 15 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index e954af0..2d99511 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -243,6 +243,10 @@ endif::git-log[]
delete/add pair to be a rename if more than 90% of the file
hasn't changed.
+--detect-bulk-moves::
+ Detect bulk move of all files of a directory into a
+ different one.
+
-C[<n>]::
Detect copies as well as renames. See also `--find-copies-harder`.
If `n` is specified, it has the same meaning as for `-M<n>`.
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 6af29a4..edeb159 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -175,6 +175,18 @@ the expense of making it slower. Without `\--find-copies-harder`,
'git diff-{asterisk}' commands can detect copies only if the file that was
copied happened to have been modified in the same changeset.
+Bulk move of all files of a directory into a different one can get
+detected using the `\--detect-bulk-moves` option. This adds an
+additional pass on top of the results of per-file rename detection.
+They are reported with NULL SHA1 id, in addition to the file renames:
+
+------------------------------------------------
+:040000 040000 0000000... 0000000... R100 a/* b/
+:100644 100644 0123456... 1234567... R90 a/file0 b/file3
+:100644 100644 0123456... 1234567... R100 a/file1 b/file1
+:100644 100644 0123456... 1234567... R100 a/file2 b/file2
+------------------------------------------------
+
diffcore-merge-broken: For Putting "Complete Rewrites" Back Together
--------------------------------------------------------------------
diff --git a/diff-lib.c b/diff-lib.c
index 392ce2b..5ec3ddc 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
ce_option, &dirty_submodule);
if (!changed && !dirty_submodule) {
ce_mark_uptodate(ce);
- if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+ if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+ !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
continue;
}
oldmode = ce->ce_mode;
@@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs,
oldmode = old->ce_mode;
if (mode == oldmode && !hashcmp(sha1, old->sha1) && !dirty_submodule &&
- !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+ !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+ !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
return 0;
diff_change(&revs->diffopt, oldmode, mode,
diff --git a/diff.c b/diff.c
index 71efa8e..1d88281 100644
--- a/diff.c
+++ b/diff.c
@@ -3188,6 +3188,11 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
DIFF_OPT_SET(options, REVERSE_DIFF);
else if (!strcmp(arg, "--find-copies-harder"))
DIFF_OPT_SET(options, FIND_COPIES_HARDER);
+ else if (!strcmp(arg, "--detect-bulk-moves")) {
+ DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+ if (!options->detect_rename)
+ options->detect_rename = DIFF_DETECT_RENAME;
+ }
else if (!strcmp(arg, "--follow"))
DIFF_OPT_SET(options, FOLLOW_RENAMES);
else if (!strcmp(arg, "--color"))
@@ -3466,7 +3471,14 @@ static void diff_flush_raw(struct diff_filepair *p, struct diff_options *opt)
if (p->status == DIFF_STATUS_COPIED ||
p->status == DIFF_STATUS_RENAMED) {
const char *name_a, *name_b;
- name_a = p->one->path;
+ if (p->is_bulkmove) {
+ /* append "*" to the first dirname */
+ char buf[PATH_MAX];
+ char *next = memccpy(buf, p->one->path, '\0', PATH_MAX);
+ next[-1] = '*'; *next = '\0';
+ name_a = buf;
+ } else
+ name_a = p->one->path;
name_b = p->two->path;
strip_prefix(opt->prefix_length, &name_a, &name_b);
write_name_quoted(name_a, opt->file, inter_name_termination);
diff --git a/diff.h b/diff.h
index 1fd44f5..b0d6fa6 100644
--- a/diff.h
+++ b/diff.h
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
#define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
#define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
#define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
+#define DIFF_OPT_DETECT_BULK_MOVES (1 << 28)
#define DIFF_OPT_TST(opts, flag) ((opts)->flags & DIFF_OPT_##flag)
#define DIFF_OPT_SET(opts, flag) ((opts)->flags |= DIFF_OPT_##flag)
@@ -265,6 +266,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
" -C detect copies.\n" \
" --find-copies-harder\n" \
" try unchanged files as candidate for copy detection.\n" \
+" --detect-bulk-moves\n" \
+" detect wholesale directory renames.\n" \
" -l<n> limit rename attempts up to <n> paths.\n" \
" -O<file> reorder diffs according to the <file>.\n" \
" -S<string> find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..fa2ba7c 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -11,9 +11,13 @@
static struct diff_rename_dst {
struct diff_filespec *two;
struct diff_filepair *pair;
+ unsigned i_am_not_single:1; /* does not look for a match, only here to be looked at */
} *rename_dst;
static int rename_dst_nr, rename_dst_alloc;
+/*
+ * Do a binary search of "two" in "rename_dst", inserting it if not found.
+ */
static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
int insert_ok)
{
@@ -49,9 +53,36 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
rename_dst[first].two = alloc_filespec(two->path);
fill_filespec(rename_dst[first].two, two->sha1, two->mode);
rename_dst[first].pair = NULL;
+ rename_dst[first].i_am_not_single = 0;
return &(rename_dst[first]);
}
+/*
+ * Do a binary search in "rename_dst" of any entry under "dirname".
+ */
+static struct diff_rename_dst *locate_rename_dst_dir(const char *dirname)
+{
+ int first, last;
+ int prefixlength = strlen(dirname);
+
+ first = 0;
+ last = rename_dst_nr;
+ while (last > first) {
+ int next = (last + first) >> 1;
+ struct diff_rename_dst *dst = &(rename_dst[next]);
+ int cmp = strncmp(dirname, dst->two->path, prefixlength);
+ if (!cmp)
+ return dst;
+ if (cmp < 0) {
+ last = next;
+ continue;
+ }
+ first = next+1;
+ }
+ /* not found */
+ return NULL;
+}
+
/* Table of rename/copy src files */
static struct diff_rename_src {
struct diff_filespec *one;
@@ -386,8 +417,11 @@ static int find_exact_renames(void)
for (i = 0; i < rename_src_nr; i++)
insert_file_table(&file_table, -1, i, rename_src[i].one);
- for (i = 0; i < rename_dst_nr; i++)
+ for (i = 0; i < rename_dst_nr; i++) {
+ if (rename_dst[i].i_am_not_single)
+ continue;
insert_file_table(&file_table, 1, i, rename_dst[i].two);
+ }
/* Find the renames */
i = for_each_hash(&file_table, find_same_files);
@@ -414,6 +448,270 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
m[worst] = *o;
}
+static struct diff_bulk_rename {
+ struct diff_filespec *one;
+ struct diff_filespec *two;
+ int discarded;
+} *bulkmove_candidates;
+static int bulkmove_candidates_nr, bulkmove_candidates_alloc;
+
+/*
+ * Do a binary search of "one" in "bulkmove_candidate", inserting it if not
+ * found.
+ * A good part was copy-pasted from locate_rename_dst().
+ */
+static struct diff_bulk_rename *locate_bulkmove_candidate(const char *one_path,
+ const char *two_path)
+{
+ int first, last;
+
+ first = 0;
+ last = bulkmove_candidates_nr;
+ while (last > first) {
+ int next = (last + first) >> 1;
+ struct diff_bulk_rename *candidate = &(bulkmove_candidates[next]);
+ /* primary sort key on one_path, secondary on two_path */
+ int cmp = strcmp(one_path, candidate->one->path);
+ if (!cmp)
+ cmp = strcmp(two_path, candidate->two->path);
+ if (!cmp)
+ return candidate;
+ if (cmp < 0) {
+ last = next;
+ continue;
+ }
+ first = next+1;
+ }
+ /* not found */
+ /* insert to make it at "first" */
+ if (bulkmove_candidates_alloc <= bulkmove_candidates_nr) {
+ bulkmove_candidates_alloc = alloc_nr(bulkmove_candidates_alloc);
+ bulkmove_candidates = xrealloc(bulkmove_candidates,
+ bulkmove_candidates_alloc * sizeof(*bulkmove_candidates));
+ }
+ bulkmove_candidates_nr++;
+ if (first < bulkmove_candidates_nr)
+ memmove(bulkmove_candidates + first + 1, bulkmove_candidates + first,
+ (bulkmove_candidates_nr - first - 1) * sizeof(*bulkmove_candidates));
+
+ bulkmove_candidates[first].one = alloc_filespec(one_path);
+ fill_filespec(bulkmove_candidates[first].one, null_sha1, S_IFDIR);
+ bulkmove_candidates[first].two = alloc_filespec(two_path);
+ fill_filespec(bulkmove_candidates[first].two, null_sha1, S_IFDIR);
+ bulkmove_candidates[first].discarded = 0;
+ return &(bulkmove_candidates[first]);
+}
+
+/*
+ * Copy dirname of src into dst, suitable to append a filename without
+ * an additional "/".
+ * Only handles relative paths since there is no absolute path in a git repo.
+ * Writes "" when there is no "/" in src.
+ * May overwrite more chars than really needed, if src ends with a "/".
+ * Supports in-place modification of src by passing dst == src.
+ */
+static const char *copy_dirname(char *dst, const char *src)
+{
+ size_t len = strlen(src);
+ const char *slash;
+ char *end;
+
+ if (len > 0 && src[len - 1] == '/')
+ /* Trailing slash. Ignore it. */
+ len--;
+
+ slash = memrchr(src, '/', len);
+ if (!slash) {
+ *dst = '\0';
+ return dst;
+ }
+
+ end = mempcpy(dst, src, slash - src + 1);
+ *end = '\0';
+ return dst;
+}
+
+// FIXME: leaks like hell.
+/* See if the fact that one_leftover exists under one_parent_path in
+ * dst tree should disqualify one_parent_path from bulkmove eligibility.
+ * Return 1 if it disqualifies, 0 if it is OK.
+ */
+static int dispatched_to_different_dirs(const char *one_parent_path)
+{
+ /* this might be a dir split, or files added
+ * after the bulk move, or just an isolated
+ * rename */
+ int two_idx, j, onep_len, maybe_dir_rename;
+ struct diff_rename_dst *one_leftover =
+ one_leftover = locate_rename_dst_dir(one_parent_path);
+
+ if (!one_leftover)
+ return 0;
+
+ /* try to see if it is a file added after the bulk move */
+ two_idx = one_leftover - rename_dst;
+ onep_len = strlen(one_parent_path);
+ maybe_dir_rename = 1;
+
+ /* check no leftover file was already here before */
+ for (j = two_idx; j < rename_dst_nr; j++) {
+ if (strncmp(rename_dst[j].two->path,
+ one_parent_path, onep_len))
+ break; /* exhausted directory in this direction */
+ fprintf(stderr, "[DBG] leftover file %s in %s\n",
+ rename_dst[j].two->path, one_parent_path);
+ if (rename_dst[j].i_am_not_single || /* those were already here */
+ (rename_dst[j].pair &&
+ !strncmp(rename_dst[j].pair->one->path,
+ one_parent_path, onep_len) && /* renamed from here */
+ strncmp(rename_dst[j].two->path,
+ one_parent_path, onep_len))) { /* not to a subdir */
+ maybe_dir_rename = 0;
+ fprintf(stderr, "[DBG] ... tells not a bulk move\n");
+ break;
+ }
+ fprintf(stderr, "[DBG] ... not believed to prevent bulk move\n");
+ }
+ if (!maybe_dir_rename)
+ return 1;
+ /* try the other direction (dup code) */
+ for (j = two_idx-1; j >= 0; j--) {
+ if (strncmp(rename_dst[j].two->path,
+ one_parent_path, onep_len))
+ break; /* exhausted directory in this direction */
+ fprintf(stderr, "[DBG] leftover file %s in '%s'\n",
+ rename_dst[j].two->path, one_parent_path);
+ if (rename_dst[j].i_am_not_single || /* those were already here */
+ (rename_dst[j].pair &&
+ !strncmp(rename_dst[j].pair->one->path,
+ one_parent_path, onep_len) && /* renamed from here */
+ strncmp(rename_dst[j].two->path,
+ one_parent_path, onep_len))) { /* not to a subdir */
+ maybe_dir_rename = 0;
+ fprintf(stderr, "[DBG] ... tells not a bulk move\n");
+ break;
+ }
+ fprintf(stderr, "[DBG] ... not believed to prevent bulk move\n");
+ }
+ if (!maybe_dir_rename)
+ return 1;
+
+ /* Here we are in the case where a directory
+ * content was completely moved, but files
+ * were added to it afterwards. Proceed as
+ * for a simple bulk move. */
+ return 0;
+}
+
+/*
+ * Assumes candidate->one is a subdir of seen->one, mark 'seen' as
+ * discarded if candidate->two is outside seen->two. Also mark
+ * 'candidate' itself as discarded if the conflict implies so.
+ *
+ * Return 1 if 'seen' was discarded
+ */
+static int discard_if_outside(struct diff_bulk_rename *candidate,
+ struct diff_bulk_rename *seen) {
+ if (!prefixcmp(candidate->two->path, seen->two->path)) {
+ fprintf(stderr, "[DBG] 'dstpair' conforts 'seen'\n");
+ return 0;
+ } else {
+ fprintf(stderr, "[DBG] discarding %s -> %s from bulk moves (split into %s and %s)\n",
+ seen->one->path, seen->two->path,
+ candidate->two->path, seen->two->path);
+ seen->discarded = 1;
+ /* Need to discard dstpair as well, unless moving from
+ * a strict subdir of seen->one or to a strict subdir
+ * of seen->two */
+ if (!strcmp(seen->one->path, candidate->one->path) &&
+ prefixcmp(seen->two->path, candidate->two->path)) {
+ fprintf(stderr, "[DBG] ... and not adding self\n");
+ candidate->discarded = 1;
+ }
+ return 1;
+ }
+}
+
+/*
+ * Check if the rename specified by "dstpair" could cause a
+ * bulk move to be detected, record it in bulkmove_candidates if yes.
+ */
+static void check_one_bulk_move(struct diff_filepair *dstpair)
+{
+ char one_parent_path[PATH_MAX], two_parent_path[PATH_MAX];
+
+ /* genuine new files (or believed to be so) */
+ if (!dstpair)
+ return;
+ /* dummy renames used by copy detection */
+ if (!strcmp(dstpair->one->path, dstpair->two->path))
+ return;
+
+ copy_dirname(one_parent_path, dstpair->one->path);
+ copy_dirname(two_parent_path, dstpair->two->path);
+
+ /* simple rename with no directory change */
+ if (!strcmp(one_parent_path, two_parent_path))
+ return;
+
+ fprintf(stderr, "[] %s -> %s ?\n", dstpair->one->path, dstpair->two->path);
+
+ /* loop up one_parent_path over successive parents */
+ // FIXME: also loop over two_parent_path prefixes
+ do {
+ struct diff_bulk_rename *seen;
+ int old_nr = bulkmove_candidates_nr;
+ struct diff_bulk_rename *candidate =
+ locate_bulkmove_candidate(one_parent_path, two_parent_path);
+ fprintf(stderr, "[[]] %s ...\n", one_parent_path);
+ if (old_nr == bulkmove_candidates_nr) {
+ fprintf(stderr, "[DBG] already seen\n");
+ return;
+ }
+
+ /* After this commit, are there any files still under one_parent_path ?
+ * Any file left would disqualifies this dir for bulk move.
+ */
+ if (dispatched_to_different_dirs(one_parent_path)) {
+ // FIXME: check overlap with discard_if_outside()
+ candidate->discarded = 1;
+ return;
+ }
+
+ /* walk up for one_parent_path prefixes */
+ for (seen = candidate-1; (seen >= bulkmove_candidates) &&
+ !prefixcmp(one_parent_path, seen->one->path); seen--) {
+ fprintf(stderr, "[DBG] ? %s -> %s\n", seen->one->path, seen->two->path);
+ /* subdir of "seen" dest dir ? */
+ if (discard_if_outside(candidate, seen))
+ continue;
+ }
+ /* look down for other moves from one_parent_path */
+ seen = candidate + 1;
+ if (seen != bulkmove_candidates + bulkmove_candidates_nr &&
+ !strcmp(one_parent_path, seen->one->path)) {
+ fprintf(stderr, "[DBG] ? %s -> %s (2)\n", seen->one->path, seen->two->path);
+ /* subdir of "seen" dest dir ? */
+ if (discard_if_outside(candidate, seen))
+ continue;
+ }
+
+ /* next parent if any */
+ copy_dirname(one_parent_path, one_parent_path);
+ } while (*one_parent_path);
+}
+
+/*
+ * Take all file renames recorded so far and check if they could cause
+ * a bulk move to be detected.
+ */
+static void diffcore_bulk_moves(void)
+{
+ int i;
+ for (i = 0; i < rename_dst_nr; i++)
+ check_one_bulk_move(rename_dst[i].pair);
+}
+
void diffcore_rename(struct diff_options *options)
{
int detect_rename = options->detect_rename;
@@ -424,6 +722,7 @@ void diffcore_rename(struct diff_options *options)
struct diff_score *mx;
int i, j, rename_count;
int num_create, num_src, dst_cnt;
+ struct diff_bulk_rename *candidate;
if (!minimum_score)
minimum_score = DEFAULT_RENAME_SCORE;
@@ -438,8 +737,7 @@ void diffcore_rename(struct diff_options *options)
continue; /* not interested */
else
locate_rename_dst(p->two, 1);
- }
- else if (!DIFF_FILE_VALID(p->two)) {
+ } else if (!DIFF_FILE_VALID(p->two)) {
/*
* If the source is a broken "delete", and
* they did not really want to get broken,
@@ -450,14 +748,23 @@ void diffcore_rename(struct diff_options *options)
if (p->broken_pair && !p->score)
p->one->rename_used++;
register_rename_src(p->one, p->score);
- }
- else if (detect_rename == DIFF_DETECT_COPY) {
- /*
- * Increment the "rename_used" score by
- * one, to indicate ourselves as a user.
- */
- p->one->rename_used++;
- register_rename_src(p->one, p->score);
+ } else {
+ if (detect_rename == DIFF_DETECT_COPY) {
+ /*
+ * Increment the "rename_used" score by
+ * one, to indicate ourselves as a user.
+ */
+ p->one->rename_used++;
+ register_rename_src(p->one, p->score);
+ }
+ if (DIFF_OPT_TST(options, DETECT_BULK_MOVES)) {
+ /* similarly, bulk move detection needs to
+ * see all files from second tree, but we don't
+ * want them to be matched against single sources.
+ */
+ // FIXME: check interaction with --find-copies-harder
+ locate_rename_dst(p->two, 1)->i_am_not_single = 1;
+ }
}
}
if (rename_dst_nr == 0 || rename_src_nr == 0)
@@ -509,6 +816,8 @@ void diffcore_rename(struct diff_options *options)
if (rename_dst[i].pair)
continue; /* dealt with exact match already. */
+ if (rename_dst[i].i_am_not_single)
+ continue; /* not looking for a match. */
m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST];
for (j = 0; j < NUM_CANDIDATE_PER_DST; j++)
@@ -569,7 +878,30 @@ void diffcore_rename(struct diff_options *options)
/* At this point, we have found some renames and copies and they
* are recorded in rename_dst. The original list is still in *q.
*/
+
+ /* Now possibly factorize those renames and copies. */
+ if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+ diffcore_bulk_moves();
+
DIFF_QUEUE_CLEAR(&outq);
+
+ /* Now turn non-discarded bulkmove_candidates into real renames */
+ for (candidate = bulkmove_candidates;
+ candidate < bulkmove_candidates + bulkmove_candidates_nr; candidate++) {
+ struct diff_filepair* pair;
+ if (candidate->discarded)
+ continue;
+ /* visualize toplevel dir if needed */
+ if (!*candidate->one->path)
+ candidate->one->path = "./";
+ if (!*candidate->two->path)
+ candidate->two->path = "./";
+ pair = diff_queue(&outq, candidate->one, candidate->two);
+ pair->score = MAX_SCORE;
+ pair->renamed_pair = 1;
+ pair->is_bulkmove = 1;
+ }
+
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct diff_filepair *pair_to_free = NULL;
diff --git a/diffcore.h b/diffcore.h
index b8f1fde..6dab95b 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
unsigned broken_pair : 1;
unsigned renamed_pair : 1;
unsigned is_unmerged : 1;
+ unsigned is_bulkmove : 1;
};
#define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
diff --git a/tree-diff.c b/tree-diff.c
index cd659c6..5d9f123 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -49,7 +49,9 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
show_entry(opt, "+", t2, base, baselen);
return 1;
}
- if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) && mode1 == mode2)
+ if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) &&
+ !DIFF_OPT_TST(opt, DETECT_BULK_MOVES) &&
+ !hashcmp(sha1, sha2) && mode1 == mode2)
return 0;
/*
--
1.7.2.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v7 3/3] [WIP] Allow hiding renames of individual files involved in a directory rename.
2010-10-23 21:06 [PATCH v7 0/3] Detection of directory renames Yann Dirson
2010-10-23 21:07 ` [PATCH v7 1/3] Introduce bulk-move detection in diffcore Yann Dirson
2010-10-23 21:07 ` [PATCH v7 2/3] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
@ 2010-10-23 21:07 ` Yann Dirson
2010-10-24 21:10 ` [PATCH] [RFC] Add --detect-bulk-moves output to unidiff format Yann Dirson
3 siblings, 0 replies; 11+ messages in thread
From: Yann Dirson @ 2010-10-23 21:07 UTC (permalink / raw)
To: git; +Cc: Yann Dirson, Yann Dirson
Once has identified groups of bulk-moved files, and then
the --hide-bulk-move-details flag hides those of the individual renames
which carry no other information (further name change, or content changes).
This makes it much easier to a human reader to spot content changes
in a commit that also moves a whole subtree.
Important note: unified diff output is not currently useful, since the "bulk move"
headers are not yet added by --detect-bulk-moves, but the redundant renames are
really removed.
Signed-off-by: Yann Dirson <ydirson@free.fr>
---
diff.c | 7 +++++
diff.h | 3 ++
diffcore-rename.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++--
diffcore.h | 1 +
4 files changed, 76 insertions(+), 3 deletions(-)
diff --git a/diff.c b/diff.c
index 1d88281..3cc7a55 100644
--- a/diff.c
+++ b/diff.c
@@ -3193,6 +3193,13 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
if (!options->detect_rename)
options->detect_rename = DIFF_DETECT_RENAME;
}
+ else if (!strcmp(arg, "--hide-bulk-move-details")) {
+ DIFF_OPT_SET(options, HIDE_DIR_RENAME_DETAILS);
+ if (!DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+ DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+ if (!options->detect_rename)
+ options->detect_rename = DIFF_DETECT_RENAME;
+ }
else if (!strcmp(arg, "--follow"))
DIFF_OPT_SET(options, FOLLOW_RENAMES);
else if (!strcmp(arg, "--color"))
diff --git a/diff.h b/diff.h
index b0d6fa6..7f132d0 100644
--- a/diff.h
+++ b/diff.h
@@ -79,6 +79,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
#define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
#define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
#define DIFF_OPT_DETECT_BULK_MOVES (1 << 28)
+#define DIFF_OPT_HIDE_DIR_RENAME_DETAILS (1 << 29)
#define DIFF_OPT_TST(opts, flag) ((opts)->flags & DIFF_OPT_##flag)
#define DIFF_OPT_SET(opts, flag) ((opts)->flags |= DIFF_OPT_##flag)
@@ -268,6 +269,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
" try unchanged files as candidate for copy detection.\n" \
" --detect-bulk-moves\n" \
" detect wholesale directory renames.\n" \
+" --hide-bulk-move-details\n" \
+" hide renames of individual files in a directory rename.\n" \
" -l<n> limit rename attempts up to <n> paths.\n" \
" -O<file> reorder diffs according to the <file>.\n" \
" -S<string> find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index fa2ba7c..3a68807 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -503,6 +503,34 @@ static struct diff_bulk_rename *locate_bulkmove_candidate(const char *one_path,
}
/*
+ * Marks as such file_rename if it is made uninteresting by dir_rename.
+ * Returns -1 if the file_rename is outside of the range in which given
+ * renames concerned by dir_rename are to be found (ie. end of loop),
+ * 0 otherwise.
+ */
+static int maybe_mark_uninteresting(struct diff_rename_dst *file_rename,
+ struct diff_bulk_rename *dir_rename,
+ int one_len, int two_len)
+{
+ if (!file_rename->pair) /* file add */
+ return 0;
+ if (strncmp(file_rename->two->path,
+ dir_rename->two->path, two_len))
+ return -1;
+ if (strncmp(file_rename->pair->one->path,
+ dir_rename->one->path, one_len) ||
+ !basename_same(file_rename->pair->one, file_rename->two) ||
+ file_rename->pair->score != MAX_SCORE)
+ return 0;
+
+ file_rename->pair->uninteresting_rename = 1;
+ fprintf(stderr, "[DBG] %s* -> %s* makes %s -> %s uninteresting\n",
+ dir_rename->one->path, dir_rename->two->path,
+ file_rename->pair->one->path, file_rename->two->path);
+ return 0;
+}
+
+/*
* Copy dirname of src into dst, suitable to append a filename without
* an additional "/".
* Only handles relative paths since there is no absolute path in a git repo.
@@ -705,11 +733,44 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
* Take all file renames recorded so far and check if they could cause
* a bulk move to be detected.
*/
-static void diffcore_bulk_moves(void)
+static void diffcore_bulk_moves(int opt_hide_renames)
{
int i;
for (i = 0; i < rename_dst_nr; i++)
check_one_bulk_move(rename_dst[i].pair);
+
+ if (opt_hide_renames) {
+ /* flag as "uninteresting" those candidates hidden by dir move */
+ struct diff_bulk_rename *candidate;
+ for (candidate = bulkmove_candidates;
+ candidate < bulkmove_candidates + bulkmove_candidates_nr;
+ candidate++) {
+ int two_idx, i, one_len, two_len;
+ struct diff_rename_dst *two_sample;
+ if (candidate->discarded)
+ continue;
+
+ /* bisect to any entry within candidate dst dir */
+ two_sample = locate_rename_dst_dir(candidate->two->path);
+ if (!two_sample) {
+ die("PANIC: %s candidate of rename not in target tree (from %s)\n",
+ candidate->two->path, candidate->one->path);
+ }
+ two_idx = two_sample - rename_dst;
+
+ /* now remove extraneous 100% files inside. */
+ one_len = strlen(candidate->one->path);
+ two_len = strlen(candidate->two->path);
+ for (i = two_idx; i < rename_dst_nr; i++)
+ if (maybe_mark_uninteresting(rename_dst+i, candidate,
+ one_len, two_len) < 0)
+ break;
+ for (i = two_idx-1; i >= 0; i--)
+ if (maybe_mark_uninteresting(rename_dst+i, candidate,
+ one_len, two_len) < 0)
+ break;
+ }
+ }
}
void diffcore_rename(struct diff_options *options)
@@ -881,7 +942,7 @@ void diffcore_rename(struct diff_options *options)
/* Now possibly factorize those renames and copies. */
if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
- diffcore_bulk_moves();
+ diffcore_bulk_moves(DIFF_OPT_TST(options, HIDE_DIR_RENAME_DETAILS));
DIFF_QUEUE_CLEAR(&outq);
@@ -916,7 +977,8 @@ void diffcore_rename(struct diff_options *options)
struct diff_rename_dst *dst =
locate_rename_dst(p->two, 0);
if (dst && dst->pair) {
- diff_q(&outq, dst->pair);
+ if (!dst->pair->uninteresting_rename)
+ diff_q(&outq, dst->pair);
pair_to_free = p;
}
else
diff --git a/diffcore.h b/diffcore.h
index 6dab95b..a4eb8e1 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
unsigned broken_pair : 1;
unsigned renamed_pair : 1;
unsigned is_unmerged : 1;
+ unsigned uninteresting_rename : 1;
unsigned is_bulkmove : 1;
};
#define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
--
1.7.2.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH] [RFC] Add --detect-bulk-moves output to unidiff format.
2010-10-23 21:06 [PATCH v7 0/3] Detection of directory renames Yann Dirson
` (2 preceding siblings ...)
2010-10-23 21:07 ` [PATCH v7 3/3] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson
@ 2010-10-24 21:10 ` Yann Dirson
3 siblings, 0 replies; 11+ messages in thread
From: Yann Dirson @ 2010-10-24 21:10 UTC (permalink / raw)
To: git; +Cc: Yann Dirson
Signed-off-by: Yann Dirson <ydirson@altern.org>
---
This patch is a complement to the v7 "Detection of directory renames"
series. It produces output like the sample below (taken from kernel
tree). Does that feel right ? Is it worth moving the output
production to a separate func filling an strbuf ?
|commit 86c6d4d0acc64543a485c11e197d0bd2c5ae0bb2
|Author: Ralf Baechle <ralf@linux-mips.org>
|Date: Thu Aug 5 13:25:55 2010 +0100
|
| MIPS: PNX833x: Move code one directory level up.
|
| It was sharing the nxp directory but no code with pnx8550 and will fit
| better into the new platform makefile scheme, if moved.
|
| Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
|diff --git --detect-bulk-moves arch/mips/nxp/pnx833x/common/ arch/mips/pnx833x/common/
|similarity index 100%
|bulk move from arch/mips/nxp/pnx833x/common/*
|bulk move to arch/mips/pnx833x/common/
|diff --git --detect-bulk-moves arch/mips/nxp/pnx833x/stb22x/ arch/mips/pnx833x/stb22x/
|similarity index 100%
|bulk move from arch/mips/nxp/pnx833x/stb22x/*
|bulk move to arch/mips/pnx833x/stb22x/
|diff --git a/arch/mips/Makefile b/arch/mips/Makefile
|index 2ba20d9..0556bc4 100644
[...]
diff.c | 48 +++++++++++++++++++++++++++++++++++++-----------
1 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/diff.c b/diff.c
index 1d88281..8f2f67f 100644
--- a/diff.c
+++ b/diff.c
@@ -2659,6 +2659,7 @@ static void run_diff_cmd(const char *pgm,
const char *xfrm_msg = NULL;
int complete_rewrite = (p->status == DIFF_STATUS_MODIFIED) && p->score;
int must_show_header = 0;
+ int use_color;
if (!DIFF_OPT_TST(o, ALLOW_EXTERNAL))
pgm = NULL;
@@ -2668,14 +2669,36 @@ static void run_diff_cmd(const char *pgm,
pgm = drv->external;
}
+ /*
+ * don't use colors when the header is intended for an
+ * external diff driver
+ */
+ use_color = DIFF_OPT_TST(o, COLOR_DIFF) && !pgm;
+
+ if (p->is_bulkmove) {
+ const char *set = diff_get_color(use_color, DIFF_METAINFO);
+ const char *reset = diff_get_color(use_color, DIFF_RESET);
+ struct strbuf *msgbuf;
+ char *line_prefix = "";
+
+ if (o->output_prefix) {
+ msgbuf = o->output_prefix(o, o->output_prefix_data);
+ line_prefix = msgbuf->buf;
+ }
+ fprintf(o->file, "%s%sdiff --git --detect-bulk-moves %s %s%s\n",
+ line_prefix, set, one->path, two->path, reset);
+ fprintf(o->file, "%s%ssimilarity index %d%%%s\n",
+ line_prefix, set, similarity_index(p), reset);
+ fprintf(o->file, "%s%sbulk move from %s*%s\n",
+ line_prefix, set, one->path, reset);
+ fprintf(o->file, "%s%sbulk move to %s%s\n",
+ line_prefix, set, two->path, reset);
+ return;
+ }
+
if (msg) {
- /*
- * don't use colors when the header is intended for an
- * external diff driver
- */
fill_metainfo(msg, name, other, one, two, o, p,
- &must_show_header,
- DIFF_OPT_TST(o, COLOR_DIFF) && !pgm);
+ &must_show_header, use_color);
xfrm_msg = msg->len ? msg->buf : NULL;
}
@@ -2748,8 +2771,10 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o)
return;
}
- diff_fill_sha1_info(one);
- diff_fill_sha1_info(two);
+ if (!p->is_bulkmove) {
+ diff_fill_sha1_info(one);
+ diff_fill_sha1_info(two);
+ }
if (!pgm &&
DIFF_FILE_VALID(one) && DIFF_FILE_VALID(two) &&
@@ -3529,9 +3554,10 @@ static void diff_flush_patch(struct diff_filepair *p, struct diff_options *o)
if (diff_unmodified_pair(p))
return;
- if ((DIFF_FILE_VALID(p->one) && S_ISDIR(p->one->mode)) ||
- (DIFF_FILE_VALID(p->two) && S_ISDIR(p->two->mode)))
- return; /* no tree diffs in patch format */
+ if (!p->is_bulkmove &&
+ ((DIFF_FILE_VALID(p->one) && S_ISDIR(p->one->mode)) ||
+ (DIFF_FILE_VALID(p->two) && S_ISDIR(p->two->mode))))
+ return; /* no tree diffs in patch format, except for bulk moves */
run_diff(p, o);
}
--
1.7.2.3
^ permalink raw reply related [flat|nested] 11+ messages in thread