* [PATCH v9] Detection of directory renames
@ 2010-12-09 21:38 Yann Dirson
2010-12-09 21:38 ` [PATCH 1/6] Introduce debug_bulkmove() in diffcore-rename Yann Dirson
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
To: git
This series requires:
- v6 of "generalizing sorted-array handling" series
- Jonathan Nieder's "compat: add memrchr()" patch,
aka. mid:<20101015051750.GA21830@burratino>
Changes since v8:
* use of the new sorted-array.h, which gives the main patch a diet
* split of debug func into its own patch
Next iteration will be merging the bulk-rm detection into this series.
Interested eyes can get a preview at t/* branches on
http://repo.or.cz/w/git/ydirson.git
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/6] Introduce debug_bulkmove() in diffcore-rename.
2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
@ 2010-12-09 21:38 ` Yann Dirson
2010-12-09 21:38 ` [PATCH 2/6] Introduce bulk-move detection in diffcore Yann Dirson
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
To: git; +Cc: Yann Dirson
This is an optional debug useful while developping new builk-* features.
Signed-off-by: Yann Dirson <ydirson@altern.org>
---
diffcore-rename.c | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/diffcore-rename.c b/diffcore-rename.c
index f7afdeb..e16fdeb 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -7,6 +7,22 @@
#include "hash.h"
#include "sorted-array.h"
+#define DEBUG_BULKMOVE 0
+
+#if DEBUG_BULKMOVE
+#define debug_bulkmove(args) __debug_bulkmove args
+void __debug_bulkmove(const char *fmt, ...)
+{
+ va_list ap;
+ va_start(ap, fmt);
+ fprintf(stderr, "[DBG] ");
+ vfprintf(stderr, fmt, ap);
+ va_end(ap);
+}
+#else
+#define debug_bulkmove(args) do { /*nothing */ } while (0)
+#endif
+
/* Table of rename/copy destinations */
struct diff_rename_dst {
--
1.7.2.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/6] Introduce bulk-move detection in diffcore.
2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
2010-12-09 21:38 ` [PATCH 1/6] Introduce debug_bulkmove() in diffcore-rename Yann Dirson
@ 2010-12-09 21:38 ` Yann Dirson
2010-12-09 21:38 ` [PATCH 3/6] Raw diff output format for bulk moves Yann Dirson
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
To: git; +Cc: Yann Dirson, Yann Dirson
This feature tries to group together files moving from and to
identical directories - a common case being directory renames.
This only adds the detection logic. The output of raw diff is displayed
as "Rnnn a/ b/", and unified diff does not display them at all. Output
formats will be refined later in the series.
It is implemented as a new pass in diffcore-rename, occuring after the
file renames get detected, grouping those renames looking like a move
of a full directory into some other place. It is activated by the new
--detect-bulk-moves diffcore flag.
Possible optimisations to this code include:
* avoid use of i_am_not_single by using a separate list
* use a more informative prefixcmp to avoid strcmp calls
eg. in discard_if_outside()
* optimize for bulk insertions (avoid useless successive memmove's)
Other future developements to be made on top of this include:
* detect bulk removals (well, that one is rather a subset than a layer above),
and possibly bulk additions
* detect bulk copies
* detect inexact bulk-moves/copies (where some files were not moved, or were
moved to a different place) - problem of computing a similarity score
* display as such the special case of directory move/rename
* application of such new diffs: issue a conflict, or just a warning ?
* teach git-svn (and others ?) to make use of that flag
* handle new conflict type "bulk-move/add"
* detect "directory splits" as well
* use inexact dir renames to bump score of below-threshold renames
from/to same locations
* support other types of bluk-grouping, like prefixes (see eg. kernel
5d1e859c), and maybe config-specified patterns
* add yours here
This patch has been improved by the following contributions:
- Jonathan Nieder: better implementation of copy_dirname()
- Jonathan Nieder: portable implementation of memrchr() in another patch
- Junio C Hamano: split individual renames hiding under control of another flag
- Junio C Hamano: coding style issues
- Junio C Hamano: better examples
- Ævar Arnfjörð Bjarmason: Don't use C99 comments.
- Jonathan Nieder: just too many other helpful suggestions to list them all
Thanks-to: Jonathan Nieder <jrnieder@gmail.com>
Thanks-to: Junio C Hamano <gitster@pobox.com>
Thanks-to: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Yann Dirson <ydirson@free.fr>
---
Documentation/diff-options.txt | 4 +
Documentation/gitdiffcore.txt | 12 ++
diff-lib.c | 6 +-
diff.c | 5 +
diff.h | 3 +
diffcore-rename.c | 322 ++++++++++++++++++++++++++++++++++++++--
diffcore.h | 1 +
tree-diff.c | 4 +-
8 files changed, 340 insertions(+), 17 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f3e9538..186cd6f 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -245,6 +245,10 @@ endif::git-log[]
delete/add pair to be a rename if more than 90% of the file
hasn't changed.
+--detect-bulk-moves::
+ Detect bulk move of all files of a directory into a
+ different one.
+
-C[<n>]::
--detect-copies[=<n>]::
Detect copies as well as renames. See also `--find-copies-harder`.
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 6af29a4..93111ac 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -175,6 +175,18 @@ the expense of making it slower. Without `\--find-copies-harder`,
'git diff-{asterisk}' commands can detect copies only if the file that was
copied happened to have been modified in the same changeset.
+Bulk move of all files of a directory into a different one can get
+detected using the `\--detect-bulk-moves` option. This adds an
+additional pass on top of the results of per-file rename detection.
+They are reported with NULL SHA1 id, in addition to the file renames:
+
+------------------------------------------------
+:040000 040000 0000000... 0000000... R100 foo/ bar/
+:100644 100644 0123456... 1234567... R090 foo/file0 bar/file3
+:100644 100644 2345678... 2345678... R100 foo/file1 bar/file1
+:100644 100644 3456789... 3456789... R100 foo/file2 bar/file2
+------------------------------------------------
+
diffcore-merge-broken: For Putting "Complete Rewrites" Back Together
--------------------------------------------------------------------
diff --git a/diff-lib.c b/diff-lib.c
index 392ce2b..5ec3ddc 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
ce_option, &dirty_submodule);
if (!changed && !dirty_submodule) {
ce_mark_uptodate(ce);
- if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+ if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+ !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
continue;
}
oldmode = ce->ce_mode;
@@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs,
oldmode = old->ce_mode;
if (mode == oldmode && !hashcmp(sha1, old->sha1) && !dirty_submodule &&
- !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+ !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+ !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
return 0;
diff_change(&revs->diffopt, oldmode, mode,
diff --git a/diff.c b/diff.c
index db5e844..d64ae44 100644
--- a/diff.c
+++ b/diff.c
@@ -3200,6 +3200,11 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
DIFF_OPT_SET(options, REVERSE_DIFF);
else if (!strcmp(arg, "--find-copies-harder"))
DIFF_OPT_SET(options, FIND_COPIES_HARDER);
+ else if (!strcmp(arg, "--detect-bulk-moves")) {
+ DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+ if (!options->detect_rename)
+ options->detect_rename = DIFF_DETECT_RENAME;
+ }
else if (!strcmp(arg, "--follow"))
DIFF_OPT_SET(options, FOLLOW_RENAMES);
else if (!strcmp(arg, "--color"))
diff --git a/diff.h b/diff.h
index 0083d92..1e2506c 100644
--- a/diff.h
+++ b/diff.h
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
#define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
#define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
#define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
+#define DIFF_OPT_DETECT_BULK_MOVES (1 << 28)
#define DIFF_OPT_TST(opts, flag) ((opts)->flags & DIFF_OPT_##flag)
#define DIFF_OPT_SET(opts, flag) ((opts)->flags |= DIFF_OPT_##flag)
@@ -265,6 +266,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
" -C detect copies.\n" \
" --find-copies-harder\n" \
" try unchanged files as candidate for copy detection.\n" \
+" --detect-bulk-moves\n" \
+" detect moves of all files of a single directory.\n" \
" -l<n> limit rename attempts up to <n> paths.\n" \
" -O<file> reorder diffs according to the <file>.\n" \
" -S<string> find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index e16fdeb..44df490 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -28,6 +28,7 @@ void __debug_bulkmove(const char *fmt, ...)
struct diff_rename_dst {
struct diff_filespec *two;
struct diff_filepair *pair;
+ unsigned i_am_not_single:1; /* does not look for a match, only here to be looked at */
};
static int rename_dst_cmp(struct diff_filespec *ref_spec, struct diff_rename_dst *elem)
@@ -39,14 +40,23 @@ static void rename_dst_init(struct diff_rename_dst *elem, struct diff_filespec *
elem->two = alloc_filespec(ref_spec->path);
fill_filespec(elem->two, ref_spec->sha1, ref_spec->mode);
elem->pair = NULL;
+ elem->i_am_not_single = 0;
}
declare_sorted_array(static, struct diff_rename_dst, rename_dst);
declare_sorted_array_search_elem(static, struct diff_rename_dst, locate_rename_dst,
struct diff_filespec *,
rename_dst, rename_dst_cmp);
-declare_sorted_array_insert_checkbool(static, struct diff_rename_dst, register_rename_dst,
- struct diff_filespec *, _gen_locate_rename_dst,
- rename_dst, rename_dst_cmp, rename_dst_init);
+declare_sorted_array_insert_elem(static, struct diff_rename_dst, register_rename_dst,
+ struct diff_filespec *, _gen_locate_rename_dst,
+ rename_dst, rename_dst_cmp, rename_dst_init);
+
+static int rename_dst_dircmp(const char *ref_dirname, struct diff_rename_dst *elem)
+{
+ // FIXME: calls strlen many times - but maybe the compiler optimizes ?
+ return strncmp(ref_dirname, elem->two->path, strlen(ref_dirname));
+}
+declare_sorted_array_search_elem(static, struct diff_rename_dst, locate_rename_dst_dir,
+ const char *, rename_dst, rename_dst_dircmp);
/* Table of rename/copy src files */
@@ -361,8 +371,11 @@ static int find_exact_renames(void)
for (i = 0; i < rename_src_nr; i++)
insert_file_table(&file_table, -1, i, rename_src[i].one);
- for (i = 0; i < rename_dst_nr; i++)
+ for (i = 0; i < rename_dst_nr; i++) {
+ if (rename_dst[i].i_am_not_single)
+ continue;
insert_file_table(&file_table, 1, i, rename_dst[i].two);
+ }
/* Find the renames */
i = for_each_hash(&file_table, find_same_files);
@@ -389,6 +402,253 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
m[worst] = *o;
}
+struct diff_bulk_rename {
+ struct diff_filespec *one;
+ struct diff_filespec *two;
+ int discarded;
+};
+
+struct diff_bulk_rename_candidate {
+ char one_path[PATH_MAX];
+ char two_path[PATH_MAX];
+};
+static int bulkmove_candidates_cmp(struct diff_bulk_rename_candidate *ref_candidate,
+ struct diff_bulk_rename *elem)
+{
+ int cmp = strcmp(ref_candidate->one_path, elem->one->path);
+ if (!cmp)
+ cmp = strcmp(ref_candidate->two_path, elem->two->path);
+ return cmp;
+}
+static void bulkmove_candidates_init(struct diff_bulk_rename *elem,
+ struct diff_bulk_rename_candidate *ref_candidate)
+{
+ elem->one = alloc_filespec(ref_candidate->one_path);
+ fill_filespec(elem->one, null_sha1, S_IFDIR);
+ elem->two = alloc_filespec(ref_candidate->two_path);
+ fill_filespec(elem->two, null_sha1, S_IFDIR);
+ elem->discarded = 0;
+}
+declare_sorted_array(static, struct diff_bulk_rename, bulkmove_candidates);
+declare_sorted_array_insertonly_elem(static, struct diff_bulk_rename, register_bulkmove_candidate,
+ struct diff_bulk_rename_candidate *, bulkmove_candidates,
+ bulkmove_candidates_cmp, bulkmove_candidates_init);
+
+/*
+ * Copy dirname of src into dst, suitable to append a filename without
+ * an additional "/".
+ * Only handles relative paths since there is no absolute path in a git repo.
+ * Writes "" when there is no "/" in src.
+ * May overwrite more chars than really needed, if src ends with a "/".
+ * Supports in-place modification of src by passing dst == src.
+ */
+static const char *copy_dirname(char *dst, const char *src)
+{
+ size_t len = strlen(src);
+ const char *slash;
+ char *end;
+
+ if (len > 0 && src[len - 1] == '/')
+ /* Trailing slash. Ignore it. */
+ len--;
+
+ slash = memrchr(src, '/', len);
+ if (!slash) {
+ *dst = '\0';
+ return dst;
+ }
+
+ if (dst == src)
+ dst[slash - src + 1] = '\0';
+ else {
+ end = mempcpy(dst, src, slash - src + 1);
+ *end = '\0';
+ }
+ return dst;
+}
+
+// FIXME: leaks like hell.
+/* See if the fact that one_leftover exists under one_parent_path in
+ * dst tree should disqualify one_parent_path from bulkmove eligibility.
+ * Return 1 if it disqualifies, 0 if it is OK.
+ */
+static int dispatched_to_different_dirs(const char *one_parent_path)
+{
+ /* this might be a dir split, or files added
+ * after the bulk move, or just an isolated
+ * rename */
+ int two_idx, j, onep_len, maybe_dir_rename;
+ struct diff_rename_dst *one_leftover =
+ one_leftover = locate_rename_dst_dir(one_parent_path);
+
+ if (!one_leftover)
+ return 0;
+
+ /* try to see if it is a file added after the bulk move */
+ two_idx = one_leftover - rename_dst;
+ onep_len = strlen(one_parent_path);
+ maybe_dir_rename = 1;
+
+ /* check no leftover file was already here before */
+ for (j = two_idx; j < rename_dst_nr; j++) {
+ if (strncmp(rename_dst[j].two->path,
+ one_parent_path, onep_len))
+ break; /* exhausted directory in this direction */
+ debug_bulkmove(("leftover file %s in %s\n",
+ rename_dst[j].two->path, one_parent_path));
+ if (rename_dst[j].i_am_not_single || /* those were already here */
+ (rename_dst[j].pair &&
+ !strncmp(rename_dst[j].pair->one->path,
+ one_parent_path, onep_len) && /* renamed from here */
+ strncmp(rename_dst[j].two->path,
+ one_parent_path, onep_len))) { /* not to a subdir */
+ maybe_dir_rename = 0;
+ debug_bulkmove(("... tells not a bulk move\n"));
+ break;
+ }
+ debug_bulkmove(("... not believed to prevent bulk move\n"));
+ }
+ if (!maybe_dir_rename)
+ return 1;
+ /* try the other direction (dup code) */
+ for (j = two_idx-1; j >= 0; j--) {
+ if (strncmp(rename_dst[j].two->path,
+ one_parent_path, onep_len))
+ break; /* exhausted directory in this direction */
+ debug_bulkmove(("leftover file %s in '%s'\n",
+ rename_dst[j].two->path, one_parent_path));
+ if (rename_dst[j].i_am_not_single || /* those were already here */
+ (rename_dst[j].pair &&
+ !strncmp(rename_dst[j].pair->one->path,
+ one_parent_path, onep_len) && /* renamed from here */
+ strncmp(rename_dst[j].two->path,
+ one_parent_path, onep_len))) { /* not to a subdir */
+ maybe_dir_rename = 0;
+ debug_bulkmove(("... tells not a bulk move\n"));
+ break;
+ }
+ debug_bulkmove(("... not believed to prevent bulk move\n"));
+ }
+ if (!maybe_dir_rename)
+ return 1;
+
+ /* Here we are in the case where a directory
+ * content was completely moved, but files
+ * were added to it afterwards. Proceed as
+ * for a simple bulk move. */
+ return 0;
+}
+
+/*
+ * Assumes candidate->one is a subdir of seen->one, mark 'seen' as
+ * discarded if candidate->two is outside seen->two. Also mark
+ * 'candidate' itself as discarded if the conflict implies so.
+ *
+ * Return 1 if 'seen' was discarded
+ */
+static int discard_if_outside(struct diff_bulk_rename *candidate,
+ struct diff_bulk_rename *seen)
+{
+ if (!prefixcmp(candidate->two->path, seen->two->path)) {
+ debug_bulkmove((" 'dstpair' conforts 'seen'\n"));
+ return 0;
+ }
+
+ debug_bulkmove(("discarding %s -> %s from bulk moves (split into %s and %s)\n",
+ seen->one->path, seen->two->path,
+ candidate->two->path, seen->two->path));
+ seen->discarded = 1;
+ /* Need to discard dstpair as well, unless moving from
+ * a strict subdir of seen->one or to a strict subdir
+ * of seen->two */
+ if (!strcmp(seen->one->path, candidate->one->path) &&
+ prefixcmp(seen->two->path, candidate->two->path)) {
+ debug_bulkmove(("... and not adding self\n"));
+ candidate->discarded = 1;
+ }
+ return 1;
+}
+
+/*
+ * Check if the rename specified by "dstpair" could cause a
+ * bulk move to be detected, record it in bulkmove_candidates if yes.
+ */
+static void check_one_bulk_move(struct diff_filepair *dstpair)
+{
+ struct diff_bulk_rename_candidate rawcandidate;
+
+ /* genuine new files (or believed to be so) */
+ if (!dstpair)
+ return;
+ /* dummy renames used by copy detection */
+ if (!strcmp(dstpair->one->path, dstpair->two->path))
+ return;
+
+ copy_dirname(rawcandidate.one_path, dstpair->one->path);
+ copy_dirname(rawcandidate.two_path, dstpair->two->path);
+
+ /* simple rename with no directory change */
+ if (!strcmp(rawcandidate.one_path, rawcandidate.two_path))
+ return;
+
+ debug_bulkmove(("[] %s -> %s ?\n", dstpair->one->path, dstpair->two->path));
+
+ /* loop up rawcandidate.one_path over successive parents */
+ // FIXME: also loop over rawcandidate.two_path prefixes
+ do {
+ struct diff_bulk_rename *seen;
+ int old_nr = bulkmove_candidates_nr;
+ struct diff_bulk_rename *candidate =
+ register_bulkmove_candidate(&rawcandidate);
+ debug_bulkmove(("[[]] %s ...\n", rawcandidate.one_path));
+ if (old_nr == bulkmove_candidates_nr) {
+ debug_bulkmove((" already seen\n"));
+ return;
+ }
+
+ /* After this commit, are there any files still under rawcandidate.one_path ?
+ * Any file left would disqualifies this dir for bulk move.
+ */
+ if (dispatched_to_different_dirs(rawcandidate.one_path)) {
+ // FIXME: check overlap with discard_if_outside()
+ candidate->discarded = 1;
+ return;
+ }
+
+ /* walk up for rawcandidate.one_path prefixes */
+ for (seen = candidate-1; (seen >= bulkmove_candidates) &&
+ !prefixcmp(rawcandidate.one_path, seen->one->path); seen--) {
+ debug_bulkmove((" ? %s -> %s\n", seen->one->path, seen->two->path));
+ /* subdir of "seen" dest dir ? */
+ if (discard_if_outside(candidate, seen))
+ continue;
+ }
+ /* look down for other moves from rawcandidate.one_path */
+ seen = candidate + 1;
+ if (seen != bulkmove_candidates + bulkmove_candidates_nr &&
+ !strcmp(rawcandidate.one_path, seen->one->path)) {
+ debug_bulkmove((" ? %s -> %s (2)\n", seen->one->path, seen->two->path));
+ /* subdir of "seen" dest dir ? */
+ if (discard_if_outside(candidate, seen))
+ continue;
+ }
+
+ /* next parent if any */
+ copy_dirname(rawcandidate.one_path, rawcandidate.one_path);
+ } while (*rawcandidate.one_path);
+}
+
+/*
+ * Take all file renames recorded so far and check if they could cause
+ * a bulk move to be detected.
+ */
+static void diffcore_bulk_moves(void)
+{
+ int i;
+ for (i = 0; i < rename_dst_nr; i++)
+ check_one_bulk_move(rename_dst[i].pair);
+}
+
void diffcore_rename(struct diff_options *options)
{
int detect_rename = options->detect_rename;
@@ -399,6 +659,7 @@ void diffcore_rename(struct diff_options *options)
struct diff_score *mx;
int i, j, rename_count;
int num_create, num_src, dst_cnt;
+ struct diff_bulk_rename *candidate;
if (!minimum_score)
minimum_score = DEFAULT_RENAME_SCORE;
@@ -413,8 +674,7 @@ void diffcore_rename(struct diff_options *options)
continue; /* not interested */
else
register_rename_dst(p->two);
- }
- else if (!DIFF_FILE_VALID(p->two)) {
+ } else if (!DIFF_FILE_VALID(p->two)) {
/*
* If the source is a broken "delete", and
* they did not really want to get broken,
@@ -425,14 +685,23 @@ void diffcore_rename(struct diff_options *options)
if (p->broken_pair && !p->score)
p->one->rename_used++;
register_rename_src(p);
- }
- else if (detect_rename == DIFF_DETECT_COPY) {
- /*
- * Increment the "rename_used" score by
- * one, to indicate ourselves as a user.
- */
- p->one->rename_used++;
- register_rename_src(p);
+ } else {
+ if (detect_rename == DIFF_DETECT_COPY) {
+ /*
+ * Increment the "rename_used" score by
+ * one, to indicate ourselves as a user.
+ */
+ p->one->rename_used++;
+ register_rename_src(p);
+ }
+ if (DIFF_OPT_TST(options, DETECT_BULK_MOVES)) {
+ /* similarly, bulk move detection needs to
+ * see all files from second tree, but we don't
+ * want them to be matched against single sources.
+ */
+ // FIXME: check interaction with --find-copies-harder
+ register_rename_dst(p->two)->i_am_not_single = 1;
+ }
}
}
if (rename_dst_nr == 0 || rename_src_nr == 0)
@@ -484,6 +753,8 @@ void diffcore_rename(struct diff_options *options)
if (rename_dst[i].pair)
continue; /* dealt with exact match already. */
+ if (rename_dst[i].i_am_not_single)
+ continue; /* not looking for a match. */
m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST];
for (j = 0; j < NUM_CANDIDATE_PER_DST; j++)
@@ -544,7 +815,30 @@ void diffcore_rename(struct diff_options *options)
/* At this point, we have found some renames and copies and they
* are recorded in rename_dst. The original list is still in *q.
*/
+
+ /* Now possibly factorize those renames and copies. */
+ if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+ diffcore_bulk_moves();
+
DIFF_QUEUE_CLEAR(&outq);
+
+ /* Now turn non-discarded bulkmove_candidates into real renames */
+ for (candidate = bulkmove_candidates;
+ candidate < bulkmove_candidates + bulkmove_candidates_nr; candidate++) {
+ struct diff_filepair* pair;
+ if (candidate->discarded)
+ continue;
+ /* visualize toplevel dir if needed */
+ if (!*candidate->one->path)
+ candidate->one->path = "./";
+ if (!*candidate->two->path)
+ candidate->two->path = "./";
+ pair = diff_queue(&outq, candidate->one, candidate->two);
+ pair->score = MAX_SCORE;
+ pair->renamed_pair = 1;
+ pair->is_bulkmove = 1;
+ }
+
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct diff_filepair *pair_to_free = NULL;
diff --git a/diffcore.h b/diffcore.h
index b8f1fde..6dab95b 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
unsigned broken_pair : 1;
unsigned renamed_pair : 1;
unsigned is_unmerged : 1;
+ unsigned is_bulkmove : 1;
};
#define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
diff --git a/tree-diff.c b/tree-diff.c
index 12c9a88..89cedd4 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -49,7 +49,9 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
show_entry(opt, "+", t2, base, baselen);
return 1;
}
- if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) && mode1 == mode2)
+ if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) &&
+ !DIFF_OPT_TST(opt, DETECT_BULK_MOVES) &&
+ !hashcmp(sha1, sha2) && mode1 == mode2)
return 0;
/*
--
1.7.2.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/6] Raw diff output format for bulk moves.
2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
2010-12-09 21:38 ` [PATCH 1/6] Introduce debug_bulkmove() in diffcore-rename Yann Dirson
2010-12-09 21:38 ` [PATCH 2/6] Introduce bulk-move detection in diffcore Yann Dirson
@ 2010-12-09 21:38 ` Yann Dirson
2010-12-09 21:38 ` [PATCH 4/6] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
To: git; +Cc: Yann Dirson, Yann Dirson
To distinguish the general bulk-move case (where the destination
directory was pre-existing) from the directory-rename case (where it
was not), the output of raw diff is displayed as "Rnnn a/* b/". Those
cannot be confused with renames of files named "whatever/*" with a
literal star character, from the full-zero SHA1's.
Signed-off-by: Yann Dirson <ydirson@free.fr>
---
Documentation/gitdiffcore.txt | 2 +-
diff.c | 9 ++++++++-
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 93111ac..2538dc0 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -181,7 +181,7 @@ additional pass on top of the results of per-file rename detection.
They are reported with NULL SHA1 id, in addition to the file renames:
------------------------------------------------
-:040000 040000 0000000... 0000000... R100 foo/ bar/
+:040000 040000 0000000... 0000000... R100 foo/* bar/
:100644 100644 0123456... 1234567... R090 foo/file0 bar/file3
:100644 100644 2345678... 2345678... R100 foo/file1 bar/file1
:100644 100644 3456789... 3456789... R100 foo/file2 bar/file2
diff --git a/diff.c b/diff.c
index d64ae44..0694d7f 100644
--- a/diff.c
+++ b/diff.c
@@ -3499,7 +3499,14 @@ static void diff_flush_raw(struct diff_filepair *p, struct diff_options *opt)
if (p->status == DIFF_STATUS_COPIED ||
p->status == DIFF_STATUS_RENAMED) {
const char *name_a, *name_b;
- name_a = p->one->path;
+ if (p->is_bulkmove) {
+ /* append "*" to the first dirname */
+ char buf[PATH_MAX];
+ char *next = memccpy(buf, p->one->path, '\0', PATH_MAX);
+ next[-1] = '*'; *next = '\0';
+ name_a = buf;
+ } else
+ name_a = p->one->path;
name_b = p->two->path;
strip_prefix(opt->prefix_length, &name_a, &name_b);
write_name_quoted(name_a, opt->file, inter_name_termination);
--
1.7.2.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/6] Add testcases for the --detect-bulk-moves diffcore flag.
2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
` (2 preceding siblings ...)
2010-12-09 21:38 ` [PATCH 3/6] Raw diff output format for bulk moves Yann Dirson
@ 2010-12-09 21:38 ` Yann Dirson
2010-12-09 21:38 ` [PATCH 5/6] Unified diff output format for bulk moves Yann Dirson
2010-12-09 21:38 ` [PATCH 6/6] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson
5 siblings, 0 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
To: git; +Cc: Yann Dirson, Yann Dirson
This notably includes a couple of tests for cases known not to be
working correctly yet.
This patch has been improved by the following contributions:
- Jonathan Nieder: reworked style of test script
- Jonathan Nieder: use "git commit" in test instead of only plumbing,
and use test_tick
- Sverre Rabbelier: anonymize hashes
Thanks-to: Jonathan Nieder <jrnieder@gmail.com>
Thanks-to: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Yann Dirson <ydirson@free.fr>
---
t/t4046-diff-bulk-move.sh | 296 +++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 296 insertions(+), 0 deletions(-)
create mode 100755 t/t4046-diff-bulk-move.sh
diff --git a/t/t4046-diff-bulk-move.sh b/t/t4046-diff-bulk-move.sh
new file mode 100755
index 0000000..4b1c78e
--- /dev/null
+++ b/t/t4046-diff-bulk-move.sh
@@ -0,0 +1,296 @@
+#!/bin/sh
+#
+# Copyright (c) 2008,2010 Yann Dirson
+# Copyright (c) 2005 Junio C Hamano
+#
+
+# TODO for dir renames:
+# * two dirs or more moving all their files to a single dir
+# * simultaneous bulkmove and rename
+
+test_description='Test rename factorization in diff engine.
+
+'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
+
+test_expect_success 'setup' '
+ git commit --allow-empty -m "original empty commit"
+
+ mkdir a &&
+ printf "Line %s\n" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >a/path0 &&
+ sed <a/path0 >a/path1 s/Line/Record/ &&
+ sed <a/path0 >a/path2 s/Line/Stuff/ &&
+ sed <a/path0 >a/path3 s/Line/Blurb/ &&
+
+ git update-index --add a/path* &&
+ test_tick &&
+ git commit -m "original set of files" &&
+
+ : rename the directory &&
+ git mv a b
+'
+test_expect_success 'diff-index --detect-bulk-moves after directory move.' '
+ cat >expected <<-EOF &&
+ :040000 040000 X X R# a/* b/
+ :100644 100644 X X R# a/path0 b/path0
+ :100644 100644 X X R# a/path1 b/path1
+ :100644 100644 X X R# a/path2 b/path2
+ :100644 100644 X X R# a/path3 b/path3
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+
+test_expect_success 'setup non-100% rename' '
+ echo "Line 16" >>b/path0 &&
+ git mv b/path2 b/2path &&
+ git rm -f b/path3 &&
+ echo anything >b/path100 &&
+ git add b/path100
+'
+test_expect_success 'diff-index --detect-bulk-moves after content changes.' '
+ cat >expected <<-EOF &&
+ :040000 040000 X X R# a/* b/
+ :100644 000000 X X D# a/path3
+ :100644 100644 X X R# a/path2 b/2path
+ :100644 100644 X X R# a/path0 b/path0
+ :100644 100644 X X R# a/path1 b/path1
+ :000000 100644 X X A# b/path100
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+
+test_expect_success 'setup bulk move that is not directory move' '
+ git reset -q --hard &&
+
+ mkdir c &&
+ (
+ for i in 0 1 2; do
+ cp a/path$i c/apath$i || exit
+ done
+ ) &&
+ git update-index --add c/apath* &&
+ test_tick &&
+ git commit -m "first set of changes" &&
+
+ git mv c/* a/
+'
+test_expect_success 'diff-index --detect-bulk-moves without full-dir rename.' '
+ cat >expected <<-EOF &&
+ :040000 040000 X X R# c/* a/
+ :100644 100644 X X R# c/apath0 a/apath0
+ :100644 100644 X X R# c/apath1 a/apath1
+ :100644 100644 X X R# c/apath2 a/apath2
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+
+test_expect_success 'setup bulk move with new file in source dir' '
+ echo > c/anotherpath "How much wood?" &&
+ git update-index --add c/another*
+'
+test_expect_success 'diff-index --detect-bulk-moves with new file in source dir.' '
+ cat >expected <<-EOF &&
+ :040000 040000 X X R# c/* a/
+ :100644 100644 X X R# c/apath0 a/apath0
+ :100644 100644 X X R# c/apath1 a/apath1
+ :100644 100644 X X R# c/apath2 a/apath2
+ :000000 100644 X X A# c/anotherpath
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+
+test_expect_success 'setup bulk move with interfering copy' '
+ rm c/anotherpath &&
+ git update-index --remove c/anotherpath &&
+ mkdir b &&
+ cp a/apath0 b/apath9 &&
+ echo >> a/apath0 "more" &&
+ git update-index --add a/apath0 b/apath9
+'
+# scores select the "wrong" one as "moved" (only a suboptimal detection)
+test_expect_failure 'diff-index --detect-bulk-moves with interfering copy.' '
+ cat >expected <<-EOF &&
+ :040000 040000 X X R# c/* a/
+ :100644 100644 X X R# c/apath0 a/apath0
+ :100644 100644 X X R# c/apath1 a/apath1
+ :100644 100644 X X R# c/apath2 a/apath2
+ :100644 100644 X X C# c/apath0 b/apath9
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+
+test_expect_success 'setup bulk move to toplevel' '
+ git reset -q --hard &&
+ git mv c/* .
+'
+test_expect_success 'diff-index --detect-bulk-moves bulk move to toplevel.' '
+ cat >expected <<-EOF &&
+ :040000 040000 X X R# c/* ./
+ :100644 100644 X X R# c/apath0 apath0
+ :100644 100644 X X R# c/apath1 apath1
+ :100644 100644 X X R# c/apath2 apath2
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+
+test_expect_success 'setup move including a subdir, with some content changes' '
+ git reset -q --hard &&
+ mv c a/ &&
+ git update-index --add --remove a/c/* c/apath0 c/apath1 c/apath2 &&
+ test_tick &&
+ git commit -m "move as subdir" &&
+
+ git mv a b &&
+ echo foo >>b/c/apath0 &&
+ git update-index --add b/c/apath*
+'
+test_expect_success 'diff-index --detect-bulk-moves on a move including a subdir.' '
+ cat >expected <<-EOF &&
+ :040000 040000 X X R# a/* b/
+ :040000 040000 X X R# a/c/* b/c/
+ :100644 100644 X X R# a/c/apath0 b/c/apath0
+ :100644 100644 X X R# a/c/apath1 b/c/apath1
+ :100644 100644 X X R# a/c/apath2 b/c/apath2
+ :100644 100644 X X R# a/path0 b/path0
+ :100644 100644 X X R# a/path1 b/path1
+ :100644 100644 X X R# a/path2 b/path2
+ :100644 100644 X X R# a/path3 b/path3
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+
+test_expect_success 'setup move of only a subdir' '
+ git reset -q --hard &&
+ : rename a subdirectory of a/. &&
+ git mv a/c a/d
+'
+test_expect_success 'moving a subdir only' '
+ cat >expected <<-EOF &&
+ :040000 040000 X X R# a/c/* a/d/
+ :100644 100644 X X R# a/c/apath0 a/d/apath0
+ :100644 100644 X X R# a/c/apath1 a/d/apath1
+ :100644 100644 X X R# a/c/apath2 a/d/apath2
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+
+test_expect_success 'setup move without a subdir' '
+ git reset -q --hard &&
+ mkdir b &&
+ : rename files in the directory but not subdir. &&
+ git mv a/path* b/
+'
+test_expect_success 'moving files but not subdirs is not mistaken for dir move' '
+ cat >expected <<-EOF &&
+ :100644 100644 X X R# a/path0 b/path0
+ :100644 100644 X X R# a/path1 b/path1
+ :100644 100644 X X R# a/path2 b/path2
+ :100644 100644 X X R# a/path3 b/path3
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+
+test_expect_success 'setup move of files and subdirs to different places' '
+ git reset -q --hard &&
+ git mv a/c b &&
+ git mv a d
+'
+test_expect_success 'moving subdirs into one dir and files into another is not mistaken for dir move' '
+ cat >expected <<-EOF &&
+ :040000 040000 X X R# a/c/* b/
+ :100644 100644 X X R# a/c/apath0 b/apath0
+ :100644 100644 X X R# a/c/apath1 b/apath1
+ :100644 100644 X X R# a/c/apath2 b/apath2
+ :100644 100644 X X R# a/path0 d/path0
+ :100644 100644 X X R# a/path1 d/path1
+ :100644 100644 X X R# a/path2 d/path2
+ :100644 100644 X X R# a/path3 d/path3
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+
+# the same with different ordering
+test_expect_success 'setup move of files and subdirs to different places' '
+ git mv d 0
+'
+test_expect_success 'moving subdirs into one dir and files into another is not mistaken for dir move' '
+ cat >expected <<-EOF &&
+ :040000 040000 X X R# a/c/* b/
+ :100644 100644 X X R# a/path0 0/path0
+ :100644 100644 X X R# a/path1 0/path1
+ :100644 100644 X X R# a/path2 0/path2
+ :100644 100644 X X R# a/path3 0/path3
+ :100644 100644 X X R# a/c/apath0 b/apath0
+ :100644 100644 X X R# a/c/apath1 b/apath1
+ :100644 100644 X X R# a/c/apath2 b/apath2
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+
+test_expect_success 'setup move of dir with only subdirs' '
+ git reset -q --hard &&
+ mkdir a/b &&
+ mv a/path* a/b/ &&
+ git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/b/path* &&
+ test_tick &&
+ git commit -m "move all toplevel files down one level" &&
+
+ git mv a z
+'
+# TODO: only a suboptimal non-detection
+test_expect_failure 'moving a dir with no direct children files' '
+ cat >expected <<-EOF &&
+ :040000 040000 X X R# a/* z/
+ :040000 040000 X X R# a/b/* z/b/
+ :040000 040000 X X R# a/c/* z/c/
+ :100644 100644 X X R# a/b/path0 z/b/path0
+ :100644 100644 X X R# a/b/path1 z/b/path1
+ :100644 100644 X X R# a/b/path2 z/b/path2
+ :100644 100644 X X R# a/b/path3 z/b/path3
+ :100644 100644 X X R# a/c/apath0 z/c/apath0
+ :100644 100644 X X R# a/c/apath1 z/c/apath1
+ :100644 100644 X X R# a/c/apath2 z/c/apath2
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+# now test moving all files from toplevel into subdir (does not hides file moves) (needs consensus on syntax)
+# Note: this is a special case of move of a dir into one of its own subdirs, which in
+# turn is a variant of new files/dirs being added into a dir after all its contents
+# are moved away
+
+test_expect_success 'setup move from toplevel to subdir' '
+ git reset -q --hard HEAD~3 &&
+ mv a/* . &&
+ git update-index --add --remove a/path0 a/path1 a/path2 a/path3 path* &&
+ test_tick &&
+ git commit -m "move all files to toplevel" &&
+
+ mkdir z &&
+ git mv path* z/
+'
+test_expect_success '--detect-bulk-moves everything from toplevel.' '
+ cat >expected <<-EOF &&
+ :040000 040000 X X R# ./* z/
+ :100644 100644 X X R# path0 z/path0
+ :100644 100644 X X R# path1 z/path1
+ :100644 100644 X X R# path2 z/path2
+ :100644 100644 X X R# path3 z/path3
+ EOF
+ git diff-index --detect-bulk-moves HEAD >current &&
+ compare_diff_raw expected current
+'
+
+test_done
--
1.7.2.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 5/6] Unified diff output format for bulk moves.
2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
` (3 preceding siblings ...)
2010-12-09 21:38 ` [PATCH 4/6] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
@ 2010-12-09 21:38 ` Yann Dirson
2010-12-09 21:38 ` [PATCH 6/6] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson
5 siblings, 0 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
To: git; +Cc: Yann Dirson, Yann Dirson
The output produced is as shown below.
diff --git-detect-bulk-moves mips/nxp/pnx833x/common/ mips/pnx833x/common/
bulk move with similarity index 100%
bulk move from mips/nxp/pnx833x/common/
bulk move to mips/pnx833x/common/
Signed-off-by: Yann Dirson <ydirson@free.fr>
---
Documentation/diff-generate-patch.txt | 19 +++++++++++++
diff.c | 48 +++++++++++++++++++++++++-------
2 files changed, 56 insertions(+), 11 deletions(-)
diff --git a/Documentation/diff-generate-patch.txt b/Documentation/diff-generate-patch.txt
index 3ac2bea..367e859 100644
--- a/Documentation/diff-generate-patch.txt
+++ b/Documentation/diff-generate-patch.txt
@@ -71,6 +71,25 @@ separate lines indicate the old and the new mode.
rename to a
+bulk move entries
+-----------------
+
+When a bulk move is detected, a special block is output in addition to
+the renames that constitute the bulk move, looking like this:
+
+ diff --git-detect-bulk-moves a/dir1/ b/dir2/
+
+It is essentially similar to the standard diff header, with a special
+syntax showing we are describing a difference between two directories,
+and not a change to be applied as-is to a file.
+
+It is followed by three specific extended header lines:
+
+ bulk move with similarity index <number>
+ bulk move from <dir1>/
+ bulk move to <dir2>/
+
+
combined diff format
--------------------
diff --git a/diff.c b/diff.c
index 0694d7f..551fab7 100644
--- a/diff.c
+++ b/diff.c
@@ -2668,6 +2668,7 @@ static void run_diff_cmd(const char *pgm,
const char *xfrm_msg = NULL;
int complete_rewrite = (p->status == DIFF_STATUS_MODIFIED) && p->score;
int must_show_header = 0;
+ int use_color;
if (!DIFF_OPT_TST(o, ALLOW_EXTERNAL))
pgm = NULL;
@@ -2677,14 +2678,36 @@ static void run_diff_cmd(const char *pgm,
pgm = drv->external;
}
+ /*
+ * don't use colors when the header is intended for an
+ * external diff driver
+ */
+ use_color = DIFF_OPT_TST(o, COLOR_DIFF) && !pgm;
+
+ if (p->is_bulkmove) {
+ const char *set = diff_get_color(use_color, DIFF_METAINFO);
+ const char *reset = diff_get_color(use_color, DIFF_RESET);
+ struct strbuf *msgbuf;
+ char *line_prefix = "";
+
+ if (o->output_prefix) {
+ msgbuf = o->output_prefix(o, o->output_prefix_data);
+ line_prefix = msgbuf->buf;
+ }
+ fprintf(o->file, "%s%sdiff --git-detect-bulk-moves %s %s%s\n",
+ line_prefix, set, one->path, two->path, reset);
+ fprintf(o->file, "%s%sbulk move with similarity index %d%%%s\n",
+ line_prefix, set, similarity_index(p), reset);
+ fprintf(o->file, "%s%sbulk move from %s%s\n",
+ line_prefix, set, one->path, reset);
+ fprintf(o->file, "%s%sbulk move to %s%s\n",
+ line_prefix, set, two->path, reset);
+ return;
+ }
+
if (msg) {
- /*
- * don't use colors when the header is intended for an
- * external diff driver
- */
fill_metainfo(msg, name, other, one, two, o, p,
- &must_show_header,
- DIFF_OPT_TST(o, COLOR_DIFF) && !pgm);
+ &must_show_header, use_color);
xfrm_msg = msg->len ? msg->buf : NULL;
}
@@ -2757,8 +2780,10 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o)
return;
}
- diff_fill_sha1_info(one);
- diff_fill_sha1_info(two);
+ if (!p->is_bulkmove) {
+ diff_fill_sha1_info(one);
+ diff_fill_sha1_info(two);
+ }
if (!pgm &&
DIFF_FILE_VALID(one) && DIFF_FILE_VALID(two) &&
@@ -3557,9 +3582,10 @@ static void diff_flush_patch(struct diff_filepair *p, struct diff_options *o)
if (diff_unmodified_pair(p))
return;
- if ((DIFF_FILE_VALID(p->one) && S_ISDIR(p->one->mode)) ||
- (DIFF_FILE_VALID(p->two) && S_ISDIR(p->two->mode)))
- return; /* no tree diffs in patch format */
+ if (!p->is_bulkmove &&
+ ((DIFF_FILE_VALID(p->one) && S_ISDIR(p->one->mode)) ||
+ (DIFF_FILE_VALID(p->two) && S_ISDIR(p->two->mode))))
+ return; /* no tree diffs in patch format, except for bulk moves */
run_diff(p, o);
}
--
1.7.2.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 6/6] [WIP] Allow hiding renames of individual files involved in a directory rename.
2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
` (4 preceding siblings ...)
2010-12-09 21:38 ` [PATCH 5/6] Unified diff output format for bulk moves Yann Dirson
@ 2010-12-09 21:38 ` Yann Dirson
5 siblings, 0 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
To: git; +Cc: Yann Dirson, Yann Dirson
Once has identified groups of bulk-moved files, and then
the --hide-bulk-move-details flag hides those of the individual renames
which carry no other information (further name change, or content changes).
This makes it much easier to a human reader to spot content changes
in a commit that also moves a whole subtree.
Signed-off-by: Yann Dirson <ydirson@free.fr>
---
Documentation/diff-options.txt | 5 +++
diff.c | 7 ++++
diff.h | 3 ++
diffcore-rename.c | 68 ++++++++++++++++++++++++++++++++++++++--
diffcore.h | 1 +
5 files changed, 81 insertions(+), 3 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 186cd6f..cc4fe9c 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -249,6 +249,11 @@ endif::git-log[]
Detect bulk move of all files of a directory into a
different one.
+--hide-bulk-move-details::
+ Hide the individual files moves that make up a bulk move,
+ without hinding other changes to the involved files (contents
+ change, name change relative to the bulk move's destination).
+
-C[<n>]::
--detect-copies[=<n>]::
Detect copies as well as renames. See also `--find-copies-harder`.
diff --git a/diff.c b/diff.c
index 551fab7..7e21e30 100644
--- a/diff.c
+++ b/diff.c
@@ -3230,6 +3230,13 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
if (!options->detect_rename)
options->detect_rename = DIFF_DETECT_RENAME;
}
+ else if (!strcmp(arg, "--hide-bulk-move-details")) {
+ DIFF_OPT_SET(options, HIDE_DIR_RENAME_DETAILS);
+ if (!DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+ DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+ if (!options->detect_rename)
+ options->detect_rename = DIFF_DETECT_RENAME;
+ }
else if (!strcmp(arg, "--follow"))
DIFF_OPT_SET(options, FOLLOW_RENAMES);
else if (!strcmp(arg, "--color"))
diff --git a/diff.h b/diff.h
index 1e2506c..f28fed5 100644
--- a/diff.h
+++ b/diff.h
@@ -79,6 +79,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
#define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
#define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
#define DIFF_OPT_DETECT_BULK_MOVES (1 << 28)
+#define DIFF_OPT_HIDE_DIR_RENAME_DETAILS (1 << 29)
#define DIFF_OPT_TST(opts, flag) ((opts)->flags & DIFF_OPT_##flag)
#define DIFF_OPT_SET(opts, flag) ((opts)->flags |= DIFF_OPT_##flag)
@@ -268,6 +269,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
" try unchanged files as candidate for copy detection.\n" \
" --detect-bulk-moves\n" \
" detect moves of all files of a single directory.\n" \
+" --hide-bulk-move-details\n" \
+" hide individual files moves in a bulk move.\n" \
" -l<n> limit rename attempts up to <n> paths.\n" \
" -O<file> reorder diffs according to the <file>.\n" \
" -S<string> find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 44df490..3d4b863 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -435,6 +435,34 @@ declare_sorted_array_insertonly_elem(static, struct diff_bulk_rename, register_b
bulkmove_candidates_cmp, bulkmove_candidates_init);
/*
+ * Marks as such file_rename if it is made uninteresting by dir_rename.
+ * Returns -1 if the file_rename is outside of the range in which given
+ * renames concerned by dir_rename are to be found (ie. end of loop),
+ * 0 otherwise.
+ */
+static int maybe_mark_uninteresting(struct diff_rename_dst *file_rename,
+ struct diff_bulk_rename *dir_rename,
+ int one_len, int two_len)
+{
+ if (!file_rename->pair) /* file add */
+ return 0;
+ if (strncmp(file_rename->two->path,
+ dir_rename->two->path, two_len))
+ return -1;
+ if (strncmp(file_rename->pair->one->path,
+ dir_rename->one->path, one_len) ||
+ !basename_same(file_rename->pair->one, file_rename->two) ||
+ file_rename->pair->score != MAX_SCORE)
+ return 0;
+
+ file_rename->pair->uninteresting_rename = 1;
+ debug_bulkmove(("%s* -> %s* makes %s -> %s uninteresting\n",
+ dir_rename->one->path, dir_rename->two->path,
+ file_rename->pair->one->path, file_rename->two->path));
+ return 0;
+}
+
+/*
* Copy dirname of src into dst, suitable to append a filename without
* an additional "/".
* Only handles relative paths since there is no absolute path in a git repo.
@@ -642,11 +670,44 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
* Take all file renames recorded so far and check if they could cause
* a bulk move to be detected.
*/
-static void diffcore_bulk_moves(void)
+static void diffcore_bulk_moves(int opt_hide_renames)
{
int i;
for (i = 0; i < rename_dst_nr; i++)
check_one_bulk_move(rename_dst[i].pair);
+
+ if (opt_hide_renames) {
+ /* flag as "uninteresting" those candidates hidden by dir move */
+ struct diff_bulk_rename *candidate;
+ for (candidate = bulkmove_candidates;
+ candidate < bulkmove_candidates + bulkmove_candidates_nr;
+ candidate++) {
+ int two_idx, i, one_len, two_len;
+ struct diff_rename_dst *two_sample;
+ if (candidate->discarded)
+ continue;
+
+ /* bisect to any entry within candidate dst dir */
+ two_sample = locate_rename_dst_dir(candidate->two->path);
+ if (!two_sample) {
+ die("PANIC: %s candidate of rename not in target tree (from %s)\n",
+ candidate->two->path, candidate->one->path);
+ }
+ two_idx = two_sample - rename_dst;
+
+ /* now remove extraneous 100% files inside. */
+ one_len = strlen(candidate->one->path);
+ two_len = strlen(candidate->two->path);
+ for (i = two_idx; i < rename_dst_nr; i++)
+ if (maybe_mark_uninteresting(rename_dst+i, candidate,
+ one_len, two_len) < 0)
+ break;
+ for (i = two_idx-1; i >= 0; i--)
+ if (maybe_mark_uninteresting(rename_dst+i, candidate,
+ one_len, two_len) < 0)
+ break;
+ }
+ }
}
void diffcore_rename(struct diff_options *options)
@@ -818,7 +879,7 @@ void diffcore_rename(struct diff_options *options)
/* Now possibly factorize those renames and copies. */
if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
- diffcore_bulk_moves();
+ diffcore_bulk_moves(DIFF_OPT_TST(options, HIDE_DIR_RENAME_DETAILS));
DIFF_QUEUE_CLEAR(&outq);
@@ -853,7 +914,8 @@ void diffcore_rename(struct diff_options *options)
struct diff_rename_dst *dst =
locate_rename_dst(p->two);
if (dst && dst->pair) {
- diff_q(&outq, dst->pair);
+ if (!dst->pair->uninteresting_rename)
+ diff_q(&outq, dst->pair);
pair_to_free = p;
}
else
diff --git a/diffcore.h b/diffcore.h
index 6dab95b..a4eb8e1 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
unsigned broken_pair : 1;
unsigned renamed_pair : 1;
unsigned is_unmerged : 1;
+ unsigned uninteresting_rename : 1;
unsigned is_bulkmove : 1;
};
#define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
--
1.7.2.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-12-09 21:38 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
2010-12-09 21:38 ` [PATCH 1/6] Introduce debug_bulkmove() in diffcore-rename Yann Dirson
2010-12-09 21:38 ` [PATCH 2/6] Introduce bulk-move detection in diffcore Yann Dirson
2010-12-09 21:38 ` [PATCH 3/6] Raw diff output format for bulk moves Yann Dirson
2010-12-09 21:38 ` [PATCH 4/6] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
2010-12-09 21:38 ` [PATCH 5/6] Unified diff output format for bulk moves Yann Dirson
2010-12-09 21:38 ` [PATCH 6/6] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).