git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9] Detection of directory renames
@ 2010-12-09 21:38 Yann Dirson
  2010-12-09 21:38 ` [PATCH 1/6] Introduce debug_bulkmove() in diffcore-rename Yann Dirson
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
  To: git

This series requires:
- v6 of "generalizing sorted-array handling" series
- Jonathan Nieder's "compat: add memrchr()" patch,
  aka. mid:<20101015051750.GA21830@burratino>

Changes since v8:

* use of the new sorted-array.h, which gives the main patch a diet
* split of debug func into its own patch

Next iteration will be merging the bulk-rm detection into this series.
Interested eyes can get a preview at t/* branches on
http://repo.or.cz/w/git/ydirson.git

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/6] Introduce debug_bulkmove() in diffcore-rename.
  2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
@ 2010-12-09 21:38 ` Yann Dirson
  2010-12-09 21:38 ` [PATCH 2/6] Introduce bulk-move detection in diffcore Yann Dirson
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

This is an optional debug useful while developping new builk-* features.

Signed-off-by: Yann Dirson <ydirson@altern.org>
---
 diffcore-rename.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index f7afdeb..e16fdeb 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -7,6 +7,22 @@
 #include "hash.h"
 #include "sorted-array.h"
 
+#define DEBUG_BULKMOVE 0
+
+#if DEBUG_BULKMOVE
+#define debug_bulkmove(args) __debug_bulkmove args
+void __debug_bulkmove(const char *fmt, ...)
+{
+	va_list ap;
+	va_start(ap, fmt);
+	fprintf(stderr, "[DBG] ");
+	vfprintf(stderr, fmt, ap);
+	va_end(ap);
+}
+#else
+#define debug_bulkmove(args) do { /*nothing */ } while (0)
+#endif
+
 /* Table of rename/copy destinations */
 
 struct diff_rename_dst {
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/6] Introduce bulk-move detection in diffcore.
  2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
  2010-12-09 21:38 ` [PATCH 1/6] Introduce debug_bulkmove() in diffcore-rename Yann Dirson
@ 2010-12-09 21:38 ` Yann Dirson
  2010-12-09 21:38 ` [PATCH 3/6] Raw diff output format for bulk moves Yann Dirson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

This feature tries to group together files moving from and to
identical directories - a common case being directory renames.

This only adds the detection logic.  The output of raw diff is displayed
as "Rnnn a/ b/", and unified diff does not display them at all.  Output
formats will be refined later in the series.

It is implemented as a new pass in diffcore-rename, occuring after the
file renames get detected, grouping those renames looking like a move
of a full directory into some other place. It is activated by the new
--detect-bulk-moves diffcore flag.

Possible optimisations to this code include:
* avoid use of i_am_not_single by using a separate list
* use a more informative prefixcmp to avoid strcmp calls
  eg. in discard_if_outside()
* optimize for bulk insertions (avoid useless successive memmove's)

Other future developements to be made on top of this include:
* detect bulk removals (well, that one is rather a subset than a layer above),
  and possibly bulk additions
* detect bulk copies
* detect inexact bulk-moves/copies (where some files were not moved, or were
  moved to a different place) - problem of computing a similarity score
* display as such the special case of directory move/rename
* application of such new diffs: issue a conflict, or just a warning ?
* teach git-svn (and others ?) to make use of that flag
* handle new conflict type "bulk-move/add"
* detect "directory splits" as well
* use inexact dir renames to bump score of below-threshold renames
  from/to same locations
* support other types of bluk-grouping, like prefixes (see eg. kernel
  5d1e859c), and maybe config-specified patterns
* add yours here

This patch has been improved by the following contributions:
- Jonathan Nieder: better implementation of copy_dirname()
- Jonathan Nieder: portable implementation of memrchr() in another patch
- Junio C Hamano: split individual renames hiding under control of another flag
- Junio C Hamano: coding style issues
- Junio C Hamano: better examples
- Ævar Arnfjörð Bjarmason: Don't use C99 comments.
- Jonathan Nieder: just too many other helpful suggestions to list them all

Thanks-to: Jonathan Nieder <jrnieder@gmail.com>
Thanks-to: Junio C Hamano <gitster@pobox.com>
Thanks-to: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 Documentation/diff-options.txt |    4 +
 Documentation/gitdiffcore.txt  |   12 ++
 diff-lib.c                     |    6 +-
 diff.c                         |    5 +
 diff.h                         |    3 +
 diffcore-rename.c              |  322 ++++++++++++++++++++++++++++++++++++++--
 diffcore.h                     |    1 +
 tree-diff.c                    |    4 +-
 8 files changed, 340 insertions(+), 17 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f3e9538..186cd6f 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -245,6 +245,10 @@ endif::git-log[]
 	delete/add pair to be a rename if more than 90% of the file
 	hasn't changed.
 
+--detect-bulk-moves::
+	Detect bulk move of all files of a directory into a
+	different one.
+
 -C[<n>]::
 --detect-copies[=<n>]::
 	Detect copies as well as renames.  See also `--find-copies-harder`.
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 6af29a4..93111ac 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -175,6 +175,18 @@ the expense of making it slower.  Without `\--find-copies-harder`,
 'git diff-{asterisk}' commands can detect copies only if the file that was
 copied happened to have been modified in the same changeset.
 
+Bulk move of all files of a directory into a different one can get
+detected using the `\--detect-bulk-moves` option.  This adds an
+additional pass on top of the results of per-file rename detection.
+They are reported with NULL SHA1 id, in addition to the file renames:
+
+------------------------------------------------
+:040000 040000 0000000... 0000000... R100 foo/ bar/
+:100644 100644 0123456... 1234567... R090 foo/file0 bar/file3
+:100644 100644 2345678... 2345678... R100 foo/file1 bar/file1
+:100644 100644 3456789... 3456789... R100 foo/file2 bar/file2
+------------------------------------------------
+
 
 diffcore-merge-broken: For Putting "Complete Rewrites" Back Together
 --------------------------------------------------------------------
diff --git a/diff-lib.c b/diff-lib.c
index 392ce2b..5ec3ddc 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 						    ce_option, &dirty_submodule);
 		if (!changed && !dirty_submodule) {
 			ce_mark_uptodate(ce);
-			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+			    !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
 				continue;
 		}
 		oldmode = ce->ce_mode;
@@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs,
 
 	oldmode = old->ce_mode;
 	if (mode == oldmode && !hashcmp(sha1, old->sha1) && !dirty_submodule &&
-	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+	    !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
 		return 0;
 
 	diff_change(&revs->diffopt, oldmode, mode,
diff --git a/diff.c b/diff.c
index db5e844..d64ae44 100644
--- a/diff.c
+++ b/diff.c
@@ -3200,6 +3200,11 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_OPT_SET(options, REVERSE_DIFF);
 	else if (!strcmp(arg, "--find-copies-harder"))
 		DIFF_OPT_SET(options, FIND_COPIES_HARDER);
+	else if (!strcmp(arg, "--detect-bulk-moves")) {
+		DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+		if (!options->detect_rename)
+			options->detect_rename = DIFF_DETECT_RENAME;
+	}
 	else if (!strcmp(arg, "--follow"))
 		DIFF_OPT_SET(options, FOLLOW_RENAMES);
 	else if (!strcmp(arg, "--color"))
diff --git a/diff.h b/diff.h
index 0083d92..1e2506c 100644
--- a/diff.h
+++ b/diff.h
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
+#define DIFF_OPT_DETECT_BULK_MOVES  (1 << 28)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
@@ -265,6 +266,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
 "  -C            detect copies.\n" \
 "  --find-copies-harder\n" \
 "                try unchanged files as candidate for copy detection.\n" \
+"  --detect-bulk-moves\n" \
+"                detect moves of all files of a single directory.\n" \
 "  -l<n>         limit rename attempts up to <n> paths.\n" \
 "  -O<file>      reorder diffs according to the <file>.\n" \
 "  -S<string>    find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index e16fdeb..44df490 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -28,6 +28,7 @@ void __debug_bulkmove(const char *fmt, ...)
 struct diff_rename_dst {
 	struct diff_filespec *two;
 	struct diff_filepair *pair;
+	unsigned i_am_not_single:1; /* does not look for a match, only here to be looked at */
 };
 
 static int rename_dst_cmp(struct diff_filespec *ref_spec, struct diff_rename_dst *elem)
@@ -39,14 +40,23 @@ static void rename_dst_init(struct diff_rename_dst *elem, struct diff_filespec *
 	elem->two = alloc_filespec(ref_spec->path);
 	fill_filespec(elem->two, ref_spec->sha1, ref_spec->mode);
 	elem->pair = NULL;
+	elem->i_am_not_single = 0;
 }
 declare_sorted_array(static, struct diff_rename_dst, rename_dst);
 declare_sorted_array_search_elem(static, struct diff_rename_dst, locate_rename_dst,
 				 struct diff_filespec *,
 				 rename_dst, rename_dst_cmp);
-declare_sorted_array_insert_checkbool(static, struct diff_rename_dst, register_rename_dst,
-				      struct diff_filespec *, _gen_locate_rename_dst,
-				      rename_dst, rename_dst_cmp, rename_dst_init);
+declare_sorted_array_insert_elem(static, struct diff_rename_dst, register_rename_dst,
+				 struct diff_filespec *, _gen_locate_rename_dst,
+				 rename_dst, rename_dst_cmp, rename_dst_init);
+
+static int rename_dst_dircmp(const char *ref_dirname, struct diff_rename_dst *elem)
+{
+	// FIXME: calls strlen many times - but maybe the compiler optimizes ?
+	return strncmp(ref_dirname, elem->two->path, strlen(ref_dirname));
+}
+declare_sorted_array_search_elem(static, struct diff_rename_dst, locate_rename_dst_dir,
+				 const char *, rename_dst, rename_dst_dircmp);
 
 /* Table of rename/copy src files */
 
@@ -361,8 +371,11 @@ static int find_exact_renames(void)
 	for (i = 0; i < rename_src_nr; i++)
 		insert_file_table(&file_table, -1, i, rename_src[i].one);
 
-	for (i = 0; i < rename_dst_nr; i++)
+	for (i = 0; i < rename_dst_nr; i++) {
+		if (rename_dst[i].i_am_not_single)
+			continue;
 		insert_file_table(&file_table, 1, i, rename_dst[i].two);
+	}
 
 	/* Find the renames */
 	i = for_each_hash(&file_table, find_same_files);
@@ -389,6 +402,253 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
 		m[worst] = *o;
 }
 
+struct diff_bulk_rename {
+	struct diff_filespec *one;
+	struct diff_filespec *two;
+	int discarded;
+};
+
+struct diff_bulk_rename_candidate {
+	char one_path[PATH_MAX];
+	char two_path[PATH_MAX];
+};
+static int bulkmove_candidates_cmp(struct diff_bulk_rename_candidate *ref_candidate,
+				   struct diff_bulk_rename *elem)
+{
+	int cmp = strcmp(ref_candidate->one_path, elem->one->path);
+	if (!cmp)
+		cmp = strcmp(ref_candidate->two_path, elem->two->path);
+	return cmp;
+}
+static void bulkmove_candidates_init(struct diff_bulk_rename *elem,
+				     struct diff_bulk_rename_candidate *ref_candidate)
+{
+	elem->one = alloc_filespec(ref_candidate->one_path);
+	fill_filespec(elem->one, null_sha1, S_IFDIR);
+	elem->two = alloc_filespec(ref_candidate->two_path);
+	fill_filespec(elem->two, null_sha1, S_IFDIR);
+	elem->discarded = 0;
+}
+declare_sorted_array(static, struct diff_bulk_rename, bulkmove_candidates);
+declare_sorted_array_insertonly_elem(static, struct diff_bulk_rename, register_bulkmove_candidate,
+				     struct diff_bulk_rename_candidate *, bulkmove_candidates,
+				     bulkmove_candidates_cmp, bulkmove_candidates_init);
+
+/*
+ * Copy dirname of src into dst, suitable to append a filename without
+ * an additional "/".
+ * Only handles relative paths since there is no absolute path in a git repo.
+ * Writes "" when there is no "/" in src.
+ * May overwrite more chars than really needed, if src ends with a "/".
+ * Supports in-place modification of src by passing dst == src.
+ */
+static const char *copy_dirname(char *dst, const char *src)
+{
+	size_t len = strlen(src);
+	const char *slash;
+	char *end;
+
+	if (len > 0 && src[len - 1] == '/')
+		/* Trailing slash.  Ignore it. */
+		len--;
+
+	slash = memrchr(src, '/', len);
+	if (!slash) {
+		*dst = '\0';
+		return dst;
+	}
+
+	if (dst == src)
+		dst[slash - src + 1] = '\0';
+	else {
+		end = mempcpy(dst, src, slash - src + 1);
+		*end = '\0';
+	}
+	return dst;
+}
+
+// FIXME: leaks like hell.
+/* See if the fact that one_leftover exists under one_parent_path in
+ * dst tree should disqualify one_parent_path from bulkmove eligibility.
+ * Return 1 if it disqualifies, 0 if it is OK.
+ */
+static int dispatched_to_different_dirs(const char *one_parent_path)
+{
+	/* this might be a dir split, or files added
+	 * after the bulk move, or just an isolated
+	 * rename */
+	int two_idx, j, onep_len, maybe_dir_rename;
+	struct diff_rename_dst *one_leftover =
+		one_leftover = locate_rename_dst_dir(one_parent_path);
+
+	if (!one_leftover)
+		return 0;
+
+	/* try to see if it is a file added after the bulk move */
+	two_idx = one_leftover - rename_dst;
+	onep_len = strlen(one_parent_path);
+	maybe_dir_rename = 1;
+
+	/* check no leftover file was already here before */
+	for (j = two_idx; j < rename_dst_nr; j++) {
+		if (strncmp(rename_dst[j].two->path,
+			    one_parent_path, onep_len))
+			break; /* exhausted directory in this direction */
+		debug_bulkmove(("leftover file %s in %s\n",
+				rename_dst[j].two->path, one_parent_path));
+		if (rename_dst[j].i_am_not_single || /* those were already here */
+		    (rename_dst[j].pair &&
+		     !strncmp(rename_dst[j].pair->one->path,
+			      one_parent_path, onep_len) && /* renamed from here */
+		     strncmp(rename_dst[j].two->path,
+			     one_parent_path, onep_len))) { /* not to a subdir */
+			maybe_dir_rename = 0;
+			debug_bulkmove(("... tells not a bulk move\n"));
+			break;
+		}
+		debug_bulkmove(("... not believed to prevent bulk move\n"));
+	}
+	if (!maybe_dir_rename)
+		return 1;
+	/* try the other direction (dup code) */
+	for (j = two_idx-1; j >= 0; j--) {
+		if (strncmp(rename_dst[j].two->path,
+			    one_parent_path, onep_len))
+			break; /* exhausted directory in this direction */
+		debug_bulkmove(("leftover file %s in '%s'\n",
+				rename_dst[j].two->path, one_parent_path));
+		if (rename_dst[j].i_am_not_single || /* those were already here */
+		    (rename_dst[j].pair &&
+		     !strncmp(rename_dst[j].pair->one->path,
+			      one_parent_path, onep_len) && /* renamed from here */
+		     strncmp(rename_dst[j].two->path,
+			     one_parent_path, onep_len))) { /* not to a subdir */
+			maybe_dir_rename = 0;
+			debug_bulkmove(("... tells not a bulk move\n"));
+			break;
+		}
+		debug_bulkmove(("... not believed to prevent bulk move\n"));
+	}
+	if (!maybe_dir_rename)
+		return 1;
+
+	/* Here we are in the case where a directory
+	 * content was completely moved, but files
+	 * were added to it afterwards.  Proceed as
+	 * for a simple bulk move. */
+	return 0;
+}
+
+/*
+ * Assumes candidate->one is a subdir of seen->one, mark 'seen' as
+ * discarded if candidate->two is outside seen->two.  Also mark
+ * 'candidate' itself as discarded if the conflict implies so.
+ *
+ * Return 1 if 'seen' was discarded
+ */
+static int discard_if_outside(struct diff_bulk_rename *candidate,
+			      struct diff_bulk_rename *seen)
+{
+	if (!prefixcmp(candidate->two->path, seen->two->path)) {
+		debug_bulkmove((" 'dstpair' conforts 'seen'\n"));
+		return 0;
+	}
+
+	debug_bulkmove(("discarding %s -> %s from bulk moves (split into %s and %s)\n",
+			seen->one->path, seen->two->path,
+			candidate->two->path, seen->two->path));
+	seen->discarded = 1;
+	/* Need to discard dstpair as well, unless moving from
+	 * a strict subdir of seen->one or to a strict subdir
+	 * of seen->two */
+	if (!strcmp(seen->one->path, candidate->one->path) &&
+	    prefixcmp(seen->two->path, candidate->two->path)) {
+		debug_bulkmove(("... and not adding self\n"));
+		candidate->discarded = 1;
+	}
+	return 1;
+}
+
+/*
+ * Check if the rename specified by "dstpair" could cause a
+ * bulk move to be detected, record it in bulkmove_candidates if yes.
+ */
+static void check_one_bulk_move(struct diff_filepair *dstpair)
+{
+	struct diff_bulk_rename_candidate rawcandidate;
+
+	/* genuine new files (or believed to be so) */
+	if (!dstpair)
+		return;
+	/* dummy renames used by copy detection */
+	if (!strcmp(dstpair->one->path, dstpair->two->path))
+		return;
+
+	copy_dirname(rawcandidate.one_path, dstpair->one->path);
+	copy_dirname(rawcandidate.two_path, dstpair->two->path);
+
+	/* simple rename with no directory change */
+	if (!strcmp(rawcandidate.one_path, rawcandidate.two_path))
+		return;
+
+	debug_bulkmove(("[] %s -> %s ?\n", dstpair->one->path, dstpair->two->path));
+
+	/* loop up rawcandidate.one_path over successive parents */
+	// FIXME: also loop over rawcandidate.two_path prefixes
+	do {
+		struct diff_bulk_rename *seen;
+		int old_nr = bulkmove_candidates_nr;
+		struct diff_bulk_rename *candidate =
+			register_bulkmove_candidate(&rawcandidate);
+		debug_bulkmove(("[[]] %s ...\n", rawcandidate.one_path));
+		if (old_nr == bulkmove_candidates_nr) {
+			debug_bulkmove((" already seen\n"));
+			return;
+		}
+
+		/* After this commit, are there any files still under rawcandidate.one_path ?
+		 * Any file left would disqualifies this dir for bulk move.
+		 */
+		if (dispatched_to_different_dirs(rawcandidate.one_path)) {
+			// FIXME: check overlap with discard_if_outside()
+			candidate->discarded = 1;
+			return;
+		}
+
+		/* walk up for rawcandidate.one_path prefixes */
+		for (seen = candidate-1; (seen >= bulkmove_candidates) &&
+			     !prefixcmp(rawcandidate.one_path, seen->one->path); seen--) {
+			debug_bulkmove((" ? %s -> %s\n", seen->one->path, seen->two->path));
+			/* subdir of "seen" dest dir ? */
+			if (discard_if_outside(candidate, seen))
+				continue;
+		}
+		/* look down for other moves from rawcandidate.one_path */
+		seen = candidate + 1;
+		if (seen != bulkmove_candidates + bulkmove_candidates_nr &&
+		    !strcmp(rawcandidate.one_path, seen->one->path)) {
+			debug_bulkmove((" ? %s -> %s (2)\n", seen->one->path, seen->two->path));
+			/* subdir of "seen" dest dir ? */
+			if (discard_if_outside(candidate, seen))
+				continue;
+		}
+
+		/* next parent if any */
+		copy_dirname(rawcandidate.one_path, rawcandidate.one_path);
+	} while (*rawcandidate.one_path);
+}
+
+/*
+ * Take all file renames recorded so far and check if they could cause
+ * a bulk move to be detected.
+ */
+static void diffcore_bulk_moves(void)
+{
+	int i;
+	for (i = 0; i < rename_dst_nr; i++)
+		check_one_bulk_move(rename_dst[i].pair);
+}
+
 void diffcore_rename(struct diff_options *options)
 {
 	int detect_rename = options->detect_rename;
@@ -399,6 +659,7 @@ void diffcore_rename(struct diff_options *options)
 	struct diff_score *mx;
 	int i, j, rename_count;
 	int num_create, num_src, dst_cnt;
+	struct diff_bulk_rename *candidate;
 
 	if (!minimum_score)
 		minimum_score = DEFAULT_RENAME_SCORE;
@@ -413,8 +674,7 @@ void diffcore_rename(struct diff_options *options)
 				continue; /* not interested */
 			else
 				register_rename_dst(p->two);
-		}
-		else if (!DIFF_FILE_VALID(p->two)) {
+		} else if (!DIFF_FILE_VALID(p->two)) {
 			/*
 			 * If the source is a broken "delete", and
 			 * they did not really want to get broken,
@@ -425,14 +685,23 @@ void diffcore_rename(struct diff_options *options)
 			if (p->broken_pair && !p->score)
 				p->one->rename_used++;
 			register_rename_src(p);
-		}
-		else if (detect_rename == DIFF_DETECT_COPY) {
-			/*
-			 * Increment the "rename_used" score by
-			 * one, to indicate ourselves as a user.
-			 */
-			p->one->rename_used++;
-			register_rename_src(p);
+		} else {
+			if (detect_rename == DIFF_DETECT_COPY) {
+				/*
+				 * Increment the "rename_used" score by
+				 * one, to indicate ourselves as a user.
+				 */
+				p->one->rename_used++;
+				register_rename_src(p);
+			}
+			if (DIFF_OPT_TST(options, DETECT_BULK_MOVES)) {
+				/* similarly, bulk move detection needs to
+				 * see all files from second tree, but we don't
+				 * want them to be matched against single sources.
+				 */
+				// FIXME: check interaction with --find-copies-harder
+				register_rename_dst(p->two)->i_am_not_single = 1;
+			}
 		}
 	}
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
@@ -484,6 +753,8 @@ void diffcore_rename(struct diff_options *options)
 
 		if (rename_dst[i].pair)
 			continue; /* dealt with exact match already. */
+		if (rename_dst[i].i_am_not_single)
+			continue; /* not looking for a match. */
 
 		m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST];
 		for (j = 0; j < NUM_CANDIDATE_PER_DST; j++)
@@ -544,7 +815,30 @@ void diffcore_rename(struct diff_options *options)
 	/* At this point, we have found some renames and copies and they
 	 * are recorded in rename_dst.  The original list is still in *q.
 	 */
+
+	/* Now possibly factorize those renames and copies. */
+	if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+		diffcore_bulk_moves();
+
 	DIFF_QUEUE_CLEAR(&outq);
+
+	/* Now turn non-discarded bulkmove_candidates into real renames */
+	for (candidate = bulkmove_candidates;
+	     candidate < bulkmove_candidates + bulkmove_candidates_nr; candidate++) {
+		struct diff_filepair* pair;
+		if (candidate->discarded)
+			continue;
+		/* visualize toplevel dir if needed */
+		if (!*candidate->one->path)
+			candidate->one->path = "./";
+		if (!*candidate->two->path)
+			candidate->two->path = "./";
+		pair = diff_queue(&outq, candidate->one, candidate->two);
+		pair->score = MAX_SCORE;
+		pair->renamed_pair = 1;
+		pair->is_bulkmove = 1;
+	}
+
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		struct diff_filepair *pair_to_free = NULL;
diff --git a/diffcore.h b/diffcore.h
index b8f1fde..6dab95b 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
 	unsigned broken_pair : 1;
 	unsigned renamed_pair : 1;
 	unsigned is_unmerged : 1;
+	unsigned is_bulkmove : 1;
 };
 #define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
 
diff --git a/tree-diff.c b/tree-diff.c
index 12c9a88..89cedd4 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -49,7 +49,9 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
 		show_entry(opt, "+", t2, base, baselen);
 		return 1;
 	}
-	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) && mode1 == mode2)
+	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) &&
+	    !DIFF_OPT_TST(opt, DETECT_BULK_MOVES) &&
+	    !hashcmp(sha1, sha2) && mode1 == mode2)
 		return 0;
 
 	/*
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/6] Raw diff output format for bulk moves.
  2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
  2010-12-09 21:38 ` [PATCH 1/6] Introduce debug_bulkmove() in diffcore-rename Yann Dirson
  2010-12-09 21:38 ` [PATCH 2/6] Introduce bulk-move detection in diffcore Yann Dirson
@ 2010-12-09 21:38 ` Yann Dirson
  2010-12-09 21:38 ` [PATCH 4/6] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

To distinguish the general bulk-move case (where the destination
directory was pre-existing) from the directory-rename case (where it
was not), the output of raw diff is displayed as "Rnnn a/* b/".  Those
cannot be confused with renames of files named "whatever/*" with a
literal star character, from the full-zero SHA1's.

Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 Documentation/gitdiffcore.txt |    2 +-
 diff.c                        |    9 ++++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 93111ac..2538dc0 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -181,7 +181,7 @@ additional pass on top of the results of per-file rename detection.
 They are reported with NULL SHA1 id, in addition to the file renames:
 
 ------------------------------------------------
-:040000 040000 0000000... 0000000... R100 foo/ bar/
+:040000 040000 0000000... 0000000... R100 foo/* bar/
 :100644 100644 0123456... 1234567... R090 foo/file0 bar/file3
 :100644 100644 2345678... 2345678... R100 foo/file1 bar/file1
 :100644 100644 3456789... 3456789... R100 foo/file2 bar/file2
diff --git a/diff.c b/diff.c
index d64ae44..0694d7f 100644
--- a/diff.c
+++ b/diff.c
@@ -3499,7 +3499,14 @@ static void diff_flush_raw(struct diff_filepair *p, struct diff_options *opt)
 	if (p->status == DIFF_STATUS_COPIED ||
 	    p->status == DIFF_STATUS_RENAMED) {
 		const char *name_a, *name_b;
-		name_a = p->one->path;
+		if (p->is_bulkmove) {
+			/* append "*" to the first dirname */
+			char buf[PATH_MAX];
+			char *next = memccpy(buf, p->one->path, '\0', PATH_MAX);
+			next[-1] = '*'; *next = '\0';
+			name_a = buf;
+		} else
+			name_a = p->one->path;
 		name_b = p->two->path;
 		strip_prefix(opt->prefix_length, &name_a, &name_b);
 		write_name_quoted(name_a, opt->file, inter_name_termination);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/6] Add testcases for the --detect-bulk-moves diffcore flag.
  2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
                   ` (2 preceding siblings ...)
  2010-12-09 21:38 ` [PATCH 3/6] Raw diff output format for bulk moves Yann Dirson
@ 2010-12-09 21:38 ` Yann Dirson
  2010-12-09 21:38 ` [PATCH 5/6] Unified diff output format for bulk moves Yann Dirson
  2010-12-09 21:38 ` [PATCH 6/6] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson
  5 siblings, 0 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

This notably includes a couple of tests for cases known not to be
working correctly yet.

This patch has been improved by the following contributions:
- Jonathan Nieder: reworked style of test script
- Jonathan Nieder: use "git commit" in test instead of only plumbing,
  and use test_tick
- Sverre Rabbelier: anonymize hashes

Thanks-to: Jonathan Nieder <jrnieder@gmail.com>
Thanks-to: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 t/t4046-diff-bulk-move.sh |  296 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 296 insertions(+), 0 deletions(-)
 create mode 100755 t/t4046-diff-bulk-move.sh

diff --git a/t/t4046-diff-bulk-move.sh b/t/t4046-diff-bulk-move.sh
new file mode 100755
index 0000000..4b1c78e
--- /dev/null
+++ b/t/t4046-diff-bulk-move.sh
@@ -0,0 +1,296 @@
+#!/bin/sh
+#
+# Copyright (c) 2008,2010 Yann Dirson
+# Copyright (c) 2005 Junio C Hamano
+#
+
+# TODO for dir renames:
+# * two dirs or more moving all their files to a single dir
+# * simultaneous bulkmove and rename
+
+test_description='Test rename factorization in diff engine.
+
+'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
+
+test_expect_success 'setup' '
+	git commit --allow-empty -m "original empty commit"
+
+	mkdir a &&
+	printf "Line %s\n" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >a/path0 &&
+	sed <a/path0 >a/path1 s/Line/Record/ &&
+	sed <a/path0 >a/path2 s/Line/Stuff/ &&
+	sed <a/path0 >a/path3 s/Line/Blurb/ &&
+
+	git update-index --add a/path* &&
+	test_tick &&
+	git commit -m "original set of files" &&
+
+	: rename the directory &&
+	git mv a b
+'
+test_expect_success 'diff-index --detect-bulk-moves after directory move.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/*	b/
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:100644 100644 X X R#	a/path2	b/path2
+	:100644 100644 X X R#	a/path3	b/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup non-100% rename' '
+	echo "Line 16" >>b/path0 &&
+	git mv b/path2 b/2path &&
+	git rm -f b/path3 &&
+	echo anything >b/path100 &&
+	git add b/path100
+'
+test_expect_success 'diff-index --detect-bulk-moves after content changes.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/*	b/
+	:100644 000000 X X D#	a/path3
+	:100644 100644 X X R#	a/path2	b/2path
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:000000 100644 X X A#	b/path100
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup bulk move that is not directory move' '
+	git reset -q --hard &&
+
+	mkdir c &&
+	(
+		for i in 0 1 2; do
+			cp a/path$i c/apath$i || exit
+		done
+	) &&
+	git update-index --add c/apath* &&
+	test_tick &&
+	git commit -m "first set of changes" &&
+
+	git mv c/* a/
+'
+test_expect_success 'diff-index --detect-bulk-moves without full-dir rename.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	a/
+	:100644 100644 X X R#	c/apath0	a/apath0
+	:100644 100644 X X R#	c/apath1	a/apath1
+	:100644 100644 X X R#	c/apath2	a/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup bulk move with new file in source dir' '
+	echo > c/anotherpath "How much wood?" &&
+	git update-index --add c/another*
+'
+test_expect_success 'diff-index --detect-bulk-moves with new file in source dir.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	a/
+	:100644 100644 X X R#	c/apath0	a/apath0
+	:100644 100644 X X R#	c/apath1	a/apath1
+	:100644 100644 X X R#	c/apath2	a/apath2
+	:000000 100644 X X A#	c/anotherpath
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup bulk move with interfering copy' '
+	rm c/anotherpath &&
+	git update-index --remove c/anotherpath &&
+	mkdir b &&
+	cp a/apath0 b/apath9 &&
+	echo >> a/apath0 "more" &&
+	git update-index --add a/apath0 b/apath9
+'
+# scores select the "wrong" one as "moved" (only a suboptimal detection)
+test_expect_failure 'diff-index --detect-bulk-moves with interfering copy.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	a/
+	:100644 100644 X X R#	c/apath0	a/apath0
+	:100644 100644 X X R#	c/apath1	a/apath1
+	:100644 100644 X X R#	c/apath2	a/apath2
+	:100644 100644 X X C#	c/apath0	b/apath9
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup bulk move to toplevel' '
+	git reset -q --hard &&
+	git mv c/* .
+'
+test_expect_success 'diff-index --detect-bulk-moves bulk move to toplevel.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	./
+	:100644 100644 X X R#	c/apath0	apath0
+	:100644 100644 X X R#	c/apath1	apath1
+	:100644 100644 X X R#	c/apath2	apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup move including a subdir, with some content changes' '
+	git reset -q --hard &&
+	mv c a/ &&
+	git update-index --add --remove a/c/* c/apath0 c/apath1 c/apath2 &&
+	test_tick &&
+	git commit -m "move as subdir" &&
+
+	git mv a b &&
+	echo foo >>b/c/apath0 &&
+	git update-index --add b/c/apath*
+'
+test_expect_success 'diff-index --detect-bulk-moves on a move including a subdir.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/*	b/
+	:040000 040000 X X R#	a/c/*	b/c/
+	:100644 100644 X X R#	a/c/apath0	b/c/apath0
+	:100644 100644 X X R#	a/c/apath1	b/c/apath1
+	:100644 100644 X X R#	a/c/apath2	b/c/apath2
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:100644 100644 X X R#	a/path2	b/path2
+	:100644 100644 X X R#	a/path3	b/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup move of only a subdir' '
+	git reset -q --hard &&
+	: rename a subdirectory of a/. &&
+	git mv a/c a/d
+'
+test_expect_success 'moving a subdir only' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/c/*	a/d/
+	:100644 100644 X X R#	a/c/apath0	a/d/apath0
+	:100644 100644 X X R#	a/c/apath1	a/d/apath1
+	:100644 100644 X X R#	a/c/apath2	a/d/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup move without a subdir' '
+	git reset -q --hard &&
+	mkdir b &&
+	: rename files in the directory but not subdir. &&
+	git mv a/path* b/
+'
+test_expect_success 'moving files but not subdirs is not mistaken for dir move' '
+	cat >expected <<-EOF &&
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:100644 100644 X X R#	a/path2	b/path2
+	:100644 100644 X X R#	a/path3	b/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup move of files and subdirs to different places' '
+	git reset -q --hard &&
+	git mv a/c b &&
+	git mv a d
+'
+test_expect_success 'moving subdirs into one dir and files into another is not mistaken for dir move' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/c/*	b/
+	:100644 100644 X X R#	a/c/apath0	b/apath0
+	:100644 100644 X X R#	a/c/apath1	b/apath1
+	:100644 100644 X X R#	a/c/apath2	b/apath2
+	:100644 100644 X X R#	a/path0	d/path0
+	:100644 100644 X X R#	a/path1	d/path1
+	:100644 100644 X X R#	a/path2	d/path2
+	:100644 100644 X X R#	a/path3	d/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+# the same with different ordering
+test_expect_success 'setup move of files and subdirs to different places' '
+	git mv d 0
+'
+test_expect_success 'moving subdirs into one dir and files into another is not mistaken for dir move' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/c/*	b/
+	:100644 100644 X X R#	a/path0	0/path0
+	:100644 100644 X X R#	a/path1	0/path1
+	:100644 100644 X X R#	a/path2	0/path2
+	:100644 100644 X X R#	a/path3	0/path3
+	:100644 100644 X X R#	a/c/apath0	b/apath0
+	:100644 100644 X X R#	a/c/apath1	b/apath1
+	:100644 100644 X X R#	a/c/apath2	b/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup move of dir with only subdirs' '
+	git reset -q --hard &&
+	mkdir a/b &&
+	mv a/path* a/b/ &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/b/path* &&
+	test_tick &&
+	git commit -m "move all toplevel files down one level" &&
+
+	git mv a z
+'
+# TODO: only a suboptimal non-detection
+test_expect_failure 'moving a dir with no direct children files' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/*	z/
+	:040000 040000 X X R#	a/b/*	z/b/
+	:040000 040000 X X R#	a/c/*	z/c/
+	:100644 100644 X X R#	a/b/path0	z/b/path0
+	:100644 100644 X X R#	a/b/path1	z/b/path1
+	:100644 100644 X X R#	a/b/path2	z/b/path2
+	:100644 100644 X X R#	a/b/path3	z/b/path3
+	:100644 100644 X X R#	a/c/apath0	z/c/apath0
+	:100644 100644 X X R#	a/c/apath1	z/c/apath1
+	:100644 100644 X X R#	a/c/apath2	z/c/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+# now test moving all files from toplevel into subdir (does not hides file moves) (needs consensus on syntax)
+# Note: this is a special case of move of a dir into one of its own subdirs, which in
+# turn is a variant of new files/dirs being added into a dir after all its contents
+# are moved away
+
+test_expect_success 'setup move from toplevel to subdir' '
+	git reset -q --hard HEAD~3 &&
+	mv a/* . &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 path* &&
+	test_tick &&
+	git commit -m "move all files to toplevel" &&
+
+	mkdir z &&
+	git mv path* z/
+'
+test_expect_success '--detect-bulk-moves everything from toplevel.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	./*	z/
+	:100644 100644 X X R#	path0	z/path0
+	:100644 100644 X X R#	path1	z/path1
+	:100644 100644 X X R#	path2	z/path2
+	:100644 100644 X X R#	path3	z/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_done
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/6] Unified diff output format for bulk moves.
  2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
                   ` (3 preceding siblings ...)
  2010-12-09 21:38 ` [PATCH 4/6] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
@ 2010-12-09 21:38 ` Yann Dirson
  2010-12-09 21:38 ` [PATCH 6/6] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson
  5 siblings, 0 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

The output produced is as shown below.

diff --git-detect-bulk-moves mips/nxp/pnx833x/common/ mips/pnx833x/common/
bulk move with similarity index 100%
bulk move from mips/nxp/pnx833x/common/
bulk move to mips/pnx833x/common/

Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 Documentation/diff-generate-patch.txt |   19 +++++++++++++
 diff.c                                |   48 +++++++++++++++++++++++++-------
 2 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/Documentation/diff-generate-patch.txt b/Documentation/diff-generate-patch.txt
index 3ac2bea..367e859 100644
--- a/Documentation/diff-generate-patch.txt
+++ b/Documentation/diff-generate-patch.txt
@@ -71,6 +71,25 @@ separate lines indicate the old and the new mode.
       rename to a
 
 
+bulk move entries
+-----------------
+
+When a bulk move is detected, a special block is output in addition to
+the renames that constitute the bulk move, looking like this:
+
+       diff --git-detect-bulk-moves a/dir1/ b/dir2/
+
+It is essentially similar to the standard diff header, with a special
+syntax showing we are describing a difference between two directories,
+and not a change to be applied as-is to a file.
+
+It is followed by three specific extended header lines:
+
+       bulk move with similarity index <number>
+       bulk move from <dir1>/
+       bulk move to <dir2>/
+
+
 combined diff format
 --------------------
 
diff --git a/diff.c b/diff.c
index 0694d7f..551fab7 100644
--- a/diff.c
+++ b/diff.c
@@ -2668,6 +2668,7 @@ static void run_diff_cmd(const char *pgm,
 	const char *xfrm_msg = NULL;
 	int complete_rewrite = (p->status == DIFF_STATUS_MODIFIED) && p->score;
 	int must_show_header = 0;
+	int use_color;
 
 	if (!DIFF_OPT_TST(o, ALLOW_EXTERNAL))
 		pgm = NULL;
@@ -2677,14 +2678,36 @@ static void run_diff_cmd(const char *pgm,
 			pgm = drv->external;
 	}
 
+	/*
+	 * don't use colors when the header is intended for an
+	 * external diff driver
+	 */
+	use_color = DIFF_OPT_TST(o, COLOR_DIFF) && !pgm;
+
+	if (p->is_bulkmove) {
+		const char *set = diff_get_color(use_color, DIFF_METAINFO);
+		const char *reset = diff_get_color(use_color, DIFF_RESET);
+		struct strbuf *msgbuf;
+		char *line_prefix = "";
+
+		if (o->output_prefix) {
+			msgbuf = o->output_prefix(o, o->output_prefix_data);
+			line_prefix = msgbuf->buf;
+		}
+		fprintf(o->file, "%s%sdiff --git-detect-bulk-moves %s %s%s\n",
+			line_prefix, set, one->path, two->path, reset);
+		fprintf(o->file, "%s%sbulk move with similarity index %d%%%s\n",
+			line_prefix, set, similarity_index(p), reset);
+		fprintf(o->file, "%s%sbulk move from %s%s\n",
+			line_prefix, set, one->path, reset);
+		fprintf(o->file, "%s%sbulk move to %s%s\n",
+			line_prefix, set, two->path, reset);
+		return;
+	}
+
 	if (msg) {
-		/*
-		 * don't use colors when the header is intended for an
-		 * external diff driver
-		 */
 		fill_metainfo(msg, name, other, one, two, o, p,
-			      &must_show_header,
-			      DIFF_OPT_TST(o, COLOR_DIFF) && !pgm);
+			      &must_show_header, use_color);
 		xfrm_msg = msg->len ? msg->buf : NULL;
 	}
 
@@ -2757,8 +2780,10 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o)
 		return;
 	}
 
-	diff_fill_sha1_info(one);
-	diff_fill_sha1_info(two);
+	if (!p->is_bulkmove) {
+		diff_fill_sha1_info(one);
+		diff_fill_sha1_info(two);
+	}
 
 	if (!pgm &&
 	    DIFF_FILE_VALID(one) && DIFF_FILE_VALID(two) &&
@@ -3557,9 +3582,10 @@ static void diff_flush_patch(struct diff_filepair *p, struct diff_options *o)
 	if (diff_unmodified_pair(p))
 		return;
 
-	if ((DIFF_FILE_VALID(p->one) && S_ISDIR(p->one->mode)) ||
-	    (DIFF_FILE_VALID(p->two) && S_ISDIR(p->two->mode)))
-		return; /* no tree diffs in patch format */
+	if (!p->is_bulkmove &&
+	    ((DIFF_FILE_VALID(p->one) && S_ISDIR(p->one->mode)) ||
+	     (DIFF_FILE_VALID(p->two) && S_ISDIR(p->two->mode))))
+		return; /* no tree diffs in patch format, except for bulk moves */
 
 	run_diff(p, o);
 }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 6/6] [WIP] Allow hiding renames of individual files involved in a directory rename.
  2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
                   ` (4 preceding siblings ...)
  2010-12-09 21:38 ` [PATCH 5/6] Unified diff output format for bulk moves Yann Dirson
@ 2010-12-09 21:38 ` Yann Dirson
  5 siblings, 0 replies; 7+ messages in thread
From: Yann Dirson @ 2010-12-09 21:38 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

Once has identified groups of bulk-moved files, and then
the --hide-bulk-move-details flag hides those of the individual renames
which carry no other information (further name change, or content changes).

This makes it much easier to a human reader to spot content changes
in a commit that also moves a whole subtree.

Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 Documentation/diff-options.txt |    5 +++
 diff.c                         |    7 ++++
 diff.h                         |    3 ++
 diffcore-rename.c              |   68 ++++++++++++++++++++++++++++++++++++++--
 diffcore.h                     |    1 +
 5 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 186cd6f..cc4fe9c 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -249,6 +249,11 @@ endif::git-log[]
 	Detect bulk move of all files of a directory into a
 	different one.
 
+--hide-bulk-move-details::
+	Hide the individual files moves that make up a bulk move,
+	without hinding other changes to the involved files (contents
+	change, name change relative to the bulk move's destination).
+
 -C[<n>]::
 --detect-copies[=<n>]::
 	Detect copies as well as renames.  See also `--find-copies-harder`.
diff --git a/diff.c b/diff.c
index 551fab7..7e21e30 100644
--- a/diff.c
+++ b/diff.c
@@ -3230,6 +3230,13 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		if (!options->detect_rename)
 			options->detect_rename = DIFF_DETECT_RENAME;
 	}
+	else if (!strcmp(arg, "--hide-bulk-move-details")) {
+		DIFF_OPT_SET(options, HIDE_DIR_RENAME_DETAILS);
+		if (!DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+			DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+		if (!options->detect_rename)
+			options->detect_rename = DIFF_DETECT_RENAME;
+	}
 	else if (!strcmp(arg, "--follow"))
 		DIFF_OPT_SET(options, FOLLOW_RENAMES);
 	else if (!strcmp(arg, "--color"))
diff --git a/diff.h b/diff.h
index 1e2506c..f28fed5 100644
--- a/diff.h
+++ b/diff.h
@@ -79,6 +79,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
 #define DIFF_OPT_DETECT_BULK_MOVES  (1 << 28)
+#define DIFF_OPT_HIDE_DIR_RENAME_DETAILS (1 << 29)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
@@ -268,6 +269,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
 "                try unchanged files as candidate for copy detection.\n" \
 "  --detect-bulk-moves\n" \
 "                detect moves of all files of a single directory.\n" \
+"  --hide-bulk-move-details\n" \
+"                hide individual files moves in a bulk move.\n" \
 "  -l<n>         limit rename attempts up to <n> paths.\n" \
 "  -O<file>      reorder diffs according to the <file>.\n" \
 "  -S<string>    find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 44df490..3d4b863 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -435,6 +435,34 @@ declare_sorted_array_insertonly_elem(static, struct diff_bulk_rename, register_b
 				     bulkmove_candidates_cmp, bulkmove_candidates_init);
 
 /*
+ * Marks as such file_rename if it is made uninteresting by dir_rename.
+ * Returns -1 if the file_rename is outside of the range in which given
+ * renames concerned by dir_rename are to be found (ie. end of loop),
+ * 0 otherwise.
+ */
+static int maybe_mark_uninteresting(struct diff_rename_dst *file_rename,
+				    struct diff_bulk_rename *dir_rename,
+				    int one_len, int two_len)
+{
+	if (!file_rename->pair) /* file add */
+		return 0;
+	if (strncmp(file_rename->two->path,
+		    dir_rename->two->path, two_len))
+		return -1;
+	if (strncmp(file_rename->pair->one->path,
+		    dir_rename->one->path, one_len) ||
+	    !basename_same(file_rename->pair->one, file_rename->two) ||
+	    file_rename->pair->score != MAX_SCORE)
+		return 0;
+
+	file_rename->pair->uninteresting_rename = 1;
+	debug_bulkmove(("%s* -> %s* makes %s -> %s uninteresting\n",
+			dir_rename->one->path, dir_rename->two->path,
+			file_rename->pair->one->path, file_rename->two->path));
+	return 0;
+}
+
+/*
  * Copy dirname of src into dst, suitable to append a filename without
  * an additional "/".
  * Only handles relative paths since there is no absolute path in a git repo.
@@ -642,11 +670,44 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
  * Take all file renames recorded so far and check if they could cause
  * a bulk move to be detected.
  */
-static void diffcore_bulk_moves(void)
+static void diffcore_bulk_moves(int opt_hide_renames)
 {
 	int i;
 	for (i = 0; i < rename_dst_nr; i++)
 		check_one_bulk_move(rename_dst[i].pair);
+
+	if (opt_hide_renames) {
+		/* flag as "uninteresting" those candidates hidden by dir move */
+		struct diff_bulk_rename *candidate;
+		for (candidate = bulkmove_candidates;
+		     candidate < bulkmove_candidates + bulkmove_candidates_nr;
+		     candidate++) {
+			int two_idx, i, one_len, two_len;
+			struct diff_rename_dst *two_sample;
+			if (candidate->discarded)
+				continue;
+
+			/* bisect to any entry within candidate dst dir */
+			two_sample = locate_rename_dst_dir(candidate->two->path);
+			if (!two_sample) {
+				die("PANIC: %s candidate of rename not in target tree (from %s)\n",
+				    candidate->two->path, candidate->one->path);
+			}
+			two_idx = two_sample - rename_dst;
+
+			/* now remove extraneous 100% files inside. */
+			one_len = strlen(candidate->one->path);
+			two_len = strlen(candidate->two->path);
+			for (i = two_idx; i < rename_dst_nr; i++)
+				if (maybe_mark_uninteresting(rename_dst+i, candidate,
+							     one_len, two_len) < 0)
+					break;
+			for (i = two_idx-1; i >= 0; i--)
+				if (maybe_mark_uninteresting(rename_dst+i, candidate,
+							     one_len, two_len) < 0)
+					break;
+		}
+	}
 }
 
 void diffcore_rename(struct diff_options *options)
@@ -818,7 +879,7 @@ void diffcore_rename(struct diff_options *options)
 
 	/* Now possibly factorize those renames and copies. */
 	if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
-		diffcore_bulk_moves();
+		diffcore_bulk_moves(DIFF_OPT_TST(options, HIDE_DIR_RENAME_DETAILS));
 
 	DIFF_QUEUE_CLEAR(&outq);
 
@@ -853,7 +914,8 @@ void diffcore_rename(struct diff_options *options)
 			struct diff_rename_dst *dst =
 				locate_rename_dst(p->two);
 			if (dst && dst->pair) {
-				diff_q(&outq, dst->pair);
+				if (!dst->pair->uninteresting_rename)
+					diff_q(&outq, dst->pair);
 				pair_to_free = p;
 			}
 			else
diff --git a/diffcore.h b/diffcore.h
index 6dab95b..a4eb8e1 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
 	unsigned broken_pair : 1;
 	unsigned renamed_pair : 1;
 	unsigned is_unmerged : 1;
+	unsigned uninteresting_rename : 1;
 	unsigned is_bulkmove : 1;
 };
 #define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-12-09 21:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-09 21:38 [PATCH v9] Detection of directory renames Yann Dirson
2010-12-09 21:38 ` [PATCH 1/6] Introduce debug_bulkmove() in diffcore-rename Yann Dirson
2010-12-09 21:38 ` [PATCH 2/6] Introduce bulk-move detection in diffcore Yann Dirson
2010-12-09 21:38 ` [PATCH 3/6] Raw diff output format for bulk moves Yann Dirson
2010-12-09 21:38 ` [PATCH 4/6] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
2010-12-09 21:38 ` [PATCH 5/6] Unified diff output format for bulk moves Yann Dirson
2010-12-09 21:38 ` [PATCH 6/6] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).