git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Detection of directory renames
@ 2008-11-01 22:03 Yann Dirson
  2008-11-01 22:03 ` [PATCH 1/3] Fix error in diff_filepair::status documentation Yann Dirson
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Yann Dirson @ 2008-11-01 22:03 UTC (permalink / raw)
  To: git

This new version fixes handling of moves to and from tree toplevel,
adds supports for detecting bulk moves of files into a subdirectory of
their original dir, and adds a couple of other testcases showing what
still has to be done to properly handle moves of directories with
subdirs.

-- 
Yann Dirson    <ydirson@altern.org> |
Debian-related: <dirson@debian.org> |   Support Debian GNU/Linux:
                                    |  Freedom, Power, Stability, Gratis
     http://ydirson.free.fr/        | Check <http://www.debian.org/>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] Fix error in diff_filepair::status documentation.
  2008-11-01 22:03 [PATCH v3 0/3] Detection of directory renames Yann Dirson
@ 2008-11-01 22:03 ` Yann Dirson
  2008-11-02  6:31   ` Junio C Hamano
  2008-11-01 22:03 ` [PATCH 2/3] Introduce rename factorization in diffcore Yann Dirson
  2008-11-01 22:03 ` [PATCH 3/3] Add testcases for the --factorize-renames diffcore flag Yann Dirson
  2 siblings, 1 reply; 15+ messages in thread
From: Yann Dirson @ 2008-11-01 22:03 UTC (permalink / raw)
  To: git

Signed-off-by: Yann Dirson <ydirson@altern.org>
---

 diffcore.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/diffcore.h b/diffcore.h
index 713cca7..05d0898 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -62,7 +62,7 @@ struct diff_filepair {
 	struct diff_filespec *one;
 	struct diff_filespec *two;
 	unsigned short int score;
-	char status; /* M C R N D U (see Documentation/diff-format.txt) */
+	char status; /* M C R A D U etc. (see DIFF_STATUS_* in diff.h) */
 	unsigned broken_pair : 1;
 	unsigned renamed_pair : 1;
 	unsigned is_unmerged : 1;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] Introduce rename factorization in diffcore.
  2008-11-01 22:03 [PATCH v3 0/3] Detection of directory renames Yann Dirson
  2008-11-01 22:03 ` [PATCH 1/3] Fix error in diff_filepair::status documentation Yann Dirson
@ 2008-11-01 22:03 ` Yann Dirson
  2008-11-07  1:10   ` Junio C Hamano
  2008-11-01 22:03 ` [PATCH 3/3] Add testcases for the --factorize-renames diffcore flag Yann Dirson
  2 siblings, 1 reply; 15+ messages in thread
From: Yann Dirson @ 2008-11-01 22:03 UTC (permalink / raw)
  To: git

Rename factorization tries to group together files moving from and to
identical directories - the most common case being directory renames.
We do that by first identifying groups of bulk-moved files, and then
hiding those of the individual renames which carry no other
information (further name change, or content changes).
This feature is activated by the new --factorize-renames diffcore
flag.

This is only the first step, adding the basic functionnality and
adding support to raw diff output (and it breaks unified-diff output
which does not know how to handle that stuff yet).

Even the output format may not be kept as is.  For now both the result
of "mv a b" and "mv a/* b/" are displayed as "Rnnn a/ b/", which is
probably not what we want.  "Rnnn a/* b/" could be a good choice for
the latter if we want them to be distinguished, and even if we want
them to look the same.

Other future developements to be made on top of this include:
* extension of unified-diff format to express this
* application of such new diffs
* teach git-svn (and others ?) to make use of that flag
* merge correctly in case of addition into a moved dir
* detect "directory splits" so merge can flag a conflict on file adds
* use inexact dir renames to bump score of below-threshold renames
  from/to same locations
* add yours here
---

 diff-lib.c        |    6 +
 diff.c            |    5 +
 diff.h            |    3 +
 diffcore-rename.c |  301 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 diffcore.h        |    1 
 tree-diff.c       |    4 +
 6 files changed, 307 insertions(+), 13 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index ae96c64..dcc4c2c 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -179,7 +179,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 		changed = ce_match_stat(ce, &st, ce_option);
 		if (!changed) {
 			ce_mark_uptodate(ce);
-			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+			    !DIFF_OPT_TST(&revs->diffopt, FACTORIZE_RENAMES))
 				continue;
 		}
 		oldmode = ce->ce_mode;
@@ -310,7 +311,8 @@ static int show_modified(struct oneway_unpack_data *cbdata,
 
 	oldmode = old->ce_mode;
 	if (mode == oldmode && !hashcmp(sha1, old->sha1) &&
-	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+	    !DIFF_OPT_TST(&revs->diffopt, FACTORIZE_RENAMES))
 		return 0;
 
 	diff_change(&revs->diffopt, oldmode, mode,
diff --git a/diff.c b/diff.c
index e368fef..f91fcf6 100644
--- a/diff.c
+++ b/diff.c
@@ -2437,6 +2437,11 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_OPT_SET(options, REVERSE_DIFF);
 	else if (!strcmp(arg, "--find-copies-harder"))
 		DIFF_OPT_SET(options, FIND_COPIES_HARDER);
+	else if (!strcmp(arg, "--factorize-renames")) {
+		DIFF_OPT_SET(options, FACTORIZE_RENAMES);
+		if (!options->detect_rename)
+			options->detect_rename = DIFF_DETECT_RENAME;
+	}
 	else if (!strcmp(arg, "--follow"))
 		DIFF_OPT_SET(options, FOLLOW_RENAMES);
 	else if (!strcmp(arg, "--color"))
diff --git a/diff.h b/diff.h
index a49d865..db1658b 100644
--- a/diff.h
+++ b/diff.h
@@ -65,6 +65,7 @@ typedef void (*diff_format_fn_t)(struct diff_queue_struct *q,
 #define DIFF_OPT_IGNORE_SUBMODULES   (1 << 18)
 #define DIFF_OPT_DIRSTAT_CUMULATIVE  (1 << 19)
 #define DIFF_OPT_DIRSTAT_BY_FILE     (1 << 20)
+#define DIFF_OPT_FACTORIZE_RENAMES   (1 << 21)
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
 #define DIFF_OPT_CLR(opts, flag)    ((opts)->flags &= ~DIFF_OPT_##flag)
@@ -220,6 +221,8 @@ extern void diffcore_std(struct diff_options *);
 "  -C            detect copies.\n" \
 "  --find-copies-harder\n" \
 "                try unchanged files as candidate for copy detection.\n" \
+"  --factorize-renames\n" \
+"                factorize renames of all files of a directory.\n" \
 "  -l<n>         limit rename attempts up to <n> paths.\n" \
 "  -O<file>      reorder diffs according to the <file>.\n" \
 "  -S<string>    find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 168a95b..ab6149e 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -11,6 +11,7 @@
 static struct diff_rename_dst {
 	struct diff_filespec *two;
 	struct diff_filepair *pair;
+	int i_am_not_single:1; // does not look for a match, only here to be looked at
 } *rename_dst;
 static int rename_dst_nr, rename_dst_alloc;
 
@@ -49,9 +50,36 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
 	rename_dst[first].two = alloc_filespec(two->path);
 	fill_filespec(rename_dst[first].two, two->sha1, two->mode);
 	rename_dst[first].pair = NULL;
+	rename_dst[first].i_am_not_single = 0;
 	return &(rename_dst[first]);
 }
 
+static struct diff_rename_dst *locate_rename_dst_dir(struct diff_filespec *dir)
+{
+	/* code mostly duplicated from locate_rename_dst - not sure we
+	 * could merge them efficiently,though
+	 */
+	int first, last;
+	int prefixlength = strlen(dir->path);
+
+	first = 0;
+	last = rename_dst_nr;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		struct diff_rename_dst *dst = &(rename_dst[next]);
+		int cmp = strncmp(dir->path, dst->two->path, prefixlength);
+		if (!cmp)
+			return dst;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
+	}
+	/* not found */
+	return NULL;
+}
+
 /* Table of rename/copy src files */
 static struct diff_rename_src {
 	struct diff_filespec *one;
@@ -262,7 +290,7 @@ static int find_identical_files(struct file_similarity *src,
 			int score;
 			struct diff_filespec *source = p->filespec;
 
-			/* False hash collission? */
+			/* False hash collision? */
 			if (hashcmp(source->sha1, target->sha1))
 				continue;
 			/* Non-regular files? If so, the modes must match! */
@@ -381,8 +409,11 @@ static int find_exact_renames(void)
 	for (i = 0; i < rename_src_nr; i++)
 		insert_file_table(&file_table, -1, i, rename_src[i].one);
 
-	for (i = 0; i < rename_dst_nr; i++)
+	for (i = 0; i < rename_dst_nr; i++) {
+		if (rename_dst[i].i_am_not_single)
+			continue;
 		insert_file_table(&file_table, 1, i, rename_dst[i].two);
+	}
 
 	/* Find the renames */
 	i = for_each_hash(&file_table, find_same_files);
@@ -409,6 +440,223 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
 		m[worst] = *o;
 }
 
+struct diff_dir_rename {
+	struct diff_filespec *one;
+	struct diff_filespec *two;
+	int discarded;
+	struct diff_dir_rename* next;
+};
+
+/*
+ * Marks as such file_rename if it is made uninteresting by dir_rename.
+ * Returns -1 if the file_rename is outside of the range in which given
+ * renames concerned by dir_rename are to be found (ie. end of loop),
+ * 0 otherwise.
+ */
+static int maybe_mark_uninteresting(struct diff_rename_dst* file_rename,
+				    struct diff_dir_rename* dir_rename,
+				    int one_len, int two_len)
+{
+	if (!file_rename->pair) /* file add */
+		return 0;
+	if (strncmp(file_rename->two->path,
+		    dir_rename->two->path, two_len))
+		return -1;
+	if (strncmp(file_rename->pair->one->path,
+		    dir_rename->one->path, one_len) ||
+	    !basename_same(file_rename->pair->one, file_rename->two) ||
+	    file_rename->pair->score != MAX_SCORE)
+		return 0;
+
+	file_rename->pair->uninteresting_rename = 1;
+	fprintf (stderr, "[DBG] %s* -> %s* makes %s -> %s uninteresting\n",
+		dir_rename->one->path, dir_rename->two->path,
+		file_rename->pair->one->path, file_rename->two->path);
+	return 0;
+}
+
+// FIXME: prevent possible overflow
+/*
+ * Copy dirname of src into dst, suitable to append a filename without
+ * an additional "/".
+ * Only handles relative paths since there is no relative path in a git repo.
+ * Writes "" when there is no "/" in src.
+ * May overwrite more chars than really needed, if src ends with a "/".
+ */
+static const char* copy_dirname(char* dst, const char* src)
+{
+	char* lastslash = strrchr(src, '/');
+	if (!lastslash) {
+		*dst = '\0';
+		return dst;
+	}
+	strncpy(dst, src, lastslash - src + 1);
+	dst[lastslash - src + 1] = '\0';
+
+	// if src ends with a "/" strip the last component
+	if (lastslash[1] == '\0') {
+		lastslash = strrchr(dst, '/');
+		if (!lastslash)
+			return strcpy(dst, ".");
+		lastslash[1] = '\0';
+	}
+
+	return dst;
+}
+
+/*
+ * FIXME: we could optimize the 100%-rename case by preventing
+ * recursion to unfold what we know we would refold here.
+ * FIXME: do we want to replace linked list with sorted array ?
+ * FIXME: this prototype only handles renaming of dirs without
+ * a subdir.
+ * FIXME: leaks like hell.
+ * FIXME: ideas to evaluate a similarity score, anyone ?
+ *  10% * tree similarity + 90% * moved files similarity ?
+ */
+static struct diff_dir_rename* factorization_candidates = NULL;
+static void diffcore_factorize_renames(void)
+{
+	char one_parent_path[PATH_MAX], two_parent_path[PATH_MAX];
+	int i;
+
+	for (i = 0; i < rename_dst_nr; i++) {
+		struct diff_dir_rename* seen;
+
+		// FIXME: what are those ?
+		if (!rename_dst[i].pair)
+			continue;
+		// dummy renames used by copy detection
+		if (!strcmp(rename_dst[i].pair->one->path, rename_dst[i].pair->two->path))
+			continue;
+
+		copy_dirname(one_parent_path, rename_dst[i].pair->one->path);
+		copy_dirname(two_parent_path, rename_dst[i].pair->two->path);
+
+		struct diff_filespec* one_parent = alloc_filespec(one_parent_path);
+		fill_filespec(one_parent, null_sha1 /*FIXME*/, S_IFDIR);
+
+		struct diff_rename_dst* one_leftover =
+			locate_rename_dst_dir(one_parent);
+		if (one_leftover) { // FIXME: should only be run if !seen
+			/* this might be a dir split, or files added
+			 * after the bulk rename, or just an isolated
+			 * rename */
+			int two_idx, j, onep_len, maybe_dir_rename;
+
+			// try to see if it is a file added after the bulk rename
+			two_idx = one_leftover - rename_dst;
+			onep_len = strlen(one_parent_path);
+			maybe_dir_rename = 1;
+
+			// check no leftover file was already here before
+			for (j = two_idx; j < rename_dst_nr; j++) {
+				if (strncmp(rename_dst[j].two->path,
+					    one_parent_path, onep_len))
+					break; // exhausted directory in this direction
+				fprintf (stderr, "[DBG] leftover file %s in %s\n",
+					 rename_dst[j].two->path, one_parent_path);
+				if (rename_dst[j].i_am_not_single || // those were already here
+				    (rename_dst[j].pair &&
+				     !strncmp(rename_dst[j].pair->one->path,
+					      one_parent_path, onep_len) && // renamed from here
+				     strncmp(rename_dst[j].two->path,
+					     one_parent_path, onep_len))) { // not to a subdir
+					maybe_dir_rename = 0;
+					fprintf (stderr, "[DBG] ... tells not a bulk rename\n");
+					break;
+				}
+				fprintf (stderr, "[DBG] ... not believed to prevent bulk rename\n");
+			}
+			if (!maybe_dir_rename) continue;
+			for (j = two_idx-1; j >= 0; j--) {
+				if (strncmp(rename_dst[j].two->path,
+					    one_parent_path, onep_len))
+					break; // exhausted directory in this direction
+				fprintf (stderr, "[DBG] leftover file %s in %s\n",
+					 rename_dst[j].two->path, one_parent_path);
+				if (rename_dst[j].i_am_not_single || // those were already here
+				    (rename_dst[j].pair &&
+				     !strncmp(rename_dst[j].pair->one->path,
+					      one_parent_path, onep_len) && // renamed from here
+				     strncmp(rename_dst[j].two->path,
+					     one_parent_path, onep_len))) { // not to a subdir
+					maybe_dir_rename = 0;
+					fprintf (stderr, "[DBG] ... tells not a bulk rename\n");
+					break;
+				}
+				fprintf (stderr, "[DBG] ... not believed to prevent bulk rename\n");
+			}
+			if (!maybe_dir_rename) continue;
+
+			// here we are in the case where a directory
+			// content was completely moved, but files
+			// were added to it afterwards.  Proceed as
+			// for a simple bulk move.
+		}
+
+		// already considered ?
+		for (seen=factorization_candidates; seen; seen = seen->next)
+			if (!strcmp(seen->one->path, one_parent_path)) break;
+		if (!seen) {
+			// record potential dir rename
+			seen = xmalloc(sizeof(*seen));
+			seen->one = one_parent;
+			seen->two = alloc_filespec(two_parent_path);
+			fill_filespec(seen->two, null_sha1 /*FIXME*/, S_IFDIR);
+			seen->discarded = 0;
+			seen->next = factorization_candidates;
+			factorization_candidates = seen;
+			fprintf (stderr, "[DBG] %s -> %s suggests possible rename from %s to %s\n",
+			       rename_dst[i].pair->one->path,
+			       rename_dst[i].pair->two->path,
+			       one_parent_path, two_parent_path);
+			continue;
+		}
+		if (seen->discarded)
+			continue;
+		// check that seen entry matches this rename
+		if (strcmp(two_parent_path, seen->two->path)) {
+			fprintf (stderr, "[DBG] discarding dir split of %s from renames (into %s and %s)\n",
+			       one_parent_path, two_parent_path, seen->two->path);
+			seen->discarded = 1;
+		}
+
+		/* all checks ok, we keep that entry */
+	}
+
+	// turn candidates into real renames
+	struct diff_dir_rename* candidate;
+	for (candidate=factorization_candidates; candidate; candidate = candidate->next) {
+		int two_idx, i, one_len, two_len;
+		if (candidate->discarded)
+			continue;
+
+		// bisect to an entry within candidate dst dir
+		struct diff_rename_dst* two_sample =
+			locate_rename_dst_dir(candidate->two);
+		if (!two_sample) {
+			die ("PANIC: %s candidate of rename not in target tree (from %s)\n",
+			     candidate->two->path, candidate->one->path);
+		}
+		two_idx = two_sample - rename_dst;
+
+		// now remove extraneous 100% files inside.
+		one_len = strlen(candidate->one->path);
+		two_len = strlen(candidate->two->path);
+		for (i = two_idx; i < rename_dst_nr; i++)
+			if (maybe_mark_uninteresting (rename_dst+i, candidate,
+						      one_len, two_len) < 0)
+				break;
+		for (i = two_idx-1; i >= 0; i--)
+			if (maybe_mark_uninteresting (rename_dst+i, candidate,
+						      one_len, two_len) < 0)
+				break;
+	}
+
+	return;
+}
+
 void diffcore_rename(struct diff_options *options)
 {
 	int detect_rename = options->detect_rename;
@@ -446,13 +694,22 @@ void diffcore_rename(struct diff_options *options)
 				p->one->rename_used++;
 			register_rename_src(p->one, p->score);
 		}
-		else if (detect_rename == DIFF_DETECT_COPY) {
-			/*
-			 * Increment the "rename_used" score by
-			 * one, to indicate ourselves as a user.
-			 */
-			p->one->rename_used++;
-			register_rename_src(p->one, p->score);
+		else {
+			if (detect_rename == DIFF_DETECT_COPY) {
+				/*
+				 * Increment the "rename_used" score by
+				 * one, to indicate ourselves as a user.
+				 */
+				p->one->rename_used++;
+				register_rename_src(p->one, p->score);
+			}
+			if (DIFF_OPT_TST(options, FACTORIZE_RENAMES)) {
+				/* similarly, rename factorization needs to
+				 * see all files from second tree, but we don't
+				 * want them to be matched against single sources.
+				 */
+				locate_rename_dst(p->two, 1)->i_am_not_single = 1;
+			}
 		}
 	}
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
@@ -504,6 +761,8 @@ void diffcore_rename(struct diff_options *options)
 
 		if (rename_dst[i].pair)
 			continue; /* dealt with exact match already. */
+		if (rename_dst[i].i_am_not_single)
+			continue; /* not looking for a match. */
 
 		m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST];
 		for (j = 0; j < NUM_CANDIDATE_PER_DST; j++)
@@ -561,8 +820,29 @@ void diffcore_rename(struct diff_options *options)
 	/* At this point, we have found some renames and copies and they
 	 * are recorded in rename_dst.  The original list is still in *q.
 	 */
+
+	/* Now possibly factorize those renames and copies. */
+	if (DIFF_OPT_TST(options, FACTORIZE_RENAMES))
+		diffcore_factorize_renames();
+
 	outq.queue = NULL;
 	outq.nr = outq.alloc = 0;
+
+	// Now turn non-discarded factorization_candidates into real renames
+	struct diff_dir_rename* candidate;
+	for (candidate=factorization_candidates; candidate; candidate = candidate->next) {
+		struct diff_filepair* pair;
+		if (candidate->discarded) continue;
+		// visualize toplevel dir if needed - FIXME: wrong place for this ?
+		if (!*candidate->one->path)
+			candidate->one->path = "./";
+		if (!*candidate->two->path)
+			candidate->two->path = "./";
+		pair = diff_queue(&outq, candidate->one, candidate->two);
+		pair->score = MAX_SCORE;
+		pair->renamed_pair = 1;
+	}
+
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		struct diff_filepair *pair_to_free = NULL;
@@ -577,7 +857,8 @@ void diffcore_rename(struct diff_options *options)
 			struct diff_rename_dst *dst =
 				locate_rename_dst(p->two, 0);
 			if (dst && dst->pair) {
-				diff_q(&outq, dst->pair);
+				if (!dst->pair->uninteresting_rename)
+					diff_q(&outq, dst->pair);
 				pair_to_free = p;
 			}
 			else
diff --git a/diffcore.h b/diffcore.h
index 05d0898..ce781ae 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -66,6 +66,7 @@ struct diff_filepair {
 	unsigned broken_pair : 1;
 	unsigned renamed_pair : 1;
 	unsigned is_unmerged : 1;
+	unsigned uninteresting_rename : 1;
 };
 #define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
 
diff --git a/tree-diff.c b/tree-diff.c
index 9f67af6..872f757 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -49,7 +49,9 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
 		show_entry(opt, "+", t2, base, baselen);
 		return 1;
 	}
-	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) && mode1 == mode2)
+	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) &&
+	    !DIFF_OPT_TST(opt, FACTORIZE_RENAMES) &&
+	    !hashcmp(sha1, sha2) && mode1 == mode2)
 		return 0;
 
 	/*

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] Add testcases for the --factorize-renames diffcore flag.
  2008-11-01 22:03 [PATCH v3 0/3] Detection of directory renames Yann Dirson
  2008-11-01 22:03 ` [PATCH 1/3] Fix error in diff_filepair::status documentation Yann Dirson
  2008-11-01 22:03 ` [PATCH 2/3] Introduce rename factorization in diffcore Yann Dirson
@ 2008-11-01 22:03 ` Yann Dirson
  2 siblings, 0 replies; 15+ messages in thread
From: Yann Dirson @ 2008-11-01 22:03 UTC (permalink / raw)
  To: git

This notably includes a couple of tests for cases known not to be
working correctly yet.
---

 t/t4030-diff-rename-factorize.sh |  259 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 259 insertions(+), 0 deletions(-)
 create mode 100755 t/t4030-diff-rename-factorize.sh

diff --git a/t/t4030-diff-rename-factorize.sh b/t/t4030-diff-rename-factorize.sh
new file mode 100755
index 0000000..302a7ab
--- /dev/null
+++ b/t/t4030-diff-rename-factorize.sh
@@ -0,0 +1,259 @@
+#!/bin/sh
+#
+# Copyright (c) 2008 Yann Dirson
+# Copyright (c) 2005 Junio C Hamano
+#
+
+test_description='Test rename factorization in diff engine.
+
+'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
+
+test_expect_success \
+    'commit the index.'  \
+    'git update-ref HEAD $(echo "original empty commit" | git commit-tree $(git write-tree))'
+
+mkdir a
+echo >a/path0 'Line 1
+Line 2
+Line 3
+Line 4
+Line 5
+Line 6
+Line 7
+Line 8
+Line 9
+Line 10
+line 11
+Line 12
+Line 13
+Line 14
+Line 15
+'
+sed <a/path0 >a/path1 s/Line/Record/
+sed <a/path0 >a/path2 s/Line/Stuff/
+sed <a/path0 >a/path3 s/Line/Blurb/
+
+test_expect_success \
+    'update-index --add file inside a directory.' \
+    'git update-index --add a/path*'
+
+test_expect_success \
+    'commit the index.'  \
+    'git update-ref HEAD $(echo "original set of files" | git commit-tree $(git write-tree))'
+
+mv a b
+test_expect_success \
+    'renamed the directory.' \
+    'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 b/path*'
+
+test_expect_success \
+    'git diff-index --factorize-renames after directory move.' \
+    'git diff-index --factorize-renames HEAD >current'
+grep -v "^\[DBG\] " <current >current.filtered
+cat >expected <<\EOF
+:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100	a/	b/
+EOF
+
+test_expect_success \
+    'validate the output for directory move.' \
+    'compare_diff_patch current.filtered expected'
+
+# now test non-100% renames
+
+echo 'Line 16' >> b/path0
+mv b/path2 b/2path
+rm b/path3
+echo anything > b/path100
+test_expect_success \
+    'edited dir contents.' \
+    'git update-index --add --remove b/* b/path2 b/path3'
+
+test_expect_success \
+    'git diff-index --factorize-renames after directory move and content changes.' \
+    'git diff-index --factorize-renames HEAD >current'
+grep -v "^\[DBG\] " <current >current.filtered
+cat >expected <<\EOF
+:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100	a/	b/
+:100644 000000 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a 0000000000000000000000000000000000000000 D	a/path3
+:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100	a/path2	b/2path
+:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 4db595d12886f90e36765fc1732c17bccb836663 R093	a/path0	b/path0
+:000000 100644 0000000000000000000000000000000000000000 1ba4650885513e62386fd3e23aeb45beeb67d3bb A	b/path100
+EOF
+
+test_expect_success \
+    'validate the output for directory move and content changes.' \
+    'compare_diff_patch current.filtered expected'
+
+git reset --hard
+
+# now test bulk moves that are not directory moves (get consensus before going further ?)
+
+mkdir c
+for i in 0 1 2; do cp a/path$i c/apath$i; done
+test_expect_success \
+    'add files into a new directory.' \
+    'git update-index --add c/apath*'
+
+test_expect_success \
+    'commit all this.'  \
+    'git commit -m "first set of changes"'
+
+mv c/* a/
+test_expect_success \
+    'move all of the new dir contents into a preexisting dir.' \
+    'git update-index --add --remove a/* c/apath0 c/apath1 c/apath2'
+
+test_expect_success \
+    'git diff-index --factorize-renames without full-dir rename.' \
+    'git diff-index --factorize-renames HEAD >current'
+grep -v "^\[DBG\] " <current >current.filtered
+cat >expected <<\EOF
+:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100	c/*	a/
+EOF
+
+test_expect_failure \
+    'validate the output for bulk rename without full-dir rename.' \
+    'compare_diff_patch current.filtered expected'
+
+git reset --hard
+
+# now test moves to toplevel
+
+mv c/* .
+test_expect_success \
+    'move all of the new dir contents into toplevel.' \
+    'git update-index --add --remove apath* c/apath0 c/apath1 c/apath2'
+
+test_expect_success \
+    'git diff-index --factorize-renames files bulk-moved to toplevel.' \
+    'git diff-index --factorize-renames HEAD >current'
+grep -v "^\[DBG\] " <current >current.filtered
+cat >expected <<\EOF
+:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100	c/*	./
+EOF
+
+test_expect_failure \
+    'validate the output for files bulk-moved to toplevel.' \
+    'compare_diff_patch current.filtered expected'
+
+git reset --hard
+
+# now test renaming with subdirs (does not take subdirs into account)
+
+mv c a/
+test_expect_success \
+    'move the new dir as subdir of another.' \
+    'git update-index --add --remove a/c/* c/apath0 c/apath1 c/apath2'
+
+test_expect_success \
+    'commit all this.'  \
+    'git commit -m "move as subdir"'
+
+mv a b
+echo foo >> b/c/apath0
+test_expect_success \
+    'rename the directory with some changes.' \
+    'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/c/apath0 a/c/apath1 a/c/apath2 b/path* b/c/apath*'
+
+test_expect_success \
+    'git diff-index --factorize-renames on a move including a subdir.' \
+    'git diff-index --factorize-renames HEAD >current'
+grep -v "^\[DBG\] " <current >current.filtered
+cat >expected <<\EOF
+:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100	a/	b/
+:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 00084e5ea68b5ae339b7c4b429e4a70fe25d069b R096	a/c/apath0	b/c/apath0
+EOF
+
+test_expect_failure \
+    'validate the output for a move including a subdir.' \
+    'compare_diff_patch current.filtered expected'
+
+git reset --hard
+
+# now check that moving all files but not subdirs is not mistaken for dir move
+
+mkdir b
+mv a/path* b/
+test_expect_success \
+    'rename files in the directory but not subdir.' \
+    'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 b/path*'
+
+test_expect_success \
+    'git diff-index --factorize-renames on a move without a subdir.' \
+    'git diff-index --factorize-renames HEAD >current'
+grep -v "^\[DBG\] " <current >current.filtered
+cat >expected <<\EOF
+:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100	a/path0	b/path0
+:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100	a/path1	b/path1
+:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100	a/path2	b/path2
+:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100	a/path3	b/path3
+EOF
+
+test_expect_success \
+    'validate the output for a move without a subdir.' \
+    'compare_diff_patch current.filtered expected'
+
+git reset --hard
+
+# now check that moving subdirs into one dir and files into another is not mistaken for dir move
+# (well, clearly it is ...)
+
+mv a/c b
+mv a d
+test_expect_success \
+    'rename subdir and files into different places.' \
+    'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/c/apath0 a/c/apath1 a/c/apath2 d/path* b/apath*'
+
+test_expect_success \
+    'git diff-index --factorize-renames on a split of subdir and files into different places.' \
+    'git diff-index --factorize-renames HEAD >current'
+grep -v "^\[DBG\] " <current >current.filtered
+cat >expected <<\EOF
+:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100	a/path0	d/path0
+:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100	a/path1	d/path1
+:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100	a/path2	d/path2
+:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100	a/path3	d/path3
+:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100	a/c/	b/
+EOF
+
+test_expect_failure \
+    'validate the output for a split of subdir and files into different places.' \
+    'compare_diff_patch current.filtered expected'
+
+# now test moving all files from toplevel into subdir (does not hides file moves) (needs consensus on syntax)
+# Note: this is as special case of move of a dir into one of its own subdirs, which in
+# turn is a special case of new files/dirs being added into a dir after all its contents
+# are moved away
+
+git reset --hard HEAD~2
+
+mv a/* .
+test_expect_success \
+    'rename the directory with some changes.' \
+    'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 path*'
+
+test_expect_success \
+    'commit all this.'  \
+    'git commit -m "move all files to toplevel"'
+
+mkdir z
+mv path* z/
+test_expect_success \
+    'rename the directory with some changes.' \
+    'git update-index --add --remove path0 path1 path2 path3 z/path*'
+
+test_expect_success \
+    'git diff-index --factorize-renames everything from toplevel.' \
+    'git diff-index --factorize-renames HEAD >current'
+grep -v "^\[DBG\] " <current >current.filtered
+cat >expected <<\EOF
+:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100	./*	z/
+EOF
+
+test_expect_failure \
+    'validate the output for a move of everything from toplevel.' \
+    'compare_diff_patch current.filtered expected'
+
+test_done

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] Fix error in diff_filepair::status documentation.
  2008-11-01 22:03 ` [PATCH 1/3] Fix error in diff_filepair::status documentation Yann Dirson
@ 2008-11-02  6:31   ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2008-11-02  6:31 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git

Yann Dirson <ydirson@altern.org> writes:

> Signed-off-by: Yann Dirson <ydirson@altern.org>
> ---
>
>  diffcore.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/diffcore.h b/diffcore.h
> index 713cca7..05d0898 100644
> --- a/diffcore.h
> +++ b/diffcore.h
> @@ -62,7 +62,7 @@ struct diff_filepair {
>  	struct diff_filespec *one;
>  	struct diff_filespec *two;
>  	unsigned short int score;
> -	char status; /* M C R N D U (see Documentation/diff-format.txt) */
> +	char status; /* M C R A D U etc. (see DIFF_STATUS_* in diff.h) */

I think you spotted the right bug and fixed it in a wrong way.  The status
letters are user visible, and "git grep status Documentation/*diff*"
mention it in handful places without listing its repertoire, which _is_
the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] Introduce rename factorization in diffcore.
  2008-11-01 22:03 ` [PATCH 2/3] Introduce rename factorization in diffcore Yann Dirson
@ 2008-11-07  1:10   ` Junio C Hamano
  2008-11-07 19:39     ` Yann Dirson
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2008-11-07  1:10 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git

Yann Dirson <ydirson@altern.org> writes:

> Rename factorization tries to group together files moving from and to
> identical directories - the most common case being directory renames.
> We do that by first identifying groups of bulk-moved files, and then
> hiding those of the individual renames which carry no other
> information (further name change, or content changes).
> This feature is activated by the new --factorize-renames diffcore
> flag.

I have a mixed feeling about this one, primarily because I cannot
visualize how a useful output should look like.  Unless you rename one
directory to another without any content changes, you would have to say
"this directory changed to that, and among the paths underneath them, this
file have this content change in addition".

A related feature that would benefit from something like your change
without any downside/complication of output format issues is to boost
rename similarity score of a path when its neighbouring paths are moved to
the same location.  E.g. when you see:

 - three files a/{1,2,3} deleted;
 - three files b/{1,2,3} created;
 - (a/1 => b/1) and (a/2 => b/2) are similar enough;
 - (a/3 => b/3) are not similar enough.

we currently detect only two renames and leave deletion of a/3 and
creation of b/3 unpaired.  You should be able to help them paired up by
noticing that the entire a/* goes away (for that, reading the full
postimage like you do in your patch helps) and boost the similarity score
between these two.

Although I do not offhand think a good format to show the information you
are trying to capture in the textual diff output, one thing that would be
helped by the grouping of renames like you do would be process_renames()
in merge_recursive.c.  This is especially so when you have added a new
path in a directory that has been moved by the other branch you are
merging.  For this usage, there is no "textual output format" issues.  It
does not even have to be expressed by replacing individual entries from
diffq with entries that represent a whole subtree --- you could for
example keep what diffq.queue records intact, and add a separate list of
directory renames as a hint for users like process_renames() to use.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] Introduce rename factorization in diffcore.
  2008-11-07  1:10   ` Junio C Hamano
@ 2008-11-07 19:39     ` Yann Dirson
  2008-11-07 20:19       ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Yann Dirson @ 2008-11-07 19:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Nov 06, 2008 at 05:10:09PM -0800, Junio C Hamano wrote:
> Yann Dirson <ydirson@altern.org> writes:
> 
> > Rename factorization tries to group together files moving from and to
> > identical directories - the most common case being directory renames.
> > We do that by first identifying groups of bulk-moved files, and then
> > hiding those of the individual renames which carry no other
> > information (further name change, or content changes).
> > This feature is activated by the new --factorize-renames diffcore
> > flag.
> 
> I have a mixed feeling about this one, primarily because I cannot
> visualize how a useful output should look like.  Unless you rename one
> directory to another without any content changes, you would have to say
> "this directory changed to that, and among the paths underneath them, this
> file have this content change in addition".

Right.  This is something I have already thought about, without taking
the time to finalize.  So let's have a try.

The direct transposition of the raw output of a dir rename with a
change would yield something like:

|diff --git a/ppc/ b/foo
|similarity index 100%
|rename from ppc/
|rename to foo/
|diff --git a/ppc/sha1.h b/foo/sha1.h
|--- ppc/sha1.h
|+++ foo/sha1.h
|...

This would have the usefulness I'm looking for, in that the content
modifications would not be "hidden" among a whole lot af individual
file rename hunks.

But if we want to make this more generally useful (eg. allow other
tools to apply such patches), we need to be careful.  A typical patch
file can usually be seen as a sequence of changes which can be applied
independently (at least in order).  It is obvious that this is not
true for the above output.  So we could also consider making this
patch output sequential with something like:

|diff --git a/ppc/ b/foo
|similarity index 100%
|rename from ppc/
|rename to foo/
|diff --git a/foo/sha1.h b/foo/sha1.h
|--- foo/sha1.h
|+++ foo/sha1.h
|...

However, such output would possibly be confusing, eg. when looking at
it from gitk and looking for commits which modify ppc/sha1.h.

A solution to this could be to add an annotation such as:

|diff --git a/ppc/ b/foo
|similarity index 100%
|rename from ppc/
|rename to foo/
|diff --git a/foo/sha1.h b/foo/sha1.h
|--- foo/sha1.h (previously ppc/sha1.h)
|+++ foo/sha1.h
|...

That would make the "git diff" format diverge a bit more from the
standard, but since a full-fledged git patch already cannot be used by
standard patch tools, it should not be a big issue - we should just be
careful about choosing a suitable format if we go that way.

But as we already change the patch format, we could also simply say
that we don't care about the patch being "litterally splittable", add
a "this is a non-splittable whole" note at the beginning of the patch
output, and go with the litteral solution shown as first example.

How do you feel wrt this ?


> A related feature that would benefit from something like your change
> without any downside/complication of output format issues is to boost
> rename similarity score of a path when its neighbouring paths are moved to
> the same location.

Yes, that could be triggered by distinct switches.

Doing this would be related to detecting "directory splits" (ie. when
some files are really not moved, or moved into several dirs instead of
being all moved to the same target dir).  We can start by detecting
dir-splits, then shift the scoring threshold and do another pass on
split dirs.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] Introduce rename factorization in diffcore.
  2008-11-07 19:39     ` Yann Dirson
@ 2008-11-07 20:19       ` Junio C Hamano
  2008-11-07 20:39         ` Yann Dirson
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2008-11-07 20:19 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git

Yann Dirson <ydirson@altern.org> writes:

> The direct transposition of the raw output of a dir rename with a
> change would yield something like:
>
> |diff --git a/ppc/ b/foo
> |similarity index 100%
> |rename from ppc/
> |rename to foo/
> |diff --git a/ppc/sha1.h b/foo/sha1.h
> |--- ppc/sha1.h
> |+++ foo/sha1.h
> |...
>
> This would have the usefulness I'm looking for, in that the content
> modifications would not be "hidden" among a whole lot af individual
> file rename hunks.

I am afraid that this is totally unacceptable, as you yourself mentioned,
the end result is unapplicable with any existing tool and would confuse
viewers like gitk and gitweb.

I think it would be a much better approach to emit a hint that talks about
directory rename in a format that does not fool usual "patch" application
tools.  E.g.

|directory moved with similarity index 82%
|rename from ppc
|rename from foo
|diff --git a/ppc/sha1.h b/foo/sha1.h
|similarity index 85%
|rename from ppc/sha1.h
|rename to foo/sha1.h
|index  9b34f76..8fce4b7 100644
|--- ppc/sha1.h
|+++ foo/sha1.h
|@@ ...

IOW, do not add anything that begins with "diff --git" if it is not a
patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] Introduce rename factorization in diffcore.
  2008-11-07 20:19       ` Junio C Hamano
@ 2008-11-07 20:39         ` Yann Dirson
  2008-11-07 21:11           ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Yann Dirson @ 2008-11-07 20:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Nov 07, 2008 at 12:19:14PM -0800, Junio C Hamano wrote:
> I am afraid that this is totally unacceptable, as you yourself mentioned,
> the end result is unapplicable with any existing tool and would confuse
> viewers like gitk and gitweb.

Well, other tools will still have to be taught about a new syntax, if
they're going to use the new flag - just like it was for --rename.

That makes me think that it could be interesting to have a header
telling which diff extensions it is using.  It could even be
interesting to come with a cross-vc convention, which would ease the
interchange of modern patch files (I suppose some other tools already
have their own extensions for file renames).

What about some hint like the following ?

|extended diff features: git:rename git:binary git:filetypes
|extended diff features: git:unixperms git:dir-rename

Maybe we could leave "filetype" and "unixperms" out, since these do
not replace an information that would otherwise be present in a
standard diff anyway, and would just be ignored by a tool not
supporting them.


> I think it would be a much better approach to emit a hint that talks about
> directory rename in a format that does not fool usual "patch" application
> tools.  E.g.
> 
> |directory moved with similarity index 82%
> |rename from ppc
> |rename from foo
> |diff --git a/ppc/sha1.h b/foo/sha1.h
> |similarity index 85%
> |rename from ppc/sha1.h
> |rename to foo/sha1.h
> |index  9b34f76..8fce4b7 100644
> |--- ppc/sha1.h
> |+++ foo/sha1.h
> |@@ ...

So you're favoring the "patch is not textually splittable approach".
Don't you think it would be good to add some sort of hint about this
as a patch header ?

> IOW, do not add anything that begins with "diff --git" if it is not a
> patch.

OK.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] Introduce rename factorization in diffcore.
  2008-11-07 20:39         ` Yann Dirson
@ 2008-11-07 21:11           ` Junio C Hamano
  2008-11-07 22:12             ` Yann Dirson
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2008-11-07 21:11 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git

Yann Dirson <ydirson@altern.org> writes:

> On Fri, Nov 07, 2008 at 12:19:14PM -0800, Junio C Hamano wrote:
>> I am afraid that this is totally unacceptable, as you yourself mentioned,
>> the end result is unapplicable with any existing tool and would confuse
>> viewers like gitk and gitweb.
>
> Well, other tools will still have to be taught about a new syntax, if
> they're going to use the new flag - just like it was for --rename.

You are mistaken.  For a patch, you are dealing with two different
parties: producer and consumer.  If you are adding new feature to the
producer, the output format should be desigend to allow the consumer tell
that it is something it does not know how to handle.

Marking a non patch with "diff --git" to trigger the logic to signal the
beginning of a patch to git-apply (and perhaps other tools) is a non
starter.

And for this "we are giving a patch that your git-apply can apply and gitk
can show, but by the way we also think the whole directory foo moved to
new location bar" is merely an additional information.  You should still
be able to apply the patch with tools that are unaware of this new
directory move detection feature.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] Introduce rename factorization in diffcore.
  2008-11-07 21:11           ` Junio C Hamano
@ 2008-11-07 22:12             ` Yann Dirson
  2008-11-07 23:43               ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Yann Dirson @ 2008-11-07 22:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Nov 07, 2008 at 01:11:40PM -0800, Junio C Hamano wrote:
> Yann Dirson <ydirson@altern.org> writes:
> 
> > On Fri, Nov 07, 2008 at 12:19:14PM -0800, Junio C Hamano wrote:
> >> I am afraid that this is totally unacceptable, as you yourself mentioned,
> >> the end result is unapplicable with any existing tool and would confuse
> >> viewers like gitk and gitweb.
> >
> > Well, other tools will still have to be taught about a new syntax, if
> > they're going to use the new flag - just like it was for --rename.
> 
> You are mistaken.  For a patch, you are dealing with two different
> parties: producer and consumer.  If you are adding new feature to the
> producer, the output format should be desigend to allow the consumer tell
> that it is something it does not know how to handle.

Agreed.

> Marking a non patch with "diff --git" to trigger the logic to signal the
> beginning of a patch to git-apply (and perhaps other tools) is a non
> starter.

This is not what I meant.

> And for this "we are giving a patch that your git-apply can apply and gitk
> can show, but by the way we also think the whole directory foo moved to
> new location bar" is merely an additional information.  You should still
> be able to apply the patch with tools that are unaware of this new
> directory move detection feature.

I hope I just miss your point.  Letting unaware tools handle such a
patch "the right way" would imply just adding the information "dir foo
moved to bar", and not removing the individual file moves, which goes
in the way of the exact reason why I have started to work on this.

Compare this to the addition of the "file rename" feature (correct me
if I'm wrong): it was added without bothering whether plain old
"patch" can deal with it, and when feeding a git diff to patch(1) I
cannot expect it to DTRT (indeed it applies the content change but not
the rename part, without complaining, and an unsuspecting user would
just be shot in the foot): we are precisely *not* able to apply the
patch with tools that are unaware of this new file rename feature.

This is precisely to limit this problem in the future that I proposed
this "diff extension" stuff in my last mail: limit how the "backward
compatibility" argument can cripple innovation.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] Introduce rename factorization in diffcore.
  2008-11-07 22:12             ` Yann Dirson
@ 2008-11-07 23:43               ` Junio C Hamano
  2008-11-08  0:29                 ` Yann Dirson
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2008-11-07 23:43 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git

Yann Dirson <ydirson@altern.org> writes:

> I hope I just miss your point.  Letting unaware tools handle such a
> patch "the right way" would imply just adding the information "dir foo
> moved to bar", and not removing the individual file moves, which goes
> in the way of the exact reason why I have started to work on this.

If your change is to move a/{1,2,3} to b/{1,2,3} and without content
change to a/{1,2} to b/{1,2}, then do you want to say "a/ moved to b/
and by the way here is the content change from a/3 to b/3" without saying
anything about a/{1,2} and b/{1,2}?

Two points.

 * I do not think it is a good idea to begin with.  If you are to apply
   such a patch (with git-apply that is updated with your patch to
   understand that notation) to the exact tree that has only {1,2,3} under
   a/, you would get an expected result.  But if the recipient of your
   patch has a/4 (or lacks a/2), there is no cue in the patch that
   automatically moving a/4 to b/4 may or may not be what is sane (or the
   patch is unapplicable in general).

 * If you give at least the names of paths that were moved without any
   content changes as I suggested, at least the recipient of your patch
   can catch the case where his tree is structurally different from what
   you used to prepare the patch for by noticing the a/2 in the patch that
   he does not have.

In addition, if you keep the movements for the paths whose contents did
not change, existing tools are perfectly capable of applying (or showing)
the output.  I seriously doubt that keeping 4 lines per perfectly moved
paths is too much a price to pay to keep backward compatibility.

> Compare this to the addition of the "file rename" feature (correct me
> if I'm wrong): it was added without bothering whether plain old
> "patch" can deal with it,...

Sorry, but that's an old history whie git-diff output format was rapidly
being developed, when we did not have that many users, and when we did not
have an old version of git-apply that did not understand the new feature
in majority of user's hands.

We do not have that kind of luxury anymore.  git is much more widespread
now and the majority of people use pre-1.6.1 git now (including me ;-)).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] Introduce rename factorization in diffcore.
  2008-11-07 23:43               ` Junio C Hamano
@ 2008-11-08  0:29                 ` Yann Dirson
  2008-11-08  0:47                   ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Yann Dirson @ 2008-11-08  0:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Nov 07, 2008 at 03:43:19PM -0800, Junio C Hamano wrote:
> Yann Dirson <ydirson@altern.org> writes:
> 
> > I hope I just miss your point.  Letting unaware tools handle such a
> > patch "the right way" would imply just adding the information "dir foo
> > moved to bar", and not removing the individual file moves, which goes
> > in the way of the exact reason why I have started to work on this.
> 
> If your change is to move a/{1,2,3} to b/{1,2,3} and without content
> change to a/{1,2} to b/{1,2}, then do you want to say "a/ moved to b/
> and by the way here is the content change from a/3 to b/3" without saying
> anything about a/{1,2} and b/{1,2}?
> 
> Two points.
> 
>  * I do not think it is a good idea to begin with.  If you are to apply
>    such a patch (with git-apply that is updated with your patch to
>    understand that notation) to the exact tree that has only {1,2,3} under
>    a/, you would get an expected result.  But if the recipient of your
>    patch has a/4 (or lacks a/2), there is no cue in the patch that
>    automatically moving a/4 to b/4 may or may not be what is sane (or the
>    patch is unapplicable in general).

Sure in theory.  But in practice I do not remember one time when, if
all files from one dir are moved in one branch, the files added on
another in the same dir were not bound to be moved as well.

Anway, if we feel git-apply should not decide without the user
knowing, we can make it refuse by default, with options to do either
way, and one option to ask for each doubtful file instead.


>  * If you give at least the names of paths that were moved without any
>    content changes as I suggested, at least the recipient of your patch
>    can catch the case where his tree is structurally different from what
>    you used to prepare the patch for by noticing the a/2 in the patch that
>    he does not have.

Right.

> In addition, if you keep the movements for the paths whose contents did
> not change, existing tools are perfectly capable of applying (or showing)
> the output.  I seriously doubt that keeping 4 lines per perfectly moved
> paths is too much a price to pay to keep backward compatibility.

OK, so I realize we need 2 things here: one format for diff-exporting
with complete info, and one for human viewing (which is, again, the
primary reason why I needed this feature, so I'm not very keen on
letting all this work finally not being useful for me :).  Commands
for saving/mailing patches could issue a bold warning if the user
specifies the for-human-viewing flag.


> > Compare this to the addition of the "file rename" feature (correct me
> > if I'm wrong): it was added without bothering whether plain old
> > "patch" can deal with it,...
> 
> Sorry, but that's an old history whie git-diff output format was rapidly
> being developed, when we did not have that many users, and when we did not
> have an old version of git-apply that did not understand the new feature
> in majority of user's hands.
> 
> We do not have that kind of luxury anymore.  git is much more widespread
> now and the majority of people use pre-1.6.1 git now (including me ;-)).

I was talking about exchanging patches with the non-git part of the
world.  The point is that eg. GNU patch still happily accepts
git-generated files but produces nonsense using some, exactly because
it ignores meaningful data which (by design ?) appear to it to be
legal to ignore.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] Introduce rename factorization in diffcore.
  2008-11-08  0:29                 ` Yann Dirson
@ 2008-11-08  0:47                   ` Junio C Hamano
  2008-11-08  0:50                     ` Yann Dirson
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2008-11-08  0:47 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git

Yann Dirson <ydirson@altern.org> writes:

> ...  Commands
> for saving/mailing patches could issue a bold warning if the user
> specifies the for-human-viewing flag.

If it is a warning to the user who produces the patch, and not but a
warning in the patch text for the user who receives it, it is not useful.

> I was talking about exchanging patches with the non-git part of the
> world.

Renaming patch needs manual massaging if you want to use GNU patch and
that is not a new issue.  People know that.

The problem is in your example output were that it would break _existing
git tools_ with a backward incompatible change, when you did not have to.
That's the difference.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] Introduce rename factorization in diffcore.
  2008-11-08  0:47                   ` Junio C Hamano
@ 2008-11-08  0:50                     ` Yann Dirson
  0 siblings, 0 replies; 15+ messages in thread
From: Yann Dirson @ 2008-11-08  0:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Nov 07, 2008 at 04:47:46PM -0800, Junio C Hamano wrote:
> Yann Dirson <ydirson@altern.org> writes:
> 
> > ...  Commands
> > for saving/mailing patches could issue a bold warning if the user
> > specifies the for-human-viewing flag.
> 
> If it is a warning to the user who produces the patch, and not but a
> warning in the patch text for the user who receives it, it is not useful.

OK to get everyone warned :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-11-08  0:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-01 22:03 [PATCH v3 0/3] Detection of directory renames Yann Dirson
2008-11-01 22:03 ` [PATCH 1/3] Fix error in diff_filepair::status documentation Yann Dirson
2008-11-02  6:31   ` Junio C Hamano
2008-11-01 22:03 ` [PATCH 2/3] Introduce rename factorization in diffcore Yann Dirson
2008-11-07  1:10   ` Junio C Hamano
2008-11-07 19:39     ` Yann Dirson
2008-11-07 20:19       ` Junio C Hamano
2008-11-07 20:39         ` Yann Dirson
2008-11-07 21:11           ` Junio C Hamano
2008-11-07 22:12             ` Yann Dirson
2008-11-07 23:43               ` Junio C Hamano
2008-11-08  0:29                 ` Yann Dirson
2008-11-08  0:47                   ` Junio C Hamano
2008-11-08  0:50                     ` Yann Dirson
2008-11-01 22:03 ` [PATCH 3/3] Add testcases for the --factorize-renames diffcore flag Yann Dirson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).