git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Diff rename, manual correction, round 2
@ 2016-01-20 11:06 Nguyễn Thái Ngọc Duy
  2016-01-20 11:06 ` [PATCH 1/7] diff-no-index: do not take a redundant prefix argument Nguyễn Thái Ngọc Duy
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-01-20 11:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

Git is amazing. It can detect file renames without you telling it. But
even the best fails sometimes. And when it fails, we have no way to
correct it. Not so amazing..

Round 1 was three years ago. The idea back then [1] was to let the
user say "rename this path to this path". Jeff quickly pointed out it
didn't help merge. And it should help merge to reduce merge conflicts.
For merge to take advantage of rename correction, the user needs to
spell out "rename this blob to this blob" instead. Then I went away
doing something and dropped it.

Round 2. I still want to keep "rename path to path", at least on the
user interface level. For git-merge, I do exactly what Jeff wrote back
then, traversing the whole notes tree to collect rename hints. But now
I cache the result, so while traversing is expensive, we don't have to
pay often.

The first two patches are preparation. Patch 3 adds --rename-file
where you can tell Git to "rename this path to that path" by writing a
file with lines like this

    this path => that path

I will need to support quoting, but that can come later. Patch 5 is
similar.

Patch 4 adds --rename-notes, where you can now store the above file in
a notes tree. The note will be used when its associate commit A is
diff'ed against A^.

Patch 6 introduces a new syntax to that file, "blob SHA-1 => SHA-1".
This is the basis for patch 7, where we traverse the whole notes tree,
convert all non-blob lines into blob ones. Then we simply tell
git-merge to use that as a rename instruction file.

Expect round 3 in 2019 (hopefully not)

[1] http://thread.gmane.org/gmane.comp.version-control.git/202654

Nguyễn Thái Ngọc Duy (7):
  diff-no-index: do not take a redundant prefix argument
  diff.c: take "prefix" argument in diff_opt_parse()
  diff: add --rename-file
  log: add --rename-notes to correct renames per commit
  merge: add --rename-file
  diffcore-rename: allow to say "rename this blob to that blob"
  merge: add --rename-notes

 Documentation/diff-options.txt   |   7 +++
 Documentation/pretty-options.txt |   5 ++
 builtin/am.c                     |   2 +-
 builtin/diff.c                   |   2 +-
 builtin/merge.c                  | 123 +++++++++++++++++++++++++++++++++++++++
 diff-no-index.c                  |   7 ++-
 diff.c                           |  13 ++++-
 diff.h                           |   5 +-
 diffcore-rename.c                | 104 ++++++++++++++++++++++++++++++++-
 log-tree.c                       |  32 ++++++++++
 merge-recursive.c                |   1 +
 merge-recursive.h                |   1 +
 revision.c                       |  12 +++-
 revision.h                       |   1 +
 t/t4001-diff-rename.sh           |  82 ++++++++++++++++++++++++++
 15 files changed, 386 insertions(+), 11 deletions(-)

-- 
2.7.0.125.g9eec362

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/7] diff-no-index: do not take a redundant prefix argument
  2016-01-20 11:06 [PATCH 0/7] Diff rename, manual correction, round 2 Nguyễn Thái Ngọc Duy
@ 2016-01-20 11:06 ` Nguyễn Thái Ngọc Duy
  2016-01-20 11:06 ` [PATCH 2/7] diff.c: take "prefix" argument in diff_opt_parse() Nguyễn Thái Ngọc Duy
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-01-20 11:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

Prefix is already set up in "revs". The same prefix should be used for
all options parsing. So kill the last argument. This patch does not
actually change anything because the only caller does use the same
prefix for init_revisions() and diff_no_index().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/diff.c  | 2 +-
 diff-no-index.c | 4 ++--
 diff.h          | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/diff.c b/builtin/diff.c
index ed0acca..52c98a9 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -341,7 +341,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 	}
 	if (no_index)
 		/* If this is a no-index diff, just run it and exit there. */
-		diff_no_index(&rev, argc, argv, prefix);
+		diff_no_index(&rev, argc, argv);
 
 	/* Otherwise, we are doing the usual "git" diff */
 	rev.diffopt.skip_stat_unmatch = !!diff_auto_refresh_index;
diff --git a/diff-no-index.c b/diff-no-index.c
index 8e0fd27..491e842 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -237,12 +237,12 @@ static void fixup_paths(const char **path, struct strbuf *replacement)
 }
 
 void diff_no_index(struct rev_info *revs,
-		   int argc, const char **argv,
-		   const char *prefix)
+		   int argc, const char **argv)
 {
 	int i, prefixlen;
 	const char *paths[2];
 	struct strbuf replacement = STRBUF_INIT;
+	const char *prefix = revs->prefix;
 
 	diff_setup(&revs->diffopt);
 	for (i = 1; i < argc - 2; ) {
diff --git a/diff.h b/diff.h
index f7208ad..f61ee54 100644
--- a/diff.h
+++ b/diff.h
@@ -345,7 +345,7 @@ extern int diff_flush_patch_id(struct diff_options *, unsigned char *);
 
 extern int diff_result_code(struct diff_options *, int);
 
-extern void diff_no_index(struct rev_info *, int, const char **, const char *);
+extern void diff_no_index(struct rev_info *, int, const char **);
 
 extern int index_differs_from(const char *def, int diff_flags);
 
-- 
2.7.0.125.g9eec362

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/7] diff.c: take "prefix" argument in diff_opt_parse()
  2016-01-20 11:06 [PATCH 0/7] Diff rename, manual correction, round 2 Nguyễn Thái Ngọc Duy
  2016-01-20 11:06 ` [PATCH 1/7] diff-no-index: do not take a redundant prefix argument Nguyễn Thái Ngọc Duy
@ 2016-01-20 11:06 ` Nguyễn Thái Ngọc Duy
  2016-01-20 20:23   ` Junio C Hamano
  2016-01-20 11:06 ` [PATCH 3/7] diff: add --rename-file Nguyễn Thái Ngọc Duy
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-01-20 11:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

This will be important later when diff_opt_parse() accepts paths as
arguments. Paths must be prefixed before access because setup code
moves cwd but does not (and cannot) update command line options.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/am.c    | 2 +-
 diff-no-index.c | 3 ++-
 diff.c          | 3 ++-
 diff.h          | 2 +-
 revision.c      | 2 +-
 5 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 9fb42fd..f009b6c 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1657,7 +1657,7 @@ static int fall_back_threeway(const struct am_state *state, const char *index_pa
 
 		init_revisions(&rev_info, NULL);
 		rev_info.diffopt.output_format = DIFF_FORMAT_NAME_STATUS;
-		diff_opt_parse(&rev_info.diffopt, &diff_filter_str, 1);
+		diff_opt_parse(&rev_info.diffopt, &diff_filter_str, 1, rev_info.prefix);
 		add_pending_sha1(&rev_info, "HEAD", our_tree, 0);
 		diff_setup_done(&rev_info.diffopt);
 		run_diff_index(&rev_info, 1);
diff --git a/diff-no-index.c b/diff-no-index.c
index 491e842..03daadb 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -252,7 +252,8 @@ void diff_no_index(struct rev_info *revs,
 		else if (!strcmp(argv[i], "--"))
 			i++;
 		else {
-			j = diff_opt_parse(&revs->diffopt, argv + i, argc - i);
+			j = diff_opt_parse(&revs->diffopt, argv + i, argc - i,
+					   revs->prefix);
 			if (j <= 0)
 				die("invalid diff option/value: %s", argv[i]);
 			i += j;
diff --git a/diff.c b/diff.c
index 80eb0c2..8d38fe8 100644
--- a/diff.c
+++ b/diff.c
@@ -3693,7 +3693,8 @@ static int parse_ws_error_highlight(struct diff_options *opt, const char *arg)
 	return 1;
 }
 
-int diff_opt_parse(struct diff_options *options, const char **av, int ac)
+int diff_opt_parse(struct diff_options *options,
+		   const char **av, int ac, const char *prefix)
 {
 	const char *arg = av[0];
 	const char *optarg;
diff --git a/diff.h b/diff.h
index f61ee54..76b5536 100644
--- a/diff.h
+++ b/diff.h
@@ -268,7 +268,7 @@ extern int parse_long_opt(const char *opt, const char **argv,
 extern int git_diff_basic_config(const char *var, const char *value, void *cb);
 extern int git_diff_ui_config(const char *var, const char *value, void *cb);
 extern void diff_setup(struct diff_options *);
-extern int diff_opt_parse(struct diff_options *, const char **, int);
+extern int diff_opt_parse(struct diff_options *, const char **, int, const char *);
 extern void diff_setup_done(struct diff_options *);
 
 #define DIFF_DETECT_RENAME	1
diff --git a/revision.c b/revision.c
index 0a282f5..14daefb 100644
--- a/revision.c
+++ b/revision.c
@@ -2049,7 +2049,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--ignore-missing")) {
 		revs->ignore_missing = 1;
 	} else {
-		int opts = diff_opt_parse(&revs->diffopt, argv, argc);
+		int opts = diff_opt_parse(&revs->diffopt, argv, argc, revs->prefix);
 		if (!opts)
 			unkv[(*unkc)++] = arg;
 		return opts;
-- 
2.7.0.125.g9eec362

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/7] diff: add --rename-file
  2016-01-20 11:06 [PATCH 0/7] Diff rename, manual correction, round 2 Nguyễn Thái Ngọc Duy
  2016-01-20 11:06 ` [PATCH 1/7] diff-no-index: do not take a redundant prefix argument Nguyễn Thái Ngọc Duy
  2016-01-20 11:06 ` [PATCH 2/7] diff.c: take "prefix" argument in diff_opt_parse() Nguyễn Thái Ngọc Duy
@ 2016-01-20 11:06 ` Nguyễn Thái Ngọc Duy
  2016-01-20 22:44   ` Junio C Hamano
  2016-01-20 22:47   ` Junio C Hamano
  2016-01-20 11:06 ` [PATCH 4/7] log: add --rename-notes to correct renames per commit Nguyễn Thái Ngọc Duy
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-01-20 11:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

Git's heuristics to detect renames or copies works most of the time.
This option can be used to correct the result when it goes wrong.
Matching pairs get max rename score and override even exact rename
detection.

Note that --rename-file does not try to break existing diff pairs. So
if you have "abc => def" in your file, but they are already paired up
(e.g. "abc => abc" and "def => def") and not broken down by -B, then
nothing happens.

An assumption is made in this patch, that the rename file only contains
a couple rename pairs, not thousands of them. Looping through all
rename source and destination for each rename line will not affect
performance and we can keep the code simple.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/diff-options.txt |  7 +++++
 diff.c                         | 10 +++++++
 diff.h                         |  1 +
 diffcore-rename.c              | 64 ++++++++++++++++++++++++++++++++++++++++--
 t/t4001-diff-rename.sh         | 33 ++++++++++++++++++++++
 5 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 306b7e3..7ae04a0 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -380,6 +380,13 @@ endif::git-log[]
 	projects, so use it with caution.  Giving more than one
 	`-C` option has the same effect.
 
+--rename-file=<path>::
+	The given file contains explicit rename pairs that override
+	automatic detected renames. Each line contains a rename pair
+	in the following format:
++
+<source path> <space> "=>" <space> <destination path>
+
 -D::
 --irreversible-delete::
 	Omit the preimage for deletes, i.e. print only the header but not
diff --git a/diff.c b/diff.c
index 8d38fe8..36cf08b 100644
--- a/diff.c
+++ b/diff.c
@@ -3773,6 +3773,16 @@ int diff_opt_parse(struct diff_options *options,
 		DIFF_OPT_SET(options, RENAME_EMPTY);
 	else if (!strcmp(arg, "--no-rename-empty"))
 		DIFF_OPT_CLR(options, RENAME_EMPTY);
+	else if (skip_prefix(arg, "--rename-file=", &arg)) {
+		struct strbuf sb = STRBUF_INIT;
+		const char *path = arg;
+
+		if (prefix)
+			path = prefix_filename(prefix, strlen(prefix), path);
+		if (strbuf_read_file(&sb, path, 0) == -1)
+			die(_("unable to read %s"), path);
+		options->manual_renames = strbuf_detach(&sb, NULL); /* leak */
+	}
 	else if (!strcmp(arg, "--relative"))
 		DIFF_OPT_SET(options, RELATIVE_NAME);
 	else if (skip_prefix(arg, "--relative=", &arg)) {
diff --git a/diff.h b/diff.h
index 76b5536..37179ba 100644
--- a/diff.h
+++ b/diff.h
@@ -176,6 +176,7 @@ struct diff_options {
 	diff_prefix_fn_t output_prefix;
 	int output_prefix_length;
 	void *output_prefix_data;
+	const char *manual_renames;
 
 	int diff_path_counter;
 };
diff --git a/diffcore-rename.c b/diffcore-rename.c
index af1fe08..79beec8 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -346,8 +346,11 @@ static int find_exact_renames(struct diff_options *options)
 		insert_file_table(&file_table, i, rename_src[i].p->one);
 
 	/* Walk the destinations and find best source match */
-	for (i = 0; i < rename_dst_nr; i++)
+	for (i = 0; i < rename_dst_nr; i++) {
+		if (rename_dst[i].pair)
+			continue; /* dealt with exact match already. */
 		renames += find_identical_files(&file_table, i, options);
+	}
 
 	/* Free the hash data structure and entries */
 	hashmap_free(&file_table, 1);
@@ -355,6 +358,61 @@ static int find_exact_renames(struct diff_options *options)
 	return renames;
 }
 
+static int manual_rename(const char *src, int srclen,
+			 const char *dst, int dstlen)
+{
+	int src_index, dst_index;
+
+	for (src_index = 0; src_index < rename_src_nr; src_index++) {
+		const char *path = rename_src[src_index].p->one->path;
+		if (strlen(path) == srclen && !strncmp(path, src, srclen))
+			break;
+	}
+	if (src_index == rename_src_nr)
+		return 0;
+
+	for (dst_index = 0; dst_index < rename_dst_nr; dst_index++) {
+		const char *path = rename_dst[dst_index].two->path;
+		if (strlen(path) == dstlen && !strncmp(path, dst, dstlen))
+			break;
+	}
+	if (dst_index == rename_dst_nr)
+		return 0;
+
+	record_rename_pair(dst_index, src_index, MAX_SCORE);
+	return 1;
+}
+
+static int find_manual_renames(struct diff_options *options)
+{
+	int renames = 0;
+	const char *p, *end;
+
+	if (!options->manual_renames)
+		return 0;
+
+	p = options->manual_renames;
+	end = p + strlen(p);
+	while (p < end) {
+		const char *line_end = strchr(p, '\n');
+		const char *arrow = strstr(p, " => ");
+		const char *src = p, *dst;
+
+		if (!line_end)
+			line_end = end;
+		p = line_end + 1;
+
+		if (!arrow || arrow >= line_end)
+			continue;
+
+		dst = arrow + strlen(" => ");
+		renames += manual_rename(src, arrow - src,
+					 dst, line_end - dst);
+	}
+
+	return renames;
+}
+
 #define NUM_CANDIDATE_PER_DST 4
 static void record_if_better(struct diff_score m[], struct diff_score *o)
 {
@@ -500,11 +558,13 @@ void diffcore_rename(struct diff_options *options)
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
 		goto cleanup; /* nothing to do */
 
+	rename_count = find_manual_renames(options);
+
 	/*
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
-	rename_count = find_exact_renames(options);
+	rename_count += find_exact_renames(options);
 
 	/* Did we only want exact renames? */
 	if (minimum_score == MAX_SCORE)
diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh
index 2f327b7..ab9a666 100755
--- a/t/t4001-diff-rename.sh
+++ b/t/t4001-diff-rename.sh
@@ -156,4 +156,37 @@ test_expect_success 'rename pretty print common prefix and suffix overlap' '
 	test_i18ngrep " d/f/{ => f}/e " output
 '
 
+test_expect_success 'manual rename correction' '
+	test_create_repo correct-rename &&
+	(
+		cd correct-rename &&
+		echo one > old-one &&
+		echo two > old-two &&
+		git add old-one old-two &&
+		git commit -m old &&
+		git rm old-one old-two &&
+		echo one > new-one &&
+		echo two > new-two &&
+		git add new-one new-two &&
+		git commit -m new &&
+		git diff -M --summary HEAD^ | grep rename >actual &&
+		cat >expected <<-\EOF &&
+		 rename old-one => new-one (100%)
+		 rename old-two => new-two (100%)
+		EOF
+		test_cmp expected actual &&
+
+		cat >correction <<-\EOF &&
+		old-one => new-two
+		old-two => new-one
+		EOF
+		git diff -M --rename-file=correction --summary HEAD^ | grep rename >actual &&
+		cat >expected <<-\EOF &&
+		 rename old-two => new-one (100%)
+		 rename old-one => new-two (100%)
+		EOF
+		test_cmp expected actual
+	)
+'
+
 test_done
-- 
2.7.0.125.g9eec362

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/7] log: add --rename-notes to correct renames per commit
  2016-01-20 11:06 [PATCH 0/7] Diff rename, manual correction, round 2 Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2016-01-20 11:06 ` [PATCH 3/7] diff: add --rename-file Nguyễn Thái Ngọc Duy
@ 2016-01-20 11:06 ` Nguyễn Thái Ngọc Duy
  2016-01-20 23:29   ` Junio C Hamano
  2016-01-20 11:06 ` [PATCH 5/7] merge: add --rename-file Nguyễn Thái Ngọc Duy
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-01-20 11:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

For simplicity, the note of commit A implies rename correction between
A^ and A. If parents are manipulated (e.g. "git log --reflog") then
the rename output may surprise people.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/pretty-options.txt |  5 +++++
 log-tree.c                       | 32 ++++++++++++++++++++++++++++++++
 revision.c                       | 10 ++++++++++
 revision.h                       |  1 +
 t/t4001-diff-rename.sh           | 24 ++++++++++++++++++++++++
 5 files changed, 72 insertions(+)

diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-options.txt
index 4b659ac..15a2971 100644
--- a/Documentation/pretty-options.txt
+++ b/Documentation/pretty-options.txt
@@ -75,6 +75,11 @@ being displayed. Examples: "--notes=foo" will show only notes from
 --[no-]standard-notes::
 	These options are deprecated. Use the above --notes/--no-notes
 	options instead.
+
+--rename-notes=<ref>::
+	Get per-commit rename instructions from notes. See option
+	`--rename-file` for more information. If both `--rename-notes`
+	and `--rename-file` are specified, the last one takes effect.
 endif::git-rev-list[]
 
 --show-signature::
diff --git a/log-tree.c b/log-tree.c
index f70a30e..e5766a6 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -788,6 +788,36 @@ static int do_diff_combined(struct rev_info *opt, struct commit *commit)
 	return !opt->loginfo;
 }
 
+static void populate_rename_notes(struct rev_info *opt, const struct object_id *oid)
+{
+	static char *last_note;
+	enum object_type type;
+	unsigned long size;
+	const unsigned char *sha1;
+
+	if (!opt->rename_notes)
+		return;
+
+	/*
+	 * "--rename-notes=abc --rename-file=def" is specified in this
+	 * order, --rename-file wins.
+	 */
+	if (opt->diffopt.manual_renames != NULL &&
+	    opt->diffopt.manual_renames != last_note)
+		return;
+
+	free(last_note);
+	opt->diffopt.manual_renames = NULL;
+
+	sha1 = get_note(opt->rename_notes, oid->hash);
+	if (!sha1)
+		return;
+
+	last_note = read_sha1_file(sha1, &type, &size);
+	if (type == OBJ_BLOB)
+		opt->diffopt.manual_renames = last_note;
+}
+
 /*
  * Show the diff of a commit.
  *
@@ -805,6 +835,8 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log
 	parse_commit_or_die(commit);
 	oid = &commit->tree->object.oid;
 
+	populate_rename_notes(opt, &commit->object.oid);
+
 	/* Root commit? */
 	parents = get_saved_parents(opt, commit);
 	if (!parents) {
diff --git a/revision.c b/revision.c
index 14daefb..20346c1 100644
--- a/revision.c
+++ b/revision.c
@@ -1958,6 +1958,16 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->notes_opt.use_default_notes = 1;
 	} else if (!strcmp(arg, "--no-standard-notes")) {
 		revs->notes_opt.use_default_notes = 0;
+	} else if (skip_prefix(arg, "--rename-notes=", &optarg)) {
+		struct strbuf buf = STRBUF_INIT;
+		struct notes_tree *nt;
+
+		strbuf_addstr(&buf, optarg);
+		expand_notes_ref(&buf);
+		revs->rename_notes = nt = xcalloc(1, sizeof(*nt));
+		init_notes(nt, buf.buf, NULL, 0);
+		strbuf_release(&buf);
+		revs->diffopt.manual_renames = NULL;
 	} else if (!strcmp(arg, "--oneline")) {
 		revs->verbose_header = 1;
 		get_commit_format("oneline", revs);
diff --git a/revision.h b/revision.h
index 23857c0..db2f225 100644
--- a/revision.h
+++ b/revision.h
@@ -189,6 +189,7 @@ struct rev_info {
 	/* diff info for patches and for paths limiting */
 	struct diff_options diffopt;
 	struct diff_options pruning;
+	struct notes_tree *rename_notes;
 
 	struct reflog_walk_info *reflog_info;
 	struct decoration children;
diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh
index ab9a666..21d9378 100755
--- a/t/t4001-diff-rename.sh
+++ b/t/t4001-diff-rename.sh
@@ -189,4 +189,28 @@ test_expect_success 'manual rename correction' '
 	)
 '
 
+test_expect_success 'rename correction from notes' '
+	(
+		cd correct-rename &&
+		git show --summary -M HEAD | grep rename >actual &&
+		cat >expected <<-\EOF &&
+		 rename old-one => new-one (100%)
+		 rename old-two => new-two (100%)
+		EOF
+		test_cmp expected actual &&
+
+		cat >correction <<-\EOF &&
+		old-one => new-two
+		old-two => new-one
+		EOF
+		git notes --ref=rename add -F correction HEAD &&
+		git show --summary -M --rename-notes=rename HEAD | grep rename >actual &&
+		cat >expected <<-\EOF &&
+		 rename old-two => new-one (100%)
+		 rename old-one => new-two (100%)
+		EOF
+		test_cmp expected actual
+	)
+'
+
 test_done
-- 
2.7.0.125.g9eec362

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/7] merge: add --rename-file
  2016-01-20 11:06 [PATCH 0/7] Diff rename, manual correction, round 2 Nguyễn Thái Ngọc Duy
                   ` (3 preceding siblings ...)
  2016-01-20 11:06 ` [PATCH 4/7] log: add --rename-notes to correct renames per commit Nguyễn Thái Ngọc Duy
@ 2016-01-20 11:06 ` Nguyễn Thái Ngọc Duy
  2016-01-20 11:06 ` [PATCH 6/7] diffcore-rename: allow to say "rename this blob to that blob" Nguyễn Thái Ngọc Duy
  2016-01-20 11:06 ` [PATCH 7/7] merge: add --rename-notes Nguyễn Thái Ngọc Duy
  6 siblings, 0 replies; 20+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-01-20 11:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

For one-time amateurs, --rename-file could be used to specify what path
should be renamed to what path between heads and the merge base, when
they hit big merge conflicts because diff fails to recognize renames
correctly.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/merge.c   | 9 +++++++++
 merge-recursive.c | 1 +
 merge-recursive.h | 1 +
 3 files changed, 11 insertions(+)

diff --git a/builtin/merge.c b/builtin/merge.c
index 15bf95b..95a6c26 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -67,6 +67,8 @@ static int abort_current_merge;
 static int show_progress = -1;
 static int default_to_upstream = 1;
 static const char *sign_commit;
+static const char *rename_file;
+static struct strbuf manual_renames = STRBUF_INIT;
 
 static struct strategy all_strategy[] = {
 	{ "recursive",  DEFAULT_TWOHEAD | NO_TRIVIAL },
@@ -225,6 +227,7 @@ static struct option builtin_merge_options[] = {
 	{ OPTION_STRING, 'S', "gpg-sign", &sign_commit, N_("key-id"),
 	  N_("GPG sign commit"), PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
 	OPT_BOOL(0, "overwrite-ignore", &overwrite_ignore, N_("update ignored files (default)")),
+	OPT_FILENAME(0, "rename-file", &rename_file, N_("--rename-file to diff")),
 	OPT_END()
 };
 
@@ -664,6 +667,8 @@ static int try_merge_strategy(const char *strategy, struct commit_list *common,
 		o.renormalize = option_renormalize;
 		o.show_rename_progress =
 			show_progress == -1 ? isatty(2) : show_progress;
+		if (manual_renames.len)
+			o.manual_renames = manual_renames.buf;
 
 		for (x = 0; x < xopts_nr; x++)
 			if (parse_merge_opt(&o, xopts[x]))
@@ -1255,6 +1260,10 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
 		usage_with_options(builtin_merge_usage,
 			builtin_merge_options);
 
+	if (rename_file &&
+	    strbuf_read_file(&manual_renames, rename_file, 0) == -1)
+		die(_("unable to read %s"), rename_file);
+
 	if (!head_commit) {
 		struct commit *remote_head;
 		/*
diff --git a/merge-recursive.c b/merge-recursive.c
index 8eabde2..ec7e044 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -492,6 +492,7 @@ static struct string_list *get_renames(struct merge_options *o,
 	opts.rename_score = o->rename_score;
 	opts.show_rename_progress = o->show_rename_progress;
 	opts.output_format = DIFF_FORMAT_NO_OUTPUT;
+	opts.manual_renames = o->manual_renames;
 	diff_setup_done(&opts);
 	diff_tree_sha1(o_tree->object.oid.hash, tree->object.oid.hash, "", &opts);
 	diffcore_std(&opts);
diff --git a/merge-recursive.h b/merge-recursive.h
index 9e090a3..898b169 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -27,6 +27,7 @@ struct merge_options {
 	struct string_list current_file_set;
 	struct string_list current_directory_set;
 	struct string_list df_conflict_file_set;
+	const char *manual_renames;
 };
 
 /* merge_trees() but with recursive ancestor consolidation */
-- 
2.7.0.125.g9eec362

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/7] diffcore-rename: allow to say "rename this blob to that blob"
  2016-01-20 11:06 [PATCH 0/7] Diff rename, manual correction, round 2 Nguyễn Thái Ngọc Duy
                   ` (4 preceding siblings ...)
  2016-01-20 11:06 ` [PATCH 5/7] merge: add --rename-file Nguyễn Thái Ngọc Duy
@ 2016-01-20 11:06 ` Nguyễn Thái Ngọc Duy
  2016-01-20 11:06 ` [PATCH 7/7] merge: add --rename-notes Nguyễn Thái Ngọc Duy
  6 siblings, 0 replies; 20+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-01-20 11:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

This will be needed for git-merge, where we do a diff between two
trees far away. We need to translate paths to blobs so that merge can
stll use it.

This patch has another potential use: to reduce processing at rename
detection. We can cache the rename output in a big file with these "blob
to blob" lines. The next diff will be fast because we only need to
compare SHA-1 instead of doing deltas. The naive find_manual_renames()
will have to be rewritten to dealwith large cache file though.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diffcore-rename.c      | 40 ++++++++++++++++++++++++++++++++++++++++
 t/t4001-diff-rename.sh | 25 +++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 79beec8..a06c06a 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -358,6 +358,38 @@ static int find_exact_renames(struct diff_options *options)
 	return renames;
 }
 
+static int manual_rename_blob(const char *src, int srclen,
+			      const char *dst, int dstlen)
+{
+	struct object_id oid;
+	int src_index, dst_index;
+
+	if (srclen != GIT_SHA1_HEXSZ || get_oid_hex(src, &oid))
+		return 0;
+
+	for (src_index = 0; src_index < rename_src_nr; src_index++) {
+		const unsigned char *sha1 = rename_src[src_index].p->one->sha1;
+		if (hashcmp(oid.hash, sha1))
+			break;
+	}
+	if (src_index == rename_src_nr)
+		return 0;
+
+	if (dstlen != GIT_SHA1_HEXSZ || get_oid_hex(dst, &oid))
+		return 0;
+
+	for (dst_index = 0; dst_index < rename_dst_nr; dst_index++) {
+		const unsigned char *sha1 = rename_dst[dst_index].two->sha1;
+		if (hashcmp(oid.hash, sha1))
+			break;
+	}
+	if (dst_index == rename_dst_nr)
+		return 0;
+
+	record_rename_pair(dst_index, src_index, MAX_SCORE);
+	return 1;
+}
+
 static int manual_rename(const char *src, int srclen,
 			 const char *dst, int dstlen)
 {
@@ -406,6 +438,14 @@ static int find_manual_renames(struct diff_options *options)
 			continue;
 
 		dst = arrow + strlen(" => ");
+
+		if (skip_prefix(src, "blob ", &src)) {
+			renames +=
+				manual_rename_blob(src, arrow - src,
+						   dst, line_end - dst);
+			continue;
+		}
+
 		renames += manual_rename(src, arrow - src,
 					 dst, line_end - dst);
 	}
diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh
index 21d9378..4d5c667 100755
--- a/t/t4001-diff-rename.sh
+++ b/t/t4001-diff-rename.sh
@@ -189,6 +189,31 @@ test_expect_success 'manual rename correction' '
 	)
 '
 
+test_expect_success 'manual rename correction with blobs' '
+	(
+		cd correct-rename &&
+		git diff -M --summary HEAD^ | grep rename >actual &&
+		cat >expected <<-\EOF &&
+		 rename old-one => new-one (100%)
+		 rename old-two => new-two (100%)
+		EOF
+		test_cmp expected actual &&
+
+		ONE=`echo one | git hash-object --stdin` &&
+		TWO=`echo two | git hash-object --stdin` &&
+		cat >correction <<-EOF &&
+		blob $ONE => $TWO
+		blob $TWO => $ONE
+		EOF
+		git diff -M --rename-file=correction --summary HEAD^ | grep rename >actual &&
+		cat >expected <<-\EOF &&
+		 rename old-two => new-one (100%)
+		 rename old-one => new-two (100%)
+		EOF
+		test_cmp expected actual
+	)
+'
+
 test_expect_success 'rename correction from notes' '
 	(
 		cd correct-rename &&
-- 
2.7.0.125.g9eec362

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/7] merge: add --rename-notes
  2016-01-20 11:06 [PATCH 0/7] Diff rename, manual correction, round 2 Nguyễn Thái Ngọc Duy
                   ` (5 preceding siblings ...)
  2016-01-20 11:06 ` [PATCH 6/7] diffcore-rename: allow to say "rename this blob to that blob" Nguyễn Thái Ngọc Duy
@ 2016-01-20 11:06 ` Nguyễn Thái Ngọc Duy
  2016-01-21 17:53   ` Junio C Hamano
  6 siblings, 1 reply; 20+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-01-20 11:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

For merge professionals, --rename-file is as bad a nightmare as
resolving conflicts because they may have to create a rename file for
every merge. --rename-notes takes advantage of rename notes to avoid that.

Because rename notes are between commit A and A^, not between A and its
merge base, we need to convert "rename path A to path B" to "rename blob
A' to blob B'" and hope that mere base still has that blob A'.

Merging notes this way is expensive. So it's cached in $GIT_DIR/rename-cache.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/merge.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 114 insertions(+)

diff --git a/builtin/merge.c b/builtin/merge.c
index 95a6c26..6ad2e52 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -30,6 +30,7 @@
 #include "fmt-merge-msg.h"
 #include "gpg-interface.h"
 #include "sequencer.h"
+#include "notes.h"
 
 #define DEFAULT_TWOHEAD (1<<0)
 #define DEFAULT_OCTOPUS (1<<1)
@@ -69,6 +70,7 @@ static int default_to_upstream = 1;
 static const char *sign_commit;
 static const char *rename_file;
 static struct strbuf manual_renames = STRBUF_INIT;
+static const char *rename_note_ref;
 
 static struct strategy all_strategy[] = {
 	{ "recursive",  DEFAULT_TWOHEAD | NO_TRIVIAL },
@@ -228,6 +230,7 @@ static struct option builtin_merge_options[] = {
 	  N_("GPG sign commit"), PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
 	OPT_BOOL(0, "overwrite-ignore", &overwrite_ignore, N_("update ignored files (default)")),
 	OPT_FILENAME(0, "rename-file", &rename_file, N_("--rename-file to diff")),
+	OPT_STRING(0, "rename-notes", &rename_note_ref, N_("note-ref"), N_("--rename-notes to diff")),
 	OPT_END()
 };
 
@@ -1161,6 +1164,102 @@ static struct commit_list *collect_parents(struct commit *head_commit,
 	return remoteheads;
 }
 
+static int merge_rename_note(const unsigned char *object_hash,
+			     const unsigned char *note_hash,
+			     char *note_path,
+			     void *cb_data)
+{
+	struct strbuf *cache = cb_data;
+	struct strbuf sb = STRBUF_INIT;
+	const char *p, *end;
+	enum object_type type;
+	unsigned long size;
+	char *note;
+
+	note = read_sha1_file(note_hash, &type, &size);
+	if (type != OBJ_BLOB) {
+		free(note);
+		return 0;
+	}
+
+	p = note;
+	end = p + strlen(p);
+	while (p < end) {
+		const char *line_end = strchr(p, '\n');
+		const char *arrow = strstr(p, " => ");
+		const char *src = p, *dst;
+		struct object_id src_oid;
+		struct object_id dst_oid;
+
+		if (!line_end)
+			line_end = end;
+		p = line_end + 1;
+
+		if (!arrow || arrow >= line_end)
+			continue;
+
+		if (starts_with(src, "blob ") && src + 4 < arrow) {
+			strbuf_addf(cache, "%.*s\n",
+				    (int)(line_end - src), src);
+			continue;
+		}
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s:%.*s", sha1_to_hex(object_hash),
+			    (int)(arrow - src), src);
+		if (get_sha1(sb.buf, src_oid.hash))
+			continue;
+
+		dst = arrow + strlen(" => ");
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s:%.*s", sha1_to_hex(object_hash),
+			    (int)(line_end - dst), dst);
+		if (get_sha1(sb.buf, dst_oid.hash))
+			continue;
+
+		strbuf_addf(cache, "blob %s => %s\n",
+			    oid_to_hex(&src_oid),
+			    oid_to_hex(&dst_oid));
+	}
+
+	return 0;
+}
+
+/*
+ * Traverse through the given notes tree, convert all "path to path"
+ * rename lines into "blob to blob" and return it. If cache_file is
+ * non-NULL, return it's content if still valid. Otherwise save the
+ * new content in it.
+ */
+static void merge_rename_notes(struct strbuf *cache,
+			       struct notes_tree *t, const char *cache_file)
+{
+	struct object_id notes_oid;
+
+	if (cache_file) {
+		struct object_id cache_oid;
+
+		strbuf_reset(cache);
+		if (!resolve_ref_unsafe(t->ref, RESOLVE_REF_READING,
+					notes_oid.hash, NULL))
+			return;
+
+		if (strbuf_read_file(cache, cache_file, 0) > GIT_SHA1_HEXSZ + 1 &&
+		    cache->buf[GIT_SHA1_HEXSZ] == '\n' &&
+		    !get_oid_hex(cache->buf, &cache_oid) &&
+		    !oidcmp(&notes_oid, &cache_oid)) {
+			strbuf_remove(cache, 0, GIT_SHA1_HEXSZ + 1);
+			return;
+		}
+	}
+
+	strbuf_reset(cache);
+	for_each_note(t, 0, merge_rename_note, cache);
+	if (cache_file && cache->len)
+		write_file(cache_file, "%s\n%s",
+			   oid_to_hex(&notes_oid), cache->buf);
+}
+
 int cmd_merge(int argc, const char **argv, const char *prefix)
 {
 	unsigned char result_tree[20];
@@ -1260,10 +1359,25 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
 		usage_with_options(builtin_merge_usage,
 			builtin_merge_options);
 
+	if (rename_file && rename_note_ref)
+		die(_("--rename-file and --rename-notes are incompatible"));
+
 	if (rename_file &&
 	    strbuf_read_file(&manual_renames, rename_file, 0) == -1)
 		die(_("unable to read %s"), rename_file);
 
+	if (rename_note_ref) {
+		struct notes_tree rename_notes;
+		struct strbuf ref = STRBUF_INIT;
+
+		strbuf_addstr(&ref, rename_note_ref);
+		expand_notes_ref(&ref);
+		init_notes(&rename_notes, ref.buf, NULL, 0);
+		strbuf_release(&ref);
+		merge_rename_notes(&manual_renames, &rename_notes,
+				   git_path("GIT_RENAME_CACHE"));
+	}
+
 	if (!head_commit) {
 		struct commit *remote_head;
 		/*
-- 
2.7.0.125.g9eec362

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/7] diff.c: take "prefix" argument in diff_opt_parse()
  2016-01-20 11:06 ` [PATCH 2/7] diff.c: take "prefix" argument in diff_opt_parse() Nguyễn Thái Ngọc Duy
@ 2016-01-20 20:23   ` Junio C Hamano
  2016-01-20 20:29     ` Jeff King
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2016-01-20 20:23 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This will be important later when diff_opt_parse() accepts paths as
> arguments. Paths must be prefixed before access because setup code
> moves cwd but does not (and cannot) update command line options.

The above sounds like a sensible thing to do (note: I didn't read
the patch or remainder of the series), but makes me wonder how the
existing --orderfile option works without this support.  Perhaps it
is not working and needs to be updated to take advantage of this
change, too?

>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  builtin/am.c    | 2 +-
>  diff-no-index.c | 3 ++-
>  diff.c          | 3 ++-
>  diff.h          | 2 +-
>  revision.c      | 2 +-
>  5 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/am.c b/builtin/am.c
> index 9fb42fd..f009b6c 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -1657,7 +1657,7 @@ static int fall_back_threeway(const struct am_state *state, const char *index_pa
>  
>  		init_revisions(&rev_info, NULL);
>  		rev_info.diffopt.output_format = DIFF_FORMAT_NAME_STATUS;
> -		diff_opt_parse(&rev_info.diffopt, &diff_filter_str, 1);
> +		diff_opt_parse(&rev_info.diffopt, &diff_filter_str, 1, rev_info.prefix);
>  		add_pending_sha1(&rev_info, "HEAD", our_tree, 0);
>  		diff_setup_done(&rev_info.diffopt);
>  		run_diff_index(&rev_info, 1);
> diff --git a/diff-no-index.c b/diff-no-index.c
> index 491e842..03daadb 100644
> --- a/diff-no-index.c
> +++ b/diff-no-index.c
> @@ -252,7 +252,8 @@ void diff_no_index(struct rev_info *revs,
>  		else if (!strcmp(argv[i], "--"))
>  			i++;
>  		else {
> -			j = diff_opt_parse(&revs->diffopt, argv + i, argc - i);
> +			j = diff_opt_parse(&revs->diffopt, argv + i, argc - i,
> +					   revs->prefix);
>  			if (j <= 0)
>  				die("invalid diff option/value: %s", argv[i]);
>  			i += j;
> diff --git a/diff.c b/diff.c
> index 80eb0c2..8d38fe8 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -3693,7 +3693,8 @@ static int parse_ws_error_highlight(struct diff_options *opt, const char *arg)
>  	return 1;
>  }
>  
> -int diff_opt_parse(struct diff_options *options, const char **av, int ac)
> +int diff_opt_parse(struct diff_options *options,
> +		   const char **av, int ac, const char *prefix)
>  {
>  	const char *arg = av[0];
>  	const char *optarg;
> diff --git a/diff.h b/diff.h
> index f61ee54..76b5536 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -268,7 +268,7 @@ extern int parse_long_opt(const char *opt, const char **argv,
>  extern int git_diff_basic_config(const char *var, const char *value, void *cb);
>  extern int git_diff_ui_config(const char *var, const char *value, void *cb);
>  extern void diff_setup(struct diff_options *);
> -extern int diff_opt_parse(struct diff_options *, const char **, int);
> +extern int diff_opt_parse(struct diff_options *, const char **, int, const char *);
>  extern void diff_setup_done(struct diff_options *);
>  
>  #define DIFF_DETECT_RENAME	1
> diff --git a/revision.c b/revision.c
> index 0a282f5..14daefb 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -2049,7 +2049,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
>  	} else if (!strcmp(arg, "--ignore-missing")) {
>  		revs->ignore_missing = 1;
>  	} else {
> -		int opts = diff_opt_parse(&revs->diffopt, argv, argc);
> +		int opts = diff_opt_parse(&revs->diffopt, argv, argc, revs->prefix);
>  		if (!opts)
>  			unkv[(*unkc)++] = arg;
>  		return opts;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/7] diff.c: take "prefix" argument in diff_opt_parse()
  2016-01-20 20:23   ` Junio C Hamano
@ 2016-01-20 20:29     ` Jeff King
  2016-01-20 21:49       ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2016-01-20 20:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, git

On Wed, Jan 20, 2016 at 12:23:38PM -0800, Junio C Hamano wrote:

> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
> 
> > This will be important later when diff_opt_parse() accepts paths as
> > arguments. Paths must be prefixed before access because setup code
> > moves cwd but does not (and cannot) update command line options.
> 
> The above sounds like a sensible thing to do (note: I didn't read
> the patch or remainder of the series), but makes me wonder how the
> existing --orderfile option works without this support.  Perhaps it
> is not working and needs to be updated to take advantage of this
> change, too?

Yeah, I think it simply does not work.

  $ >main-order
  $ mkdir subdir && >subdir/sub-order
  $ cd subdir
  $ git show -Osub-order
  fatal: failed to read orderfile 'sub-order': No such file or directory
  $ git show -Omain-order
  [shows diff]

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/7] diff.c: take "prefix" argument in diff_opt_parse()
  2016-01-20 20:29     ` Jeff King
@ 2016-01-20 21:49       ` Junio C Hamano
  2016-01-21 11:48         ` Duy Nguyen
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2016-01-20 21:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git

Jeff King <peff@peff.net> writes:

> On Wed, Jan 20, 2016 at 12:23:38PM -0800, Junio C Hamano wrote:
>
>> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>> 
>> > This will be important later when diff_opt_parse() accepts paths as
>> > arguments. Paths must be prefixed before access because setup code
>> > moves cwd but does not (and cannot) update command line options.
>> 
>> The above sounds like a sensible thing to do (note: I didn't read
>> the patch or remainder of the series), but makes me wonder how the
>> existing --orderfile option works without this support.  Perhaps it
>> is not working and needs to be updated to take advantage of this
>> change, too?
>
> Yeah, I think it simply does not work.
>
>   $ >main-order
>   $ mkdir subdir && >subdir/sub-order
>   $ cd subdir
>   $ git show -Osub-order
>   fatal: failed to read orderfile 'sub-order': No such file or directory
>   $ git show -Omain-order
>   [shows diff]

Good.

Then 2/7 can be rerolled and advertised as "make -O to work from
subdirectories", and can gradulate regardless of the remainder of
the series.  Even if the rest needs rerolls to get it right (or
takes until 2019 to mature ;-), we will have one less change to
re-review in the process as we can push these early and obviously
correct part out separately.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/7] diff: add --rename-file
  2016-01-20 11:06 ` [PATCH 3/7] diff: add --rename-file Nguyễn Thái Ngọc Duy
@ 2016-01-20 22:44   ` Junio C Hamano
  2016-01-20 22:47   ` Junio C Hamano
  1 sibling, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2016-01-20 22:44 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Git's heuristics to detect renames or copies works most of the time.
> This option can be used to correct the result when it goes wrong.
> Matching pairs get max rename score and override even exact rename
> detection.
>
> Note that --rename-file does not try to break existing diff pairs. So
> if you have "abc => def" in your file, but they are already paired up
> (e.g. "abc => abc" and "def => def") and not broken down by -B, then
> nothing happens.
>
> An assumption is made in this patch, that the rename file only contains
> a couple rename pairs, not thousands of them. Looping through all
> rename source and destination for each rename line will not affect
> performance and we can keep the code simple.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---

> +--rename-file=<path>::
> +	The given file contains explicit rename pairs that override
> +	automatic detected renames. Each line contains a rename pair
> +	in the following format:
> ++
> +<source path> <space> "=>" <space> <destination path>
> +

Obviously this needs path-quoting support before it can get anywhere
near 'next', but I found the basic idea outlined in the change in
diffcore-rename.c in this patch OK.  The manual one comes even
before the "look for identical ones" step, which is logically the
right thing to do.

One small nit is that using MAX_SCORE may have unintended fallouts.
The manually specified one is chosen among rename candidates not
because it has the highest similarity score---it in fact may not
have a good similarity score at all, but we pair the paths because
the user tells us to (and the user knows better than we do).

I am just wondering if it is easy to make the similarity score shown
at the beginning of diff displayed differently.  For manually paired
paths, we can and should keep "rename from OLDPATH" and "rename to
NEWPATH" entries, but I think "similarity index 100%" should ideally
be not shown for them, and if we have to have the line (e.g. perhaps
existing tools expect a line that begins with "similarity index" to
exist), show something like "similarity index N/A" that is clearly
different from "100%".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/7] diff: add --rename-file
  2016-01-20 11:06 ` [PATCH 3/7] diff: add --rename-file Nguyễn Thái Ngọc Duy
  2016-01-20 22:44   ` Junio C Hamano
@ 2016-01-20 22:47   ` Junio C Hamano
  1 sibling, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2016-01-20 22:47 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Git's heuristics to detect renames or copies works most of the time.
> This option can be used to correct the result when it goes wrong.
> Matching pairs get max rename score and override even exact rename
> detection.
>
> Note that --rename-file does not try to break existing diff pairs. So
> if you have "abc => def" in your file, but they are already paired up
> (e.g. "abc => abc" and "def => def") and not broken down by -B, then
> nothing happens.

By the way, how would this interact with "diff -R"?

Do you need to spell --rename-file in reverse before you can run the
diff in reverse?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/7] log: add --rename-notes to correct renames per commit
  2016-01-20 11:06 ` [PATCH 4/7] log: add --rename-notes to correct renames per commit Nguyễn Thái Ngọc Duy
@ 2016-01-20 23:29   ` Junio C Hamano
  2016-01-22  1:00     ` Duy Nguyen
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2016-01-20 23:29 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> For simplicity, the note of commit A implies rename correction between
> A^ and A. If parents are manipulated (e.g. "git log --reflog") then
> the rename output may surprise people.

I do not think "git log --reflog" attempts to show changes to bring
the tree of @{2} into the shape of @{1}, even though for traversal
purposes it pretends as if @{1}'s parent is @{2}.  So I am not sure
what you are trying to say in the above sentence.

A path limited "git log -- path1/ path2/..." also manipulates the
commit->parents for traversal purposes, but I think the patch is
still produced against the true parents (there is a call to
get_saved_parents() in log_tree_diff() that shows the change for a
given commit), and in that context, commit A that has notes about
the change to bring the tree of commit A^ to its tree still makes
sense.

I'd be more worried about "git log -m -p"--the diff against the
second and subsequent parents would not want to use the notes that
describes the change between the first parent and the resulting
merge.


>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Documentation/pretty-options.txt |  5 +++++
>  log-tree.c                       | 32 ++++++++++++++++++++++++++++++++
>  revision.c                       | 10 ++++++++++
>  revision.h                       |  1 +
>  t/t4001-diff-rename.sh           | 24 ++++++++++++++++++++++++
>  5 files changed, 72 insertions(+)
>
> diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-options.txt
> index 4b659ac..15a2971 100644
> --- a/Documentation/pretty-options.txt
> +++ b/Documentation/pretty-options.txt
> @@ -75,6 +75,11 @@ being displayed. Examples: "--notes=foo" will show only notes from
>  --[no-]standard-notes::
>  	These options are deprecated. Use the above --notes/--no-notes
>  	options instead.
> +
> +--rename-notes=<ref>::
> +	Get per-commit rename instructions from notes. See option
> +	`--rename-file` for more information. If both `--rename-notes`
> +	and `--rename-file` are specified, the last one takes effect.
>  endif::git-rev-list[]
>  
>  --show-signature::
> diff --git a/log-tree.c b/log-tree.c
> index f70a30e..e5766a6 100644
> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -788,6 +788,36 @@ static int do_diff_combined(struct rev_info *opt, struct commit *commit)
>  	return !opt->loginfo;
>  }
>  
> +static void populate_rename_notes(struct rev_info *opt, const struct object_id *oid)
> +{
> +	static char *last_note;
> +	enum object_type type;
> +	unsigned long size;
> +	const unsigned char *sha1;
> +
> +	if (!opt->rename_notes)
> +		return;
> +
> +	/*
> +	 * "--rename-notes=abc --rename-file=def" is specified in this
> +	 * order, --rename-file wins.
> +	 */
> +	if (opt->diffopt.manual_renames != NULL &&
> +	    opt->diffopt.manual_renames != last_note)
> +		return;
> +
> +	free(last_note);
> +	opt->diffopt.manual_renames = NULL;
> +
> +	sha1 = get_note(opt->rename_notes, oid->hash);
> +	if (!sha1)
> +		return;
> +
> +	last_note = read_sha1_file(sha1, &type, &size);
> +	if (type == OBJ_BLOB)
> +		opt->diffopt.manual_renames = last_note;
> +}
> +
>  /*
>   * Show the diff of a commit.
>   *
> @@ -805,6 +835,8 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log
>  	parse_commit_or_die(commit);
>  	oid = &commit->tree->object.oid;
>  
> +	populate_rename_notes(opt, &commit->object.oid);
> +
>  	/* Root commit? */
>  	parents = get_saved_parents(opt, commit);
>  	if (!parents) {
> diff --git a/revision.c b/revision.c
> index 14daefb..20346c1 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1958,6 +1958,16 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
>  		revs->notes_opt.use_default_notes = 1;
>  	} else if (!strcmp(arg, "--no-standard-notes")) {
>  		revs->notes_opt.use_default_notes = 0;
> +	} else if (skip_prefix(arg, "--rename-notes=", &optarg)) {
> +		struct strbuf buf = STRBUF_INIT;
> +		struct notes_tree *nt;
> +
> +		strbuf_addstr(&buf, optarg);
> +		expand_notes_ref(&buf);
> +		revs->rename_notes = nt = xcalloc(1, sizeof(*nt));
> +		init_notes(nt, buf.buf, NULL, 0);
> +		strbuf_release(&buf);
> +		revs->diffopt.manual_renames = NULL;
>  	} else if (!strcmp(arg, "--oneline")) {
>  		revs->verbose_header = 1;
>  		get_commit_format("oneline", revs);
> diff --git a/revision.h b/revision.h
> index 23857c0..db2f225 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -189,6 +189,7 @@ struct rev_info {
>  	/* diff info for patches and for paths limiting */
>  	struct diff_options diffopt;
>  	struct diff_options pruning;
> +	struct notes_tree *rename_notes;
>  
>  	struct reflog_walk_info *reflog_info;
>  	struct decoration children;
> diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh
> index ab9a666..21d9378 100755
> --- a/t/t4001-diff-rename.sh
> +++ b/t/t4001-diff-rename.sh
> @@ -189,4 +189,28 @@ test_expect_success 'manual rename correction' '
>  	)
>  '
>  
> +test_expect_success 'rename correction from notes' '
> +	(
> +		cd correct-rename &&
> +		git show --summary -M HEAD | grep rename >actual &&
> +		cat >expected <<-\EOF &&
> +		 rename old-one => new-one (100%)
> +		 rename old-two => new-two (100%)
> +		EOF
> +		test_cmp expected actual &&
> +
> +		cat >correction <<-\EOF &&
> +		old-one => new-two
> +		old-two => new-one
> +		EOF
> +		git notes --ref=rename add -F correction HEAD &&
> +		git show --summary -M --rename-notes=rename HEAD | grep rename >actual &&
> +		cat >expected <<-\EOF &&
> +		 rename old-two => new-one (100%)
> +		 rename old-one => new-two (100%)
> +		EOF
> +		test_cmp expected actual
> +	)
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/7] diff.c: take "prefix" argument in diff_opt_parse()
  2016-01-20 21:49       ` Junio C Hamano
@ 2016-01-21 11:48         ` Duy Nguyen
  2016-01-21 23:01           ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Duy Nguyen @ 2016-01-21 11:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

On Wed, Jan 20, 2016 at 01:49:21PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > On Wed, Jan 20, 2016 at 12:23:38PM -0800, Junio C Hamano wrote:
> >
> >> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
> >> 
> >> > This will be important later when diff_opt_parse() accepts paths as
> >> > arguments. Paths must be prefixed before access because setup code
> >> > moves cwd but does not (and cannot) update command line options.
> >> 
> >> The above sounds like a sensible thing to do (note: I didn't read
> >> the patch or remainder of the series), but makes me wonder how the
> >> existing --orderfile option works without this support.  Perhaps it
> >> is not working and needs to be updated to take advantage of this
> >> change, too?
> >
> > Yeah, I think it simply does not work.
> >
> >   $ >main-order
> >   $ mkdir subdir && >subdir/sub-order
> >   $ cd subdir
> >   $ git show -Osub-order
> >   fatal: failed to read orderfile 'sub-order': No such file or directory
> >   $ git show -Omain-order
> >   [shows diff]
> 
> Good.
> 
> Then 2/7 can be rerolled and advertised as "make -O to work from
> subdirectories", and can gradulate regardless of the remainder of
> the series.  Even if the rest needs rerolls to get it right (or
> takes until 2019 to mature ;-), we will have one less change to
> re-review in the process as we can push these early and obviously
> correct part out separately.
> 

I didn't know there was already an option that takes a path. I read
through the function and found another one. So here's the standalone
patch that fixes both.

-- 8< --
Subject: [PATCH] diff: make -O and --output work in subdirectory

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/am.c          |  2 +-
 diff-no-index.c       |  3 ++-
 diff.c                | 14 ++++++++++----
 diff.h                |  2 +-
 revision.c            |  2 +-
 t/t4056-diff-order.sh |  6 ++++++
 6 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 9fb42fd..f009b6c 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1657,7 +1657,7 @@ static int fall_back_threeway(const struct am_state *state, const char *index_pa
 
 		init_revisions(&rev_info, NULL);
 		rev_info.diffopt.output_format = DIFF_FORMAT_NAME_STATUS;
-		diff_opt_parse(&rev_info.diffopt, &diff_filter_str, 1);
+		diff_opt_parse(&rev_info.diffopt, &diff_filter_str, 1, rev_info.prefix);
 		add_pending_sha1(&rev_info, "HEAD", our_tree, 0);
 		diff_setup_done(&rev_info.diffopt);
 		run_diff_index(&rev_info, 1);
diff --git a/diff-no-index.c b/diff-no-index.c
index 8e0fd27..aa81a15 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -252,7 +252,8 @@ void diff_no_index(struct rev_info *revs,
 		else if (!strcmp(argv[i], "--"))
 			i++;
 		else {
-			j = diff_opt_parse(&revs->diffopt, argv + i, argc - i);
+			j = diff_opt_parse(&revs->diffopt, argv + i, argc - i,
+					   revs->prefix);
 			if (j <= 0)
 				die("invalid diff option/value: %s", argv[i]);
 			i += j;
diff --git a/diff.c b/diff.c
index 80eb0c2..2136b69 100644
--- a/diff.c
+++ b/diff.c
@@ -3693,12 +3693,16 @@ static int parse_ws_error_highlight(struct diff_options *opt, const char *arg)
 	return 1;
 }
 
-int diff_opt_parse(struct diff_options *options, const char **av, int ac)
+int diff_opt_parse(struct diff_options *options,
+		   const char **av, int ac, const char *prefix)
 {
 	const char *arg = av[0];
 	const char *optarg;
 	int argcount;
 
+	if (!prefix)
+		prefix = "";
+
 	/* Output format options */
 	if (!strcmp(arg, "-p") || !strcmp(arg, "-u") || !strcmp(arg, "--patch")
 	    || opt_arg(arg, 'U', "unified", &options->context))
@@ -3915,7 +3919,8 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 	else if (!strcmp(arg, "--pickaxe-regex"))
 		options->pickaxe_opts |= DIFF_PICKAXE_REGEX;
 	else if ((argcount = short_opt('O', av, &optarg))) {
-		options->orderfile = optarg;
+		const char *path = prefix_filename(prefix, strlen(prefix), optarg);
+		options->orderfile = xstrdup(path);
 		return argcount;
 	}
 	else if ((argcount = parse_long_opt("diff-filter", av, &optarg))) {
@@ -3954,9 +3959,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 	else if (!strcmp(arg, "--no-function-context"))
 		DIFF_OPT_CLR(options, FUNCCONTEXT);
 	else if ((argcount = parse_long_opt("output", av, &optarg))) {
-		options->file = fopen(optarg, "w");
+		const char *path = prefix_filename(prefix, strlen(prefix), optarg);
+		options->file = fopen(path, "w");
 		if (!options->file)
-			die_errno("Could not open '%s'", optarg);
+			die_errno("Could not open '%s'", path);
 		options->close_file = 1;
 		return argcount;
 	} else
diff --git a/diff.h b/diff.h
index 893f446..4537e38 100644
--- a/diff.h
+++ b/diff.h
@@ -268,7 +268,7 @@ extern int parse_long_opt(const char *opt, const char **argv,
 extern int git_diff_basic_config(const char *var, const char *value, void *cb);
 extern int git_diff_ui_config(const char *var, const char *value, void *cb);
 extern void diff_setup(struct diff_options *);
-extern int diff_opt_parse(struct diff_options *, const char **, int);
+extern int diff_opt_parse(struct diff_options *, const char **, int, const char *);
 extern void diff_setup_done(struct diff_options *);
 
 #define DIFF_DETECT_RENAME	1
diff --git a/revision.c b/revision.c
index 0a282f5..14daefb 100644
--- a/revision.c
+++ b/revision.c
@@ -2049,7 +2049,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--ignore-missing")) {
 		revs->ignore_missing = 1;
 	} else {
-		int opts = diff_opt_parse(&revs->diffopt, argv, argc);
+		int opts = diff_opt_parse(&revs->diffopt, argv, argc, revs->prefix);
 		if (!opts)
 			unkv[(*unkc)++] = arg;
 		return opts;
diff --git a/t/t4056-diff-order.sh b/t/t4056-diff-order.sh
index c0460bb..43dd474 100755
--- a/t/t4056-diff-order.sh
+++ b/t/t4056-diff-order.sh
@@ -68,6 +68,12 @@ test_expect_success POSIXPERM,SANITY 'unreadable orderfile' '
 	test_must_fail git diff -Ounreadable_file --name-only HEAD^..HEAD
 '
 
+test_expect_success "orderfile using option from subdir with --output" '
+	mkdir subdir &&
+	git -C subdir diff -O../order_file_1 --output ../actual --name-only HEAD^..HEAD &&
+	test_cmp expect_1 actual
+'
+
 for i in 1 2
 do
 	test_expect_success "orderfile using option ($i)" '
-- 
2.7.0.125.g9eec362

-- 8< --

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/7] merge: add --rename-notes
  2016-01-20 11:06 ` [PATCH 7/7] merge: add --rename-notes Nguyễn Thái Ngọc Duy
@ 2016-01-21 17:53   ` Junio C Hamano
  2016-01-22  3:35     ` Duy Nguyen
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2016-01-21 17:53 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> +/*
> + * Traverse through the given notes tree, convert all "path to path"
> + * rename lines into "blob to blob" and return it. If cache_file is
> + * non-NULL, return it's content if still valid. Otherwise save the
> + * new content in it.
> + */
> +static void merge_rename_notes(struct strbuf *cache,
> +			       struct notes_tree *t, const char *cache_file)
> +{
> +	struct object_id notes_oid;
> +
> +	if (cache_file) {
> +	...
> +	}
> +
> +	strbuf_reset(cache);
> +	for_each_note(t, 0, merge_rename_note, cache);

This uses _all_ merge notes attached to _any_ commit in the history,
without even checking if the commit is involved in the current merge
being done?  How would that be useful?

I also suspect that the data structure to keep track renames by
using notes needs a better design.  You seem to have a note per
commit and one note records a set of "this goes to that", and
that is the reason why you need to discriminately read everything
under the sun.

I think the index into the notes tree for the purpose of this use
case should not be "which commit records this set of renames?",
but by "what is the destination blob of possible rename
operations?".  IOW, if a path that held blob X was moved to
another path that holds blob Z in commit A, and if a path that
held blob Y was moved to another path that holds blob Z in commit
B, attach a note to blob Z that records "moved from X in A" and
"moved from Y in B".

Then, after diffcore-rename has enumerated the potential rename
destinations, look these destination blobs up in the notes.  You
see a path with blob Z created, and you know if you have removed
path that held X or Y in the potential rename source set, you
found a manually recorded rename.  If you have ancestry
information when you do it, you could even reject "move from X to
Z" when you know commit A is not involved in the ancestry path
involved in the merge, but that is optional (as you may be
comparing two states that may not be related with each other).

For the purpose of helping "git log", a notes tree reorganized in
such a way would be useful.  Again, when you find a potential
rename destination that has blob Z as the result of the change,
you read the note attached to the blob to learn that it may have
come from a path that held blob X (if we are dealing with commit
A), or it may have come from a path that held blob Y (if we are
dealing with commit B).  So you can add a new field in diffopt
structure and allow the caller somewhere in the callchain
(log_tree_diff(), perhaps?) to pass which resulting commit it
wants the recorded renames to be used from.  When it is time for
"git log -M -p" to show commit A, diffcore-rename down in the
callchain will get the diff_queue that contains "diff A^..A",
among which there is a path that used to have blob Z that goes
away, reads the notes for blob Z and notice that commit A moved a
path with blob X that is going away was renamed to it, discarding
the other record for blob Z that talks about the change in commit
B that is not relevant for the purpose of showing commit A.

Hmm?

> @@ -1260,10 +1359,25 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
>  		usage_with_options(builtin_merge_usage,
>  			builtin_merge_options);
>  
> +	if (rename_file && rename_note_ref)
> +		die(_("--rename-file and --rename-notes are incompatible"));
> +
>  	if (rename_file &&
>  	    strbuf_read_file(&manual_renames, rename_file, 0) == -1)
>  		die(_("unable to read %s"), rename_file);
>  
> +	if (rename_note_ref) {
> +		struct notes_tree rename_notes;
> +		struct strbuf ref = STRBUF_INIT;
> +
> +		strbuf_addstr(&ref, rename_note_ref);
> +		expand_notes_ref(&ref);
> +		init_notes(&rename_notes, ref.buf, NULL, 0);
> +		strbuf_release(&ref);
> +		merge_rename_notes(&manual_renames, &rename_notes,
> +				   git_path("GIT_RENAME_CACHE"));
> +	}
> +

These new hunks must move way below the codepath and be run only
after we are certain that we need to do a real merge.  The merge
logic that follows this line tries to handle light-weight special
cases (like an obvious fast-forward and already up-to-date cases)
as early as possible to avoid doing unnecessary things.

This patch (and the earlier one adds strbuf-read-file of the
rename-file) breaks the design by doing these new heavyweight
operations whose result may not be used at all too early, I
think.

If you go the route of reorganizing the rename notes around the
destination blobs, then you wouldn't even be reading everything
here very high in the callchain (you would be doing so only after
diffcore-rename decides that rename notes may be worth reading),
and that would fix this.

>  	if (!head_commit) {
>  		struct commit *remote_head;
>  		/*

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/7] diff.c: take "prefix" argument in diff_opt_parse()
  2016-01-21 11:48         ` Duy Nguyen
@ 2016-01-21 23:01           ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2016-01-21 23:01 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jeff King, git

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/7] log: add --rename-notes to correct renames per commit
  2016-01-20 23:29   ` Junio C Hamano
@ 2016-01-22  1:00     ` Duy Nguyen
  0 siblings, 0 replies; 20+ messages in thread
From: Duy Nguyen @ 2016-01-22  1:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King

On Thu, Jan 21, 2016 at 6:29 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> For simplicity, the note of commit A implies rename correction between
>> A^ and A. If parents are manipulated (e.g. "git log --reflog") then
>> the rename output may surprise people.
>
> I do not think "git log --reflog" attempts to show changes to bring
> the tree of @{2} into the shape of @{1}, even though for traversal
> purposes it pretends as if @{1}'s parent is @{2}.  So I am not sure
> what you are trying to say in the above sentence.

Hazy memory about hierarchy manipulation by reflog, maybe I'm wrong.

> A path limited "git log -- path1/ path2/..." also manipulates the
> commit->parents for traversal purposes, but I think the patch is
> still produced against the true parents (there is a call to
> get_saved_parents() in log_tree_diff() that shows the change for a
> given commit), and in that context, commit A that has notes about
> the change to bring the tree of commit A^ to its tree still makes
> sense.
>
> I'd be more worried about "git log -m -p"--the diff against the
> second and subsequent parents would not want to use the notes that
> describes the change between the first parent and the resulting
> merge.

I "fix" this by providing the tools and let the user go wild. The line
"A => B" says rename A to B for a diff between any source or target.
But they can also say rename A to B only if diff source is X, or diff
target is Y, or both. The syntax I'm having for that is something like
this

.source X # X can be anything that resolves to a tree SHA-1 with get_sha1()
.target Y
A => B
-- 
Duy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/7] merge: add --rename-notes
  2016-01-21 17:53   ` Junio C Hamano
@ 2016-01-22  3:35     ` Duy Nguyen
  2016-01-22 17:17       ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Duy Nguyen @ 2016-01-22  3:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King

On Fri, Jan 22, 2016 at 12:53 AM, Junio C Hamano <gitster@pobox.com> wrote:
> This uses _all_ merge notes attached to _any_ commit in the history,
> without even checking if the commit is involved in the current merge
> being done?  How would that be useful?
>
> I also suspect that the data structure to keep track renames by
> using notes needs a better design.  You seem to have a note per
> commit and one note records a set of "this goes to that", and
> that is the reason why you need to discriminately read everything
> under the sun.
>
> I think the index into the notes tree for the purpose of this use
> case should not be "which commit records this set of renames?",
> but by "what is the destination blob of possible rename
> operations?".  IOW, if a path that held blob X was moved to
> another path that holds blob Z in commit A, and if a path that
> held blob Y was moved to another path that holds blob Z in commit
> B, attach a note to blob Z that records "moved from X in A" and
> "moved from Y in B".

This is exactly the output after merge_rename_notes() is done. A
bunch of "rename this blob to that blob" pairs.

The problem with indexing notes tree by blob is how the user will
manage it. From the user perspective, I think it's natural to
think "I have made this commit that renames this path to that
path. But Git does not recognize it. I need to correct it by
adding a note to the commit".

The notes tree saves exactly that. Reading and updating can be
done with existing tools. Exchanging git notes is something we
have been avoiding looking at, but per-commit notes would make it
easier to exchange too, I think.

The information in that form may not be the best way to be
consumed by Git. But that is what cache is for. We can either
generate a secondary notes tree, indexed by blob, or naively a
big "blob to blob" rename file that I did in this patch. But it
does not have to be visible to the user.

If notes-indexed-by-blob is exposed to the user, I think we need some
more helper tools to edit and view them because blob SHA-1 is not
something the user deals with, normally. It feels not as easy as
editing a simple text file, to me.
-- 
Duy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/7] merge: add --rename-notes
  2016-01-22  3:35     ` Duy Nguyen
@ 2016-01-22 17:17       ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2016-01-22 17:17 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jeff King

Duy Nguyen <pclouds@gmail.com> writes:

> The problem with indexing notes tree by blob is how the user will
> manage it.

I do not think they have to be incompatible goals.  Allowing the
user to specify "at this commit, this path was moved to that path"
may be a good thing.  But at the same time, using a better data
structure at runtime is also important.

Unless I am reading your "cache" code wrong, it is always reading a
flat list of unsorted "<blob> to <blob>" entries describing
everything that happened in the entire history of the project into
core, and then forcing the low-level code to find the entries
relevant to the change in the commit it is looking at at runtime.

The way the "cache" code stores and forces the runtime to use the
data makes it proportional to the size of the history of the project
(here I am assuming uniform "rename density" across history, the
longer the history of a project, the more renames there would be) to
find the manually defined renames just to show a single change, no?

What was disappointing is that the way the rename notes is used
during 'log -M' traversal is an example of a good design.  When
showing a single commit, the cost the code pays to get to the data
for a single commit out of the rename notes tree does not grow
proportional to the size of the history.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-01-22 17:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-20 11:06 [PATCH 0/7] Diff rename, manual correction, round 2 Nguyễn Thái Ngọc Duy
2016-01-20 11:06 ` [PATCH 1/7] diff-no-index: do not take a redundant prefix argument Nguyễn Thái Ngọc Duy
2016-01-20 11:06 ` [PATCH 2/7] diff.c: take "prefix" argument in diff_opt_parse() Nguyễn Thái Ngọc Duy
2016-01-20 20:23   ` Junio C Hamano
2016-01-20 20:29     ` Jeff King
2016-01-20 21:49       ` Junio C Hamano
2016-01-21 11:48         ` Duy Nguyen
2016-01-21 23:01           ` Junio C Hamano
2016-01-20 11:06 ` [PATCH 3/7] diff: add --rename-file Nguyễn Thái Ngọc Duy
2016-01-20 22:44   ` Junio C Hamano
2016-01-20 22:47   ` Junio C Hamano
2016-01-20 11:06 ` [PATCH 4/7] log: add --rename-notes to correct renames per commit Nguyễn Thái Ngọc Duy
2016-01-20 23:29   ` Junio C Hamano
2016-01-22  1:00     ` Duy Nguyen
2016-01-20 11:06 ` [PATCH 5/7] merge: add --rename-file Nguyễn Thái Ngọc Duy
2016-01-20 11:06 ` [PATCH 6/7] diffcore-rename: allow to say "rename this blob to that blob" Nguyễn Thái Ngọc Duy
2016-01-20 11:06 ` [PATCH 7/7] merge: add --rename-notes Nguyễn Thái Ngọc Duy
2016-01-21 17:53   ` Junio C Hamano
2016-01-22  3:35     ` Duy Nguyen
2016-01-22 17:17       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).