git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation
@ 2018-04-02 22:48 Stefan Beller
  2018-04-02 22:48 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
                   ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

This is a re-attempt of [1], which allows the moved code detection to
ignore blanks in various modes.

patches 1-5 are refactoring, patch 6 adds all existing white space options
of regular diff to the move detection. (I am unsure about this patch,
as I presume we want to keep the option space at a minimum if possible).

The fun is in the last patch, which allows white space sensitive
languages to trust the move detection, too. Each block that is marked as
moved will have the same delta in {in-, de-}dentation.
I would think this mode might be a reasonable default eventually.

Thanks,
Stefan

[1] https://public-inbox.org/git/20171025224620.27657-1-sbeller@google.com/

Stefan Beller (7):
  xdiff/xdiff.h: remove unused flags
  xdiff/xdiffi.c: remove unneeded function declarations
  diff.c: do not pass diff options as keydata to hashmap
  diff.c: adjust hash function signature to match hashmap expectation
  diff.c: refactor internal representation for coloring moved code
  diff.c: decouple white space treatment for move detection from generic
    option
  diff.c: add --color-moved-ignore-space-delta option

 Documentation/diff-options.txt |  13 ++
 diff.c                         | 155 ++++++++++++---
 diff.h                         |  18 +-
 t/t4015-diff-whitespace.sh     | 341 ++++++++++++++++++++++++++++++++-
 xdiff/xdiff.h                  |   8 -
 xdiff/xdiffi.c                 |  17 --
 6 files changed, 483 insertions(+), 69 deletions(-)

-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/7] xdiff/xdiff.h: remove unused flags
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-02 22:48 ` [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

These flags were there since the beginning (3443546f6e (Use a *real*
built-in diff generator, 2006-03-24), but were never used. Remove them.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 xdiff/xdiff.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index c1937a2911..2356da5f78 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -52,14 +52,6 @@ extern "C" {
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
-#define XDL_MMB_READONLY (1 << 0)
-
-#define XDL_MMF_ATOMIC (1 << 0)
-
-#define XDL_BDOP_INS 1
-#define XDL_BDOP_CPY 2
-#define XDL_BDOP_INSB 3
-
 /* merge simplification levels */
 #define XDL_MERGE_MINIMAL 0
 #define XDL_MERGE_EAGER 1
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
  2018-04-02 22:48 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-02 22:48 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

There is no need to forward-declare these functions, as they are used
after their implementation only.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 xdiff/xdiffi.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 0de1ef463b..3e8aff92bc 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -22,34 +22,17 @@
 
 #include "xinclude.h"
 
-
-
 #define XDL_MAX_COST_MIN 256
 #define XDL_HEUR_MIN_COST 256
 #define XDL_LINE_MAX (long)((1UL << (CHAR_BIT * sizeof(long) - 1)) - 1)
 #define XDL_SNAKE_CNT 20
 #define XDL_K_HEUR 4
 
-
-
 typedef struct s_xdpsplit {
 	long i1, i2;
 	int min_lo, min_hi;
 } xdpsplit_t;
 
-
-
-
-static long xdl_split(unsigned long const *ha1, long off1, long lim1,
-		      unsigned long const *ha2, long off2, long lim2,
-		      long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
-		      xdalgoenv_t *xenv);
-static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, long chg2);
-
-
-
-
-
 /*
  * See "An O(ND) Difference Algorithm and its Variations", by Eugene Myers.
  * Basically considers a "box" (off1, off2, lim1, lim2) and scan from both
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
  2018-04-02 22:48 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
  2018-04-02 22:48 ` [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-06 20:04   ` Jeff King
  2018-04-02 22:48 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

The diff options are passed to the compare function as
'hashmap_cmp_fn_data', which are given when the hashmaps
are initialized.

A later patch will make use of the keydata to signal
different settings for comparision.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 4c59f5f5d3..52f6e02130 100644
--- a/diff.c
+++ b/diff.c
@@ -842,13 +842,13 @@ static void mark_color_as_moved(struct diff_options *o,
 		case DIFF_SYMBOL_PLUS:
 			hm = del_lines;
 			key = prepare_entry(o, n);
-			match = hashmap_get(hm, key, o);
+			match = hashmap_get(hm, key, NULL);
 			free(key);
 			break;
 		case DIFF_SYMBOL_MINUS:
 			hm = add_lines;
 			key = prepare_entry(o, n);
-			match = hashmap_get(hm, key, o);
+			match = hashmap_get(hm, key, NULL);
 			free(key);
 			break;
 		default:
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (2 preceding siblings ...)
  2018-04-02 22:48 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-02 22:48 ` [PATCH 5/7] diff.c: refactor internal representation for coloring moved code Stefan Beller
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

This makes the follow up patch easier.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 52f6e02130..879b8a5d9d 100644
--- a/diff.c
+++ b/diff.c
@@ -707,11 +707,15 @@ struct moved_entry {
 	struct moved_entry *next_line;
 };
 
-static int moved_entry_cmp(const struct diff_options *diffopt,
-			   const struct moved_entry *a,
-			   const struct moved_entry *b,
+static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
+			   const void *entry,
+			   const void *entry_or_key,
 			   const void *keydata)
 {
+	const struct diff_options *diffopt = hashmap_cmp_fn_data;
+	const struct moved_entry *a = entry;
+	const struct moved_entry *b = entry_or_key;
+
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
 				    diffopt->xdl_opts);
@@ -5534,10 +5538,8 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
-			hashmap_init(&del_lines,
-				     (hashmap_cmp_fn)moved_entry_cmp, o, 0);
-			hashmap_init(&add_lines,
-				     (hashmap_cmp_fn)moved_entry_cmp, o, 0);
+			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
+			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
 			add_lines_to_move_detection(o, &add_lines, &del_lines);
 			mark_color_as_moved(o, &add_lines, &del_lines);
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (3 preceding siblings ...)
  2018-04-02 22:48 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-02 23:51   ` Jonathan Tan
  2018-04-03 19:39   ` Ævar Arnfjörð Bjarmason
  2018-04-02 22:48 ` [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option Stefan Beller
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 30+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

At the time the move coloring was implemented we thought an enum of modes
is the best to configure this feature.  However as we want to tack on new
features, the enum would grow exponentially.

Refactor the code such that features are enabled via bits. Currently we can
* activate the move detection,
* enable the block detection on top, and
* enable the dimming inside a block, though this could be done without
  block detection as well (mode "plain, dimmed")

Choose the flags to not be at bit position 2,3,4 as the next patch
will occupy these.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 27 ++++++++++++++-------------
 diff.h | 17 ++++++++++-------
 2 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/diff.c b/diff.c
index 879b8a5d9d..2052a43f7a 100644
--- a/diff.c
+++ b/diff.c
@@ -260,7 +260,7 @@ static int parse_color_moved(const char *arg)
 {
 	switch (git_parse_maybe_bool(arg)) {
 	case 0:
-		return COLOR_MOVED_NO;
+		return 0;
 	case 1:
 		return COLOR_MOVED_DEFAULT;
 	default:
@@ -268,15 +268,17 @@ static int parse_color_moved(const char *arg)
 	}
 
 	if (!strcmp(arg, "no"))
-		return COLOR_MOVED_NO;
+		return 0;
 	else if (!strcmp(arg, "plain"))
-		return COLOR_MOVED_PLAIN;
+		return COLOR_MOVED_ENABLED;
 	else if (!strcmp(arg, "zebra"))
-		return COLOR_MOVED_ZEBRA;
+		return COLOR_MOVED_ENABLED | COLOR_MOVED_FIND_BLOCKS;
 	else if (!strcmp(arg, "default"))
 		return COLOR_MOVED_DEFAULT;
 	else if (!strcmp(arg, "dimmed_zebra"))
-		return COLOR_MOVED_ZEBRA_DIM;
+		return COLOR_MOVED_ENABLED |
+		       COLOR_MOVED_FIND_BLOCKS |
+		       COLOR_MOVED_DIMMED_BLOCKS;
 	else
 		return error(_("color moved setting must be one of 'no', 'default', 'zebra', 'dimmed_zebra', 'plain'"));
 }
@@ -794,7 +796,7 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 }
 
 /*
- * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
+ * If o->color_moved is not set to find blocks, this function does nothing.
  *
  * Otherwise, if the last block has fewer alphanumeric characters than
  * COLOR_MOVED_MIN_ALNUM_COUNT, unset DIFF_SYMBOL_MOVED_LINE on all lines in
@@ -809,7 +811,7 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 static void adjust_last_block(struct diff_options *o, int n, int block_length)
 {
 	int i, alnum_count = 0;
-	if (o->color_moved == COLOR_MOVED_PLAIN)
+	if (!(o->color_moved & COLOR_MOVED_FIND_BLOCKS))
 		return;
 	for (i = 1; i < block_length + 1; i++) {
 		const char *c = o->emitted_symbols->buf[n - i].line;
@@ -868,7 +870,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
 		l->flags |= DIFF_SYMBOL_MOVED_LINE;
 
-		if (o->color_moved == COLOR_MOVED_PLAIN)
+		if (!(o->color_moved & COLOR_MOVED_FIND_BLOCKS))
 			continue;
 
 		/* Check any potential block runs, advance each or nullify */
@@ -4697,12 +4699,11 @@ int diff_opt_parse(struct diff_options *options,
 	else if (!strcmp(arg, "--no-color"))
 		options->use_color = 0;
 	else if (!strcmp(arg, "--color-moved")) {
-		if (diff_color_moved_default)
-			options->color_moved = diff_color_moved_default;
-		if (options->color_moved == COLOR_MOVED_NO)
+		options->color_moved = diff_color_moved_default;
+		if (!options->color_moved)
 			options->color_moved = COLOR_MOVED_DEFAULT;
 	} else if (!strcmp(arg, "--no-color-moved"))
-		options->color_moved = COLOR_MOVED_NO;
+		options->color_moved = 0;
 	else if (skip_prefix(arg, "--color-moved=", &arg)) {
 		int cm = parse_color_moved(arg);
 		if (cm < 0)
@@ -5543,7 +5544,7 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 
 			add_lines_to_move_detection(o, &add_lines, &del_lines);
 			mark_color_as_moved(o, &add_lines, &del_lines);
-			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
+			if (o->color_moved & COLOR_MOVED_DIMMED_BLOCKS)
 				dim_moved_lines(o);
 
 			hashmap_free(&add_lines, 0);
diff --git a/diff.h b/diff.h
index d29560f822..9542017986 100644
--- a/diff.h
+++ b/diff.h
@@ -205,13 +205,16 @@ struct diff_options {
 	int diff_path_counter;
 
 	struct emitted_diff_symbols *emitted_symbols;
-	enum {
-		COLOR_MOVED_NO = 0,
-		COLOR_MOVED_PLAIN = 1,
-		COLOR_MOVED_ZEBRA = 2,
-		COLOR_MOVED_ZEBRA_DIM = 3,
-	} color_moved;
-	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
+	unsigned color_moved;
+
+	#define COLOR_MOVED_ENABLED		(1 << 0)
+
+	#define COLOR_MOVED_FIND_BLOCKS		(1 << 1)
+	/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
+
+	#define COLOR_MOVED_DIMMED_BLOCKS	(1 << 23)
+
+	#define COLOR_MOVED_DEFAULT (COLOR_MOVED_ENABLED | COLOR_MOVED_FIND_BLOCKS)
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
 };
 
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (4 preceding siblings ...)
  2018-04-02 22:48 ` [PATCH 5/7] diff.c: refactor internal representation for coloring moved code Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-03  0:31   ` Jonathan Tan
  2018-04-02 22:48 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
  2018-04-02 23:47 ` [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
  7 siblings, 1 reply; 30+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

In the original implementation of the move detection logic we assumed that
the choice for ignoring white space changes is the same for the move
detection as it is for the generic diff.  It turns out this is wrong.

There are a couple of things where the user wants to input their
decision into the diff machinery:

* Whether or not to ignore white space for the general diff detection.
  This is covered with the -w, -b options already.
* Whether the move detection ought to pay attention to white spaces
  in general. And if so, how are white spaces handled for the block
  detection.

One example would be reviewing a current patch under discussion, that moves
code around.  In that case, you may want to have the general diff machinery
not ignore the white spaces (i.e. exact matching), as that will show you
the diff as the patch sent. The code moved however may have another
indentation level, such that you want to ignore white spaces for the move
detection. The code may be in python, such that spaces matter for program
flow, though. That is why you'd want each block to have the same change
in white space.

This patch decouples white space treatment in the general diff machinery
from the white space treatment in the move detection.

This adds all the the options for ignoring white space that the regular
diff machinery has to the move detection, such that we can cover all the
cases that were introduced in 7a55427094 (Merge branch
'jk/diff-color-moved-fix', 2017-11-06).

The example from above (different lines in the diff with -w for the regular
diff) is covered in a test. Convert the existing tests to be more explicit
on their choice of white space behavior in the move detection as the tests
should not fail if the default is changed.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt |  13 +++
 diff.c                         |  17 +++-
 t/t4015-diff-whitespace.sh     | 150 +++++++++++++++++++++++++++++++--
 3 files changed, 171 insertions(+), 9 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index e3a44f03cd..4ca09c1977 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -288,6 +288,19 @@ dimmed_zebra::
 	blocks are considered interesting, the rest is uninteresting.
 --
 
+--color-moved-[no-]ignore-all-space::
+	Ignore whitespace when comparing lines when performing the move
+	detection for --color-moved.  This ignores differences even if
+	one line has whitespace where the other line has none.
+--color-moved-[no-]ignore-space-change::
+	Ignore changes in amount of whitespace when performing the move
+	detection for --color-moved.  This ignores whitespace
+	at line end, and considers all other sequences of one or
+	more whitespace characters to be equivalent.
+--color-moved-[no-]ignore-space-at-eol::
+	Ignore changes in whitespace at EOL when performing the move
+	detection for --color-moved.
+
 --word-diff[=<mode>]::
 	Show a word diff, using the <mode> to delimit changed words.
 	By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index 2052a43f7a..5fe2930dca 100644
--- a/diff.c
+++ b/diff.c
@@ -720,7 +720,7 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
-				    diffopt->xdl_opts);
+				    diffopt->color_moved & XDF_WHITESPACE_FLAGS);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -728,8 +728,9 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 {
 	struct moved_entry *ret = xmalloc(sizeof(*ret));
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
+	unsigned flags = o->color_moved & XDF_WHITESPACE_FLAGS;
 
-	ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
+	ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
 	ret->es = l;
 	ret->next_line = NULL;
 
@@ -4638,6 +4639,18 @@ int diff_opt_parse(struct diff_options *options,
 		DIFF_XDL_SET(options, IGNORE_CR_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+	else if (!strcmp(arg, "--color-moved-no-ignore-all-space"))
+		options->color_moved &= ~XDF_IGNORE_WHITESPACE;
+	else if (!strcmp(arg, "--color-moved-no-ignore-space-change"))
+		options->color_moved &= ~XDF_IGNORE_WHITESPACE_CHANGE;
+	else if (!strcmp(arg, "--color-moved-no-ignore-space-at-eol"))
+		options->color_moved &= ~XDF_IGNORE_WHITESPACE_AT_EOL;
+	else if (!strcmp(arg, "--color-moved-ignore-all-space"))
+		options->color_moved |= XDF_IGNORE_WHITESPACE;
+	else if (!strcmp(arg, "--color-moved-ignore-space-change"))
+		options->color_moved |= XDF_IGNORE_WHITESPACE_CHANGE;
+	else if (!strcmp(arg, "--color-moved-ignore-space-at-eol"))
+		options->color_moved |= XDF_IGNORE_WHITESPACE_AT_EOL;
 	else if (!strcmp(arg, "--indent-heuristic"))
 		DIFF_XDL_SET(options, INDENT_HEURISTIC);
 	else if (!strcmp(arg, "--no-indent-heuristic"))
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 17df491a3a..38aaf4c46c 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1395,7 +1395,10 @@ test_expect_success 'move detection ignoring whitespace ' '
 	line 4
 	line 5
 	EOF
-	git diff HEAD --no-renames --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1419,7 +1422,10 @@ test_expect_success 'move detection ignoring whitespace ' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -w --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1459,7 +1465,10 @@ test_expect_success 'move detection ignoring whitespace changes' '
 	line 5
 	EOF
 
-	git diff HEAD --no-renames --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1483,7 +1492,10 @@ test_expect_success 'move detection ignoring whitespace changes' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -b --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-at-eol \
+		--color-moved-ignore-space-change |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1526,7 +1538,10 @@ test_expect_success 'move detection ignoring whitespace at eol' '
 	# avoid cluttering the output with complaints about our eol whitespace
 	test_config core.whitespace -blank-at-eol &&
 
-	git diff HEAD --no-renames --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1550,7 +1565,10 @@ test_expect_success 'move detection ignoring whitespace at eol' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1722,7 +1740,125 @@ test_expect_success 'move detection with submodules' '
 
 	# nor did we mess with it another way
 	git diff --submodule=diff --color | test_decode_color >expect &&
-	test_cmp expect decoded_actual
+	test_cmp expect decoded_actual &&
+	rm -rf bananas &&
+	git submodule deinit bananas
+'
+
+test_expect_success 'move detection only ignores white spaces' '
+	git reset --hard &&
+	q_to_tab <<-\EOF >function.c &&
+	int func()
+	{
+	Qif (foo) {
+	QQ// this part of the function
+	QQ// function will be very long
+	QQ// indeed. We must exceed both
+	QQ// per-line and number of line
+	QQ// minimums
+	QQ;
+	Q}
+	Qbaz();
+	Qbar();
+	Q// more unrelated stuff
+	}
+	EOF
+	git add function.c &&
+	git commit -m "add function.c" &&
+	q_to_tab <<-\EOF >function.c &&
+	int do_foo()
+	{
+	Q// this part of the function
+	Q// function will be very long
+	Q// indeed. We must exceed both
+	Q// per-line and number of line
+	Q// minimums
+	Q;
+	}
+
+	int func()
+	{
+	Qif (foo)
+	QQdo_foo();
+	Qbaz();
+	Qbar();
+	Q// more unrelated stuff
+	}
+	EOF
+
+	# Make sure we get a different diff using -w ("moved function header")
+	git diff --color --color-moved -w \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
+		grep -v "index" |
+		test_decode_color >actual &&
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/function.c b/function.c<RESET>
+	<BOLD>--- a/function.c<RESET>
+	<BOLD>+++ b/function.c<RESET>
+	<CYAN>@@ -1,6 +1,5 @@<RESET>
+	<BOLD;MAGENTA>-int func()<RESET>
+	<GREEN>+<RESET><GREEN>int do_foo()<RESET>
+	 {<RESET>
+	<RED>-	if (foo) {<RESET>
+	 Q// this part of the function<RESET>
+	 Q// function will be very long<RESET>
+	 Q// indeed. We must exceed both<RESET>
+	<CYAN>@@ -8,6 +7,11 @@<RESET> <RESET>int func()<RESET>
+	 Q// minimums<RESET>
+	 Q;<RESET>
+	 }<RESET>
+	<GREEN>+<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>int func()<RESET>
+	<GREEN>+<RESET><GREEN>{<RESET>
+	<GREEN>+<RESET>Q<GREEN>if (foo)<RESET>
+	<GREEN>+<RESET>QQ<GREEN>do_foo();<RESET>
+	 Qbaz();<RESET>
+	 Qbar();<RESET>
+	 Q// more unrelated stuff<RESET>
+	EOF
+	test_cmp expected actual &&
+
+	# And now ignoring white space only in the move detection
+	git diff --color --color-moved \
+		--color-moved-ignore-all-space \
+		--color-moved-ignore-space-change \
+		--color-moved-ignore-space-at-eol |
+		grep -v "index" |
+		test_decode_color >actual &&
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/function.c b/function.c<RESET>
+	<BOLD>--- a/function.c<RESET>
+	<BOLD>+++ b/function.c<RESET>
+	<CYAN>@@ -1,13 +1,17 @@<RESET>
+	<GREEN>+<RESET><GREEN>int do_foo()<RESET>
+	<GREEN>+<RESET><GREEN>{<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>// this part of the function<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>// function will be very long<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>// indeed. We must exceed both<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>// per-line and number of line<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>// minimums<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>;<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>}<RESET>
+	<GREEN>+<RESET>
+	 int func()<RESET>
+	 {<RESET>
+	<RED>-Qif (foo) {<RESET>
+	<BOLD;MAGENTA>-QQ// this part of the function<RESET>
+	<BOLD;MAGENTA>-QQ// function will be very long<RESET>
+	<BOLD;MAGENTA>-QQ// indeed. We must exceed both<RESET>
+	<BOLD;MAGENTA>-QQ// per-line and number of line<RESET>
+	<BOLD;MAGENTA>-QQ// minimums<RESET>
+	<BOLD;MAGENTA>-QQ;<RESET>
+	<BOLD;MAGENTA>-Q}<RESET>
+	<GREEN>+<RESET>Q<GREEN>if (foo)<RESET>
+	<GREEN>+<RESET>QQ<GREEN>do_foo();<RESET>
+	 Qbaz();<RESET>
+	 Qbar();<RESET>
+	 Q// more unrelated stuff<RESET>
+	EOF
+	test_cmp expected actual
 '
 
 test_done
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (5 preceding siblings ...)
  2018-04-02 22:48 ` [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-03  0:41   ` Jonathan Tan
  2018-04-02 23:47 ` [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
  7 siblings, 1 reply; 30+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

This marks moved code still as blocks when their indentation level
changes uniformly.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c                     |  93 ++++++++++++++++--
 diff.h                     |   1 +
 t/t4015-diff-whitespace.sh | 191 +++++++++++++++++++++++++++++++++++++
 3 files changed, 278 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 5fe2930dca..9f969be588 100644
--- a/diff.c
+++ b/diff.c
@@ -709,6 +709,41 @@ struct moved_entry {
 	struct moved_entry *next_line;
 };
 
+struct ws_delta {
+	int deltachars;
+	char firstchar;
+};
+
+static void compute_ws_delta(const struct emitted_diff_symbol *a,
+			     const struct emitted_diff_symbol *b,
+			     struct ws_delta *out)
+{
+	int i;
+	const struct emitted_diff_symbol *longer = a->len > b->len ? a : b;
+
+
+	out->deltachars = 0;
+	out->firstchar = 0;
+
+	if (longer->len > 0 && isspace(longer->line[0]))
+		out->firstchar = longer->line[0];
+	else
+		return;
+
+	for (i = 0; i < a->len; i++)
+		if (a->line[i] == out->firstchar)
+			out->deltachars ++;
+
+	for (i = 0; i < b->len; i++)
+		if (b->line[i] == out->firstchar)
+			out->deltachars --;
+}
+
+static int compare_ws_delta(const struct ws_delta *a, const struct ws_delta *b)
+{
+	return  a->firstchar == b->firstchar && a->deltachars == b->deltachars;
+}
+
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			   const void *entry,
 			   const void *entry_or_key,
@@ -717,10 +752,20 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
 	const struct moved_entry *a = entry;
 	const struct moved_entry *b = entry_or_key;
+	unsigned flags = diffopt->color_moved & XDF_WHITESPACE_FLAGS;
+
+	if (diffopt->color_moved & COLOR_MOVED_DELTA_WHITESPACES)
+		/*
+		 * As there is not specific white space config given,
+		 * we'd need to check for a new block, so ignore all
+		 * white space. The setup of the white space
+		 * configuration for the next block is done else where
+		 */
+		flags |= XDF_IGNORE_WHITESPACE;
 
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
-				    diffopt->color_moved & XDF_WHITESPACE_FLAGS);
+				    flags);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -770,7 +815,8 @@ static void add_lines_to_move_detection(struct diff_options *o,
 }
 
 static int shrink_potential_moved_blocks(struct moved_entry **pmb,
-					 int pmb_nr)
+					 int pmb_nr,
+					 struct ws_delta **wsd)
 {
 	int lp, rp;
 
@@ -786,6 +832,8 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 
 		if (lp < pmb_nr && rp > -1 && lp < rp) {
 			pmb[lp] = pmb[rp];
+			if (*wsd)
+				(*wsd)[lp] = (*wsd)[rp];
 			pmb[rp] = NULL;
 			rp--;
 			lp++;
@@ -835,8 +883,11 @@ static void mark_color_as_moved(struct diff_options *o,
 {
 	struct moved_entry **pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
-	int n, flipped_block = 1, block_length = 0;
 
+	struct ws_delta *wsd = NULL; /* white space deltas between pmb */
+	int wsd_alloc = 0;
+
+	int n, flipped_block = 1, block_length = 0;
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
 		struct hashmap *hm = NULL;
@@ -879,14 +930,30 @@ static void mark_color_as_moved(struct diff_options *o,
 			struct moved_entry *p = pmb[i];
 			struct moved_entry *pnext = (p && p->next_line) ?
 					p->next_line : NULL;
-			if (pnext && !hm->cmpfn(o, pnext, match, NULL)) {
-				pmb[i] = p->next_line;
+
+			if (o->color_moved & COLOR_MOVED_DELTA_WHITESPACES) {
+				struct ws_delta out;
+
+				if (pnext)
+					compute_ws_delta(l, pnext->es, &out);
+				if (pnext &&
+				    !hm->cmpfn(o, pnext, match, NULL) &&
+				    compare_ws_delta(&out, &wsd[i])) {
+					pmb[i] = p->next_line;
+					/* wsd[i] is the same */
+				} else {
+					pmb[i] = NULL;
+				}
 			} else {
-				pmb[i] = NULL;
+				if (pnext && !hm->cmpfn(o, pnext, match, NULL)) {
+					pmb[i] = p->next_line;
+				} else {
+					pmb[i] = NULL;
+				}
 			}
 		}
 
-		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
+		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr, &wsd);
 
 		if (pmb_nr == 0) {
 			/*
@@ -895,6 +962,10 @@ static void mark_color_as_moved(struct diff_options *o,
 			 */
 			for (; match; match = hashmap_get_next(hm, match)) {
 				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
+				if (o->color_moved & COLOR_MOVED_DELTA_WHITESPACES) {
+					ALLOC_GROW(wsd, pmb_nr + 1, wsd_alloc);
+					compute_ws_delta(l, match->es, &wsd[pmb_nr]);
+				}
 				pmb[pmb_nr++] = match;
 			}
 
@@ -912,6 +983,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	adjust_last_block(o, n, block_length);
 
 	free(pmb);
+	free(wsd);
 }
 
 #define DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK \
@@ -4645,12 +4717,16 @@ int diff_opt_parse(struct diff_options *options,
 		options->color_moved &= ~XDF_IGNORE_WHITESPACE_CHANGE;
 	else if (!strcmp(arg, "--color-moved-no-ignore-space-at-eol"))
 		options->color_moved &= ~XDF_IGNORE_WHITESPACE_AT_EOL;
+	else if (!strcmp(arg, "--color-moved-no-ignore-space-delta"))
+		options->color_moved &= ~COLOR_MOVED_DELTA_WHITESPACES;
 	else if (!strcmp(arg, "--color-moved-ignore-all-space"))
 		options->color_moved |= XDF_IGNORE_WHITESPACE;
 	else if (!strcmp(arg, "--color-moved-ignore-space-change"))
 		options->color_moved |= XDF_IGNORE_WHITESPACE_CHANGE;
 	else if (!strcmp(arg, "--color-moved-ignore-space-at-eol"))
 		options->color_moved |= XDF_IGNORE_WHITESPACE_AT_EOL;
+	else if (!strcmp(arg, "--color-moved-ignore-space-delta"))
+		options->color_moved |= COLOR_MOVED_DELTA_WHITESPACES;
 	else if (!strcmp(arg, "--indent-heuristic"))
 		DIFF_XDL_SET(options, INDENT_HEURISTIC);
 	else if (!strcmp(arg, "--no-indent-heuristic"))
@@ -5555,6 +5631,9 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
 			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
+			if (o->color_moved & COLOR_MOVED_DELTA_WHITESPACES)
+				o->color_moved |= XDF_IGNORE_WHITESPACE;
+
 			add_lines_to_move_detection(o, &add_lines, &del_lines);
 			mark_color_as_moved(o, &add_lines, &del_lines);
 			if (o->color_moved & COLOR_MOVED_DIMMED_BLOCKS)
diff --git a/diff.h b/diff.h
index 9542017986..b8c0cf1232 100644
--- a/diff.h
+++ b/diff.h
@@ -212,6 +212,7 @@ struct diff_options {
 	#define COLOR_MOVED_FIND_BLOCKS		(1 << 1)
 	/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
 
+	#define COLOR_MOVED_DELTA_WHITESPACES	(1 << 22)
 	#define COLOR_MOVED_DIMMED_BLOCKS	(1 << 23)
 
 	#define COLOR_MOVED_DEFAULT (COLOR_MOVED_ENABLED | COLOR_MOVED_FIND_BLOCKS)
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 38aaf4c46c..246ccadf9b 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1861,4 +1861,195 @@ test_expect_success 'move detection only ignores white spaces' '
 	test_cmp expected actual
 '
 
+test_expect_success 'compare whitespace delta across moved blocks' '
+
+	git reset --hard &&
+	q_to_tab <<-\EOF >text.txt &&
+	QIndented
+	QText
+	Qacross
+	Qfive
+	Qlines
+	QBut!
+	Qthis
+	QQone
+	Qline
+	QQdid
+	Qnot
+	QQadjust
+	EOF
+
+	git add text.txt &&
+	git commit -m "add text.txt" &&
+
+	q_to_tab <<-\EOF >text.txt &&
+	QQIndented
+	QQText
+	QQacross
+	QQfive
+	QQlines
+	QQQBut!
+	QQthis
+	QQQone
+	QQline
+	QQQdid
+	QQnot
+	QQQadjust
+	EOF
+
+	git diff --color --color-moved-ignore-space-delta |
+		grep -v "index" |
+		test_decode_color >actual &&
+
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/text.txt b/text.txt<RESET>
+	<BOLD>--- a/text.txt<RESET>
+	<BOLD>+++ b/text.txt<RESET>
+	<CYAN>@@ -1,12 +1,12 @@<RESET>
+	<BOLD;MAGENTA>-QIndented<RESET>
+	<BOLD;MAGENTA>-QText<RESET>
+	<BOLD;MAGENTA>-Qacross<RESET>
+	<BOLD;MAGENTA>-Qfive<RESET>
+	<BOLD;MAGENTA>-Qlines<RESET>
+	<RED>-QBut!<RESET>
+	<BOLD;MAGENTA>-Qthis<RESET>
+	<BOLD;MAGENTA>-QQone<RESET>
+	<BOLD;MAGENTA>-Qline<RESET>
+	<BOLD;MAGENTA>-QQdid<RESET>
+	<BOLD;MAGENTA>-Qnot<RESET>
+	<BOLD;MAGENTA>-QQadjust<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>Indented<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>Text<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>across<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>five<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>lines<RESET>
+	<GREEN>+<RESET>QQQ<GREEN>But!<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>this<RESET>
+	<BOLD;YELLOW>+<RESET>QQQ<BOLD;YELLOW>one<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>line<RESET>
+	<BOLD;YELLOW>+<RESET>QQQ<BOLD;YELLOW>did<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>not<RESET>
+	<BOLD;YELLOW>+<RESET>QQQ<BOLD;YELLOW>adjust<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'compare whitespace delta across moved blocks with multiple indentation levels' '
+	# alternative: "python programmers would love the move detection, too"
+
+	git reset --hard &&
+	q_to_tab <<-\EOF >test.py &&
+	class test:
+	Qdef f(x):
+	QQ"""
+	QQA simple python function
+	QQthat returns the square of a number
+	QQAlthough it may not be pythonic
+	QQ"""
+	QQdef g(x):
+	QQQ"""
+	QQQNested function that returns the same number
+	QQQWe just multiply by 1.0
+	QQQso we can write a comment about it
+	QQQas we need longer blocks
+	QQQ"""
+	QQQreturn 1.0 * x
+	QQ# Another comment for f(x)
+	QQ# also spanning multiple lines
+	QQ# to make a block
+	QQreturn g(x) * g(x)
+	Qdef h(x):
+	QQ# Another function unrelated to the previous
+	QQ# but building a block,
+	QQ# long enough to call it a block
+	QQreturn x * 1.0
+	EOF
+
+	git add test.py &&
+	git commit -m "add test.py" &&
+
+	q_to_tab <<-\EOF >test.py &&
+	class test:
+	Qdef h(x):
+	QQ# Another function unrelated to the previous
+	QQ# but building a block,
+	QQ# long enough to call it a block
+	QQreturn x * 1.0
+	def f(x):
+	Q"""
+	QA simple python function
+	Qthat returns the square of a number
+	QAlthough it may not be pythonic
+	Q"""
+	Qdef g(x):
+	QQ"""
+	QQNested function that returns the same number
+	QQWe just multiply by 1.0
+	QQso we can write a comment about it
+	QQas we need longer blocks
+	QQ"""
+	QQreturn 1.0 * x
+	Q# Another comment for f(x)
+	Q# also spanning multiple lines
+	Q# to make a block
+	Qreturn g(x) * g(x)
+	EOF
+
+	git diff --color --color-moved-ignore-space-delta |
+		grep -v "index" |
+		test_decode_color >actual &&
+
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/test.py b/test.py<RESET>
+	<BOLD>--- a/test.py<RESET>
+	<BOLD>+++ b/test.py<RESET>
+	<CYAN>@@ -1,24 +1,24 @@<RESET>
+	 class test:<RESET>
+	<BOLD;MAGENTA>-Qdef f(x):<RESET>
+	<BOLD;MAGENTA>-QQ"""<RESET>
+	<BOLD;MAGENTA>-QQA simple python function<RESET>
+	<BOLD;MAGENTA>-QQthat returns the square of a number<RESET>
+	<BOLD;MAGENTA>-QQAlthough it may not be pythonic<RESET>
+	<BOLD;MAGENTA>-QQ"""<RESET>
+	<BOLD;MAGENTA>-QQdef g(x):<RESET>
+	<BOLD;MAGENTA>-QQQ"""<RESET>
+	<BOLD;MAGENTA>-QQQNested function that returns the same number<RESET>
+	<BOLD;MAGENTA>-QQQWe just multiply by 1.0<RESET>
+	<BOLD;MAGENTA>-QQQso we can write a comment about it<RESET>
+	<BOLD;MAGENTA>-QQQas we need longer blocks<RESET>
+	<BOLD;MAGENTA>-QQQ"""<RESET>
+	<BOLD;MAGENTA>-QQQreturn 1.0 * x<RESET>
+	<BOLD;MAGENTA>-QQ# Another comment for f(x)<RESET>
+	<BOLD;MAGENTA>-QQ# also spanning multiple lines<RESET>
+	<BOLD;MAGENTA>-QQ# to make a block<RESET>
+	<BOLD;MAGENTA>-QQreturn g(x) * g(x)<RESET>
+	 Qdef h(x):<RESET>
+	 QQ# Another function unrelated to the previous<RESET>
+	 QQ# but building a block,<RESET>
+	 QQ# long enough to call it a block<RESET>
+	 QQreturn x * 1.0<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>def f(x):<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>"""<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>A simple python function<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>that returns the square of a number<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>Although it may not be pythonic<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>"""<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>def g(x):<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>"""<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Nested function that returns the same number<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>We just multiply by 1.0<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>so we can write a comment about it<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>as we need longer blocks<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>"""<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>return 1.0 * x<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN># Another comment for f(x)<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN># also spanning multiple lines<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN># to make a block<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>return g(x) * g(x)<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (6 preceding siblings ...)
  2018-04-02 22:48 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
@ 2018-04-02 23:47 ` Jonathan Tan
  2018-04-03  0:03   ` Jacob Keller
  7 siblings, 1 reply; 30+ messages in thread
From: Jonathan Tan @ 2018-04-02 23:47 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jacob.keller, simon, git

On Mon,  2 Apr 2018 15:48:47 -0700
Stefan Beller <sbeller@google.com> wrote:

> This is a re-attempt of [1], which allows the moved code detection to
> ignore blanks in various modes.
> 
> patches 1-5 are refactoring, patch 6 adds all existing white space options
> of regular diff to the move detection. (I am unsure about this patch,
> as I presume we want to keep the option space at a minimum if possible).

My preference is to not do this until a need has been demonstrated, but
this sounds like it could be useful one day. I'll review the patches
from the viewpoint that we do want this feature.

> The fun is in the last patch, which allows white space sensitive
> languages to trust the move detection, too. Each block that is marked as
> moved will have the same delta in {in-, de-}dentation.
> I would think this mode might be a reasonable default eventually.

This sounds like a good idea. "Trust" is probably too strong a word, but
I can see this being useful even in non-whitespace-sensitive languages
with nested blocks (like C).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-02 22:48 ` [PATCH 5/7] diff.c: refactor internal representation for coloring moved code Stefan Beller
@ 2018-04-02 23:51   ` Jonathan Tan
  2018-04-03 18:59     ` Stefan Beller
  2018-04-06 21:28     ` Stefan Beller
  2018-04-03 19:39   ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 30+ messages in thread
From: Jonathan Tan @ 2018-04-02 23:51 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jacob.keller, simon, git

On Mon,  2 Apr 2018 15:48:52 -0700
Stefan Beller <sbeller@google.com> wrote:

> At the time the move coloring was implemented we thought an enum of modes
> is the best to configure this feature.  However as we want to tack on new
> features, the enum would grow exponentially.
> 
> Refactor the code such that features are enabled via bits. Currently we can
> * activate the move detection,
> * enable the block detection on top, and
> * enable the dimming inside a block, though this could be done without
>   block detection as well (mode "plain, dimmed")

Firstly, patches 1-4 are obviously correct.

As for this patch, I don't think that using flags is the right way to do
this. We are not under any size pressure for struct diff_options, and
the additional options that we plan to add (color-moved-whitespace-flags
and ignore-space-delta) can easily be additional fields instead.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation
  2018-04-02 23:47 ` [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
@ 2018-04-03  0:03   ` Jacob Keller
  2018-04-03 19:00     ` Stefan Beller
  0 siblings, 1 reply; 30+ messages in thread
From: Jacob Keller @ 2018-04-03  0:03 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Stefan Beller, Simon Ruderich, Git mailing list

On Mon, Apr 2, 2018 at 4:47 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Mon,  2 Apr 2018 15:48:47 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> This is a re-attempt of [1], which allows the moved code detection to
>> ignore blanks in various modes.
>>
>> patches 1-5 are refactoring, patch 6 adds all existing white space options
>> of regular diff to the move detection. (I am unsure about this patch,
>> as I presume we want to keep the option space at a minimum if possible).
>
> My preference is to not do this until a need has been demonstrated, but
> this sounds like it could be useful one day. I'll review the patches
> from the viewpoint that we do want this feature.
>
>> The fun is in the last patch, which allows white space sensitive
>> languages to trust the move detection, too. Each block that is marked as
>> moved will have the same delta in {in-, de-}dentation.
>> I would think this mode might be a reasonable default eventually.
>
> This sounds like a good idea. "Trust" is probably too strong a word, but
> I can see this being useful even in non-whitespace-sensitive languages
> with nested blocks (like C).

The ability to detect moved code despite whitespace changes would be
good, even while showing diffs with the whitespace intact.

We may not need *all* the options available, but allowing the internal
settings to enable this but have the user-visible controls be limited
should be totally fine.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option
  2018-04-02 22:48 ` [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option Stefan Beller
@ 2018-04-03  0:31   ` Jonathan Tan
  0 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2018-04-03  0:31 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jacob.keller, simon, git

On Mon,  2 Apr 2018 15:48:53 -0700
Stefan Beller <sbeller@google.com> wrote:

> In the original implementation of the move detection logic we assumed that
> the choice for ignoring white space changes is the same for the move
> detection as it is for the generic diff.  It turns out this is wrong.

I don't think we can say that this is wrong - just that we decided that
it would be useful to allow different whitespace handling methods when
calculating the diff and when detecting moves.

> There are a couple of things where the user wants to input their
> decision into the diff machinery:
> 
> * Whether or not to ignore white space for the general diff detection.
>   This is covered with the -w, -b options already.
> * Whether the move detection ought to pay attention to white spaces
>   in general. And if so, how are white spaces handled for the block
>   detection.
> 
> One example would be reviewing a current patch under discussion, that moves
> code around.  In that case, you may want to have the general diff machinery
> not ignore the white spaces (i.e. exact matching), as that will show you
> the diff as the patch sent. The code moved however may have another
> indentation level, such that you want to ignore white spaces for the move
> detection. The code may be in python, such that spaces matter for program
> flow, though. That is why you'd want each block to have the same change
> in white space.
> 
> This patch decouples white space treatment in the general diff machinery
> from the white space treatment in the move detection.
> 
> This adds all the the options for ignoring white space that the regular
> diff machinery has to the move detection, such that we can cover all the
> cases that were introduced in 7a55427094 (Merge branch
> 'jk/diff-color-moved-fix', 2017-11-06).

I would just write the above as follows:

  Allow the user to specify that whitespace should be ignored
  differently during detection of moved lines than during generation of
  added and removed lines. This is done by providing analogs to the
  --ignore-space-at-eol, -b, and -w options (namely,
  --color-moved-[no-]ignore-space-at-eol <fill in the rest>) that affect
  only the color of the output, and making the existing
  --ignore-space-at-eol, -b, and -w options no longer affect the color
  of the output.

(And if the above is not clear enough that this is a
backwards-incompatible change, we might need to call it out.)

> The example from above (different lines in the diff with -w for the regular
> diff) is covered in a test. Convert the existing tests to be more explicit
> on their choice of white space behavior in the move detection as the tests
> should not fail if the default is changed.

This statement is confusing to me - of course the tests should fail,
since we changed the defaults. I think it is sufficient to just mention
that whitespace handling has to be explicitly specified for the move
detection part.

> +--color-moved-[no-]ignore-all-space::
> +	Ignore whitespace when comparing lines when performing the move
> +	detection for --color-moved.  This ignores differences even if
> +	one line has whitespace where the other line has none.
> +--color-moved-[no-]ignore-space-change::
> +	Ignore changes in amount of whitespace when performing the move
> +	detection for --color-moved.  This ignores whitespace
> +	at line end, and considers all other sequences of one or
> +	more whitespace characters to be equivalent.
> +--color-moved-[no-]ignore-space-at-eol::
> +	Ignore changes in whitespace at EOL when performing the move
> +	detection for --color-moved.

Optional: I would reorder this to be in the same order as the analogous
options (--ignore-space-at-eol first, then -b, then -w).

> @@ -720,7 +720,7 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
>  
>  	return !xdiff_compare_lines(a->es->line, a->es->len,
>  				    b->es->line, b->es->len,
> -				    diffopt->xdl_opts);
> +				    diffopt->color_moved & XDF_WHITESPACE_FLAGS);
>  }
>  
>  static struct moved_entry *prepare_entry(struct diff_options *o,
> @@ -728,8 +728,9 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
>  {
>  	struct moved_entry *ret = xmalloc(sizeof(*ret));
>  	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
> +	unsigned flags = o->color_moved & XDF_WHITESPACE_FLAGS;
>  
> -	ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
> +	ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
>  	ret->es = l;
>  	ret->next_line = NULL;

These 2 changes looked suspicious to me at first, but then I saw that
moved_entry_cmp() and prepare_entry() are only used during move
detection.

> @@ -1419,7 +1422,10 @@ test_expect_success 'move detection ignoring whitespace ' '
>  	EOF
>  	test_cmp expected actual &&
>  
> -	git diff HEAD --no-renames -w --color-moved --color |
> +	git diff HEAD --no-renames --color-moved --color \
> +		--color-moved-ignore-all-space \
> +		--color-moved-no-ignore-space-change \
> +		--color-moved-no-ignore-space-at-eol |
>  		grep -v "index" |
>  		test_decode_color >actual &&
>  	cat <<-\EOF >expected &&

This change (removal of -w) looked suspicious to me at first, but then I
saw that the "-w" was only used to trigger whitespace insensitivity
during the move detection phase. (The line in question was moved, so "-"
and "+" lines would have been generated in the diff anyway regardless of
whether whitespace was ignored.) Same as to the other test
modifications.

> +test_expect_success 'move detection only ignores white spaces' '

This seems to be "only move detection ignores white spaces", not as
written. I would title this "--color-moved-ignore-all-space only ignores
whitespace during move detection".

> +	q_to_tab <<-\EOF >function.c &&
> +	int func()
> +	{
> +	Qif (foo) {
> +	QQ// this part of the function
> +	QQ// function will be very long
> +	QQ// indeed. We must exceed both
> +	QQ// per-line and number of line
> +	QQ// minimums
> +	QQ;
> +	Q}
> +	Qbaz();
> +	Qbar();
> +	Q// more unrelated stuff
> +	}
> +	EOF

Could the sample data be more concise? In particular, a few lines should
be sufficient for before:

  a long line to exceed per-line minimum
  another long line to exceed per-line minimum
  original file

and after:

  Qa long line to exceed per-line minimum
  Qanother long line to exceed per-line minimum
  new file

Then the ordinary diff (with -w) will only have "-" and "+" for
"original file" and "new file", and the one with only
--color-moved-ignore-all-space would have "-" and "+" for all lines.

> +	# Make sure we get a different diff using -w ("moved function header")

-w is not "moved function header", maybe reword.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-02 22:48 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
@ 2018-04-03  0:41   ` Jonathan Tan
  2018-04-03 19:22     ` Stefan Beller
  0 siblings, 1 reply; 30+ messages in thread
From: Jonathan Tan @ 2018-04-03  0:41 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jacob.keller, simon, git

On Mon,  2 Apr 2018 15:48:54 -0700
Stefan Beller <sbeller@google.com> wrote:

> +struct ws_delta {
> +	int deltachars;
> +	char firstchar;
> +};

I'll just make some overall design comments.

Shouldn't this be a string of characters (or a char* and len) and
whether it was added or removed? If you're only checking the first
character, this would not work if the other characters were different.

> @@ -717,10 +752,20 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
>  	const struct diff_options *diffopt = hashmap_cmp_fn_data;
>  	const struct moved_entry *a = entry;
>  	const struct moved_entry *b = entry_or_key;
> +	unsigned flags = diffopt->color_moved & XDF_WHITESPACE_FLAGS;
> +
> +	if (diffopt->color_moved & COLOR_MOVED_DELTA_WHITESPACES)
> +		/*
> +		 * As there is not specific white space config given,
> +		 * we'd need to check for a new block, so ignore all
> +		 * white space. The setup of the white space
> +		 * configuration for the next block is done else where
> +		 */
> +		flags |= XDF_IGNORE_WHITESPACE;
>  
>  	return !xdiff_compare_lines(a->es->line, a->es->len,
>  				    b->es->line, b->es->len,
> -				    diffopt->color_moved & XDF_WHITESPACE_FLAGS);
> +				    flags);
>  }

I think we should just prohibit combining this with any of the
whitespace ignoring flags except for the space-at-eol one. They seem to
contradict anyway.

> +test_expect_success 'compare whitespace delta across moved blocks' '
> +
> +	git reset --hard &&
> +	q_to_tab <<-\EOF >text.txt &&
> +	QIndented
> +	QText
> +	Qacross
> +	Qfive
> +	Qlines
> +	QBut!
> +	Qthis
> +	QQone
> +	Qline
> +	QQdid
> +	Qnot
> +	QQadjust
> +	EOF

Do we need 5 lines? I thought 2 would suffice. (It's the ALNUM_COUNT
that matters, as far as I know.) This makes it hard to see that the
"But!" line is the one that counts.

> +test_expect_success 'compare whitespace delta across moved blocks with multiple indentation levels' '
> +	# alternative: "python programmers would love the move detection, too"
> +
> +	git reset --hard &&
> +	q_to_tab <<-\EOF >test.py &&
> +	class test:
> +	Qdef f(x):
> +	QQ"""
> +	QQA simple python function
> +	QQthat returns the square of a number
> +	QQAlthough it may not be pythonic
> +	QQ"""
> +	QQdef g(x):
> +	QQQ"""
> +	QQQNested function that returns the same number
> +	QQQWe just multiply by 1.0
> +	QQQso we can write a comment about it
> +	QQQas we need longer blocks
> +	QQQ"""
> +	QQQreturn 1.0 * x
> +	QQ# Another comment for f(x)
> +	QQ# also spanning multiple lines
> +	QQ# to make a block
> +	QQreturn g(x) * g(x)
> +	Qdef h(x):
> +	QQ# Another function unrelated to the previous
> +	QQ# but building a block,
> +	QQ# long enough to call it a block
> +	QQreturn x * 1.0
> +	EOF

If the objective it just to show that the functions f and g are treated
as one unit despite their lines being of multiple indentation levels,
the test file could be much shorter.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-02 23:51   ` Jonathan Tan
@ 2018-04-03 18:59     ` Stefan Beller
  2018-04-06 21:28     ` Stefan Beller
  1 sibling, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2018-04-03 18:59 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jacob Keller, Simon Ruderich, git

On Mon, Apr 2, 2018 at 4:51 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Mon,  2 Apr 2018 15:48:52 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> At the time the move coloring was implemented we thought an enum of modes
>> is the best to configure this feature.  However as we want to tack on new
>> features, the enum would grow exponentially.
>>
>> Refactor the code such that features are enabled via bits. Currently we can
>> * activate the move detection,
>> * enable the block detection on top, and
>> * enable the dimming inside a block, though this could be done without
>>   block detection as well (mode "plain, dimmed")
>
> Firstly, patches 1-4 are obviously correct.
>
> As for this patch, I don't think that using flags is the right way to do
> this. We are not under any size pressure for struct diff_options, and
> the additional options that we plan to add (color-moved-whitespace-flags
> and ignore-space-delta) can easily be additional fields instead.

Gah! I'll give it a try.

When I came to the conclusion that the features of this series is
orthogonal to the existing code, bit fields came first to mind.

Let's see if the alternative is easier to read.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation
  2018-04-03  0:03   ` Jacob Keller
@ 2018-04-03 19:00     ` Stefan Beller
  2018-04-03 19:55       ` Jacob Keller
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Beller @ 2018-04-03 19:00 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jonathan Tan, Simon Ruderich, Git mailing list

>>> The fun is in the last patch, which allows white space sensitive
>>> languages to trust the move detection, too. Each block that is marked as
>>> moved will have the same delta in {in-, de-}dentation.
>>> I would think this mode might be a reasonable default eventually.
>>
>> This sounds like a good idea. "Trust" is probably too strong a word, but
>> I can see this being useful even in non-whitespace-sensitive languages
>> with nested blocks (like C).
>
> The ability to detect moved code despite whitespace changes would be
> good, even while showing diffs with the whitespace intact.

That is what the last patch is about.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-03  0:41   ` Jonathan Tan
@ 2018-04-03 19:22     ` Stefan Beller
  2018-04-03 20:38       ` Jonathan Tan
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Beller @ 2018-04-03 19:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jacob Keller, Simon Ruderich, git

On Mon, Apr 2, 2018 at 5:41 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Mon,  2 Apr 2018 15:48:54 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> +struct ws_delta {
>> +     int deltachars;
>> +     char firstchar;
>> +};
>
> I'll just make some overall design comments.
>
> Shouldn't this be a string of characters (or a char* and len) and
> whether it was added or removed? If you're only checking the first
> character, this would not work if the other characters were different.

I considered diving into this, but it seemed to be too complicated for
>95 % of the use cases, which can be approximated in change of the
first character.

Because if we take a string of characters, we'd also need to take care of
tricky conversions (e.g. Are 8 white spaces equal to a tab, and if so do
we break blocks if one line converts 8 ws to a tab?)

So I would definitely pursue the string instead of change of first
character, but what are all the heuristics to put in?

Just to be clear: The string would contain only the change in
white space up front, or would we also somehow store white space
in other parts?

- # This is a sample comment
- # across multiple lines
- # maybe even a license header
+ #     This is a sample comment
+ #     across multiple lines
+ #     maybe even a license header

How about this?


>
>> @@ -717,10 +752,20 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
>>       const struct diff_options *diffopt = hashmap_cmp_fn_data;
>>       const struct moved_entry *a = entry;
>>       const struct moved_entry *b = entry_or_key;
>> +     unsigned flags = diffopt->color_moved & XDF_WHITESPACE_FLAGS;
>> +
>> +     if (diffopt->color_moved & COLOR_MOVED_DELTA_WHITESPACES)
>> +             /*
>> +              * As there is not specific white space config given,
>> +              * we'd need to check for a new block, so ignore all
>> +              * white space. The setup of the white space
>> +              * configuration for the next block is done else where
>> +              */
>> +             flags |= XDF_IGNORE_WHITESPACE;
>>
>>       return !xdiff_compare_lines(a->es->line, a->es->len,
>>                                   b->es->line, b->es->len,
>> -                                 diffopt->color_moved & XDF_WHITESPACE_FLAGS);
>> +                                 flags);
>>  }
>
> I think we should just prohibit combining this with any of the
> whitespace ignoring flags except for the space-at-eol one. They seem to
> contradict anyway.

ok, we can narrow this one down to ignore all white space.

>
>> +test_expect_success 'compare whitespace delta across moved blocks' '
>> +
>> +     git reset --hard &&
>> +     q_to_tab <<-\EOF >text.txt &&
>> +     QIndented
>> +     QText
>> +     Qacross
>> +     Qfive
>> +     Qlines
>> +     QBut!
>> +     Qthis
>> +     QQone
>> +     Qline
>> +     QQdid
>> +     Qnot
>> +     QQadjust
>> +     EOF
>
> Do we need 5 lines? I thought 2 would suffice. (It's the ALNUM_COUNT
> that matters, as far as I know.) This makes it hard to see that the
> "But!" line is the one that counts.

I did not want to go with the bare minimum as then adjusting the minimum
would be a pain as these unrelated (to the minimum) test cases would
break.

>
>> +test_expect_success 'compare whitespace delta across moved blocks with multiple indentation levels' '

>> +     EOF
>
> If the objective it just to show that the functions f and g are treated
> as one unit despite their lines being of multiple indentation levels,
> the test file could be much shorter.

yeah, I noticed that we already test that in the test above where we
have that test after the "But!", where lines ziggy-zag. Will drop this test.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-02 22:48 ` [PATCH 5/7] diff.c: refactor internal representation for coloring moved code Stefan Beller
  2018-04-02 23:51   ` Jonathan Tan
@ 2018-04-03 19:39   ` Ævar Arnfjörð Bjarmason
  2018-04-03 19:49     ` Stefan Beller
  1 sibling, 1 reply; 30+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-04-03 19:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jonathantanmy, jacob.keller, simon, git


On Mon, Apr 02 2018, Stefan Beller wrote:

> At the time the move coloring was implemented we thought an enum of modes
> is the best to configure this feature.  However as we want to tack on new
> features, the enum would grow exponentially.
>
> Refactor the code such that features are enabled via bits. Currently we can
> * activate the move detection,
> * enable the block detection on top, and
> * enable the dimming inside a block, though this could be done without
>   block detection as well (mode "plain, dimmed")
>
> Choose the flags to not be at bit position 2,3,4 as the next patch
> will occupy these.

When I've been playing with colorMoved the thing I've found really
confusing is that the current config has confused two completely
unrelated things (at least, from a user perspective), what underlying
algorithm you use, and how the colors look.

I was helping someone at work the other day where they were trying:

    git -c color.diff.new="green bold" \
        -c color.diff.old="red bold" \
        -c color.diff.newMoved="green" \
        -c color.diff.oldMoved="red" \
        -c diff.colorMoved=plain show <commit>

But what gave better results was:

    git -c color.diff.new="green bold" \
        -c color.diff.old="red bold" \
        -c color.diff.newMoved="green" \
        -c color.diff.oldMoved="red" \
        -c diff.colorMoved=zebra \
        -c color.diff.oldMovedAlternative=red \
        -c color.diff.newMovedAlternative=green show <commit>

I don't have a public test commit to share (sorry), but I have an
internal example where "plain" will consider a thing as falling under
color.diff.old OR color.diff.oldMoved, but zebra will consider that
whole part only color.diff.old.

I see now that that might be since only the "zebra" supports the
*Alternative that it ends up "stealing" chunks from something that would
have otherwise been classified differently, so I have no idea if there's
an easy "solution", or if it's even a problem.

Sorry about being vague, I just dug this up from some old notes now
after this patch jolted my memory about it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-03 19:39   ` Ævar Arnfjörð Bjarmason
@ 2018-04-03 19:49     ` Stefan Beller
  2018-04-03 20:44       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Beller @ 2018-04-03 19:49 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jonathan Tan, Jacob Keller, Simon Ruderich, git

On Tue, Apr 3, 2018 at 12:39 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Mon, Apr 02 2018, Stefan Beller wrote:
>
>> At the time the move coloring was implemented we thought an enum of modes
>> is the best to configure this feature.  However as we want to tack on new
>> features, the enum would grow exponentially.
>>
>> Refactor the code such that features are enabled via bits. Currently we can
>> * activate the move detection,
>> * enable the block detection on top, and
>> * enable the dimming inside a block, though this could be done without
>>   block detection as well (mode "plain, dimmed")
>>
>> Choose the flags to not be at bit position 2,3,4 as the next patch
>> will occupy these.
>
> When I've been playing with colorMoved the thing I've found really
> confusing is that the current config has confused two completely
> unrelated things (at least, from a user perspective), what underlying
> algorithm you use, and how the colors look.

Not sure I follow. The colors are in color.diff.X and the algorithm is in
diff.colorMoved, whereas some colors are reused for different algorithms?

>
> I was helping someone at work the other day where they were trying:
>
>     git -c color.diff.new="green bold" \
>         -c color.diff.old="red bold" \
>         -c color.diff.newMoved="green" \
>         -c color.diff.oldMoved="red" \
>         -c diff.colorMoved=plain show <commit>
>
> But what gave better results was:
>
>     git -c color.diff.new="green bold" \
>         -c color.diff.old="red bold" \
>         -c color.diff.newMoved="green" \
>         -c color.diff.oldMoved="red" \
>         -c diff.colorMoved=zebra \
>         -c color.diff.oldMovedAlternative=red \
>         -c color.diff.newMovedAlternative=green show <commit>
>
> I don't have a public test commit to share (sorry), but I have an
> internal example where "plain" will consider a thing as falling under
> color.diff.old OR color.diff.oldMoved, but zebra will consider that
> whole part only color.diff.old.

What do you mean by "OR" ?
Is the hunk present multiple times and colored one or the other way?
Is it colored differently in different invocations of Git?
Is one hunk mixing up both colors?

Is the hunk "small" ?
small hunks are un-colored, to avoid showing empty lines
or closing braces as moved. But plain mode ignores this heuristic.

> I see now that that might be since only the "zebra" supports the
> *Alternative that it ends up "stealing" chunks from something that would
> have otherwise been classified differently, so I have no idea if there's
> an easy "solution", or if it's even a problem.

Can you describe the issue more to see if it is a problem?
(It sounds like a problem in the documentation/UX to me already
as the docs could not tell you what to expect)

> Sorry about being vague, I just dug this up from some old notes now
> after this patch jolted my memory about it.

ok.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation
  2018-04-03 19:00     ` Stefan Beller
@ 2018-04-03 19:55       ` Jacob Keller
  0 siblings, 0 replies; 30+ messages in thread
From: Jacob Keller @ 2018-04-03 19:55 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Tan, Simon Ruderich, Git mailing list

On Tue, Apr 3, 2018 at 12:00 PM, Stefan Beller <sbeller@google.com> wrote:
>>>> The fun is in the last patch, which allows white space sensitive
>>>> languages to trust the move detection, too. Each block that is marked as
>>>> moved will have the same delta in {in-, de-}dentation.
>>>> I would think this mode might be a reasonable default eventually.
>>>
>>> This sounds like a good idea. "Trust" is probably too strong a word, but
>>> I can see this being useful even in non-whitespace-sensitive languages
>>> with nested blocks (like C).
>>
>> The ability to detect moved code despite whitespace changes would be
>> good, even while showing diffs with the whitespace intact.
>
> That is what the last patch is about.

Right. I was trying to say "I agree with the goal, even if we don't
necessarily allow every possible white-space + color-moved lines
combination" (i.e. to avoid polluting the option space too much)

Thanks,
Jake

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-03 19:22     ` Stefan Beller
@ 2018-04-03 20:38       ` Jonathan Tan
  0 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2018-04-03 20:38 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jacob Keller, Simon Ruderich, git

On Tue, 3 Apr 2018 12:22:32 -0700
Stefan Beller <sbeller@google.com> wrote:

> On Mon, Apr 2, 2018 at 5:41 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> > On Mon,  2 Apr 2018 15:48:54 -0700
> > Stefan Beller <sbeller@google.com> wrote:
> >
> >> +struct ws_delta {
> >> +     int deltachars;
> >> +     char firstchar;
> >> +};
> >
> > I'll just make some overall design comments.
> >
> > Shouldn't this be a string of characters (or a char* and len) and
> > whether it was added or removed? If you're only checking the first
> > character, this would not work if the other characters were different.
> 
> I considered diving into this, but it seemed to be too complicated for
> >95 % of the use cases, which can be approximated in change of the
> first character.

It's true that most use cases can be approximated this way, but I don't
think that it's worth the approximation.

> Because if we take a string of characters, we'd also need to take care of
> tricky conversions (e.g. Are 8 white spaces equal to a tab, and if so do
> we break blocks if one line converts 8 ws to a tab?)

No conversions - spaces are spaces and tabs are tabs.

> So I would definitely pursue the string instead of change of first
> character, but what are all the heuristics to put in?

No heuristics - a few lines make a block if the same prefix (which
consists of all whitespace) was added or removed.

> Just to be clear: The string would contain only the change in
> white space up front, or would we also somehow store white space
> in other parts?

Only change in white space at the start of the line - this option only
handles space at the start of the line, right?

> - # This is a sample comment
> - # across multiple lines
> - # maybe even a license header
> + #     This is a sample comment
> + #     across multiple lines
> + #     maybe even a license header
> 
> How about this?

My understanding is that this patch does not handle this case.

> >> @@ -717,10 +752,20 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
> >>       const struct diff_options *diffopt = hashmap_cmp_fn_data;
> >>       const struct moved_entry *a = entry;
> >>       const struct moved_entry *b = entry_or_key;
> >> +     unsigned flags = diffopt->color_moved & XDF_WHITESPACE_FLAGS;
> >> +
> >> +     if (diffopt->color_moved & COLOR_MOVED_DELTA_WHITESPACES)
> >> +             /*
> >> +              * As there is not specific white space config given,
> >> +              * we'd need to check for a new block, so ignore all
> >> +              * white space. The setup of the white space
> >> +              * configuration for the next block is done else where
> >> +              */
> >> +             flags |= XDF_IGNORE_WHITESPACE;
> >>
> >>       return !xdiff_compare_lines(a->es->line, a->es->len,
> >>                                   b->es->line, b->es->len,
> >> -                                 diffopt->color_moved & XDF_WHITESPACE_FLAGS);
> >> +                                 flags);
> >>  }
> >
> > I think we should just prohibit combining this with any of the
> > whitespace ignoring flags except for the space-at-eol one. They seem to
> > contradict anyway.
> 
> ok, we can narrow this one down to ignore all white space.

What do you mean? The rationale for my comment is that I saw that you
need to specify a special flag to xdiff_compare_lines if
COLOR_MOVED_DELTA_WHITESPACES is set, which could conflict with other
flags that the user has explicitly set. So avoiding that case entirely
seems like a good idea, especially since it is logical to do so.

> >> +test_expect_success 'compare whitespace delta across moved blocks' '
> >> +
> >> +     git reset --hard &&
> >> +     q_to_tab <<-\EOF >text.txt &&
> >> +     QIndented
> >> +     QText
> >> +     Qacross
> >> +     Qfive
> >> +     Qlines
> >> +     QBut!
> >> +     Qthis
> >> +     QQone
> >> +     Qline
> >> +     QQdid
> >> +     Qnot
> >> +     QQadjust
> >> +     EOF
> >
> > Do we need 5 lines? I thought 2 would suffice. (It's the ALNUM_COUNT
> > that matters, as far as I know.) This makes it hard to see that the
> > "But!" line is the one that counts.
> 
> I did not want to go with the bare minimum as then adjusting the minimum
> would be a pain as these unrelated (to the minimum) test cases would
> break.

That is true, but it makes the test case harder to read now. If you're
worried about bumping into the minimum if we do adjust the minimum,
making the lines longer should be sufficient.

> >> +test_expect_success 'compare whitespace delta across moved blocks with multiple indentation levels' '
> 
> >> +     EOF
> >
> > If the objective it just to show that the functions f and g are treated
> > as one unit despite their lines being of multiple indentation levels,
> > the test file could be much shorter.
> 
> yeah, I noticed that we already test that in the test above where we
> have that test after the "But!", where lines ziggy-zag. Will drop this test.

OK, sounds good.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-03 19:49     ` Stefan Beller
@ 2018-04-03 20:44       ` Ævar Arnfjörð Bjarmason
  2018-04-03 21:05         ` [PATCH] diff: add a blocks mode for moved code detection Stefan Beller
  0 siblings, 1 reply; 30+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-04-03 20:44 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Tan, Jacob Keller, Simon Ruderich, git


On Tue, Apr 03 2018, Stefan Beller wrote:

> On Tue, Apr 3, 2018 at 12:39 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> On Mon, Apr 02 2018, Stefan Beller wrote:
>>
>>> At the time the move coloring was implemented we thought an enum of modes
>>> is the best to configure this feature.  However as we want to tack on new
>>> features, the enum would grow exponentially.
>>>
>>> Refactor the code such that features are enabled via bits. Currently we can
>>> * activate the move detection,
>>> * enable the block detection on top, and
>>> * enable the dimming inside a block, though this could be done without
>>>   block detection as well (mode "plain, dimmed")
>>>
>>> Choose the flags to not be at bit position 2,3,4 as the next patch
>>> will occupy these.
>>
>> When I've been playing with colorMoved the thing I've found really
>> confusing is that the current config has confused two completely
>> unrelated things (at least, from a user perspective), what underlying
>> algorithm you use, and how the colors look.
>
> Not sure I follow. The colors are in color.diff.X and the algorithm is in
> diff.colorMoved, whereas some colors are reused for different algorithms?
>
>>
>> I was helping someone at work the other day where they were trying:
>>
>>     git -c color.diff.new="green bold" \
>>         -c color.diff.old="red bold" \
>>         -c color.diff.newMoved="green" \
>>         -c color.diff.oldMoved="red" \
>>         -c diff.colorMoved=plain show <commit>
>>
>> But what gave better results was:
>>
>>     git -c color.diff.new="green bold" \
>>         -c color.diff.old="red bold" \
>>         -c color.diff.newMoved="green" \
>>         -c color.diff.oldMoved="red" \
>>         -c diff.colorMoved=zebra \
>>         -c color.diff.oldMovedAlternative=red \
>>         -c color.diff.newMovedAlternative=green show <commit>
>>
>> I don't have a public test commit to share (sorry), but I have an
>> internal example where "plain" will consider a thing as falling under
>> color.diff.old OR color.diff.oldMoved, but zebra will consider that
>> whole part only color.diff.old.
>
> What do you mean by "OR" ?
> Is the hunk present multiple times and colored one or the other way?
> Is it colored differently in different invocations of Git?
> Is one hunk mixing up both colors?
>
> Is the hunk "small" ?
> small hunks are un-colored, to avoid showing empty lines
> or closing braces as moved. But plain mode ignores this heuristic.
>
>> I see now that that might be since only the "zebra" supports the
>> *Alternative that it ends up "stealing" chunks from something that would
>> have otherwise been classified differently, so I have no idea if there's
>> an easy "solution", or if it's even a problem.
>
> Can you describe the issue more to see if it is a problem?
> (It sounds like a problem in the documentation/UX to me already
> as the docs could not tell you what to expect)
>
>> Sorry about being vague, I just dug this up from some old notes now
>> after this patch jolted my memory about it.

Forget about what I said so far, sorry, that was a really shitty
report. I dug into this a bit more and here's a better one.

I still can't share the actual diff I have in front of me (internal
code).

Currently we have plain, zebra & dimmed_zebra, and zebra is the
default.

I got an internal report from someone who had, because zebra looked
crappy in his terminal, moved to "plain", and was reporting getting
worse moved diffs as a result.

I found that there's essentially a missing setting between "plain" and
"zebra", in git command terms:

    # The "plain" setting
    git -c diff.colorMoved=true \
        -c diff.colorMoved=plain \
        show <commit>

    # We don't have this, it's "plain" but with "zebra" heuristics,
    # plain_zebra?
    git -c diff.colorMoved=true \
        -c color.diff.oldMovedAlternative="bold magenta" \
        -c color.diff.newMovedAlternative="bold yellow" \
        -c diff.colorMoved=zebra \
        show <commit>

    # The "zebra" setting.
    git -c diff.colorMoved=true \
        -c diff.colorMoved=zebra \
        show <commit>

Which is what I mean by the current config conflating two (to me)
unrelated things. One is how we, via any method, detect what's moved or
not, and the other is what color/format we use to present this to the
user.

You can feed that plain_zebra invocation input where it'll color-wise
produce something that looks *almost* like "plain", but will differ (and
usually be better) in what lines it decides to show as moved, which of
course is due to *MovedAlternative.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] diff: add a blocks mode for moved code detection
  2018-04-03 20:44       ` Ævar Arnfjörð Bjarmason
@ 2018-04-03 21:05         ` Stefan Beller
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2018-04-03 21:05 UTC (permalink / raw)
  To: avarab; +Cc: git, jacob.keller, jonathantanmy, sbeller, simon

> Currently we have plain, zebra & dimmed_zebra, and zebra is the
> default.
>
> I got an internal report from someone who had, because zebra looked
> crappy in his terminal, moved to "plain", and was reporting getting
> worse moved diffs as a result.
>
> I found that there's essentially a missing setting between "plain" and
> "zebra", in git command terms:
>
>     # The "plain" setting
>     git -c diff.colorMoved=true \
>         -c diff.colorMoved=plain \
>         show <commit>
>
>     # We don't have this, it's "plain" but with "zebra" heuristics,
>     # plain_zebra?
>     git -c diff.colorMoved=true \
>         -c color.diff.oldMovedAlternative="bold magenta" \
>         -c color.diff.newMovedAlternative="bold yellow" \
>         -c diff.colorMoved=zebra \
>         show <commit>
>
>     # The "zebra" setting.
>     git -c diff.colorMoved=true \
>         -c diff.colorMoved=zebra \
>         show <commit>
>
> Which is what I mean by the current config conflating two (to me)
> unrelated things. One is how we, via any method, detect what's moved or
> not, and the other is what color/format we use to present this to the
> user.

Oh I see.

Reading the docs again, maybe we want to have a "blocks" mode,
that is zebra with the same color for any block?

> You can feed that plain_zebra invocation input where it'll color-wise
> produce something that looks *almost* like "plain", but will differ (and
> usually be better) in what lines it decides to show as moved, which of
> course is due to *MovedAlternative.

I would think this is close to what you want (module implementation errors,
I did not run/test this code).

One could also argue that this is *too* weak, as when there are
multiple blocks of say 15 chars adjacent, they might be one large block.

---8<----

From dde04f6afa35a313fac3575100fe83b554ec2b59 Mon Sep 17 00:00:00 2001
From: Stefan Beller <sbeller@google.com>
Date: Tue, 3 Apr 2018 14:03:01 -0700
Subject: [PATCH] diff: add a blocks mode for moved code detection

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt | 5 +++++
 diff.c                         | 4 +++-
 diff.h                         | 5 +++--
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index c330c01ff0..abce5142d2 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -268,6 +268,11 @@ plain::
 	that are added somewhere else in the diff. This mode picks up any
 	moved line, but it is not very useful in a review to determine
 	if a block of code was moved without permutation.
+blocks:
+	Blocks of moved text of at least 20 alphanumeric characters
+	are detected greedily. The detected blocks are
+	painted using either the 'color.diff.{old,new}Moved' color.
+	Adjacent blocks cannot be told apart.
 zebra::
 	Blocks of moved text of at least 20 alphanumeric characters
 	are detected greedily. The detected blocks are
diff --git a/diff.c b/diff.c
index 21c3838b25..80dd8cbd9a 100644
--- a/diff.c
+++ b/diff.c
@@ -271,6 +271,8 @@ static int parse_color_moved(const char *arg)
 		return COLOR_MOVED_NO;
 	else if (!strcmp(arg, "plain"))
 		return COLOR_MOVED_PLAIN;
+	else if (!strcmp(arg, "blocks"))
+		return COLOR_MOVED_BLOCKS;
 	else if (!strcmp(arg, "zebra"))
 		return COLOR_MOVED_ZEBRA;
 	else if (!strcmp(arg, "default"))
@@ -899,7 +901,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
 		block_length++;
 
-		if (flipped_block)
+		if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
 	adjust_last_block(o, n, block_length);
diff --git a/diff.h b/diff.h
index 6bd278aac1..3a228861d9 100644
--- a/diff.h
+++ b/diff.h
@@ -207,8 +207,9 @@ struct diff_options {
 	enum {
 		COLOR_MOVED_NO = 0,
 		COLOR_MOVED_PLAIN = 1,
-		COLOR_MOVED_ZEBRA = 2,
-		COLOR_MOVED_ZEBRA_DIM = 3,
+		COLOR_MOVED_BLOCKS = 2,
+		COLOR_MOVED_ZEBRA = 3,
+		COLOR_MOVED_ZEBRA_DIM = 4,
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap
  2018-04-02 22:48 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
@ 2018-04-06 20:04   ` Jeff King
  2018-04-06 20:41     ` Stefan Beller
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2018-04-06 20:04 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jonathantanmy, jacob.keller, simon, git

On Mon, Apr 02, 2018 at 03:48:50PM -0700, Stefan Beller wrote:

> The diff options are passed to the compare function as
> 'hashmap_cmp_fn_data', which are given when the hashmaps
> are initialized.
> 
> A later patch will make use of the keydata to signal
> different settings for comparision.

I had to scratch my head here for a moment. Don't we use those options
as part of the comparison?

I took the "which" to mean "the compare function", but I think you mean
"we pass these diff options already when the hashmap is initialized".

Maybe something like this would be more clear:

  When we initialize the hashmap, we give it a pointer to the
  diff_options, which it then passes along to each call of the
  hashmap_cmp_fn function. There's no need to pass it a second time as
  the "keydata" parameter, and our comparison functions never look at
  keydata.

  This was a mistake left over from an earlier round of 2e2d5ac184
  (diff.c: color moved lines differently, 2017-06-30), before hashmap
  learned to pass the data pointer for us.

(I'm just guessing on the second paragraph based on a quick look at
git-blame and my recollection from the time).

-Peff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap
  2018-04-06 20:04   ` Jeff King
@ 2018-04-06 20:41     ` Stefan Beller
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2018-04-06 20:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, Jacob Keller, Simon Ruderich, git

Hi Jeff,

On Fri, Apr 6, 2018 at 1:04 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Apr 02, 2018 at 03:48:50PM -0700, Stefan Beller wrote:
>
>> The diff options are passed to the compare function as
>> 'hashmap_cmp_fn_data', which are given when the hashmaps
>> are initialized.
>>
>> A later patch will make use of the keydata to signal
>> different settings for comparision.
>
> I had to scratch my head here for a moment. Don't we use those options
> as part of the comparison?

Yes we do, but not as passed in here.

Stepping back a bit: There are 2 void pointers passed to the cmp function:

typedef int (*hashmap_cmp_fn)(const void *hashmap_cmp_fn_data,
      const void *entry, const void *entry_or_key,
      const void *keydata);

The hashmap_cmp_fn_data is the same pointer as we pass as
the equals_function data in

extern void hashmap_init(struct hashmap *map,
        hashmap_cmp_fn equals_function,
        const void *equals_function_data,
        size_t initial_size);

whereas the keydata is passed in either directly when calling cmp directly
or in hashmap_get_from_hash:

static inline void *hashmap_get_from_hash(const struct hashmap *map,
          unsigned int hash,
         const void *keydata)
{
        struct hashmap_entry key;
        hashmap_entry_init(&key, hash);
        return hashmap_get(map, &key, keydata);
}

It turns out we are passing the struct diff_options *o into the cmp
function twice, once via the inits equals_function_data, as well as
keydata directly. Omit the direct pass in.

This is mostly a cleanup as of now, as it turns out I do not need to
reuse the keydata field anyway.

> I took the "which" to mean "the compare function", but I think you mean
> "we pass these diff options already when the hashmap is initialized".

Oh, yes.

> Maybe something like this would be more clear:
>
>   When we initialize the hashmap, we give it a pointer to the
>   diff_options, which it then passes along to each call of the
>   hashmap_cmp_fn function. There's no need to pass it a second time as
>   the "keydata" parameter, and our comparison functions never look at
>   keydata.
>

Thanks for clearing this up, will take as-is.

>   This was a mistake left over from an earlier round of 2e2d5ac184
>   (diff.c: color moved lines differently, 2017-06-30), before hashmap
>   learned to pass the data pointer for us.

That sounds about right.

>
> (I'm just guessing on the second paragraph based on a quick look at
> git-blame and my recollection from the time).
>
> -Peff

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-02 23:51   ` Jonathan Tan
  2018-04-03 18:59     ` Stefan Beller
@ 2018-04-06 21:28     ` Stefan Beller
  2018-04-06 22:27       ` Jonathan Tan
  1 sibling, 1 reply; 30+ messages in thread
From: Stefan Beller @ 2018-04-06 21:28 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jacob Keller, Simon Ruderich, git

On Mon, Apr 2, 2018 at 4:51 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Mon,  2 Apr 2018 15:48:52 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> At the time the move coloring was implemented we thought an enum of modes
>> is the best to configure this feature.  However as we want to tack on new
>> features, the enum would grow exponentially.
>>
>> Refactor the code such that features are enabled via bits. Currently we can
>> * activate the move detection,
>> * enable the block detection on top, and
>> * enable the dimming inside a block, though this could be done without
>>   block detection as well (mode "plain, dimmed")
>
> Firstly, patches 1-4 are obviously correct.
>
> As for this patch, I don't think that using flags is the right way to do
> this.

Now that I redid it another way[1], I have the impression that this was the
right approach, because it allows for a short
  if (o->color_moved)
condition. If we treat white spaces separately, then we'd have to
have implications such as:

  if (some white space option)
    the enum = default if not given explicitely.

which we do not need in case of a flags field.

[1] Keeping the enum around and having an extra variable for the
white space related configuration.

> We are not under any size pressure for struct diff_options, and
> the additional options that we plan to add (color-moved-whitespace-flags
> and ignore-space-delta) can easily be additional fields instead.

The  traditional white space flags would want to be a field and occupy
the same bits in that field for ease of implementation, and the new option
would just fit in by picking a new place in the bit field.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-06 21:28     ` Stefan Beller
@ 2018-04-06 22:27       ` Jonathan Tan
  0 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2018-04-06 22:27 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jacob Keller, Simon Ruderich, git

On Fri, 6 Apr 2018 14:28:40 -0700
Stefan Beller <sbeller@google.com> wrote:

> Now that I redid it another way[1], I have the impression that this was the
> right approach, because it allows for a short
>   if (o->color_moved)
> condition. If we treat white spaces separately, then we'd have to
> have implications such as:
> 
>   if (some white space option)
>     the enum = default if not given explicitely.
> 
> which we do not need in case of a flags field.
> 
> [1] Keeping the enum around and having an extra variable for the
> white space related configuration.

Ah, I think I see what you mean. In your way, move detection can be
activated either by explicitly setting a color-move pattern (e.g.
zebra), or by setting a move-detection whitespace option, both of which
will set bits in the uint32.

As opposed to my proposed way, where you either have to set the default
explicitly (like you describe), or write "if (o->color_moved ||
o->color_moved_whitespace_handling)" instead of "if (o->color_moved)".

I don't think such an implicit dependence (whitespace option to
color-move pattern) is reason enough to use a bitfield, and I think the
opposite actually - this dependence should be in fact explicit, not
implicit. But I'm open to opinions from others.

> > We are not under any size pressure for struct diff_options, and
> > the additional options that we plan to add (color-moved-whitespace-flags
> > and ignore-space-delta) can easily be additional fields instead.
> 
> The  traditional white space flags would want to be a field and occupy
> the same bits in that field for ease of implementation, and the new option
> would just fit in by picking a new place in the bit field.

If we were to use bitfields, this would be the way, yes.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-24 21:03 [PATCHv2 " Stefan Beller
@ 2018-04-24 21:03 ` Stefan Beller
  2018-04-24 22:35   ` Jonathan Tan
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Beller @ 2018-04-24 21:03 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, simon, avarab, jacob.keller, Stefan Beller

This marks moved code still as blocks when their indentation level
changes uniformly.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt |  4 ++
 diff.c                         | 83 +++++++++++++++++++++++++++++++---
 diff.h                         |  2 +
 t/t4015-diff-whitespace.sh     | 54 ++++++++++++++++++++++
 4 files changed, 137 insertions(+), 6 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 7b2527b9a1..facdbc8f95 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -304,6 +304,10 @@ dimmed_zebra::
 	Ignore whitespace when comparing lines when performing the move
 	detection for --color-moved.  This ignores differences even if
 	one line has whitespace where the other line has none.
+--color-moved-[no-]ignore-space-prefix-delta::
+	Ignores whitespace when comparing lines when performing the move
+	detection for --color-moved. This ignores uniform differences
+	of white space at the beginning lines in moved blocks.
 
 --word-diff[=<mode>]::
 	Show a word diff, using the <mode> to delimit changed words.
diff --git a/diff.c b/diff.c
index b5819dd538..1227a4d2a8 100644
--- a/diff.c
+++ b/diff.c
@@ -709,6 +709,31 @@ struct moved_entry {
 	struct moved_entry *next_line;
 };
 
+struct ws_delta {
+	char *string; /* The prefix delta, which is the same in the block */
+	int direction; /* adding or removing the line? */
+	int missmatch; /* in the remainder */
+};
+#define WS_DELTA_INIT { NULL, 0, 0 }
+
+static void compute_ws_delta(const struct emitted_diff_symbol *a,
+			     const struct emitted_diff_symbol *b,
+			     struct ws_delta *out)
+{
+	const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
+	const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
+	int d = longer->len - shorter->len;
+
+	out->missmatch = !memcmp(longer->line + d, shorter->line, shorter->len);
+	out->string = xmemdupz(longer->line, d);
+	out->direction = (a == longer);
+}
+
+static int compare_ws_delta(const struct ws_delta *a, const struct ws_delta *b)
+{
+	return a->direction == b->direction && !strcmp(a->string, b->string);
+}
+
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			   const void *entry,
 			   const void *entry_or_key,
@@ -720,6 +745,15 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
+	if (diffopt->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
+		/*
+		 * As there is not specific white space config given,
+		 * we'd need to check for a new block, so ignore all
+		 * white space. The setup of the white space
+		 * configuration for the next block is done else where
+		 */
+		flags |= XDF_IGNORE_WHITESPACE;
+
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
 				    flags);
@@ -772,7 +806,8 @@ static void add_lines_to_move_detection(struct diff_options *o,
 }
 
 static int shrink_potential_moved_blocks(struct moved_entry **pmb,
-					 int pmb_nr)
+					 int pmb_nr,
+					 struct ws_delta **wsd)
 {
 	int lp, rp;
 
@@ -788,6 +823,10 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 
 		if (lp < pmb_nr && rp > -1 && lp < rp) {
 			pmb[lp] = pmb[rp];
+			if (*wsd) {
+				free((*wsd)[lp].string);
+				(*wsd)[lp] = (*wsd)[rp];
+			}
 			pmb[rp] = NULL;
 			rp--;
 			lp++;
@@ -837,8 +876,11 @@ static void mark_color_as_moved(struct diff_options *o,
 {
 	struct moved_entry **pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
-	int n, flipped_block = 1, block_length = 0;
 
+	struct ws_delta *wsd = NULL; /* white space deltas between pmb */
+	int wsd_alloc = 0;
+
+	int n, flipped_block = 1, block_length = 0;
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
 		struct hashmap *hm = NULL;
@@ -881,14 +923,31 @@ static void mark_color_as_moved(struct diff_options *o,
 			struct moved_entry *p = pmb[i];
 			struct moved_entry *pnext = (p && p->next_line) ?
 					p->next_line : NULL;
-			if (pnext && !hm->cmpfn(o, pnext, match, NULL)) {
-				pmb[i] = p->next_line;
+
+			if (o->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES) {
+				struct ws_delta out = WS_DELTA_INIT;
+
+				if (pnext)
+					compute_ws_delta(l, pnext->es, &out);
+				if (pnext &&
+				    !hm->cmpfn(o, pnext, match, NULL) &&
+				    compare_ws_delta(&out, &wsd[i])) {
+					pmb[i] = p->next_line;
+					/* wsd[i] is the same */
+				} else {
+					pmb[i] = NULL;
+				}
+				free(out.string);
 			} else {
-				pmb[i] = NULL;
+				if (pnext && !hm->cmpfn(o, pnext, match, NULL)) {
+					pmb[i] = p->next_line;
+				} else {
+					pmb[i] = NULL;
+				}
 			}
 		}
 
-		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
+		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr, &wsd);
 
 		if (pmb_nr == 0) {
 			/*
@@ -897,6 +956,10 @@ static void mark_color_as_moved(struct diff_options *o,
 			 */
 			for (; match; match = hashmap_get_next(hm, match)) {
 				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
+				if (o->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES) {
+					ALLOC_GROW(wsd, pmb_nr + 1, wsd_alloc);
+					compute_ws_delta(l, match->es, &wsd[pmb_nr]);
+				}
 				pmb[pmb_nr++] = match;
 			}
 
@@ -914,6 +977,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	adjust_last_block(o, n, block_length);
 
 	free(pmb);
+	free(wsd);
 }
 
 #define DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK \
@@ -4647,12 +4711,16 @@ int diff_opt_parse(struct diff_options *options,
 		options->color_moved_ws_handling &= ~XDF_IGNORE_WHITESPACE_CHANGE;
 	else if (!strcmp(arg, "--color-moved-no-ignore-space-at-eol"))
 		options->color_moved_ws_handling &= ~XDF_IGNORE_WHITESPACE_AT_EOL;
+	else if (!strcmp(arg, "--color-moved-no-ignore-space-prefix-delta"))
+		options->color_moved_ws_handling &= ~COLOR_MOVED_DELTA_WHITESPACES;
 	else if (!strcmp(arg, "--color-moved-ignore-all-space"))
 		options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
 	else if (!strcmp(arg, "--color-moved-ignore-space-change"))
 		options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE_CHANGE;
 	else if (!strcmp(arg, "--color-moved-ignore-space-at-eol"))
 		options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE_AT_EOL;
+	else if (!strcmp(arg, "--color-moved-ignore-space-prefix-delta"))
+		options->color_moved_ws_handling |= COLOR_MOVED_DELTA_WHITESPACES;
 	else if (!strcmp(arg, "--indent-heuristic"))
 		DIFF_XDL_SET(options, INDENT_HEURISTIC);
 	else if (!strcmp(arg, "--no-indent-heuristic"))
@@ -5558,6 +5626,9 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
 			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
+			if (o->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
+				o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
+
 			add_lines_to_move_detection(o, &add_lines, &del_lines);
 			mark_color_as_moved(o, &add_lines, &del_lines);
 			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
diff --git a/diff.h b/diff.h
index de5dc68005..b00ea76c08 100644
--- a/diff.h
+++ b/diff.h
@@ -214,6 +214,8 @@ struct diff_options {
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
+	/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
+	#define COLOR_MOVED_DELTA_WHITESPACES	(1 << 22)
 	int color_moved_ws_handling;
 };
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 751fc478dd..3e94cf051d 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1847,4 +1847,58 @@ test_expect_success 'only move detection ignores white spaces' '
 	test_cmp expected actual
 '
 
+test_expect_success 'compare whitespace delta across moved blocks' '
+
+	git reset --hard &&
+	q_to_tab <<-\EOF >text.txt &&
+	QIndented
+	QText across
+	Qthree lines
+	QBut! <- this stands out
+	Qthis one
+	QQline did
+	Qnot adjust
+	EOF
+
+	git add text.txt &&
+	git commit -m "add text.txt" &&
+
+	q_to_tab <<-\EOF >text.txt &&
+	QQIndented
+	QQText across
+	QQthree lines
+	QQQBut! <- this stands out
+	this one
+	Qline did
+	not adjust
+	EOF
+
+	git diff --color --color-moved-ignore-space-prefix-delta |
+		grep -v "index" |
+		test_decode_color >actual &&
+
+	q_to_tab <<-\EOF >expected &&
+		<BOLD>diff --git a/text.txt b/text.txt<RESET>
+		<BOLD>--- a/text.txt<RESET>
+		<BOLD>+++ b/text.txt<RESET>
+		<CYAN>@@ -1,7 +1,7 @@<RESET>
+		<RED>-QIndented<RESET>
+		<RED>-QText across<RESET>
+		<RED>-Qthree lines<RESET>
+		<RED>-QBut! <- this stands out<RESET>
+		<RED>-Qthis one<RESET>
+		<RED>-QQline did<RESET>
+		<RED>-Qnot adjust<RESET>
+		<GREEN>+<RESET>QQ<GREEN>Indented<RESET>
+		<GREEN>+<RESET>QQ<GREEN>Text across<RESET>
+		<GREEN>+<RESET>QQ<GREEN>three lines<RESET>
+		<GREEN>+<RESET>QQQ<GREEN>But! <- this stands out<RESET>
+		<GREEN>+<RESET><GREEN>this one<RESET>
+		<GREEN>+<RESET>Q<GREEN>line did<RESET>
+		<GREEN>+<RESET><GREEN>not adjust<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-24 21:03 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
@ 2018-04-24 22:35   ` Jonathan Tan
       [not found]     ` <CAGZ79kbGkHFSS9K8KKTwNikx3Tw+m+RMLY3RAf8SW_iK9a2OJQ@mail.gmail.com>
  0 siblings, 1 reply; 30+ messages in thread
From: Jonathan Tan @ 2018-04-24 22:35 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, simon, avarab, jacob.keller

On Tue, 24 Apr 2018 14:03:30 -0700
Stefan Beller <sbeller@google.com> wrote:

> +--color-moved-[no-]ignore-space-prefix-delta::
> +	Ignores whitespace when comparing lines when performing the move
> +	detection for --color-moved. This ignores uniform differences
> +	of white space at the beginning lines in moved blocks.

Setting this option means that moved lines may be indented or dedented,
and if they have been indented or dedented by the same amount, they are
still considered to be the same block. Maybe call this
--color-moved-allow-indentation-change.

> +struct ws_delta {
> +	char *string; /* The prefix delta, which is the same in the block */

Probably better described as "the difference between the '-' line and
the '+' line". This may be the empty string if there is no difference.

> +	int direction; /* adding or removing the line? */

What is the value when "added" and what when "removed"? Also, it is not
truly "added" or "removed", so a better way might be: 1 if the '-' line
is longer than the '+' line, and 0 otherwise. (And make sure that the
documentation is correct with respect to equal lines.)

> +	int missmatch; /* in the remainder */

s/missmatch/mismatch/
Also, what is this used for?

> +	if (diffopt->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
> +		/*
> +		 * As there is not specific white space config given,
> +		 * we'd need to check for a new block, so ignore all
> +		 * white space. The setup of the white space
> +		 * configuration for the next block is done else where
> +		 */
> +		flags |= XDF_IGNORE_WHITESPACE;
> +
>  	return !xdiff_compare_lines(a->es->line, a->es->len,
>  				    b->es->line, b->es->len,
>  				    flags);

I wrote in [1]:

  I think we should just prohibit combining this with any of the
  whitespace ignoring flags except for the space-at-eol one. They seem
  to contradict anyway.

To elaborate, adding a specific flag that may interfere with other
user-provided flags sounds like we're unnecessarily adding combinations
that we must keep track of, and that it's both logical (from a user's
point of view) and clearer (as for the code) to just forbid such
combinations.

[1] https://public-inbox.org/git/20180402174118.d204ec0d4b9d2fa7ebd77739@google.com/

>  	struct moved_entry **pmb = NULL; /* potentially moved blocks */
>  	int pmb_nr = 0, pmb_alloc = 0;
> -	int n, flipped_block = 1, block_length = 0;
>  
> +	struct ws_delta *wsd = NULL; /* white space deltas between pmb */
> +	int wsd_alloc = 0;
> +
> +	int n, flipped_block = 1, block_length = 0;

It seems like pmb and wsd are parallel arrays - could each wsd be
embedded into the corresponding entry of pmb instead?

> --- a/diff.h
> +++ b/diff.h
> @@ -214,6 +214,8 @@ struct diff_options {
>  	} color_moved;
>  	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
>  	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
> +	/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
> +	#define COLOR_MOVED_DELTA_WHITESPACES	(1 << 22)
>  	int color_moved_ws_handling;
>  };

Setting of DELTA_WHITESPACES should be a separate field, not as part of
ws_handling. It's fine for the ws_handling to be a bitset, since that's
how it's passed to xdiff_compare_lines(), but we don't need to do the
same for DELTA_WHITESPACES.

> +test_expect_success 'compare whitespace delta across moved blocks' '
> +
> +	git reset --hard &&
> +	q_to_tab <<-\EOF >text.txt &&
> +	QIndented
> +	QText across
> +	Qthree lines
> +	QBut! <- this stands out
> +	Qthis one
> +	QQline did
> +	Qnot adjust
> +	EOF
> +
> +	git add text.txt &&
> +	git commit -m "add text.txt" &&
> +
> +	q_to_tab <<-\EOF >text.txt &&
> +	QQIndented
> +	QQText across
> +	QQthree lines
> +	QQQBut! <- this stands out
> +	this one
> +	Qline did
> +	not adjust
> +	EOF
> +
> +	git diff --color --color-moved-ignore-space-prefix-delta |
> +		grep -v "index" |
> +		test_decode_color >actual &&
> +
> +	q_to_tab <<-\EOF >expected &&
> +		<BOLD>diff --git a/text.txt b/text.txt<RESET>
> +		<BOLD>--- a/text.txt<RESET>
> +		<BOLD>+++ b/text.txt<RESET>
> +		<CYAN>@@ -1,7 +1,7 @@<RESET>
> +		<RED>-QIndented<RESET>
> +		<RED>-QText across<RESET>
> +		<RED>-Qthree lines<RESET>
> +		<RED>-QBut! <- this stands out<RESET>
> +		<RED>-Qthis one<RESET>
> +		<RED>-QQline did<RESET>
> +		<RED>-Qnot adjust<RESET>
> +		<GREEN>+<RESET>QQ<GREEN>Indented<RESET>
> +		<GREEN>+<RESET>QQ<GREEN>Text across<RESET>
> +		<GREEN>+<RESET>QQ<GREEN>three lines<RESET>
> +		<GREEN>+<RESET>QQQ<GREEN>But! <- this stands out<RESET>
> +		<GREEN>+<RESET><GREEN>this one<RESET>
> +		<GREEN>+<RESET>Q<GREEN>line did<RESET>
> +		<GREEN>+<RESET><GREEN>not adjust<RESET>
> +	EOF
> +
> +	test_cmp expected actual
> +'

I would have expected every line except the "this stands out" line to be
colored differently than the usual RED and GREEN. Is this test output
expected?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
       [not found]     ` <CAGZ79kbGkHFSS9K8KKTwNikx3Tw+m+RMLY3RAf8SW_iK9a2OJQ@mail.gmail.com>
@ 2018-04-24 23:23       ` Stefan Beller
  2018-04-25  0:11         ` Jonathan Tan
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Beller @ 2018-04-24 23:23 UTC (permalink / raw)
  To: Jonathan Tan, git, Simon Ruderich,
	Ævar Arnfjörð Bjarmason, Jacob Keller

Replied to Jonathan only instead of all. My reply is below:

On Tue, Apr 24, 2018 at 3:55 PM, Stefan Beller <sbeller@google.com> wrote:
> On Tue, Apr 24, 2018 at 3:35 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
>> On Tue, 24 Apr 2018 14:03:30 -0700
>> Stefan Beller <sbeller@google.com> wrote:
>>
>>> +--color-moved-[no-]ignore-space-prefix-delta::
>>> +     Ignores whitespace when comparing lines when performing the move
>>> +     detection for --color-moved. This ignores uniform differences
>>> +     of white space at the beginning lines in moved blocks.
>>
>> Setting this option means that moved lines may be indented or dedented,
>> and if they have been indented or dedented by the same amount, they are
>> still considered to be the same block. Maybe call this
>> --color-moved-allow-indentation-change.
>
> ok, sounds good as well. I tried coming up with a name that refers to
> the block check as that is the important part.
>
>>> +struct ws_delta {
>>> +     char *string; /* The prefix delta, which is the same in the block */
>>
>> Probably better described as "the difference between the '-' line and
>> the '+' line". This may be the empty string if there is no difference.
>
> Makes sense.
>
>>
>>> +     int direction; /* adding or removing the line? */
>>
>> What is the value when "added" and what when "removed"? Also, it is not
>> truly "added" or "removed", so a better way might be: 1 if the '-' line
>> is longer than the '+' line, and 0 otherwise. (And make sure that the
>> documentation is correct with respect to equal lines.)
>>
>>> +     int missmatch; /* in the remainder */
>>
>> s/missmatch/mismatch/
>> Also, what is this used for?
>
> The mismatch should be used for (thanks for catching!)
> checking if the remainder of a line is the same, although a boolean
> may be not the correct choice. We know that the two strings
> passed into compute_ws_delta come from a complete white space
> agnostic comparison, so consider:
>
> + SP SP more TAB more
> + SP SP text TAB text
>
> - SP more TAB more
> - SP text TAB text
>
> which would mark this as a moved block. This is the feature
> working as intended, but what about
>
> + SP SP more TAB more
> + SP SP text TAB text
>
> - SP more SP more
> - SP text TAB text
>
> Note how the length of the strings is the same, hence the current
> code of
>
>     compute_ws_delta(...) {
>         int d = longer->len - shorter->len;
>         out->string = xmemdupz(longer->line, d);
>     }
>
> would work fine and not notice the "change in the remainder".
> That ought to be caught by the mismatch variable, that
> is assigned, but not used.
>
> The compare_ws_delta(a, b) needs to be extended to
>
>   !a->mismatch && !b->mismatch && existing_condition
>
> Ideally the example from above would be different depending
> on whether the other white space flags are given or not.
>
>>> +     if (diffopt->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
>>> +             /*
>>> +              * As there is not specific white space config given,
>>> +              * we'd need to check for a new block, so ignore all
>>> +              * white space. The setup of the white space
>>> +              * configuration for the next block is done else where
>>> +              */
>>> +             flags |= XDF_IGNORE_WHITESPACE;
>>> +
>>>       return !xdiff_compare_lines(a->es->line, a->es->len,
>>>                                   b->es->line, b->es->len,
>>>                                   flags);
>>
>> I wrote in [1]:
>>
>>   I think we should just prohibit combining this with any of the
>>   whitespace ignoring flags except for the space-at-eol one. They seem
>>   to contradict anyway.
>
> As outlined above, I think there are corner cases in which they do not
> contradict. So I think the COLOR_MOVED_DELTA_WHITESPACES
> will go into its own variable, and then we can solve the corner cases
> correctly.
>
>> To elaborate, adding a specific flag that may interfere with other
>> user-provided flags sounds like we're unnecessarily adding combinations
>> that we must keep track of, and that it's both logical (from a user's
>> point of view) and clearer (as for the code) to just forbid such
>> combinations.
>
> Yes, I think you mentioned this before. Thanks for reminding!
>
>>> +     struct ws_delta *wsd = NULL; /* white space deltas between pmb */
>>> +     int wsd_alloc = 0;
>>> +
>>> +     int n, flipped_block = 1, block_length = 0;
>>
>> It seems like pmb and wsd are parallel arrays - could each wsd be
>> embedded into the corresponding entry of pmb instead?
>
> I'll explore that. It sounds like a good idea for code hygiene.
> Although if you do not intend to use this feature, then keeping it separate
> would allow for a smaller footprint in memory.
>
>>
>>> --- a/diff.h
>>> +++ b/diff.h
>>> @@ -214,6 +214,8 @@ struct diff_options {
>>>       } color_moved;
>>>       #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
>>>       #define COLOR_MOVED_MIN_ALNUM_COUNT 20
>>> +     /* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
>>> +     #define COLOR_MOVED_DELTA_WHITESPACES   (1 << 22)
>>>       int color_moved_ws_handling;
>>>  };
>>
>> Setting of DELTA_WHITESPACES should be a separate field, not as part of
>> ws_handling. It's fine for the ws_handling to be a bitset, since that's
>> how it's passed to xdiff_compare_lines(), but we don't need to do the
>> same for DELTA_WHITESPACES.
>
> You are correct. Thanks for your patience in this series!
>
>> +     git diff --color --color-moved-ignore-space-prefix-delta |
>> +             grep -v "index" |
>> +             test_decode_color >actual &&
>
>>> +     q_to_tab <<-\EOF >expected &&
>>> +             <BOLD>diff --git a/text.txt b/text.txt<RESET>
>>> +             <BOLD>--- a/text.txt<RESET>
>>> +             <BOLD>+++ b/text.txt<RESET>
>>> +             <CYAN>@@ -1,7 +1,7 @@<RESET>
>>> +             <RED>-QIndented<RESET>
>>> +             <RED>-QText across<RESET>
>>> +             <RED>-Qthree lines<RESET>
>>> +             <RED>-QBut! <- this stands out<RESET>
>>> +             <RED>-Qthis one<RESET>
>>> +             <RED>-QQline did<RESET>
>>> +             <RED>-Qnot adjust<RESET>
>>> +             <GREEN>+<RESET>QQ<GREEN>Indented<RESET>
>>> +             <GREEN>+<RESET>QQ<GREEN>Text across<RESET>
>>> +             <GREEN>+<RESET>QQ<GREEN>three lines<RESET>
>>> +             <GREEN>+<RESET>QQQ<GREEN>But! <- this stands out<RESET>
>>> +             <GREEN>+<RESET><GREEN>this one<RESET>
>>> +             <GREEN>+<RESET>Q<GREEN>line did<RESET>
>>> +             <GREEN>+<RESET><GREEN>not adjust<RESET>
>>> +     EOF
>>> +
>>> +     test_cmp expected actual
>>> +'
>>
>> I would have expected every line except the "this stands out" line to be
>> colored differently than the usual RED and GREEN. Is this test output
>> expected?
>
> It is wrong indeed. I blindly copied the actual file once interactive testing
> confirmed it worked.
>
> The command is missing a --color-moved, as the --color-moved-whitespace-settings
> do not imply --color-moved, yet(?)
>
> Thanks,
> Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-24 23:23       ` Stefan Beller
@ 2018-04-25  0:11         ` Jonathan Tan
  0 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2018-04-25  0:11 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Simon Ruderich, Ævar Arnfjörð Bjarmason,
	Jacob Keller

On Tue, 24 Apr 2018 16:23:28 -0700
Stefan Beller <sbeller@google.com> wrote:

> >> s/missmatch/mismatch/
> >> Also, what is this used for?
> >
> > The mismatch should be used for (thanks for catching!)
> > checking if the remainder of a line is the same, although a boolean
> > may be not the correct choice. We know that the two strings
> > passed into compute_ws_delta come from a complete white space
> > agnostic comparison, so consider:
> >
> > + SP SP more TAB more
> > + SP SP text TAB text
> >
> > - SP more TAB more
> > - SP text TAB text
> >
> > which would mark this as a moved block. This is the feature
> > working as intended, but what about
> >
> > + SP SP more TAB more
> > + SP SP text TAB text
> >
> > - SP more SP more
> > - SP text TAB text
> >
> > Note how the length of the strings is the same, hence the current
> > code of
> >
> >     compute_ws_delta(...) {
> >         int d = longer->len - shorter->len;
> >         out->string = xmemdupz(longer->line, d);
> >     }
> >
> > would work fine and not notice the "change in the remainder".
> > That ought to be caught by the mismatch variable, that
> > is assigned, but not used.
> >
> > The compare_ws_delta(a, b) needs to be extended to
> >
> >   !a->mismatch && !b->mismatch && existing_condition
> >
> > Ideally the example from above would be different depending
> > on whether the other white space flags are given or not.

Thanks - this gives me food for thought.

I'm starting to think that it is impossible to avoid creating our own
string comparison function that:
 - seeks to the first non-whitespace character in both strings
 - checks that both strings, from that first non-whitespace characters,
   are equal for some definition of equal (whether through strcmp or
   xdiff_compare_lines)
 - walks backwards from that first non-whitespace characters to look for
   the first non-matching whitespace character between the 2 strings

The existing diff whitespace modes (to be passed to xdiff_compare_lines)
do not give the exact result we want. For example, if
XDF_IGNORE_WHITESPACE is used (as is in this patch), lines like "a b"
and "ab " would match even though they shouldn't.

This comparison function can be used both in moved_entry_cmp() and when
constructing the ws_delta (in which case it should be made to output
whatever information is needed as out parameters or similar).

> >>> +     if (diffopt->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
> >>> +             /*
> >>> +              * As there is not specific white space config given,
> >>> +              * we'd need to check for a new block, so ignore all
> >>> +              * white space. The setup of the white space
> >>> +              * configuration for the next block is done else where
> >>> +              */
> >>> +             flags |= XDF_IGNORE_WHITESPACE;
> >>> +
> >>>       return !xdiff_compare_lines(a->es->line, a->es->len,
> >>>                                   b->es->line, b->es->len,
> >>>                                   flags);

We can't add XDF_IGNORE_WHITESPACE here - as far as I can tell, this
means that more lines will be treated as moved than the user wants (if
the user did not set --color-moved-ignore-all-space).

> >> It seems like pmb and wsd are parallel arrays - could each wsd be
> >> embedded into the corresponding entry of pmb instead?
> >
> > I'll explore that. It sounds like a good idea for code hygiene.
> > Although if you do not intend to use this feature, then keeping it separate
> > would allow for a smaller footprint in memory.

If you're worried about memory, wsd can be embedded as a pointer.

> > The command is missing a --color-moved, as the --color-moved-whitespace-settings
> > do not imply --color-moved, yet(?)

Maybe one should imply the other (or at least a warning).

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-04-25  0:11 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
2018-04-02 22:48 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
2018-04-02 22:48 ` [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
2018-04-02 22:48 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
2018-04-06 20:04   ` Jeff King
2018-04-06 20:41     ` Stefan Beller
2018-04-02 22:48 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
2018-04-02 22:48 ` [PATCH 5/7] diff.c: refactor internal representation for coloring moved code Stefan Beller
2018-04-02 23:51   ` Jonathan Tan
2018-04-03 18:59     ` Stefan Beller
2018-04-06 21:28     ` Stefan Beller
2018-04-06 22:27       ` Jonathan Tan
2018-04-03 19:39   ` Ævar Arnfjörð Bjarmason
2018-04-03 19:49     ` Stefan Beller
2018-04-03 20:44       ` Ævar Arnfjörð Bjarmason
2018-04-03 21:05         ` [PATCH] diff: add a blocks mode for moved code detection Stefan Beller
2018-04-02 22:48 ` [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option Stefan Beller
2018-04-03  0:31   ` Jonathan Tan
2018-04-02 22:48 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
2018-04-03  0:41   ` Jonathan Tan
2018-04-03 19:22     ` Stefan Beller
2018-04-03 20:38       ` Jonathan Tan
2018-04-02 23:47 ` [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
2018-04-03  0:03   ` Jacob Keller
2018-04-03 19:00     ` Stefan Beller
2018-04-03 19:55       ` Jacob Keller
  -- strict thread matches above, loose matches on Subject: below --
2018-04-24 21:03 [PATCHv2 " Stefan Beller
2018-04-24 21:03 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
2018-04-24 22:35   ` Jonathan Tan
     [not found]     ` <CAGZ79kbGkHFSS9K8KKTwNikx3Tw+m+RMLY3RAf8SW_iK9a2OJQ@mail.gmail.com>
2018-04-24 23:23       ` Stefan Beller
2018-04-25  0:11         ` Jonathan Tan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).