* [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling
@ 2017-08-11 22:49 Jonathan Tan
2017-08-11 22:49 ` [RFC PATCH 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
` (6 more replies)
0 siblings, 7 replies; 30+ messages in thread
From: Jonathan Tan @ 2017-08-11 22:49 UTC (permalink / raw)
To: git; +Cc: Jonathan Tan, sbeller
Note that these patches are for "next", depending on the "--color-moved"
patches.
While working on something else [1], I noticed some irregularities with
how "diff --color-moved" treats the minimum block size, occasionally
coloring blocks smaller than that as moved.
I've marked this as RFC because now I'm not sure if minimum number of
lines is the best way to handle the minimum block size. In particular,
an existing test (in t4015) assumes that 2-line blocks can be
legitimately colored as moved (and notice that it is now marked
test_expect_failure in patch 3), and in my work, I would also like the
rules to be relaxed. What do you think of also changing the rule to,
say, "minimum 10 non-whitespace characters"?
[1] https://public-inbox.org/git/cover.1502483486.git.jonathantanmy@google.com/
Jonathan Tan (3):
diff: avoid redundantly clearing a flag
diff: respect MIN_BLOCK_LENGTH for last block
diff: check MIN_BLOCK_LENGTH at start of new block
diff.c | 35 +++++++++++++++-----
t/t4015-diff-whitespace.sh | 80 +++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 106 insertions(+), 9 deletions(-)
--
2.14.0.434.g98096fd7a8-goog
^ permalink raw reply [flat|nested] 30+ messages in thread* [RFC PATCH 1/3] diff: avoid redundantly clearing a flag 2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan @ 2017-08-11 22:49 ` Jonathan Tan 2017-08-14 17:13 ` Stefan Beller 2017-08-11 22:49 ` [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan ` (5 subsequent siblings) 6 siblings, 1 reply; 30+ messages in thread From: Jonathan Tan @ 2017-08-11 22:49 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in mark_color_as_moved(), so it is redundant to clear it for the current line. Therefore, clear it only for previous lines. This makes a refactoring in a subsequent patch easier. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diff.c b/diff.c index f84346b47..4965ffbc4 100644 --- a/diff.c +++ b/diff.c @@ -895,7 +895,7 @@ static void mark_color_as_moved(struct diff_options *o, if (!match) { if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH && o->color_moved != COLOR_MOVED_PLAIN) { - for (i = 0; i < block_length + 1; i++) { + for (i = 1; i < block_length + 1; i++) { l = &o->emitted_symbols->buf[n - i]; l->flags &= ~DIFF_SYMBOL_MOVED_LINE; } -- 2.14.0.434.g98096fd7a8-goog ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 1/3] diff: avoid redundantly clearing a flag 2017-08-11 22:49 ` [RFC PATCH 1/3] diff: avoid redundantly clearing a flag Jonathan Tan @ 2017-08-14 17:13 ` Stefan Beller 0 siblings, 0 replies; 30+ messages in thread From: Stefan Beller @ 2017-08-14 17:13 UTC (permalink / raw) To: Jonathan Tan; +Cc: git@vger.kernel.org On Fri, Aug 11, 2017 at 3:49 PM, Jonathan Tan <jonathantanmy@google.com> wrote: > No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in > mark_color_as_moved(), so it is redundant to clear it for the current > line. Therefore, clear it only for previous lines. Oh that part. I remember debating with myself if I rather want to have the upper bound adjusted by one less (block_length instead of 'block_length + 1'), and then add a constant to 'buf[n - i];' The patch as implemented is fine, too. > This makes a refactoring in a subsequent patch easier. > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > diff.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/diff.c b/diff.c > index f84346b47..4965ffbc4 100644 > --- a/diff.c > +++ b/diff.c > @@ -895,7 +895,7 @@ static void mark_color_as_moved(struct diff_options *o, > if (!match) { > if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH && > o->color_moved != COLOR_MOVED_PLAIN) { > - for (i = 0; i < block_length + 1; i++) { > + for (i = 1; i < block_length + 1; i++) { > l = &o->emitted_symbols->buf[n - i]; > l->flags &= ~DIFF_SYMBOL_MOVED_LINE; > } > -- > 2.14.0.434.g98096fd7a8-goog > ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block 2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan 2017-08-11 22:49 ` [RFC PATCH 1/3] diff: avoid redundantly clearing a flag Jonathan Tan @ 2017-08-11 22:49 ` Jonathan Tan 2017-08-14 17:17 ` Stefan Beller 2017-08-11 22:49 ` [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan ` (4 subsequent siblings) 6 siblings, 1 reply; 30+ messages in thread From: Jonathan Tan @ 2017-08-11 22:49 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line that does not belong to the current block. In particular, this means that MIN_BLOCK_LENGTH is not checked after all lines are encountered. Perform that check. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- diff.c | 29 ++++++++++++++++++++++------- t/t4015-diff-whitespace.sh | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 7 deletions(-) diff --git a/diff.c b/diff.c index 4965ffbc4..95620b130 100644 --- a/diff.c +++ b/diff.c @@ -858,6 +858,26 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb, return rp + 1; } +/* + * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing. + * + * Otherwise, if the last block has fewer lines than + * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in + * that block. + * + * The last block consists of the (n - block_length)'th line up to but not + * including the nth line. + */ +static void adjust_last_block(struct diff_options *o, int n, int block_length) +{ + int i; + if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH || + o->color_moved == COLOR_MOVED_PLAIN) + return; + for (i = 1; i < block_length + 1; i++) + o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE; +} + /* Find blocks of moved code, delegate actual coloring decision to helper */ static void mark_color_as_moved(struct diff_options *o, struct hashmap *add_lines, @@ -893,13 +913,7 @@ static void mark_color_as_moved(struct diff_options *o, } if (!match) { - if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH && - o->color_moved != COLOR_MOVED_PLAIN) { - for (i = 1; i < block_length + 1; i++) { - l = &o->emitted_symbols->buf[n - i]; - l->flags &= ~DIFF_SYMBOL_MOVED_LINE; - } - } + adjust_last_block(o, n, block_length); pmb_nr = 0; block_length = 0; continue; @@ -941,6 +955,7 @@ static void mark_color_as_moved(struct diff_options *o, if (flipped_block) l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; } + adjust_last_block(o, n, block_length); free(pmb); } diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index c3b697411..6f7758e5c 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1382,6 +1382,41 @@ EOF test_cmp expected actual ' +test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' ' + git reset --hard && + >bar && + cat <<-\EOF >foo && + irrelevant_line + line1 + EOF + git add foo bar && + git commit -m x && + + cat <<-\EOF >bar && + line1 + EOF + cat <<-\EOF >foo && + irrelevant_line + EOF + + git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual && + cat >expected <<-\EOF && + <BOLD>diff --git a/bar b/bar<RESET> + <BOLD>--- a/bar<RESET> + <BOLD>+++ b/bar<RESET> + <CYAN>@@ -0,0 +1 @@<RESET> + <GREEN>+<RESET><GREEN>line1<RESET> + <BOLD>diff --git a/foo b/foo<RESET> + <BOLD>--- a/foo<RESET> + <BOLD>+++ b/foo<RESET> + <CYAN>@@ -1,2 +1 @@<RESET> + irrelevant_line<RESET> + <RED>-line1<RESET> + EOF + + test_cmp expected actual +' + test_expect_success 'move detection with submodules' ' test_create_repo bananas && echo ripe >bananas/recipe && -- 2.14.0.434.g98096fd7a8-goog ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block 2017-08-11 22:49 ` [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan @ 2017-08-14 17:17 ` Stefan Beller 0 siblings, 0 replies; 30+ messages in thread From: Stefan Beller @ 2017-08-14 17:17 UTC (permalink / raw) To: Jonathan Tan; +Cc: git@vger.kernel.org On Fri, Aug 11, 2017 at 3:49 PM, Jonathan Tan <jonathantanmy@google.com> wrote: > Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line > that does not belong to the current block. In particular, this means > that MIN_BLOCK_LENGTH is not checked after all lines are encountered. > > Perform that check. Thanks for spotting! This fix looks straightforward correct. (Also thanks for factoring out the adjustment, I am tempted to start a bike shedding discussion about its name, though. :P) > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > diff.c | 29 ++++++++++++++++++++++------- > t/t4015-diff-whitespace.sh | 35 +++++++++++++++++++++++++++++++++++ > 2 files changed, 57 insertions(+), 7 deletions(-) > > diff --git a/diff.c b/diff.c > index 4965ffbc4..95620b130 100644 > --- a/diff.c > +++ b/diff.c > @@ -858,6 +858,26 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb, > return rp + 1; > } > > +/* > + * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing. > + * > + * Otherwise, if the last block has fewer lines than > + * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in > + * that block. > + * > + * The last block consists of the (n - block_length)'th line up to but not > + * including the nth line. > + */ > +static void adjust_last_block(struct diff_options *o, int n, int block_length) > +{ > + int i; > + if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH || > + o->color_moved == COLOR_MOVED_PLAIN) > + return; > + for (i = 1; i < block_length + 1; i++) > + o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE; > +} > + > /* Find blocks of moved code, delegate actual coloring decision to helper */ > static void mark_color_as_moved(struct diff_options *o, > struct hashmap *add_lines, > @@ -893,13 +913,7 @@ static void mark_color_as_moved(struct diff_options *o, > } > > if (!match) { > - if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH && > - o->color_moved != COLOR_MOVED_PLAIN) { > - for (i = 1; i < block_length + 1; i++) { > - l = &o->emitted_symbols->buf[n - i]; > - l->flags &= ~DIFF_SYMBOL_MOVED_LINE; > - } > - } > + adjust_last_block(o, n, block_length); > pmb_nr = 0; > block_length = 0; > continue; > @@ -941,6 +955,7 @@ static void mark_color_as_moved(struct diff_options *o, > if (flipped_block) > l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; > } > + adjust_last_block(o, n, block_length); > > free(pmb); > } > diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh > index c3b697411..6f7758e5c 100755 > --- a/t/t4015-diff-whitespace.sh > +++ b/t/t4015-diff-whitespace.sh > @@ -1382,6 +1382,41 @@ EOF > test_cmp expected actual > ' > > +test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' ' > + git reset --hard && > + >bar && > + cat <<-\EOF >foo && > + irrelevant_line > + line1 > + EOF > + git add foo bar && > + git commit -m x && > + > + cat <<-\EOF >bar && > + line1 > + EOF > + cat <<-\EOF >foo && > + irrelevant_line > + EOF > + > + git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual && > + cat >expected <<-\EOF && > + <BOLD>diff --git a/bar b/bar<RESET> > + <BOLD>--- a/bar<RESET> > + <BOLD>+++ b/bar<RESET> > + <CYAN>@@ -0,0 +1 @@<RESET> > + <GREEN>+<RESET><GREEN>line1<RESET> > + <BOLD>diff --git a/foo b/foo<RESET> > + <BOLD>--- a/foo<RESET> > + <BOLD>+++ b/foo<RESET> > + <CYAN>@@ -1,2 +1 @@<RESET> > + irrelevant_line<RESET> > + <RED>-line1<RESET> > + EOF > + > + test_cmp expected actual > +' > + > test_expect_success 'move detection with submodules' ' > test_create_repo bananas && > echo ripe >bananas/recipe && > -- > 2.14.0.434.g98096fd7a8-goog > ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block 2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan 2017-08-11 22:49 ` [RFC PATCH 1/3] diff: avoid redundantly clearing a flag Jonathan Tan 2017-08-11 22:49 ` [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan @ 2017-08-11 22:49 ` Jonathan Tan 2017-08-14 17:22 ` Stefan Beller 2017-08-12 0:39 ` [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Junio C Hamano ` (3 subsequent siblings) 6 siblings, 1 reply; 30+ messages in thread From: Jonathan Tan @ 2017-08-11 22:49 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller When noticing that the current line is not the continuation of the current block, but the start of a new one, mark_color_as_moved() does not check the length of the current block. Perform that check. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- diff.c | 6 +++++- t/t4015-diff-whitespace.sh | 45 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 49 insertions(+), 2 deletions(-) diff --git a/diff.c b/diff.c index 95620b130..04fc0d65d 100644 --- a/diff.c +++ b/diff.c @@ -920,7 +920,6 @@ static void mark_color_as_moved(struct diff_options *o, } l->flags |= DIFF_SYMBOL_MOVED_LINE; - block_length++; if (o->color_moved == COLOR_MOVED_PLAIN) continue; @@ -950,8 +949,13 @@ static void mark_color_as_moved(struct diff_options *o, } flipped_block = (flipped_block + 1) % 2; + + adjust_last_block(o, n, block_length); + block_length = 0; } + block_length++; + if (flipped_block) l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; } diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index 6f7758e5c..401dc8f08 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1015,7 +1015,7 @@ test_expect_success 'detect moved code, complete file' ' test_cmp expected actual ' -test_expect_success 'detect malicious moved code, inside file' ' +test_expect_failure 'detect malicious moved code, inside file' ' test_config color.diff.oldMoved "normal red" && test_config color.diff.newMoved "normal green" && test_config color.diff.oldMovedAlternative "blue" && @@ -1417,6 +1417,49 @@ test_expect_success '--color-moved block at end of diff output respects MIN_BLOC test_cmp expected actual ' +test_expect_success '--color-moved treats adjacent blocks as separate for MIN_BLOCK_LENGTH' ' + git reset --hard && + cat <<-\EOF >foo && + line1 + irrelevant_line + line2 + line3 + EOF + >bar && + git add foo bar && + git commit -m x && + + cat <<-\EOF >foo && + irrelevant_line + EOF + cat <<-\EOF >bar && + line2 + line3 + line1 + EOF + + git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual && + cat >expected <<-\EOF && + <BOLD>diff --git a/bar b/bar<RESET> + <BOLD>--- a/bar<RESET> + <BOLD>+++ b/bar<RESET> + <CYAN>@@ -0,0 +1,3 @@<RESET> + <GREEN>+<RESET><GREEN>line2<RESET> + <GREEN>+<RESET><GREEN>line3<RESET> + <GREEN>+<RESET><GREEN>line1<RESET> + <BOLD>diff --git a/foo b/foo<RESET> + <BOLD>--- a/foo<RESET> + <BOLD>+++ b/foo<RESET> + <CYAN>@@ -1,4 +1 @@<RESET> + <RED>-line1<RESET> + irrelevant_line<RESET> + <RED>-line2<RESET> + <RED>-line3<RESET> + EOF + + test_cmp expected actual +' + test_expect_success 'move detection with submodules' ' test_create_repo bananas && echo ripe >bananas/recipe && -- 2.14.0.434.g98096fd7a8-goog ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block 2017-08-11 22:49 ` [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan @ 2017-08-14 17:22 ` Stefan Beller 0 siblings, 0 replies; 30+ messages in thread From: Stefan Beller @ 2017-08-14 17:22 UTC (permalink / raw) To: Jonathan Tan; +Cc: git@vger.kernel.org On Fri, Aug 11, 2017 at 3:49 PM, Jonathan Tan <jonathantanmy@google.com> wrote: > When noticing that the current line is not the continuation of the > current block, but the start of a new one, mark_color_as_moved() does > not check the length of the current block. Perform that check. As far as I remember that behavior is intentional, as indicated by the succeeding test. The whole MIN_BLOCK_LENGTH thing is a hack IMHO as we did not have a better heuristic for suppressing uninteresting "moved" lines such as closing braces in C. The information that a thing is moved in between two blocks is more valuable than pointing out it is just 'new' or 'old'. As this is changing behavior in a way that seems controversial, can you give your motivation/example for why this behavior is better? (Do we want to put it into an option/mode?) ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling 2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan ` (2 preceding siblings ...) 2017-08-11 22:49 ` [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan @ 2017-08-12 0:39 ` Junio C Hamano 2017-08-14 17:29 ` Stefan Beller 2017-08-14 21:31 ` [PATCH v2 " Jonathan Tan ` (2 subsequent siblings) 6 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2017-08-12 0:39 UTC (permalink / raw) To: Jonathan Tan; +Cc: git, sbeller Jonathan Tan <jonathantanmy@google.com> writes: > Note that these patches are for "next", depending on the "--color-moved" > patches. As we have finished Git 2.14 cycle, in preparation for the next one, the 'next' branch will be rewound and rebuilt early next week. I do not mind tentatively ejecting some topics that needs fix-ups out of 'next' to give them a clean restart. My preference however is to keep sb/diff-color-move topic as-is without replacing and fixing it with incremental updates like these patches. Thanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling 2017-08-12 0:39 ` [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Junio C Hamano @ 2017-08-14 17:29 ` Stefan Beller 2017-08-14 19:37 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Stefan Beller @ 2017-08-14 17:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Tan, git@vger.kernel.org On Fri, Aug 11, 2017 at 5:39 PM, Junio C Hamano <gitster@pobox.com> wrote: > Jonathan Tan <jonathantanmy@google.com> writes: > >> Note that these patches are for "next", depending on the "--color-moved" >> patches. > > As we have finished Git 2.14 cycle, in preparation for the next one, > the 'next' branch will be rewound and rebuilt early next week. I do > not mind tentatively ejecting some topics that needs fix-ups out of > 'next' to give them a clean restart. These would cleanly apply on sb/diff-color-move. > > My preference however is to keep sb/diff-color-move topic as-is > without replacing and fixing it with incremental updates like these > patches. I would have hoped to not need to reroll that topic. Though I do find patches 1&2 valuable either on top or squashed into "[PATCH] diff.c: color moved lines differently" and "[PATCH] diff.c: color moved lines differently, plain mode" respectively. So I'd ask to pick at least patches 1&2 on top of that series, please? (I am missing the context for *why* you preference is to not do exactly this). Thanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling 2017-08-14 17:29 ` Stefan Beller @ 2017-08-14 19:37 ` Junio C Hamano 2017-08-14 19:51 ` Stefan Beller 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2017-08-14 19:37 UTC (permalink / raw) To: Stefan Beller; +Cc: Jonathan Tan, git@vger.kernel.org Stefan Beller <sbeller@google.com> writes: > On Fri, Aug 11, 2017 at 5:39 PM, Junio C Hamano <gitster@pobox.com> wrote: > ... >> My preference however is to keep sb/diff-color-move topic as-is >> without replacing and fixing it with incremental updates like these >> patches. > > I would have hoped to not need to reroll that topic. > Though I do find patches 1&2 valuable either on top or squashed > into "[PATCH] diff.c: color moved lines differently" and > "[PATCH] diff.c: color moved lines differently, plain mode" > respectively. > > So I'd ask to pick at least patches 1&2 on top of that series, please? Yeah, that is exactly what I did before reading this message but after reading your comments on the patches ;-) > (I am missing the context for *why* you preference is to not do > exactly this). I see what I wrote can be misread, especially due to its lack of ",instead", that I want to keep the broken one as-is, with neither reroll nor fixup. That is not what I meant. - If you choose to squash so that the resulting history after the series graduates to 'master' will be simpler to read (due to lack of "oops, that was a mistake"), I do not mind a reroll. - On the other hand, as the topic has been in 'next' for some time and presumably people tried it in their real daily work when needed, keeping what is queued as-is has a value---we have a fixed reference point that we can go back to to compare the code with and without the fix. I do not have a strong preference, but if I were asked to choose, I'd choose the latter. Thanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling 2017-08-14 19:37 ` Junio C Hamano @ 2017-08-14 19:51 ` Stefan Beller 2017-08-15 17:07 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Stefan Beller @ 2017-08-14 19:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Tan, git@vger.kernel.org On Mon, Aug 14, 2017 at 12:37 PM, Junio C Hamano <gitster@pobox.com> wrote: > Stefan Beller <sbeller@google.com> writes: > >> On Fri, Aug 11, 2017 at 5:39 PM, Junio C Hamano <gitster@pobox.com> wrote: >> ... >>> My preference however is to keep sb/diff-color-move topic as-is >>> without replacing and fixing it with incremental updates like these >>> patches. >> >> I would have hoped to not need to reroll that topic. >> Though I do find patches 1&2 valuable either on top or squashed >> into "[PATCH] diff.c: color moved lines differently" and >> "[PATCH] diff.c: color moved lines differently, plain mode" >> respectively. >> >> So I'd ask to pick at least patches 1&2 on top of that series, please? > > Yeah, that is exactly what I did before reading this message but > after reading your comments on the patches ;-) > >> (I am missing the context for *why* you preference is to not do >> exactly this). > > I see what I wrote can be misread, especially due to its lack of > ",instead", that I want to keep the broken one as-is, with neither > reroll nor fixup. That is not what I meant. > > - If you choose to squash so that the resulting history after the > series graduates to 'master' will be simpler to read (due to lack > of "oops, that was a mistake"), I do not mind a reroll. > > - On the other hand, as the topic has been in 'next' for some time > and presumably people tried it in their real daily work when > needed, keeping what is queued as-is has a value---we have a > fixed reference point that we can go back to to compare the code > with and without the fix. > > I do not have a strong preference, but if I were asked to choose, > I'd choose the latter. We'll go with the latter then. Thanks! Other reasons for the latter that I want to add: * The patches are written 2 month apart, which may indicate that there was real usage and hence fixes with a more substantiated understanding of the new feature. * We should not strive for "perfect" history IMHO. That is because commit messages provide a lot of reasoning and add a lot of value for understanding the code. If I were to squash and reroll, I would need to make sure these points are addressed in the commit message to have a result that is equally good. The history only needs to be "good-enough", which we defined to "bisectable on all platforms that we care about", fixups/bugfixes are like the cherry on the cake, it draw attention on its own. Not a bad thing IMHO. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling 2017-08-14 19:51 ` Stefan Beller @ 2017-08-15 17:07 ` Junio C Hamano 0 siblings, 0 replies; 30+ messages in thread From: Junio C Hamano @ 2017-08-15 17:07 UTC (permalink / raw) To: Stefan Beller; +Cc: Jonathan Tan, git@vger.kernel.org Stefan Beller <sbeller@google.com> writes: >> I see what I wrote can be misread, especially due to its lack of >> ",instead", that I want to keep the broken one as-is, with neither >> reroll nor fixup. That is not what I meant. >> >> - If you choose to squash so that the resulting history after the >> series graduates to 'master' will be simpler to read (due to lack >> of "oops, that was a mistake"), I do not mind a reroll. >> >> - On the other hand, as the topic has been in 'next' for some time >> and presumably people tried it in their real daily work when >> needed, keeping what is queued as-is has a value---we have a >> fixed reference point that we can go back to to compare the code >> with and without the fix. >> >> I do not have a strong preference, but if I were asked to choose, >> I'd choose the latter. > > We'll go with the latter then. Thanks! > > Other reasons for the latter that I want to add: Yup, we are on the same page. You articulated what I meant in the "On the other hand" bullet point in a better way. Even though we generally do not tolerate stupid mistakes and design errors to clutter the history if they are found early in the review process while the patches are still in flight, code that have been "in" for extended period of time and then found it has room for improvement is a different matter. There is a reason why we thought it was good enough initially, and there is a reason why we later found it needing improvement. Doing the latter as an incremental fix-up is a good way to leave records for both in our history. And to make that kind of incremental refinement useful, it helps to keep the history clean from an initial attempt riddled with trivial issues that are found early in the review is also important. ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling 2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan ` (3 preceding siblings ...) 2017-08-12 0:39 ` [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Junio C Hamano @ 2017-08-14 21:31 ` Jonathan Tan 2017-08-14 21:31 ` [PATCH v2 1/3] diff: avoid redundantly clearing a flag Jonathan Tan ` (2 more replies) 2017-08-14 23:57 ` [PATCH v3 0/3] "diff --color-moved" with different heuristic Jonathan Tan 2017-08-16 1:27 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Jonathan Tan 6 siblings, 3 replies; 30+ messages in thread From: Jonathan Tan @ 2017-08-14 21:31 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller, gitster These patches are on sb/diff-color-move. Patches 1 and 2 are unchanged. It was pointed out to me that the documentation for "--color-moved=zebra" is ambiguous, but could plausibly describe the current behavior. I still think that the current behavior is confusing, so I have retained patch 3, but have updated the commit message and documentation to describe this situation. I also proceeded with updating the test mentioned in the previous cover letter to produce the expected result (describing the modification in the commit message). (If we do decide that the current behavior is correct, I can send another patch to update the documentation to be less ambiguous and maybe update the code to not use a name like MIN_BLOCK_LENGTH, as it is the number of adjacent moved lines that we are checking, not the length of a block.) Jonathan Tan (3): diff: avoid redundantly clearing a flag diff: respect MIN_BLOCK_LENGTH for last block diff: check MIN_BLOCK_LENGTH at start of new block Documentation/diff-options.txt | 6 +-- diff.c | 35 ++++++++++---- t/t4015-diff-whitespace.sh | 106 +++++++++++++++++++++++++++++++++++------ 3 files changed, 122 insertions(+), 25 deletions(-) -- 2.14.1.480.gb18f417b89-goog ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 1/3] diff: avoid redundantly clearing a flag 2017-08-14 21:31 ` [PATCH v2 " Jonathan Tan @ 2017-08-14 21:31 ` Jonathan Tan 2017-08-14 21:31 ` [PATCH v2 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan 2017-08-14 21:31 ` [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan 2 siblings, 0 replies; 30+ messages in thread From: Jonathan Tan @ 2017-08-14 21:31 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller, gitster No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in mark_color_as_moved(), so it is redundant to clear it for the current line. Therefore, clear it only for previous lines. This makes a refactoring in a subsequent patch easier. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diff.c b/diff.c index 4af73a7e0..23311f9c0 100644 --- a/diff.c +++ b/diff.c @@ -898,7 +898,7 @@ static void mark_color_as_moved(struct diff_options *o, if (!match) { if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH && o->color_moved != COLOR_MOVED_PLAIN) { - for (i = 0; i < block_length + 1; i++) { + for (i = 1; i < block_length + 1; i++) { l = &o->emitted_symbols->buf[n - i]; l->flags &= ~DIFF_SYMBOL_MOVED_LINE; } -- 2.14.1.480.gb18f417b89-goog ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 2/3] diff: respect MIN_BLOCK_LENGTH for last block 2017-08-14 21:31 ` [PATCH v2 " Jonathan Tan 2017-08-14 21:31 ` [PATCH v2 1/3] diff: avoid redundantly clearing a flag Jonathan Tan @ 2017-08-14 21:31 ` Jonathan Tan 2017-08-14 21:31 ` [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan 2 siblings, 0 replies; 30+ messages in thread From: Jonathan Tan @ 2017-08-14 21:31 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller, gitster Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line that does not belong to the current block. In particular, this means that MIN_BLOCK_LENGTH is not checked after all lines are encountered. Perform that check. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- diff.c | 29 ++++++++++++++++++++++------- t/t4015-diff-whitespace.sh | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 7 deletions(-) diff --git a/diff.c b/diff.c index 23311f9c0..f598d8a3a 100644 --- a/diff.c +++ b/diff.c @@ -861,6 +861,26 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb, return rp + 1; } +/* + * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing. + * + * Otherwise, if the last block has fewer lines than + * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in + * that block. + * + * The last block consists of the (n - block_length)'th line up to but not + * including the nth line. + */ +static void adjust_last_block(struct diff_options *o, int n, int block_length) +{ + int i; + if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH || + o->color_moved == COLOR_MOVED_PLAIN) + return; + for (i = 1; i < block_length + 1; i++) + o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE; +} + /* Find blocks of moved code, delegate actual coloring decision to helper */ static void mark_color_as_moved(struct diff_options *o, struct hashmap *add_lines, @@ -896,13 +916,7 @@ static void mark_color_as_moved(struct diff_options *o, } if (!match) { - if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH && - o->color_moved != COLOR_MOVED_PLAIN) { - for (i = 1; i < block_length + 1; i++) { - l = &o->emitted_symbols->buf[n - i]; - l->flags &= ~DIFF_SYMBOL_MOVED_LINE; - } - } + adjust_last_block(o, n, block_length); pmb_nr = 0; block_length = 0; continue; @@ -944,6 +958,7 @@ static void mark_color_as_moved(struct diff_options *o, if (flipped_block) l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; } + adjust_last_block(o, n, block_length); free(pmb); } diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index c3b697411..6f7758e5c 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1382,6 +1382,41 @@ EOF test_cmp expected actual ' +test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' ' + git reset --hard && + >bar && + cat <<-\EOF >foo && + irrelevant_line + line1 + EOF + git add foo bar && + git commit -m x && + + cat <<-\EOF >bar && + line1 + EOF + cat <<-\EOF >foo && + irrelevant_line + EOF + + git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual && + cat >expected <<-\EOF && + <BOLD>diff --git a/bar b/bar<RESET> + <BOLD>--- a/bar<RESET> + <BOLD>+++ b/bar<RESET> + <CYAN>@@ -0,0 +1 @@<RESET> + <GREEN>+<RESET><GREEN>line1<RESET> + <BOLD>diff --git a/foo b/foo<RESET> + <BOLD>--- a/foo<RESET> + <BOLD>+++ b/foo<RESET> + <CYAN>@@ -1,2 +1 @@<RESET> + irrelevant_line<RESET> + <RED>-line1<RESET> + EOF + + test_cmp expected actual +' + test_expect_success 'move detection with submodules' ' test_create_repo bananas && echo ripe >bananas/recipe && -- 2.14.1.480.gb18f417b89-goog ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block 2017-08-14 21:31 ` [PATCH v2 " Jonathan Tan 2017-08-14 21:31 ` [PATCH v2 1/3] diff: avoid redundantly clearing a flag Jonathan Tan 2017-08-14 21:31 ` [PATCH v2 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan @ 2017-08-14 21:31 ` Jonathan Tan 2017-08-14 22:46 ` Stefan Beller 2 siblings, 1 reply; 30+ messages in thread From: Jonathan Tan @ 2017-08-14 21:31 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller, gitster The existing documentation states "If there are fewer than 3 adjacent moved lines, they are not marked up as moved", which is ambiguous as to whether "adjacent moved lines" must be adjacent both at the source and at the destination, or be adjacent merely at the source or merely at the destination. The behavior of the current code takes the latter interpretation, but the behavior of blocks being conceptually painted as blocks and then "unpainted" as lines is confusing to me. Therefore, clarify the ambiguity in the documentation in the stricter direction - a block is completely painted or not at all - and update the code accordingly. This requires a change in the test "detect malicious moved code, inside file" in that the malicious change is now marked without the move colors (because the blocks involved are too small), contrasting with the subsequent test where the non-malicious change is marked with move colors. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- Documentation/diff-options.txt | 6 ++-- diff.c | 6 +++- t/t4015-diff-whitespace.sh | 71 +++++++++++++++++++++++++++++++++--------- 3 files changed, 65 insertions(+), 18 deletions(-) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index bc52bd0b9..1ee3ca3f6 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -257,10 +257,10 @@ zebra:: Blocks of moved code are detected greedily. The detected blocks are painted using either the 'color.diff.{old,new}Moved' color or 'color.diff.{old,new}MovedAlternative'. The change between - the two colors indicates that a new block was detected. If there - are fewer than 3 adjacent moved lines, they are not marked up + the two colors indicates that a new block was detected. If a block + has fewer than 3 adjacent moved lines, it is not marked up as moved, but the regular colors 'color.diff.{old,new}' will be - used. + used instead. dimmed_zebra:: Similar to 'zebra', but additional dimming of uninteresting parts of moved code is performed. The bordering lines of two adjacent diff --git a/diff.c b/diff.c index f598d8a3a..20b784736 100644 --- a/diff.c +++ b/diff.c @@ -923,7 +923,6 @@ static void mark_color_as_moved(struct diff_options *o, } l->flags |= DIFF_SYMBOL_MOVED_LINE; - block_length++; if (o->color_moved == COLOR_MOVED_PLAIN) continue; @@ -953,8 +952,13 @@ static void mark_color_as_moved(struct diff_options *o, } flipped_block = (flipped_block + 1) % 2; + + adjust_last_block(o, n, block_length); + block_length = 0; } + block_length++; + if (flipped_block) l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; } diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index 6f7758e5c..d0613a189 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1097,13 +1097,13 @@ test_expect_success 'detect malicious moved code, inside file' ' printf("World\n");<RESET> }<RESET> <RESET> - <BRED>-int secure_foo(struct user *u)<RESET> - <BRED>-{<RESET> - <BLUE>-if (!u->is_allowed_foo)<RESET> - <BLUE>-return;<RESET> - <BRED>-foo(u);<RESET> - <BLUE>-}<RESET> - <BLUE>-<RESET> + <RED>-int secure_foo(struct user *u)<RESET> + <RED>-{<RESET> + <RED>-if (!u->is_allowed_foo)<RESET> + <RED>-return;<RESET> + <RED>-foo(u);<RESET> + <RED>-}<RESET> + <RED>-<RESET> int main()<RESET> {<RESET> foo();<RESET> @@ -1115,13 +1115,13 @@ test_expect_success 'detect malicious moved code, inside file' ' printf("Hello World, but different\n");<RESET> }<RESET> <RESET> - <BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET> - <BGREEN>+<RESET><BGREEN>{<RESET> - <YELLOW>+<RESET><YELLOW>foo(u);<RESET> - <BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET> - <BGREEN>+<RESET><BGREEN>return;<RESET> - <YELLOW>+<RESET><YELLOW>}<RESET> - <YELLOW>+<RESET> + <GREEN>+<RESET><GREEN>int secure_foo(struct user *u)<RESET> + <GREEN>+<RESET><GREEN>{<RESET> + <GREEN>+<RESET><GREEN>foo(u);<RESET> + <GREEN>+<RESET><GREEN>if (!u->is_allowed_foo)<RESET> + <GREEN>+<RESET><GREEN>return;<RESET> + <GREEN>+<RESET><GREEN>}<RESET> + <GREEN>+<RESET> int another_function()<RESET> {<RESET> bar();<RESET> @@ -1417,6 +1417,49 @@ test_expect_success '--color-moved block at end of diff output respects MIN_BLOC test_cmp expected actual ' +test_expect_success '--color-moved treats adjacent blocks as separate for MIN_BLOCK_LENGTH' ' + git reset --hard && + cat <<-\EOF >foo && + line1 + irrelevant_line + line2 + line3 + EOF + >bar && + git add foo bar && + git commit -m x && + + cat <<-\EOF >foo && + irrelevant_line + EOF + cat <<-\EOF >bar && + line2 + line3 + line1 + EOF + + git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual && + cat >expected <<-\EOF && + <BOLD>diff --git a/bar b/bar<RESET> + <BOLD>--- a/bar<RESET> + <BOLD>+++ b/bar<RESET> + <CYAN>@@ -0,0 +1,3 @@<RESET> + <GREEN>+<RESET><GREEN>line2<RESET> + <GREEN>+<RESET><GREEN>line3<RESET> + <GREEN>+<RESET><GREEN>line1<RESET> + <BOLD>diff --git a/foo b/foo<RESET> + <BOLD>--- a/foo<RESET> + <BOLD>+++ b/foo<RESET> + <CYAN>@@ -1,4 +1 @@<RESET> + <RED>-line1<RESET> + irrelevant_line<RESET> + <RED>-line2<RESET> + <RED>-line3<RESET> + EOF + + test_cmp expected actual +' + test_expect_success 'move detection with submodules' ' test_create_repo bananas && echo ripe >bananas/recipe && -- 2.14.1.480.gb18f417b89-goog ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block 2017-08-14 21:31 ` [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan @ 2017-08-14 22:46 ` Stefan Beller 0 siblings, 0 replies; 30+ messages in thread From: Stefan Beller @ 2017-08-14 22:46 UTC (permalink / raw) To: Jonathan Tan; +Cc: git@vger.kernel.org, Junio C Hamano On Mon, Aug 14, 2017 at 2:31 PM, Jonathan Tan <jonathantanmy@google.com> wrote: > The existing documentation states "If there are fewer than 3 adjacent > moved lines, they are not marked up as moved", which is ambiguous as to > whether "adjacent moved lines" must be adjacent both at the source and > at the destination, or be adjacent merely at the source or merely at the > destination. The behavior of the current code takes the latter > interpretation, but the behavior of blocks being conceptually painted as > blocks and then "unpainted" as lines is confusing to me. > > Therefore, clarify the ambiguity in the documentation in the stricter > direction - a block is completely painted or not at all - and update the > code accordingly. > > This requires a change in the test "detect malicious moved code, inside > file" in that the malicious change is now marked without the move > colors (because the blocks involved are too small), contrasting with > the subsequent test where the non-malicious change is marked with move > colors. > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> I wonder if these changes ought to be a new mode (C.f. "mountain zebra" and "imperial zebra" for slight changes in coloring ;) or if we can settle on one true way. The 3 lines heuristic is a bad heuristic IMHO (it works reasonable well for little effort but the fact that we discuss this patch makes it a bad heuristic as we discuss corner cases that are not relevant. The heuristic originally wanted to filter out stray single braces that were "moved", it did not want to suppress small original moved pieces of code), which this covers up a bit. Maybe we'll cook this in next for a while to see how people react to it? > --- > Documentation/diff-options.txt | 6 ++-- > diff.c | 6 +++- > t/t4015-diff-whitespace.sh | 71 +++++++++++++++++++++++++++++++++--------- > 3 files changed, 65 insertions(+), 18 deletions(-) > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > index bc52bd0b9..1ee3ca3f6 100644 > --- a/Documentation/diff-options.txt > +++ b/Documentation/diff-options.txt > @@ -257,10 +257,10 @@ zebra:: > Blocks of moved code are detected greedily. The detected blocks are > painted using either the 'color.diff.{old,new}Moved' color or > 'color.diff.{old,new}MovedAlternative'. The change between > - the two colors indicates that a new block was detected. If there > - are fewer than 3 adjacent moved lines, they are not marked up > + the two colors indicates that a new block was detected. If a block > + has fewer than 3 adjacent moved lines, it is not marked up > as moved, but the regular colors 'color.diff.{old,new}' will be > - used. > + used instead. Thanks for clarifying the docs. > diff --git a/diff.c b/diff.c > index f598d8a3a..20b784736 100644 > --- a/diff.c > +++ b/diff.c > @@ -923,7 +923,6 @@ static void mark_color_as_moved(struct diff_options *o, > } > > l->flags |= DIFF_SYMBOL_MOVED_LINE; > - block_length++; > > if (o->color_moved == COLOR_MOVED_PLAIN) > continue; > @@ -953,8 +952,13 @@ static void mark_color_as_moved(struct diff_options *o, > } > > flipped_block = (flipped_block + 1) % 2; > + > + adjust_last_block(o, n, block_length); > + block_length = 0; > } > > + block_length++; > + > if (flipped_block) > l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; > } This changes the algorithm in a non-obvious way. When the min-length heuristic is strictly bound to each block, the function can be simplified more than adding on these tweaks, 1) remove variable block_length, needing to count in the adjust function 2) assign DIFF_SYMBOL_MOVED_LINE either in COLOR_MOVED_PLAIN case (and continue) or later (where block_length is increased in this patch) No need to do these, just as thoughts on how to reduce complexity. > diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh > index 6f7758e5c..d0613a189 100755 > --- a/t/t4015-diff-whitespace.sh > +++ b/t/t4015-diff-whitespace.sh > @@ -1097,13 +1097,13 @@ test_expect_success 'detect malicious moved code, inside file' ' This test would not 'detect malicious moved code, inside file' any more, I think instead we'd rather want to have a more realistic test case, which has more lines in it? (This test is about the block detection not about the omission of short blocks, which was an after thought) > +test_expect_success '--color-moved treats adjacent blocks as separate for MIN_BLOCK_LENGTH' ' Thanks for providing a test here! For testing MIN_BLOCK_LENGTH for each block I would have imagined the tests would have a block of length (1,)2,3(,4) lines and then we'd see that the blocks are highlighted or not. This only has length=1 blocks? ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 0/3] "diff --color-moved" with different heuristic 2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan ` (4 preceding siblings ...) 2017-08-14 21:31 ` [PATCH v2 " Jonathan Tan @ 2017-08-14 23:57 ` Jonathan Tan 2017-08-14 23:57 ` [PATCH v3 1/3] diff: avoid redundantly clearing a flag Jonathan Tan ` (2 more replies) 2017-08-16 1:27 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Jonathan Tan 6 siblings, 3 replies; 30+ messages in thread From: Jonathan Tan @ 2017-08-14 23:57 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller, gitster These patches are on sb/diff-color-move. Patches 1 and 2 are unchanged. If we're planning to cook a heuristic, we might as well try a better one. What do you think of this? A heuristic of number of non-whitespace characters, applied at the block level, and not dependent on the block's neighbors. As for the test, good point about testing the boundary conditions - I have included another test that does that. Jonathan Tan (3): diff: avoid redundantly clearing a flag diff: respect MIN_BLOCK_LENGTH for last block diff: define block by number of non-space chars Documentation/diff-options.txt | 8 +-- diff.c | 44 +++++++++++--- diff.h | 2 +- t/t4015-diff-whitespace.sh | 129 +++++++++++++++++++++++++++++++++++++++-- 4 files changed, 163 insertions(+), 20 deletions(-) -- 2.14.1.480.gb18f417b89-goog ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 1/3] diff: avoid redundantly clearing a flag 2017-08-14 23:57 ` [PATCH v3 0/3] "diff --color-moved" with different heuristic Jonathan Tan @ 2017-08-14 23:57 ` Jonathan Tan 2017-08-14 23:57 ` [PATCH v3 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan 2017-08-14 23:57 ` [PATCH v3 3/3] diff: define block by number of non-space chars Jonathan Tan 2 siblings, 0 replies; 30+ messages in thread From: Jonathan Tan @ 2017-08-14 23:57 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller, gitster No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in mark_color_as_moved(), so it is redundant to clear it for the current line. Therefore, clear it only for previous lines. This makes a refactoring in a subsequent patch easier. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diff.c b/diff.c index 4af73a7e0..23311f9c0 100644 --- a/diff.c +++ b/diff.c @@ -898,7 +898,7 @@ static void mark_color_as_moved(struct diff_options *o, if (!match) { if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH && o->color_moved != COLOR_MOVED_PLAIN) { - for (i = 0; i < block_length + 1; i++) { + for (i = 1; i < block_length + 1; i++) { l = &o->emitted_symbols->buf[n - i]; l->flags &= ~DIFF_SYMBOL_MOVED_LINE; } -- 2.14.1.480.gb18f417b89-goog ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 2/3] diff: respect MIN_BLOCK_LENGTH for last block 2017-08-14 23:57 ` [PATCH v3 0/3] "diff --color-moved" with different heuristic Jonathan Tan 2017-08-14 23:57 ` [PATCH v3 1/3] diff: avoid redundantly clearing a flag Jonathan Tan @ 2017-08-14 23:57 ` Jonathan Tan 2017-08-14 23:57 ` [PATCH v3 3/3] diff: define block by number of non-space chars Jonathan Tan 2 siblings, 0 replies; 30+ messages in thread From: Jonathan Tan @ 2017-08-14 23:57 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller, gitster Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line that does not belong to the current block. In particular, this means that MIN_BLOCK_LENGTH is not checked after all lines are encountered. Perform that check. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- diff.c | 29 ++++++++++++++++++++++------- t/t4015-diff-whitespace.sh | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 7 deletions(-) diff --git a/diff.c b/diff.c index 23311f9c0..f598d8a3a 100644 --- a/diff.c +++ b/diff.c @@ -861,6 +861,26 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb, return rp + 1; } +/* + * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing. + * + * Otherwise, if the last block has fewer lines than + * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in + * that block. + * + * The last block consists of the (n - block_length)'th line up to but not + * including the nth line. + */ +static void adjust_last_block(struct diff_options *o, int n, int block_length) +{ + int i; + if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH || + o->color_moved == COLOR_MOVED_PLAIN) + return; + for (i = 1; i < block_length + 1; i++) + o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE; +} + /* Find blocks of moved code, delegate actual coloring decision to helper */ static void mark_color_as_moved(struct diff_options *o, struct hashmap *add_lines, @@ -896,13 +916,7 @@ static void mark_color_as_moved(struct diff_options *o, } if (!match) { - if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH && - o->color_moved != COLOR_MOVED_PLAIN) { - for (i = 1; i < block_length + 1; i++) { - l = &o->emitted_symbols->buf[n - i]; - l->flags &= ~DIFF_SYMBOL_MOVED_LINE; - } - } + adjust_last_block(o, n, block_length); pmb_nr = 0; block_length = 0; continue; @@ -944,6 +958,7 @@ static void mark_color_as_moved(struct diff_options *o, if (flipped_block) l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; } + adjust_last_block(o, n, block_length); free(pmb); } diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index c3b697411..6f7758e5c 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1382,6 +1382,41 @@ EOF test_cmp expected actual ' +test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' ' + git reset --hard && + >bar && + cat <<-\EOF >foo && + irrelevant_line + line1 + EOF + git add foo bar && + git commit -m x && + + cat <<-\EOF >bar && + line1 + EOF + cat <<-\EOF >foo && + irrelevant_line + EOF + + git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual && + cat >expected <<-\EOF && + <BOLD>diff --git a/bar b/bar<RESET> + <BOLD>--- a/bar<RESET> + <BOLD>+++ b/bar<RESET> + <CYAN>@@ -0,0 +1 @@<RESET> + <GREEN>+<RESET><GREEN>line1<RESET> + <BOLD>diff --git a/foo b/foo<RESET> + <BOLD>--- a/foo<RESET> + <BOLD>+++ b/foo<RESET> + <CYAN>@@ -1,2 +1 @@<RESET> + irrelevant_line<RESET> + <RED>-line1<RESET> + EOF + + test_cmp expected actual +' + test_expect_success 'move detection with submodules' ' test_create_repo bananas && echo ripe >bananas/recipe && -- 2.14.1.480.gb18f417b89-goog ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 3/3] diff: define block by number of non-space chars 2017-08-14 23:57 ` [PATCH v3 0/3] "diff --color-moved" with different heuristic Jonathan Tan 2017-08-14 23:57 ` [PATCH v3 1/3] diff: avoid redundantly clearing a flag Jonathan Tan 2017-08-14 23:57 ` [PATCH v3 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan @ 2017-08-14 23:57 ` Jonathan Tan 2017-08-15 2:29 ` Stefan Beller 2017-08-15 19:54 ` Junio C Hamano 2 siblings, 2 replies; 30+ messages in thread From: Jonathan Tan @ 2017-08-14 23:57 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller, gitster The existing behavior of diff --color-moved=zebra does not define the minimum size of a block at all, instead relying on a heuristic applied later to filter out sets of adjacent moved lines that are shorter than 3 lines long. This can be confusing, because a block could thus be colored as moved at the source but not at the destination (or vice versa), depending on its neighbors. Instead, teach diff that the minimum size of a block is 10 non-whitespace characters. This allows diff to still exclude uninteresting lines appearing on their own (such as those solely consisting of one or a few closing braces), as was the intention of the adjacent-moved-line heuristic. This requires a change in the test "detect malicious moved code, inside file" in that some of its lines are no longer considered to be part of a block, because they are too short. The malicious move can still be observed, however. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- Documentation/diff-options.txt | 8 ++-- diff.c | 27 +++++++++--- diff.h | 2 +- t/t4015-diff-whitespace.sh | 96 +++++++++++++++++++++++++++++++++++++++--- 4 files changed, 113 insertions(+), 20 deletions(-) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index bc52bd0b9..d0bdc849f 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -254,13 +254,11 @@ plain:: moved line, but it is not very useful in a review to determine if a block of code was moved without permutation. zebra:: - Blocks of moved code are detected greedily. The detected blocks are + Blocks of moved text of at least 10 non-whitespace characters + are detected greedily. The detected blocks are painted using either the 'color.diff.{old,new}Moved' color or 'color.diff.{old,new}MovedAlternative'. The change between - the two colors indicates that a new block was detected. If there - are fewer than 3 adjacent moved lines, they are not marked up - as moved, but the regular colors 'color.diff.{old,new}' will be - used. + the two colors indicates that a new block was detected. dimmed_zebra:: Similar to 'zebra', but additional dimming of uninteresting parts of moved code is performed. The bordering lines of two adjacent diff --git a/diff.c b/diff.c index f598d8a3a..305ce4126 100644 --- a/diff.c +++ b/diff.c @@ -864,19 +864,28 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb, /* * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing. * - * Otherwise, if the last block has fewer lines than - * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in - * that block. + * Otherwise, if the last block has fewer non-space characters than + * COLOR_MOVED_MIN_NON_SPACE_COUNT, unset DIFF_SYMBOL_MOVED_LINE on all lines + * in that block. * * The last block consists of the (n - block_length)'th line up to but not * including the nth line. */ static void adjust_last_block(struct diff_options *o, int n, int block_length) { - int i; - if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH || - o->color_moved == COLOR_MOVED_PLAIN) + int i, non_space_count = 0; + if (o->color_moved == COLOR_MOVED_PLAIN) return; + for (i = 1; i < block_length + 1; i++) { + const char *c = o->emitted_symbols->buf[n - i].line; + for (; *c; c++) { + if (isspace(*c)) + continue; + non_space_count++; + if (non_space_count >= COLOR_MOVED_MIN_NON_SPACE_COUNT) + return; + } + } for (i = 1; i < block_length + 1; i++) o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE; } @@ -923,7 +932,6 @@ static void mark_color_as_moved(struct diff_options *o, } l->flags |= DIFF_SYMBOL_MOVED_LINE; - block_length++; if (o->color_moved == COLOR_MOVED_PLAIN) continue; @@ -953,8 +961,13 @@ static void mark_color_as_moved(struct diff_options *o, } flipped_block = (flipped_block + 1) % 2; + + adjust_last_block(o, n, block_length); + block_length = 0; } + block_length++; + if (flipped_block) l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; } diff --git a/diff.h b/diff.h index 5755f465d..9e2fece5b 100644 --- a/diff.h +++ b/diff.h @@ -195,7 +195,7 @@ struct diff_options { COLOR_MOVED_ZEBRA_DIM = 3, } color_moved; #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA - #define COLOR_MOVED_MIN_BLOCK_LENGTH 3 + #define COLOR_MOVED_MIN_NON_SPACE_COUNT 10 }; void diff_emit_submodule_del(struct diff_options *o, const char *line); diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index 6f7758e5c..d8e7b77b9 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1101,9 +1101,9 @@ test_expect_success 'detect malicious moved code, inside file' ' <BRED>-{<RESET> <BLUE>-if (!u->is_allowed_foo)<RESET> <BLUE>-return;<RESET> - <BRED>-foo(u);<RESET> - <BLUE>-}<RESET> - <BLUE>-<RESET> + <RED>-foo(u);<RESET> + <RED>-}<RESET> + <RED>-<RESET> int main()<RESET> {<RESET> foo();<RESET> @@ -1117,11 +1117,11 @@ test_expect_success 'detect malicious moved code, inside file' ' <RESET> <BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET> <BGREEN>+<RESET><BGREEN>{<RESET> - <YELLOW>+<RESET><YELLOW>foo(u);<RESET> + <GREEN>+<RESET><GREEN>foo(u);<RESET> <BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET> <BGREEN>+<RESET><BGREEN>return;<RESET> - <YELLOW>+<RESET><YELLOW>}<RESET> - <YELLOW>+<RESET> + <GREEN>+<RESET><GREEN>}<RESET> + <GREEN>+<RESET> int another_function()<RESET> {<RESET> bar();<RESET> @@ -1382,7 +1382,7 @@ EOF test_cmp expected actual ' -test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' ' +test_expect_success '--color-moved block at end of diff output respects MIN_NON_SPACE_COUNT' ' git reset --hard && >bar && cat <<-\EOF >foo && @@ -1417,6 +1417,88 @@ test_expect_success '--color-moved block at end of diff output respects MIN_BLOC test_cmp expected actual ' +test_expect_success '--color-moved respects MIN_NON_SPACE_COUNT' ' + git reset --hard && + cat <<-\EOF >foo && + nine chars + irrelevant_line + ten chars__ + EOF + >bar && + git add foo bar && + git commit -m x && + + cat <<-\EOF >foo && + irrelevant_line + EOF + cat <<-\EOF >bar && + ten chars__ + nine chars + EOF + + git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual && + cat >expected <<-\EOF && + <BOLD>diff --git a/bar b/bar<RESET> + <BOLD>--- a/bar<RESET> + <BOLD>+++ b/bar<RESET> + <CYAN>@@ -0,0 +1,2 @@<RESET> + <BOLD;CYAN>+<RESET><BOLD;CYAN>ten chars__<RESET> + <GREEN>+<RESET><GREEN>nine chars<RESET> + <BOLD>diff --git a/foo b/foo<RESET> + <BOLD>--- a/foo<RESET> + <BOLD>+++ b/foo<RESET> + <CYAN>@@ -1,3 +1 @@<RESET> + <RED>-nine chars<RESET> + irrelevant_line<RESET> + <BOLD;MAGENTA>-ten chars__<RESET> + EOF + + test_cmp expected actual +' + +test_expect_success '--color-moved treats adjacent blocks as separate for MIN_NON_SPACE_COUNT' ' + git reset --hard && + cat <<-\EOF >foo && + lin1 + irrelevant_line + lin2 + lin3 + EOF + >bar && + git add foo bar && + git commit -m x && + + cat <<-\EOF >foo && + irrelevant_line + EOF + cat <<-\EOF >bar && + lin2 + lin3 + lin1 + EOF + + git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual && + cat >expected <<-\EOF && + <BOLD>diff --git a/bar b/bar<RESET> + <BOLD>--- a/bar<RESET> + <BOLD>+++ b/bar<RESET> + <CYAN>@@ -0,0 +1,3 @@<RESET> + <GREEN>+<RESET><GREEN>lin2<RESET> + <GREEN>+<RESET><GREEN>lin3<RESET> + <GREEN>+<RESET><GREEN>lin1<RESET> + <BOLD>diff --git a/foo b/foo<RESET> + <BOLD>--- a/foo<RESET> + <BOLD>+++ b/foo<RESET> + <CYAN>@@ -1,4 +1 @@<RESET> + <RED>-lin1<RESET> + irrelevant_line<RESET> + <RED>-lin2<RESET> + <RED>-lin3<RESET> + EOF + + test_cmp expected actual +' + test_expect_success 'move detection with submodules' ' test_create_repo bananas && echo ripe >bananas/recipe && -- 2.14.1.480.gb18f417b89-goog ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 3/3] diff: define block by number of non-space chars 2017-08-14 23:57 ` [PATCH v3 3/3] diff: define block by number of non-space chars Jonathan Tan @ 2017-08-15 2:29 ` Stefan Beller 2017-08-15 19:54 ` Junio C Hamano 1 sibling, 0 replies; 30+ messages in thread From: Stefan Beller @ 2017-08-15 2:29 UTC (permalink / raw) To: Jonathan Tan; +Cc: git@vger.kernel.org, Junio C Hamano On Mon, Aug 14, 2017 at 4:57 PM, Jonathan Tan <jonathantanmy@google.com> wrote: > The existing behavior of diff --color-moved=zebra does not define the > minimum size of a block at all, instead relying on a heuristic applied > later to filter out sets of adjacent moved lines that are shorter than 3 > lines long. This can be confusing, because a block could thus be colored > as moved at the source but not at the destination (or vice versa), > depending on its neighbors. > > Instead, teach diff that the minimum size of a block is 10 > non-whitespace characters. This allows diff to still exclude > uninteresting lines appearing on their own (such as those solely > consisting of one or a few closing braces), as was the intention of the > adjacent-moved-line heuristic. After some thought, I really like this heuristic, however allow me a moment to bikeshed 10 as a number here. One could think that 10 equals roughly 3 lines a 3 characters and in C based languages the shortest meaningful lines have more than 3 characters ("i++;", "a();", "int i;" have 4 or 5 each), but I would still think that 10 is too much, as we'd want to detect the closing braces in their own lines. > dimmed_zebra:: > Similar to 'zebra', but additional dimming of uninteresting parts > of moved code is performed. The bordering lines of two adjacent > diff --git a/diff.c b/diff.c > index f598d8a3a..305ce4126 100644 > --- a/diff.c > +++ b/diff.c > @@ -864,19 +864,28 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb, > /* > * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing. > * > - * Otherwise, if the last block has fewer lines than > - * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in > - * that block. > + * Otherwise, if the last block has fewer non-space characters than > + * COLOR_MOVED_MIN_NON_SPACE_COUNT, unset DIFF_SYMBOL_MOVED_LINE on all lines > + * in that block. > * > * The last block consists of the (n - block_length)'th line up to but not > * including the nth line. > */ > static void adjust_last_block(struct diff_options *o, int n, int block_length) > { > - int i; > - if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH || > - o->color_moved == COLOR_MOVED_PLAIN) > + int i, non_space_count = 0; > + if (o->color_moved == COLOR_MOVED_PLAIN) > return; > + for (i = 1; i < block_length + 1; i++) { > + const char *c = o->emitted_symbols->buf[n - i].line; > + for (; *c; c++) { > + if (isspace(*c)) > + continue; > + non_space_count++; > + if (non_space_count >= COLOR_MOVED_MIN_NON_SPACE_COUNT) > + return; When we do this counting, we could count the lines ourselves here as well. `n-block_count` should be equal to the line that has a different (flags & (DIFF_SYMBOL_MOVED_LINE | DIFF_SYMBOL_MOVED_LINE_ALT)) pattern than those before. (although we'd also have to check for i > 0, too) Your choice. > + } > + } > for (i = 1; i < block_length + 1; i++) > o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE; > } > @@ -923,7 +932,6 @@ static void mark_color_as_moved(struct diff_options *o, > } > > l->flags |= DIFF_SYMBOL_MOVED_LINE; > - block_length++; > > if (o->color_moved == COLOR_MOVED_PLAIN) > continue; > @@ -953,8 +961,13 @@ static void mark_color_as_moved(struct diff_options *o, > } > > flipped_block = (flipped_block + 1) % 2; > + > + adjust_last_block(o, n, block_length); > + block_length = 0; > } > > + block_length++; > + > if (flipped_block) > l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; > } > diff --git a/diff.h b/diff.h > index 5755f465d..9e2fece5b 100644 > --- a/diff.h > +++ b/diff.h > @@ -195,7 +195,7 @@ struct diff_options { > COLOR_MOVED_ZEBRA_DIM = 3, > } color_moved; > #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA > - #define COLOR_MOVED_MIN_BLOCK_LENGTH 3 > + #define COLOR_MOVED_MIN_NON_SPACE_COUNT 10 > }; > > void diff_emit_submodule_del(struct diff_options *o, const char *line); > diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh > index 6f7758e5c..d8e7b77b9 100755 > --- a/t/t4015-diff-whitespace.sh > +++ b/t/t4015-diff-whitespace.sh > @@ -1101,9 +1101,9 @@ test_expect_success 'detect malicious moved code, inside file' ' > <BRED>-{<RESET> > <BLUE>-if (!u->is_allowed_foo)<RESET> > <BLUE>-return;<RESET> > - <BRED>-foo(u);<RESET> > - <BLUE>-}<RESET> > - <BLUE>-<RESET> > + <RED>-foo(u);<RESET> > + <RED>-}<RESET> > + <RED>-<RESET> Here we have 2 blocks, the first has 7 character, which we may want to detect, the second has only 1 char. The longest "uninteresting" line in C like languages might be "\t } else {" which has 6 non-ws characters. Thinking of other languages (shell "fi" is uninteresting, others are interesting, Latex \"custom" all bets are off), I think we may want to go lower and have COLOR_MOVED_MIN_NON_SPACE_COUNT to be about 6 (~ 2 characters on a 3 line block). That said this is all bikeshedding, feel free to ignore. Acked-by: Stefan Beller <sbeller@google.com> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 3/3] diff: define block by number of non-space chars 2017-08-14 23:57 ` [PATCH v3 3/3] diff: define block by number of non-space chars Jonathan Tan 2017-08-15 2:29 ` Stefan Beller @ 2017-08-15 19:54 ` Junio C Hamano 2017-08-15 20:06 ` Stefan Beller 1 sibling, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2017-08-15 19:54 UTC (permalink / raw) To: Jonathan Tan; +Cc: git, sbeller Jonathan Tan <jonathantanmy@google.com> writes: > The existing behavior of diff --color-moved=zebra does not define the > minimum size of a block at all, instead relying on a heuristic applied > later to filter out sets of adjacent moved lines that are shorter than 3 > lines long. This can be confusing, because a block could thus be colored > as moved at the source but not at the destination (or vice versa), > depending on its neighbors. > > Instead, teach diff that the minimum size of a block is 10 > non-whitespace characters. This allows diff to still exclude > uninteresting lines appearing on their own (such as those solely > consisting of one or a few closing braces), as was the intention of the > adjacent-moved-line heuristic. I recall that there is a logic backed by a similar rationale in blame.c::blame_entry_score() but over there we count alnum, not !isspace, to judge if a block has been split into too small a piece to be significant. I do not know which one is better, but if there is no strong reason, perhaps we want to unify the two, so that we can improve both heuristics at the same time? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 3/3] diff: define block by number of non-space chars 2017-08-15 19:54 ` Junio C Hamano @ 2017-08-15 20:06 ` Stefan Beller 2017-08-15 20:53 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Stefan Beller @ 2017-08-15 20:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Tan, git@vger.kernel.org On Tue, Aug 15, 2017 at 12:54 PM, Junio C Hamano <gitster@pobox.com> wrote: > Jonathan Tan <jonathantanmy@google.com> writes: > >> The existing behavior of diff --color-moved=zebra does not define the >> minimum size of a block at all, instead relying on a heuristic applied >> later to filter out sets of adjacent moved lines that are shorter than 3 >> lines long. This can be confusing, because a block could thus be colored >> as moved at the source but not at the destination (or vice versa), >> depending on its neighbors. >> >> Instead, teach diff that the minimum size of a block is 10 >> non-whitespace characters. This allows diff to still exclude >> uninteresting lines appearing on their own (such as those solely >> consisting of one or a few closing braces), as was the intention of the >> adjacent-moved-line heuristic. > > I recall that there is a logic backed by a similar rationale in > blame.c::blame_entry_score() but over there we count alnum, not > !isspace, to judge if a block has been split into too small a piece > to be significant. I do not know which one is better, but if there > is no strong reason, perhaps we want to unify the two, so that we > can improve both heuristics at the same time? In an ideal world we would use entropy of the diffed characters as that is a best approximation on how much "interesting" things are going on in that particular diff. Computing the entropy is cumbersome, but maybe ok for this purpose (we're most likely IO bound anyway, specifically when including the human understanding as the last part of IO). Reasons that may influence the choice * Non latin/ASCII characters should also work. * Do we care about the whitespace esoteric programming language? The function blame_entry_score is documented to approach this exact problem that we are trying to solve here, so I agree we should have a common heuristic. Stefan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 3/3] diff: define block by number of non-space chars 2017-08-15 20:06 ` Stefan Beller @ 2017-08-15 20:53 ` Junio C Hamano 0 siblings, 0 replies; 30+ messages in thread From: Junio C Hamano @ 2017-08-15 20:53 UTC (permalink / raw) To: Stefan Beller; +Cc: Jonathan Tan, git@vger.kernel.org Stefan Beller <sbeller@google.com> writes: > The function blame_entry_score is documented to approach > this exact problem that we are trying to solve here, so I agree > we should have a common heuristic. While I do not think it is necessary to do such a refactoring within the scope of this series, we should at least leave a NEEDSWORK in-code comment next to the new heuristics code to remind those who revisit the code of this possible small project idea, I would think. ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 0/3] "diff --color-moved" with yet another heuristic 2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan ` (5 preceding siblings ...) 2017-08-14 23:57 ` [PATCH v3 0/3] "diff --color-moved" with different heuristic Jonathan Tan @ 2017-08-16 1:27 ` Jonathan Tan 2017-08-16 1:27 ` [PATCH v4 1/3] diff: avoid redundantly clearing a flag Jonathan Tan ` (3 more replies) 6 siblings, 4 replies; 30+ messages in thread From: Jonathan Tan @ 2017-08-16 1:27 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller, gitster These patches are on sb/diff-color-move. Patches 1 and 2 are unchanged, except for some line wrapping in a test in patch 2. This has been updated to use the same alphanumeric heuristic as blame (20 alnum characters). I tried it out and I thought the results were reasonable in a patch set that I'm working on (the pack-related function refactoring one). As for refactoring blame.c and this file, I'm not sure where best to put the new function, so I've added a NEEDSWORK for now. As for detecting block boundaries in adjust_last_block(), I've left it as-is for now. I think it's clearer if the parent function provides that information, since it already tracks that. In addition, we avoid corner cases such as what happens if the block is at the start of the diff output (we must ensure that we don't read off the beginning edge, for example). Jonathan Tan (3): diff: avoid redundantly clearing a flag diff: respect MIN_BLOCK_LENGTH for last block diff: define block by number of alphanumeric chars Documentation/diff-options.txt | 8 +- diff.c | 47 ++++++-- diff.h | 2 +- t/t4015-diff-whitespace.sh | 261 ++++++++++++++++++++++++++++++----------- 4 files changed, 236 insertions(+), 82 deletions(-) -- 2.14.1.480.gb18f417b89-goog ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 1/3] diff: avoid redundantly clearing a flag 2017-08-16 1:27 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Jonathan Tan @ 2017-08-16 1:27 ` Jonathan Tan 2017-08-16 1:27 ` [PATCH v4 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan ` (2 subsequent siblings) 3 siblings, 0 replies; 30+ messages in thread From: Jonathan Tan @ 2017-08-16 1:27 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller, gitster No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in mark_color_as_moved(), so it is redundant to clear it for the current line. Therefore, clear it only for previous lines. This makes a refactoring in a subsequent patch easier. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diff.c b/diff.c index 4af73a7e0..23311f9c0 100644 --- a/diff.c +++ b/diff.c @@ -898,7 +898,7 @@ static void mark_color_as_moved(struct diff_options *o, if (!match) { if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH && o->color_moved != COLOR_MOVED_PLAIN) { - for (i = 0; i < block_length + 1; i++) { + for (i = 1; i < block_length + 1; i++) { l = &o->emitted_symbols->buf[n - i]; l->flags &= ~DIFF_SYMBOL_MOVED_LINE; } -- 2.14.1.480.gb18f417b89-goog ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 2/3] diff: respect MIN_BLOCK_LENGTH for last block 2017-08-16 1:27 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Jonathan Tan 2017-08-16 1:27 ` [PATCH v4 1/3] diff: avoid redundantly clearing a flag Jonathan Tan @ 2017-08-16 1:27 ` Jonathan Tan 2017-08-16 1:27 ` [PATCH v4 3/3] diff: define block by number of alphanumeric chars Jonathan Tan 2017-08-16 5:55 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Stefan Beller 3 siblings, 0 replies; 30+ messages in thread From: Jonathan Tan @ 2017-08-16 1:27 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller, gitster Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line that does not belong to the current block. In particular, this means that MIN_BLOCK_LENGTH is not checked after all lines are encountered. Perform that check. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- diff.c | 29 ++++++++++++++++++++++------- t/t4015-diff-whitespace.sh | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+), 7 deletions(-) diff --git a/diff.c b/diff.c index 23311f9c0..f598d8a3a 100644 --- a/diff.c +++ b/diff.c @@ -861,6 +861,26 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb, return rp + 1; } +/* + * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing. + * + * Otherwise, if the last block has fewer lines than + * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in + * that block. + * + * The last block consists of the (n - block_length)'th line up to but not + * including the nth line. + */ +static void adjust_last_block(struct diff_options *o, int n, int block_length) +{ + int i; + if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH || + o->color_moved == COLOR_MOVED_PLAIN) + return; + for (i = 1; i < block_length + 1; i++) + o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE; +} + /* Find blocks of moved code, delegate actual coloring decision to helper */ static void mark_color_as_moved(struct diff_options *o, struct hashmap *add_lines, @@ -896,13 +916,7 @@ static void mark_color_as_moved(struct diff_options *o, } if (!match) { - if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH && - o->color_moved != COLOR_MOVED_PLAIN) { - for (i = 1; i < block_length + 1; i++) { - l = &o->emitted_symbols->buf[n - i]; - l->flags &= ~DIFF_SYMBOL_MOVED_LINE; - } - } + adjust_last_block(o, n, block_length); pmb_nr = 0; block_length = 0; continue; @@ -944,6 +958,7 @@ static void mark_color_as_moved(struct diff_options *o, if (flipped_block) l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; } + adjust_last_block(o, n, block_length); free(pmb); } diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index c3b697411..6ae62a6bf 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1382,6 +1382,43 @@ EOF test_cmp expected actual ' +test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' ' + git reset --hard && + >bar && + cat <<-\EOF >foo && + irrelevant_line + line1 + EOF + git add foo bar && + git commit -m x && + + cat <<-\EOF >bar && + line1 + EOF + cat <<-\EOF >foo && + irrelevant_line + EOF + + git diff HEAD --color-moved=zebra --no-renames | + grep -v "index" | + test_decode_color >actual && + cat >expected <<-\EOF && + <BOLD>diff --git a/bar b/bar<RESET> + <BOLD>--- a/bar<RESET> + <BOLD>+++ b/bar<RESET> + <CYAN>@@ -0,0 +1 @@<RESET> + <GREEN>+<RESET><GREEN>line1<RESET> + <BOLD>diff --git a/foo b/foo<RESET> + <BOLD>--- a/foo<RESET> + <BOLD>+++ b/foo<RESET> + <CYAN>@@ -1,2 +1 @@<RESET> + irrelevant_line<RESET> + <RED>-line1<RESET> + EOF + + test_cmp expected actual +' + test_expect_success 'move detection with submodules' ' test_create_repo bananas && echo ripe >bananas/recipe && -- 2.14.1.480.gb18f417b89-goog ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v4 3/3] diff: define block by number of alphanumeric chars 2017-08-16 1:27 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Jonathan Tan 2017-08-16 1:27 ` [PATCH v4 1/3] diff: avoid redundantly clearing a flag Jonathan Tan 2017-08-16 1:27 ` [PATCH v4 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan @ 2017-08-16 1:27 ` Jonathan Tan 2017-08-16 5:55 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Stefan Beller 3 siblings, 0 replies; 30+ messages in thread From: Jonathan Tan @ 2017-08-16 1:27 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, sbeller, gitster The existing behavior of diff --color-moved=zebra does not define the minimum size of a block at all, instead relying on a heuristic applied later to filter out sets of adjacent moved lines that are shorter than 3 lines long. This can be confusing, because a block could thus be colored as moved at the source but not at the destination (or vice versa), depending on its neighbors. Instead, teach diff that the minimum size of a block is 20 alphanumeric characters, the same heuristic used by "git blame". This allows diff to still exclude uninteresting lines appearing on their own (such as those solely consisting of one or a few closing braces), as was the intention of the adjacent-moved-line heuristic. This requires a change in some tests in that some of their lines are no longer considered to be part of a block, because they are too short. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- Documentation/diff-options.txt | 8 +- diff.c | 28 +++-- diff.h | 2 +- t/t4015-diff-whitespace.sh | 226 ++++++++++++++++++++++++++++------------- 4 files changed, 183 insertions(+), 81 deletions(-) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index bc52bd0b9..b8c881605 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -254,13 +254,11 @@ plain:: moved line, but it is not very useful in a review to determine if a block of code was moved without permutation. zebra:: - Blocks of moved code are detected greedily. The detected blocks are + Blocks of moved text of at least 20 alphanumeric characters + are detected greedily. The detected blocks are painted using either the 'color.diff.{old,new}Moved' color or 'color.diff.{old,new}MovedAlternative'. The change between - the two colors indicates that a new block was detected. If there - are fewer than 3 adjacent moved lines, they are not marked up - as moved, but the regular colors 'color.diff.{old,new}' will be - used. + the two colors indicates that a new block was detected. dimmed_zebra:: Similar to 'zebra', but additional dimming of uninteresting parts of moved code is performed. The bordering lines of two adjacent diff --git a/diff.c b/diff.c index f598d8a3a..c50fcc7ea 100644 --- a/diff.c +++ b/diff.c @@ -864,19 +864,31 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb, /* * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing. * - * Otherwise, if the last block has fewer lines than - * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in + * Otherwise, if the last block has fewer alphanumeric characters than + * COLOR_MOVED_MIN_ALNUM_COUNT, unset DIFF_SYMBOL_MOVED_LINE on all lines in * that block. * * The last block consists of the (n - block_length)'th line up to but not * including the nth line. + * + * NEEDSWORK: This uses the same heuristic as blame_entry_score() in blame.c. + * Think of a way to unify them. */ static void adjust_last_block(struct diff_options *o, int n, int block_length) { - int i; - if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH || - o->color_moved == COLOR_MOVED_PLAIN) + int i, alnum_count = 0; + if (o->color_moved == COLOR_MOVED_PLAIN) return; + for (i = 1; i < block_length + 1; i++) { + const char *c = o->emitted_symbols->buf[n - i].line; + for (; *c; c++) { + if (!isalnum(*c)) + continue; + alnum_count++; + if (alnum_count >= COLOR_MOVED_MIN_ALNUM_COUNT) + return; + } + } for (i = 1; i < block_length + 1; i++) o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE; } @@ -923,7 +935,6 @@ static void mark_color_as_moved(struct diff_options *o, } l->flags |= DIFF_SYMBOL_MOVED_LINE; - block_length++; if (o->color_moved == COLOR_MOVED_PLAIN) continue; @@ -953,8 +964,13 @@ static void mark_color_as_moved(struct diff_options *o, } flipped_block = (flipped_block + 1) % 2; + + adjust_last_block(o, n, block_length); + block_length = 0; } + block_length++; + if (flipped_block) l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT; } diff --git a/diff.h b/diff.h index 5755f465d..aca150ba2 100644 --- a/diff.h +++ b/diff.h @@ -195,7 +195,7 @@ struct diff_options { COLOR_MOVED_ZEBRA_DIM = 3, } color_moved; #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA - #define COLOR_MOVED_MIN_BLOCK_LENGTH 3 + #define COLOR_MOVED_MIN_ALNUM_COUNT 20 }; void diff_emit_submodule_del(struct diff_options *o, const char *line); diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh index 6ae62a6bf..12d182dc1 100755 --- a/t/t4015-diff-whitespace.sh +++ b/t/t4015-diff-whitespace.sh @@ -1101,9 +1101,9 @@ test_expect_success 'detect malicious moved code, inside file' ' <BRED>-{<RESET> <BLUE>-if (!u->is_allowed_foo)<RESET> <BLUE>-return;<RESET> - <BRED>-foo(u);<RESET> - <BLUE>-}<RESET> - <BLUE>-<RESET> + <RED>-foo(u);<RESET> + <RED>-}<RESET> + <RED>-<RESET> int main()<RESET> {<RESET> foo();<RESET> @@ -1117,11 +1117,11 @@ test_expect_success 'detect malicious moved code, inside file' ' <RESET> <BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET> <BGREEN>+<RESET><BGREEN>{<RESET> - <YELLOW>+<RESET><YELLOW>foo(u);<RESET> + <GREEN>+<RESET><GREEN>foo(u);<RESET> <BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET> <BGREEN>+<RESET><BGREEN>return;<RESET> - <YELLOW>+<RESET><YELLOW>}<RESET> - <YELLOW>+<RESET> + <GREEN>+<RESET><GREEN>}<RESET> + <GREEN>+<RESET> int another_function()<RESET> {<RESET> bar();<RESET> @@ -1182,9 +1182,9 @@ test_expect_success 'plain moved code, inside file' ' test_expect_success 'detect permutations inside moved code -- dimmed_zebra' ' git reset --hard && cat <<-\EOF >lines.txt && - line 1 - line 2 - line 3 + long line 1 + long line 2 + long line 3 line 4 line 5 line 6 @@ -1195,9 +1195,9 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' ' line 11 line 12 line 13 - line 14 - line 15 - line 16 + long line 14 + long line 15 + long line 16 EOF git add lines.txt && git commit -m "add poetry" && @@ -1208,12 +1208,12 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' ' line 7 line 8 line 9 - line 1 - line 2 - line 3 - line 14 - line 15 - line 16 + long line 1 + long line 2 + long line 3 + long line 14 + long line 15 + long line 16 line 10 line 11 line 12 @@ -1227,35 +1227,36 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' ' test_config color.diff.newMovedDimmed "normal cyan" && test_config color.diff.oldMovedAlternativeDimmed "normal blue" && test_config color.diff.newMovedAlternativeDimmed "normal yellow" && - git diff HEAD --no-renames --color-moved=dimmed_zebra| test_decode_color >actual && + git diff HEAD --no-renames --color-moved=dimmed_zebra | + grep -v "index" | + test_decode_color >actual && cat <<-\EOF >expected && <BOLD>diff --git a/lines.txt b/lines.txt<RESET> - <BOLD>index 47ea9c3..ba96a38 100644<RESET> <BOLD>--- a/lines.txt<RESET> <BOLD>+++ b/lines.txt<RESET> <CYAN>@@ -1,16 +1,16 @@<RESET> - <BMAGENTA>-line 1<RESET> - <BMAGENTA>-line 2<RESET> - <BMAGENTA>-line 3<RESET> + <BMAGENTA>-long line 1<RESET> + <BMAGENTA>-long line 2<RESET> + <BMAGENTA>-long line 3<RESET> line 4<RESET> line 5<RESET> line 6<RESET> line 7<RESET> line 8<RESET> line 9<RESET> - <BCYAN>+<RESET><BCYAN>line 1<RESET> - <BCYAN>+<RESET><BCYAN>line 2<RESET> - <CYAN>+<RESET><CYAN>line 3<RESET> - <YELLOW>+<RESET><YELLOW>line 14<RESET> - <BYELLOW>+<RESET><BYELLOW>line 15<RESET> - <BYELLOW>+<RESET><BYELLOW>line 16<RESET> + <BCYAN>+<RESET><BCYAN>long line 1<RESET> + <BCYAN>+<RESET><BCYAN>long line 2<RESET> + <CYAN>+<RESET><CYAN>long line 3<RESET> + <YELLOW>+<RESET><YELLOW>long line 14<RESET> + <BYELLOW>+<RESET><BYELLOW>long line 15<RESET> + <BYELLOW>+<RESET><BYELLOW>long line 16<RESET> line 10<RESET> line 11<RESET> line 12<RESET> line 13<RESET> - <BMAGENTA>-line 14<RESET> - <BMAGENTA>-line 15<RESET> - <BMAGENTA>-line 16<RESET> + <BMAGENTA>-long line 14<RESET> + <BMAGENTA>-long line 15<RESET> + <BMAGENTA>-long line 16<RESET> EOF test_cmp expected actual ' @@ -1270,35 +1271,36 @@ test_expect_success 'cmd option assumes configured colored-moved' ' test_config color.diff.oldMovedAlternativeDimmed "normal blue" && test_config color.diff.newMovedAlternativeDimmed "normal yellow" && test_config diff.colorMoved zebra && - git diff HEAD --no-renames --color-moved| test_decode_color >actual && + git diff HEAD --no-renames --color-moved | + grep -v "index" | + test_decode_color >actual && cat <<-\EOF >expected && <BOLD>diff --git a/lines.txt b/lines.txt<RESET> - <BOLD>index 47ea9c3..ba96a38 100644<RESET> <BOLD>--- a/lines.txt<RESET> <BOLD>+++ b/lines.txt<RESET> <CYAN>@@ -1,16 +1,16 @@<RESET> - <MAGENTA>-line 1<RESET> - <MAGENTA>-line 2<RESET> - <MAGENTA>-line 3<RESET> + <MAGENTA>-long line 1<RESET> + <MAGENTA>-long line 2<RESET> + <MAGENTA>-long line 3<RESET> line 4<RESET> line 5<RESET> line 6<RESET> line 7<RESET> line 8<RESET> line 9<RESET> - <CYAN>+<RESET><CYAN>line 1<RESET> - <CYAN>+<RESET><CYAN>line 2<RESET> - <CYAN>+<RESET><CYAN>line 3<RESET> - <YELLOW>+<RESET><YELLOW>line 14<RESET> - <YELLOW>+<RESET><YELLOW>line 15<RESET> - <YELLOW>+<RESET><YELLOW>line 16<RESET> + <CYAN>+<RESET><CYAN>long line 1<RESET> + <CYAN>+<RESET><CYAN>long line 2<RESET> + <CYAN>+<RESET><CYAN>long line 3<RESET> + <YELLOW>+<RESET><YELLOW>long line 14<RESET> + <YELLOW>+<RESET><YELLOW>long line 15<RESET> + <YELLOW>+<RESET><YELLOW>long line 16<RESET> line 10<RESET> line 11<RESET> line 12<RESET> line 13<RESET> - <MAGENTA>-line 14<RESET> - <MAGENTA>-line 15<RESET> - <MAGENTA>-line 16<RESET> + <MAGENTA>-long line 14<RESET> + <MAGENTA>-long line 15<RESET> + <MAGENTA>-long line 16<RESET> EOF test_cmp expected actual ' @@ -1324,16 +1326,16 @@ line 1 line 2 line 3 line 4 -line 5 -line 6 -line 7 +long line 5 +long line 6 +long line 7 EOF git add lines.txt && git commit -m "add poetry" && cat <<\EOF >lines.txt && - line 5 - line 6 - line 7 + long line 5 + long line 6 + long line 7 line 1 line 2 line 3 @@ -1341,48 +1343,50 @@ line 4 EOF test_config color.diff.oldMoved "magenta" && test_config color.diff.newMoved "cyan" && - git diff HEAD --no-renames --color-moved| test_decode_color >actual && + git diff HEAD --no-renames --color-moved | + grep -v "index" | + test_decode_color >actual && cat <<-\EOF >expected && <BOLD>diff --git a/lines.txt b/lines.txt<RESET> - <BOLD>index 734156d..eb89ead 100644<RESET> <BOLD>--- a/lines.txt<RESET> <BOLD>+++ b/lines.txt<RESET> <CYAN>@@ -1,7 +1,7 @@<RESET> - <GREEN>+<RESET> <GREEN>line 5<RESET> - <GREEN>+<RESET> <GREEN>line 6<RESET> - <GREEN>+<RESET> <GREEN>line 7<RESET> + <GREEN>+<RESET> <GREEN>long line 5<RESET> + <GREEN>+<RESET> <GREEN>long line 6<RESET> + <GREEN>+<RESET> <GREEN>long line 7<RESET> line 1<RESET> line 2<RESET> line 3<RESET> line 4<RESET> - <RED>-line 5<RESET> - <RED>-line 6<RESET> - <RED>-line 7<RESET> + <RED>-long line 5<RESET> + <RED>-long line 6<RESET> + <RED>-long line 7<RESET> EOF test_cmp expected actual && - git diff HEAD --no-renames -w --color-moved| test_decode_color >actual && + git diff HEAD --no-renames -w --color-moved | + grep -v "index" | + test_decode_color >actual && cat <<-\EOF >expected && <BOLD>diff --git a/lines.txt b/lines.txt<RESET> - <BOLD>index 734156d..eb89ead 100644<RESET> <BOLD>--- a/lines.txt<RESET> <BOLD>+++ b/lines.txt<RESET> <CYAN>@@ -1,7 +1,7 @@<RESET> - <CYAN>+<RESET> <CYAN>line 5<RESET> - <CYAN>+<RESET> <CYAN>line 6<RESET> - <CYAN>+<RESET> <CYAN>line 7<RESET> + <CYAN>+<RESET> <CYAN>long line 5<RESET> + <CYAN>+<RESET> <CYAN>long line 6<RESET> + <CYAN>+<RESET> <CYAN>long line 7<RESET> line 1<RESET> line 2<RESET> line 3<RESET> line 4<RESET> - <MAGENTA>-line 5<RESET> - <MAGENTA>-line 6<RESET> - <MAGENTA>-line 7<RESET> + <MAGENTA>-long line 5<RESET> + <MAGENTA>-long line 6<RESET> + <MAGENTA>-long line 7<RESET> EOF test_cmp expected actual ' -test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' ' +test_expect_success '--color-moved block at end of diff output respects MIN_ALNUM_COUNT' ' git reset --hard && >bar && cat <<-\EOF >foo && @@ -1419,6 +1423,90 @@ test_expect_success '--color-moved block at end of diff output respects MIN_BLOC test_cmp expected actual ' +test_expect_success '--color-moved respects MIN_ALNUM_COUNT' ' + git reset --hard && + cat <<-\EOF >foo && + nineteen chars 456789 + irrelevant_line + twenty chars 234567890 + EOF + >bar && + git add foo bar && + git commit -m x && + + cat <<-\EOF >foo && + irrelevant_line + EOF + cat <<-\EOF >bar && + twenty chars 234567890 + nineteen chars 456789 + EOF + + git diff HEAD --color-moved=zebra --no-renames | + grep -v "index" | + test_decode_color >actual && + cat >expected <<-\EOF && + <BOLD>diff --git a/bar b/bar<RESET> + <BOLD>--- a/bar<RESET> + <BOLD>+++ b/bar<RESET> + <CYAN>@@ -0,0 +1,2 @@<RESET> + <BOLD;CYAN>+<RESET><BOLD;CYAN>twenty chars 234567890<RESET> + <GREEN>+<RESET><GREEN>nineteen chars 456789<RESET> + <BOLD>diff --git a/foo b/foo<RESET> + <BOLD>--- a/foo<RESET> + <BOLD>+++ b/foo<RESET> + <CYAN>@@ -1,3 +1 @@<RESET> + <RED>-nineteen chars 456789<RESET> + irrelevant_line<RESET> + <BOLD;MAGENTA>-twenty chars 234567890<RESET> + EOF + + test_cmp expected actual +' + +test_expect_success '--color-moved treats adjacent blocks as separate for MIN_ALNUM_COUNT' ' + git reset --hard && + cat <<-\EOF >foo && + 7charsA + irrelevant_line + 7charsB + 7charsC + EOF + >bar && + git add foo bar && + git commit -m x && + + cat <<-\EOF >foo && + irrelevant_line + EOF + cat <<-\EOF >bar && + 7charsB + 7charsC + 7charsA + EOF + + git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual && + cat >expected <<-\EOF && + <BOLD>diff --git a/bar b/bar<RESET> + <BOLD>--- a/bar<RESET> + <BOLD>+++ b/bar<RESET> + <CYAN>@@ -0,0 +1,3 @@<RESET> + <GREEN>+<RESET><GREEN>7charsB<RESET> + <GREEN>+<RESET><GREEN>7charsC<RESET> + <GREEN>+<RESET><GREEN>7charsA<RESET> + <BOLD>diff --git a/foo b/foo<RESET> + <BOLD>--- a/foo<RESET> + <BOLD>+++ b/foo<RESET> + <CYAN>@@ -1,4 +1 @@<RESET> + <RED>-7charsA<RESET> + irrelevant_line<RESET> + <RED>-7charsB<RESET> + <RED>-7charsC<RESET> + EOF + + test_cmp expected actual +' + test_expect_success 'move detection with submodules' ' test_create_repo bananas && echo ripe >bananas/recipe && -- 2.14.1.480.gb18f417b89-goog ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v4 0/3] "diff --color-moved" with yet another heuristic 2017-08-16 1:27 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Jonathan Tan ` (2 preceding siblings ...) 2017-08-16 1:27 ` [PATCH v4 3/3] diff: define block by number of alphanumeric chars Jonathan Tan @ 2017-08-16 5:55 ` Stefan Beller 3 siblings, 0 replies; 30+ messages in thread From: Stefan Beller @ 2017-08-16 5:55 UTC (permalink / raw) To: Jonathan Tan; +Cc: git@vger.kernel.org, Junio C Hamano On Tue, Aug 15, 2017 at 6:27 PM, Jonathan Tan <jonathantanmy@google.com> wrote: > These patches are on sb/diff-color-move. > > Patches 1 and 2 are unchanged, except for some line wrapping in a test > in patch 2. > > This has been updated to use the same alphanumeric heuristic as blame > (20 alnum characters). I tried it out and I thought the results were > reasonable in a patch set that I'm working on (the pack-related function > refactoring one). ok, great! > As for refactoring blame.c and this file, I'm not sure where best to put > the new function, so I've added a NEEDSWORK for now. Let's just cook this heuristic a bit and see if we want to refactor them or tweak them differently. > As for detecting block boundaries in adjust_last_block(), I've left it > as-is for now. I think it's clearer if the parent function provides that > information, since it already tracks that. In addition, we avoid corner > cases such as what happens if the block is at the start of the diff > output (we must ensure that we don't read off the beginning edge, for > example). ok. Thanks! Stefan > > Jonathan Tan (3): > diff: avoid redundantly clearing a flag > diff: respect MIN_BLOCK_LENGTH for last block > diff: define block by number of alphanumeric chars > > Documentation/diff-options.txt | 8 +- > diff.c | 47 ++++++-- > diff.h | 2 +- > t/t4015-diff-whitespace.sh | 261 ++++++++++++++++++++++++++++++----------- > 4 files changed, 236 insertions(+), 82 deletions(-) > > -- > 2.14.1.480.gb18f417b89-goog > ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2017-08-16 5:55 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan 2017-08-11 22:49 ` [RFC PATCH 1/3] diff: avoid redundantly clearing a flag Jonathan Tan 2017-08-14 17:13 ` Stefan Beller 2017-08-11 22:49 ` [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan 2017-08-14 17:17 ` Stefan Beller 2017-08-11 22:49 ` [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan 2017-08-14 17:22 ` Stefan Beller 2017-08-12 0:39 ` [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Junio C Hamano 2017-08-14 17:29 ` Stefan Beller 2017-08-14 19:37 ` Junio C Hamano 2017-08-14 19:51 ` Stefan Beller 2017-08-15 17:07 ` Junio C Hamano 2017-08-14 21:31 ` [PATCH v2 " Jonathan Tan 2017-08-14 21:31 ` [PATCH v2 1/3] diff: avoid redundantly clearing a flag Jonathan Tan 2017-08-14 21:31 ` [PATCH v2 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan 2017-08-14 21:31 ` [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan 2017-08-14 22:46 ` Stefan Beller 2017-08-14 23:57 ` [PATCH v3 0/3] "diff --color-moved" with different heuristic Jonathan Tan 2017-08-14 23:57 ` [PATCH v3 1/3] diff: avoid redundantly clearing a flag Jonathan Tan 2017-08-14 23:57 ` [PATCH v3 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan 2017-08-14 23:57 ` [PATCH v3 3/3] diff: define block by number of non-space chars Jonathan Tan 2017-08-15 2:29 ` Stefan Beller 2017-08-15 19:54 ` Junio C Hamano 2017-08-15 20:06 ` Stefan Beller 2017-08-15 20:53 ` Junio C Hamano 2017-08-16 1:27 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Jonathan Tan 2017-08-16 1:27 ` [PATCH v4 1/3] diff: avoid redundantly clearing a flag Jonathan Tan 2017-08-16 1:27 ` [PATCH v4 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan 2017-08-16 1:27 ` [PATCH v4 3/3] diff: define block by number of alphanumeric chars Jonathan Tan 2017-08-16 5:55 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Stefan Beller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.