* [PATCH] word diff: handle zero length matches
@ 2021-05-04 9:27 Phillip Wood via GitGitGadget
0 siblings, 0 replies; only message in thread
From: Phillip Wood via GitGitGadget @ 2021-05-04 9:27 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Phillip Wood, Phillip Wood
From: Phillip Wood <phillip.wood@dunelm.org.uk>
If find_word_boundaries() encounters a zero length match (which can be
caused by matching a newline or using '*' instead of '+' in the regex)
we stop splitting the input into words which generates an inaccurate
diff. To fix this increment the start point when there is a zero
length match and try a new match. This is safe as posix regular
expressions always return the longest available match so a zero length
match means there are no longer matches available from the current
position.
Commit bf82940dbf1 (color-words: enable REG_NEWLINE to help user,
2009-01-17) prevented matching newlines in negated character classes
but it is still possible for the user to have an explicit newline
match in the regex which could cause a zero length match.
One could argue that having explicit newline matches or using '*'
rather than '+' are user errors but it seems to be better to work
round them than produce inaccurate diffs.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
word diff: handle zero length matches
If find_word_boundaries() encounters a zero length match (which can be
caused by matching a newline or using '*' instead of '+' in the regex)
we stop splitting the input into words which generates an inaccurate
diff. To fix this increment the start point when there is a zero length
match and try a new match.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-947%2Fphillipwood%2Fwip%2Fword-diff-zero-length-matches-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-947/phillipwood/wip/word-diff-zero-length-matches-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/947
diff.c | 10 +++++++---
t/t4034-diff-words.sh | 5 +++++
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/diff.c b/diff.c
index 4acccd9d7edb..c8b1d724349c 100644
--- a/diff.c
+++ b/diff.c
@@ -2053,7 +2053,7 @@ static void fn_out_diff_words_aux(void *priv,
static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
int *begin, int *end)
{
- if (word_regex && *begin < buffer->size) {
+ while (word_regex && *begin < buffer->size) {
regmatch_t match[1];
if (!regexec_buf(word_regex, buffer->ptr + *begin,
buffer->size - *begin, 1, match, 0)) {
@@ -2061,9 +2061,13 @@ static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
'\n', match[0].rm_eo - match[0].rm_so);
*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
*begin += match[0].rm_so;
- return *begin >= *end;
+ if (*begin == *end)
+ (*begin)++;
+ else
+ return *begin > *end;
+ } else {
+ return -1;
}
- return -1;
}
/* find the next word */
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index ee7721ab9135..561c582d1615 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -184,6 +184,11 @@ test_expect_success 'word diff with a regular expression' '
word_diff --color-words="[a-z]+"
'
+test_expect_success 'word diff with zero length matches' '
+ cp expect.letter-runs-are-words expect &&
+ word_diff --color-words="[a-z${LF}]*"
+'
+
test_expect_success 'set up a diff driver' '
git config diff.testdriver.wordRegex "[^[:space:]]" &&
cat <<-\EOF >.gitattributes
base-commit: 7e391989789db82983665667013a46eabc6fc570
--
gitgitgadget
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2021-05-04 9:27 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-05-04 9:27 [PATCH] word diff: handle zero length matches Phillip Wood via GitGitGadget
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.