From: "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Phillip Wood <phillip.wood@dunelm.org.uk>,
Phillip Wood <phillip.wood@dunelm.org.uk>
Subject: [PATCH] word diff: handle zero length matches
Date: Tue, 04 May 2021 09:27:34 +0000 [thread overview]
Message-ID: <pull.947.git.1620120455364.gitgitgadget@gmail.com> (raw)
From: Phillip Wood <phillip.wood@dunelm.org.uk>
If find_word_boundaries() encounters a zero length match (which can be
caused by matching a newline or using '*' instead of '+' in the regex)
we stop splitting the input into words which generates an inaccurate
diff. To fix this increment the start point when there is a zero
length match and try a new match. This is safe as posix regular
expressions always return the longest available match so a zero length
match means there are no longer matches available from the current
position.
Commit bf82940dbf1 (color-words: enable REG_NEWLINE to help user,
2009-01-17) prevented matching newlines in negated character classes
but it is still possible for the user to have an explicit newline
match in the regex which could cause a zero length match.
One could argue that having explicit newline matches or using '*'
rather than '+' are user errors but it seems to be better to work
round them than produce inaccurate diffs.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
word diff: handle zero length matches
If find_word_boundaries() encounters a zero length match (which can be
caused by matching a newline or using '*' instead of '+' in the regex)
we stop splitting the input into words which generates an inaccurate
diff. To fix this increment the start point when there is a zero length
match and try a new match.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-947%2Fphillipwood%2Fwip%2Fword-diff-zero-length-matches-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-947/phillipwood/wip/word-diff-zero-length-matches-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/947
diff.c | 10 +++++++---
t/t4034-diff-words.sh | 5 +++++
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/diff.c b/diff.c
index 4acccd9d7edb..c8b1d724349c 100644
--- a/diff.c
+++ b/diff.c
@@ -2053,7 +2053,7 @@ static void fn_out_diff_words_aux(void *priv,
static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
int *begin, int *end)
{
- if (word_regex && *begin < buffer->size) {
+ while (word_regex && *begin < buffer->size) {
regmatch_t match[1];
if (!regexec_buf(word_regex, buffer->ptr + *begin,
buffer->size - *begin, 1, match, 0)) {
@@ -2061,9 +2061,13 @@ static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
'\n', match[0].rm_eo - match[0].rm_so);
*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
*begin += match[0].rm_so;
- return *begin >= *end;
+ if (*begin == *end)
+ (*begin)++;
+ else
+ return *begin > *end;
+ } else {
+ return -1;
}
- return -1;
}
/* find the next word */
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index ee7721ab9135..561c582d1615 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -184,6 +184,11 @@ test_expect_success 'word diff with a regular expression' '
word_diff --color-words="[a-z]+"
'
+test_expect_success 'word diff with zero length matches' '
+ cp expect.letter-runs-are-words expect &&
+ word_diff --color-words="[a-z${LF}]*"
+'
+
test_expect_success 'set up a diff driver' '
git config diff.testdriver.wordRegex "[^[:space:]]" &&
cat <<-\EOF >.gitattributes
base-commit: 7e391989789db82983665667013a46eabc6fc570
--
gitgitgadget
reply other threads:[~2021-05-04 9:27 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.947.git.1620120455364.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=phillip.wood@dunelm.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.