* [PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
@ 2024-07-18 14:23 Phillip Wood via GitGitGadget
2024-07-18 16:29 ` Junio C Hamano
2024-07-20 16:01 ` [PATCH v2 0/2] " Phillip Wood via GitGitGadget
0 siblings, 2 replies; 6+ messages in thread
From: Phillip Wood via GitGitGadget @ 2024-07-18 14:23 UTC (permalink / raw)
To: git; +Cc: Jeff King, Phillip Wood, Phillip Wood
From: Phillip Wood <phillip.wood@dunelm.org.uk>
When "add -p" parses diffs, it looks for context lines starting with a
single space. But when diff.suppressBlankEmpty is in effect, an empty
context line will omit the space, giving us a true empty line. This
confuses the parser, which is unable to split based on such a line.
It's tempting to say that we should just make sure that we generate a
diff without that option. However, although we do not parse hunks that
the user has manually edited with parse_diff() we do allow the user
to split such hunks. As POSIX calls the decision of whether to print the
space here "implementation-defined" we need to handle edited hunks where
empty context lines omit the space.
So let's handle both cases: a context line either starts with a space or
consists of a totally empty line by normalizing the first character to a
space when we parse them. Normalizing the first character rather than
changing the code to check for a space or newline will hopefully future
proof against introducing similar bugs if the code is changed.
Reported-by: Ilya Tumaykin <itumaykin@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
add-patch: handle splitting hunks with diff.suppressBlankEmpty
This is an alternative to jk/add-patch-with-suppress-blank-empty which
was recently discarded from next. I hope that normalizing the context
marker will simplify any future changes to the code.
While I was writing this I realized that we should be recalculating
hunk->splittable_into when we re-count the hunk header of it is edited
but I've left to for a future series.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1763%2Fphillipwood%2Fadd-p-suppress-blank-empty-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1763/phillipwood/add-p-suppress-blank-empty-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1763
add-patch.c | 17 ++++++++++++-----
t/t3701-add-interactive.sh | 11 +++++++++++
2 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/add-patch.c b/add-patch.c
index d8ea05ff108..13b2607f544 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -400,6 +400,12 @@ static void complete_file(char marker, struct hunk *hunk)
hunk->splittable_into++;
}
+/* Empty context lines may omit the leading ' ' */
+static int normalize_marker(char *p)
+{
+ return p[0] == '\n' || (p[0] == '\r' && p[1] == '\n') ? ' ' : p[0];
+}
+
static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
{
struct strvec args = STRVEC_INIT;
@@ -485,6 +491,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
while (p != pend) {
char *eol = memchr(p, '\n', pend - p);
const char *deleted = NULL, *mode_change = NULL;
+ char ch = normalize_marker(p);
if (!eol)
eol = pend;
@@ -532,7 +539,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
* Start counting into how many hunks this one can be
* split
*/
- marker = *p;
+ marker = ch;
} else if (hunk == &file_diff->head &&
starts_with(p, "new file")) {
file_diff->added = 1;
@@ -586,10 +593,10 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
(int)(eol - (plain->buf + file_diff->head.start)),
plain->buf + file_diff->head.start);
- if ((marker == '-' || marker == '+') && *p == ' ')
+ if ((marker == '-' || marker == '+') && ch == ' ')
hunk->splittable_into++;
- if (marker && *p != '\\')
- marker = *p;
+ if (marker && ch != '\\')
+ marker = ch;
p = eol == pend ? pend : eol + 1;
hunk->end = p - plain->buf;
@@ -953,7 +960,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
context_line_count = 0;
while (splittable_into > 1) {
- ch = s->plain.buf[current];
+ ch = normalize_marker(s->plain.buf + current);
if (!ch)
BUG("buffer overrun while splitting hunks");
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 5d78868ac16..351dd2b4332 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -1164,4 +1164,15 @@ test_expect_success 'reset -p with unmerged files' '
test_must_be_empty staged
'
+test_expect_success 'hunk splitting works with diff.suppressBlankEmpty' '
+ test_config diff.suppressBlankEmpty true &&
+ test_write_lines a b c "" d e f >file &&
+ git add file &&
+ test_write_lines p q r "" s t u >file &&
+ test_write_lines s n y q | git add -p &&
+ git cat-file blob :file >actual &&
+ test_write_lines a b c "" s t u >expect &&
+ test_cmp expect actual
+'
+
test_done
base-commit: 790a17fb19d6eadd16c52e5d284a5c6921744766
--
gitgitgadget
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
2024-07-18 14:23 [PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty Phillip Wood via GitGitGadget
@ 2024-07-18 16:29 ` Junio C Hamano
2024-07-19 15:17 ` Phillip Wood
2024-07-20 16:01 ` [PATCH v2 0/2] " Phillip Wood via GitGitGadget
1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2024-07-18 16:29 UTC (permalink / raw)
To: Phillip Wood via GitGitGadget; +Cc: git, Jeff King, Phillip Wood
"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> When "add -p" parses diffs, it looks for context lines starting with a
> single space. But when diff.suppressBlankEmpty is in effect, an empty
> context line will omit the space, giving us a true empty line. This
> confuses the parser, which is unable to split based on such a line.
>
> It's tempting to say that we should just make sure that we generate a
> diff without that option. However, although we do not parse hunks that
> the user has manually edited with parse_diff() we do allow the user
> to split such hunks. As POSIX calls the decision of whether to print the
> space here "implementation-defined" we need to handle edited hunks where
> empty context lines omit the space.
>
> So let's handle both cases: a context line either starts with a space or
> consists of a totally empty line by normalizing the first character to a
> space when we parse them. Normalizing the first character rather than
> changing the code to check for a space or newline will hopefully future
> proof against introducing similar bugs if the code is changed.
>
> Reported-by: Ilya Tumaykin <itumaykin@gmail.com>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
Well written. Thanks for a pleasant read.
> @@ -953,7 +960,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
> context_line_count = 0;
>
> while (splittable_into > 1) {
> - ch = s->plain.buf[current];
> + ch = normalize_marker(s->plain.buf + current);
I wondered if &s->plain.buf[current] is easier to grok, but what's
written already is good enough ;-)
There is another explicit mention of ' ' in merge_hunks(). Unless
we are normalizing the buffer here (which I do not think we do),
wouldn't we have to make sure that the code over there also knows
that an empty line in a patch is an unmodified empty line?
if (plain[overlap_end] != ' ')
return error(_("expected context line "
"#%d in\n%.*s"),
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
2024-07-18 16:29 ` Junio C Hamano
@ 2024-07-19 15:17 ` Phillip Wood
0 siblings, 0 replies; 6+ messages in thread
From: Phillip Wood @ 2024-07-19 15:17 UTC (permalink / raw)
To: Junio C Hamano, Phillip Wood via GitGitGadget
Cc: git, Jeff King, Phillip Wood
Hi Junio
On 18/07/2024 17:29, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> When "add -p" parses diffs, it looks for context lines starting with a
>> single space. But when diff.suppressBlankEmpty is in effect, an empty
>> context line will omit the space, giving us a true empty line. This
>> confuses the parser, which is unable to split based on such a line.
>>
>> It's tempting to say that we should just make sure that we generate a
>> diff without that option. However, although we do not parse hunks that
>> the user has manually edited with parse_diff() we do allow the user
>> to split such hunks. As POSIX calls the decision of whether to print the
>> space here "implementation-defined" we need to handle edited hunks where
>> empty context lines omit the space.
>>
>> So let's handle both cases: a context line either starts with a space or
>> consists of a totally empty line by normalizing the first character to a
>> space when we parse them. Normalizing the first character rather than
>> changing the code to check for a space or newline will hopefully future
>> proof against introducing similar bugs if the code is changed.
>>
>> Reported-by: Ilya Tumaykin <itumaykin@gmail.com>
>> Helped-by: Jeff King <peff@peff.net>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>
> Well written. Thanks for a pleasant read.
Thanks to Peff for letting me steal his commit message
>> @@ -953,7 +960,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
>> context_line_count = 0;
>>
>> while (splittable_into > 1) {
>> - ch = s->plain.buf[current];
>> + ch = normalize_marker(s->plain.buf + current);
>
> I wondered if &s->plain.buf[current] is easier to grok, but what's
> written already is good enough ;-)
Yes I think that would be better
> There is another explicit mention of ' ' in merge_hunks(). Unless
> we are normalizing the buffer here (which I do not think we do),
> wouldn't we have to make sure that the code over there also knows
> that an empty line in a patch is an unmodified empty line?
>
> if (plain[overlap_end] != ' ')
> return error(_("expected context line "
> "#%d in\n%.*s"),
Oh dear, I'm not sure how I missed that. I'll fix that and update the
test to make sure it exercises that code path as well.
Thanks for your comments
Phillip
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 0/2] add-patch: handle splitting hunks with diff.suppressBlankEmpty
2024-07-18 14:23 [PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty Phillip Wood via GitGitGadget
2024-07-18 16:29 ` Junio C Hamano
@ 2024-07-20 16:01 ` Phillip Wood via GitGitGadget
2024-07-20 16:01 ` [PATCH v2 1/2] " Phillip Wood via GitGitGadget
2024-07-20 16:02 ` [PATCH v2 2/2] add-patch: use normalize_marker() when recounting edited hunk Phillip Wood via GitGitGadget
1 sibling, 2 replies; 6+ messages in thread
From: Phillip Wood via GitGitGadget @ 2024-07-20 16:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Phillip Wood, Phillip Wood
This is an alternative to jk/add-patch-with-suppress-blank-empty which was
recently discarded from next. I hope that normalizing the context marker
will simplify any future changes to the code.
Changes since V1
* Updated merge_hunks() to use normalize_marker() as spotted by Junio
* Updated the test so it checks merge_hunks() as well.
Phillip Wood (2):
add-patch: handle splitting hunks with diff.suppressBlankEmpty
add-patch: use normalize_marker() when recounting edited hunk
add-patch.c | 23 +++++++++++++++--------
t/t3701-add-interactive.sh | 19 +++++++++++++++++++
2 files changed, 34 insertions(+), 8 deletions(-)
base-commit: 790a17fb19d6eadd16c52e5d284a5c6921744766
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1763%2Fphillipwood%2Fadd-p-suppress-blank-empty-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1763/phillipwood/add-p-suppress-blank-empty-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1763
Range-diff vs v1:
1: fd09e66727a ! 1: 34d8fd44a97 add-patch: handle splitting hunks with diff.suppressBlankEmpty
@@ add-patch.c: static void complete_file(char marker, struct hunk *hunk)
}
+/* Empty context lines may omit the leading ' ' */
-+static int normalize_marker(char *p)
++static int normalize_marker(const char *p)
+{
+ return p[0] == '\n' || (p[0] == '\r' && p[1] == '\n') ? ' ' : p[0];
+}
@@ add-patch.c: static int parse_diff(struct add_p_state *s, const struct pathspec
p = eol == pend ? pend : eol + 1;
hunk->end = p - plain->buf;
+@@ add-patch.c: static int merge_hunks(struct add_p_state *s, struct file_diff *file_diff,
+ (int)(hunk->end - hunk->start),
+ plain + hunk->start);
+
+- if (plain[overlap_end] != ' ')
++ if (normalize_marker(&plain[overlap_end]) != ' ')
+ return error(_("expected context line "
+ "#%d in\n%.*s"),
+ (int)(j + 1),
@@ add-patch.c: static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
context_line_count = 0;
while (splittable_into > 1) {
- ch = s->plain.buf[current];
-+ ch = normalize_marker(s->plain.buf + current);
++ ch = normalize_marker(&s->plain.buf[current]);
if (!ch)
BUG("buffer overrun while splitting hunks");
@@ t/t3701-add-interactive.sh: test_expect_success 'reset -p with unmerged files' '
+test_expect_success 'hunk splitting works with diff.suppressBlankEmpty' '
+ test_config diff.suppressBlankEmpty true &&
-+ test_write_lines a b c "" d e f >file &&
++ write_script fake-editor.sh <<-\EOF &&
++ tr F G <"$1" >"$1.tmp" &&
++ mv "$1.tmp" "$1"
++ EOF
++
++ test_write_lines a b "" c d "" e f "" >file &&
+ git add file &&
-+ test_write_lines p q r "" s t u >file &&
-+ test_write_lines s n y q | git add -p &&
++ test_write_lines A b "" c D "" e F "" >file &&
++ (
++ test_set_editor "$(pwd)/fake-editor.sh" &&
++ test_write_lines s n y e q | git add -p file
++ ) &&
+ git cat-file blob :file >actual &&
-+ test_write_lines a b c "" s t u >expect &&
++ test_write_lines a b "" c D "" e G "" >expect &&
+ test_cmp expect actual
+'
+
-: ----------- > 2: 7bdcd2df012 add-patch: use normalize_marker() when recounting edited hunk
--
gitgitgadget
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH v2 1/2] add-patch: handle splitting hunks with diff.suppressBlankEmpty
2024-07-20 16:01 ` [PATCH v2 0/2] " Phillip Wood via GitGitGadget
@ 2024-07-20 16:01 ` Phillip Wood via GitGitGadget
2024-07-20 16:02 ` [PATCH v2 2/2] add-patch: use normalize_marker() when recounting edited hunk Phillip Wood via GitGitGadget
1 sibling, 0 replies; 6+ messages in thread
From: Phillip Wood via GitGitGadget @ 2024-07-20 16:01 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Phillip Wood, Phillip Wood,
Phillip Wood
From: Phillip Wood <phillip.wood@dunelm.org.uk>
When "add -p" parses diffs, it looks for context lines starting with a
single space. But when diff.suppressBlankEmpty is in effect, an empty
context line will omit the space, giving us a true empty line. This
confuses the parser, which is unable to split based on such a line.
It's tempting to say that we should just make sure that we generate a
diff without that option. However, although we do not parse hunks that
the user has manually edited with parse_diff() we do allow the user
to split such hunks. As POSIX calls the decision of whether to print the
space here "implementation-defined" we need to handle edited hunks where
empty context lines omit the space.
So let's handle both cases: a context line either starts with a space or
consists of a totally empty line by normalizing the first character to a
space when we parse them. Normalizing the first character rather than
changing the code to check for a space or newline will hopefully future
proof against introducing similar bugs if the code is changed.
Reported-by: Ilya Tumaykin <itumaykin@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
add-patch.c | 19 +++++++++++++------
t/t3701-add-interactive.sh | 19 +++++++++++++++++++
2 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/add-patch.c b/add-patch.c
index d8ea05ff108..8feb719483f 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -400,6 +400,12 @@ static void complete_file(char marker, struct hunk *hunk)
hunk->splittable_into++;
}
+/* Empty context lines may omit the leading ' ' */
+static int normalize_marker(const char *p)
+{
+ return p[0] == '\n' || (p[0] == '\r' && p[1] == '\n') ? ' ' : p[0];
+}
+
static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
{
struct strvec args = STRVEC_INIT;
@@ -485,6 +491,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
while (p != pend) {
char *eol = memchr(p, '\n', pend - p);
const char *deleted = NULL, *mode_change = NULL;
+ char ch = normalize_marker(p);
if (!eol)
eol = pend;
@@ -532,7 +539,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
* Start counting into how many hunks this one can be
* split
*/
- marker = *p;
+ marker = ch;
} else if (hunk == &file_diff->head &&
starts_with(p, "new file")) {
file_diff->added = 1;
@@ -586,10 +593,10 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
(int)(eol - (plain->buf + file_diff->head.start)),
plain->buf + file_diff->head.start);
- if ((marker == '-' || marker == '+') && *p == ' ')
+ if ((marker == '-' || marker == '+') && ch == ' ')
hunk->splittable_into++;
- if (marker && *p != '\\')
- marker = *p;
+ if (marker && ch != '\\')
+ marker = ch;
p = eol == pend ? pend : eol + 1;
hunk->end = p - plain->buf;
@@ -813,7 +820,7 @@ static int merge_hunks(struct add_p_state *s, struct file_diff *file_diff,
(int)(hunk->end - hunk->start),
plain + hunk->start);
- if (plain[overlap_end] != ' ')
+ if (normalize_marker(&plain[overlap_end]) != ' ')
return error(_("expected context line "
"#%d in\n%.*s"),
(int)(j + 1),
@@ -953,7 +960,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
context_line_count = 0;
while (splittable_into > 1) {
- ch = s->plain.buf[current];
+ ch = normalize_marker(&s->plain.buf[current]);
if (!ch)
BUG("buffer overrun while splitting hunks");
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 5d78868ac16..9a48933cecf 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -1164,4 +1164,23 @@ test_expect_success 'reset -p with unmerged files' '
test_must_be_empty staged
'
+test_expect_success 'hunk splitting works with diff.suppressBlankEmpty' '
+ test_config diff.suppressBlankEmpty true &&
+ write_script fake-editor.sh <<-\EOF &&
+ tr F G <"$1" >"$1.tmp" &&
+ mv "$1.tmp" "$1"
+ EOF
+
+ test_write_lines a b "" c d "" e f "" >file &&
+ git add file &&
+ test_write_lines A b "" c D "" e F "" >file &&
+ (
+ test_set_editor "$(pwd)/fake-editor.sh" &&
+ test_write_lines s n y e q | git add -p file
+ ) &&
+ git cat-file blob :file >actual &&
+ test_write_lines a b "" c D "" e G "" >expect &&
+ test_cmp expect actual
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH v2 2/2] add-patch: use normalize_marker() when recounting edited hunk
2024-07-20 16:01 ` [PATCH v2 0/2] " Phillip Wood via GitGitGadget
2024-07-20 16:01 ` [PATCH v2 1/2] " Phillip Wood via GitGitGadget
@ 2024-07-20 16:02 ` Phillip Wood via GitGitGadget
1 sibling, 0 replies; 6+ messages in thread
From: Phillip Wood via GitGitGadget @ 2024-07-20 16:02 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Phillip Wood, Phillip Wood,
Phillip Wood
From: Phillip Wood <phillip.wood@dunelm.org.uk>
After the user has edited a hunk the number of lines in the pre- and
post- image lines is recounted the hunk header can be updated before
passing the hunk to "git apply". The recounting code correctly handles
empty context lines where the leading ' ' is omitted by treating '\n'
and '\r' as context lines.
Update this code to use normalize_marker() so that the handling of empty
context lines is consistent with the rest of the hunk parsing
code. There is a small change in behavior as normalize_marker() only
treats "\r\n" as an empty context line rather than any line starting
with '\r'. This should not matter in practice as Macs have used Unix
line endings since MacOs 10 was released in 2001 and if it transpires
that someone is still using an earlier version of MacOs where lines end
with '\r' then we will need to change the handling of '\r' in
normalize_marker() anyway.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
add-patch.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/add-patch.c b/add-patch.c
index 8feb719483f..2a72ad63d14 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -1178,14 +1178,14 @@ static ssize_t recount_edited_hunk(struct add_p_state *s, struct hunk *hunk,
header->old_count = header->new_count = 0;
for (i = hunk->start; i < hunk->end; ) {
- switch (s->plain.buf[i]) {
+ switch(normalize_marker(&s->plain.buf[i])) {
case '-':
header->old_count++;
break;
case '+':
header->new_count++;
break;
- case ' ': case '\r': case '\n':
+ case ' ':
header->old_count++;
header->new_count++;
break;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-07-20 16:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-18 14:23 [PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty Phillip Wood via GitGitGadget
2024-07-18 16:29 ` Junio C Hamano
2024-07-19 15:17 ` Phillip Wood
2024-07-20 16:01 ` [PATCH v2 0/2] " Phillip Wood via GitGitGadget
2024-07-20 16:01 ` [PATCH v2 1/2] " Phillip Wood via GitGitGadget
2024-07-20 16:02 ` [PATCH v2 2/2] add-patch: use normalize_marker() when recounting edited hunk Phillip Wood via GitGitGadget
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).