git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git crashes in `git commit --patch` with diff.suppressBlankEmpty = true
@ 2024-07-03 14:41 Ilya Tumaykin
  2024-07-04 13:14 ` Phillip Wood
  0 siblings, 1 reply; 12+ messages in thread
From: Ilya Tumaykin @ 2024-07-03 14:41 UTC (permalink / raw)
  To: git

Hello.

`git commit --patch` crashes with diff.suppressBlankEmpty option enabled 
under certain conditions.


Steps to reproduce:
1. Prepare .gitconfig:
[user]
	name = User
	email = user@example.com
[diff]
	suppressBlankEmpty = true

2. Initialize repo:
$ mkdir git_bug && cd git_bug
$ git init
$ echo -e 'test\n\n test \n\ntest' > test.txt
$ git add test.txt
$ git commit test.txt -m 'initial'

3. Make changes:
$ echo -e 'test\n\n test\ntest\n \n' > test.txt

4. Try to commit new changes
$ git commit --patch test.txt

5. Try to split the first hunk, press 's' in the git-commit interactive 
interface.


Actual results:
diff --git a/test.txt b/test.txt
index 366cd4b..611ca9d 100644
--- a/test.txt
+++ b/test.txt
@@ -1,5 +1,6 @@
  test

- test
-
+ test
  test
+
+
(1/1) Stage this hunk [y,n,q,a,d,s,e,p,?]? s
BUG: add-patch.c:994: unhandled diff marker: '
'
Aborted (core dumped)


Expected results:
git-commit splits the hunk and continues.


Comment:
If I set diff.suppressBlankEmpty = false, then I get the expected behavior.

git --version: 2.45.2
OS: up-to-date Fedora 40

-- 
Best regards.
Ilya Tumaykin.

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: git crashes in `git commit --patch` with diff.suppressBlankEmpty = true
  2024-07-03 14:41 git crashes in `git commit --patch` with diff.suppressBlankEmpty = true Ilya Tumaykin
@ 2024-07-04 13:14 ` Phillip Wood
  2024-07-05 16:39   ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Phillip Wood @ 2024-07-04 13:14 UTC (permalink / raw)
  To: Ilya Tumaykin, git; +Cc: Junio C Hamano, Jeff King, Johannes Schindelin

Hi Ilya

Thanks for reporting this and for the example reproduction. The problem 
is that the code that splits hunks expects context lines to begin with ' 
'. We could fix that fairly simply but I wonder if we should change 
'diff-index' and 'diff-files' to ignore diff.suppressBlankEmpty instead. 
The plumbing diff commands already ignore most of the options that 
change diff output so I'm not quite sure why they respect this 
particular config setting. I've cc'd a few people to see what they think.

Best Wishes

Phillip

On 03/07/2024 15:41, Ilya Tumaykin wrote:
> Hello.
> 
> `git commit --patch` crashes with diff.suppressBlankEmpty option enabled 
> under certain conditions.
> 
> 
> Steps to reproduce:
> 1. Prepare .gitconfig:
> [user]
>      name = User
>      email = user@example.com
> [diff]
>      suppressBlankEmpty = true
> 
> 2. Initialize repo:
> $ mkdir git_bug && cd git_bug
> $ git init
> $ echo -e 'test\n\n test \n\ntest' > test.txt
> $ git add test.txt
> $ git commit test.txt -m 'initial'
> 
> 3. Make changes:
> $ echo -e 'test\n\n test\ntest\n \n' > test.txt
> 
> 4. Try to commit new changes
> $ git commit --patch test.txt
> 
> 5. Try to split the first hunk, press 's' in the git-commit interactive 
> interface.
> 
> 
> Actual results:
> diff --git a/test.txt b/test.txt
> index 366cd4b..611ca9d 100644
> --- a/test.txt
> +++ b/test.txt
> @@ -1,5 +1,6 @@
>   test
> 
> - test
> -
> + test
>   test
> +
> +
> (1/1) Stage this hunk [y,n,q,a,d,s,e,p,?]? s
> BUG: add-patch.c:994: unhandled diff marker: '
> '
> Aborted (core dumped)
> 
> 
> Expected results:
> git-commit splits the hunk and continues.
> 
> 
> Comment:
> If I set diff.suppressBlankEmpty = false, then I get the expected behavior.
> 
> git --version: 2.45.2
> OS: up-to-date Fedora 40
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git crashes in `git commit --patch` with diff.suppressBlankEmpty = true
  2024-07-04 13:14 ` Phillip Wood
@ 2024-07-05 16:39   ` Junio C Hamano
  2024-07-10  9:36     ` [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2024-07-05 16:39 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Ilya Tumaykin, git, Jeff King, Johannes Schindelin

Phillip Wood <phillip.wood123@gmail.com> writes:

> ... but I wonder if we
> should change 'diff-index' and 'diff-files' to ignore
> diff.suppressBlankEmpty instead. The plumbing diff commands already
> ignore most of the options that change diff output so I'm not quite
> sure why they respect this particular config setting.

Very true.  Even though POSIX adopted the text:

    It is implementation-defined whether an empty unaffected line is
    written as an empty line or a line containing a single <space>
    character.

it merely allows the implementation to show an empty unaffected line
as an empty line (where traditionally such an output was not allowed),
and we probably want to give priority to consistent and stable output.
If the addition of diff.suppressBlankEmpty were done in the usually
recommended way to first add command line option to prove that the
feature is useful and then add configuration variable to pretend as
if it were passed, then it would have been perfect.  We then could
have made the plumbing to completely ignore the configuration to
make the output more stable, while allowing script writers a choice
to invoke plumbing commands with explicit comand line options.

But that was not what happened, unfortunately.

If we really wanted to force the world line to where we did what we
should have done back then, I would say we need to do a two-step
transition.

 - Add the --[no-]suppress-blank-empty option from the command line
   to all commands in the diff family.  Plumbing diff trio will
   still pay attention to diff.suppressBlankEmpty but when they see
   it is set to any non-default value (i.e. true) without being set
   by the new command line option, we loudly warn that we will fix
   this historical mistake in Git 3.0 and encourage script writers
   to update their invocation of plumbing diff trio to use the
   command line to custimze.

 - At Git 3.0, plumbing diff trio stops paying attention to the
   diff.suppressBlankEmpty configuration.  By this time, the warning
   we gave in earlier versions would have helped existing scripts to
   migrate away from relying on it and if they want they would
   instead explicitly pass the command line option, so we stop
   warning.

As to the "commit -p" issue, I think the patch parser is in the
wrong and needs to be corrected, period.  As long as the patches
given as input are well-formed, we should be prepared to grok
them (we even allow manual editing of patches, right?).

Thanks.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
  2024-07-05 16:39   ` Junio C Hamano
@ 2024-07-10  9:36     ` Jeff King
  2024-07-10 13:46       ` Phillip Wood
  2024-07-10 17:06       ` Junio C Hamano
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2024-07-10  9:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, Ilya Tumaykin, git, Johannes Schindelin

On Fri, Jul 05, 2024 at 09:39:52AM -0700, Junio C Hamano wrote:

> As to the "commit -p" issue, I think the patch parser is in the
> wrong and needs to be corrected, period.  As long as the patches
> given as input are well-formed, we should be prepared to grok
> them (we even allow manual editing of patches, right?).

Maybe this?

-- >8 --
Subject: add-patch: handle splitting hunks with diff.suppressBlankEmpty

When "add -p" parses diffs, it looks for context lines starting with a
single space. But when diff.suppressBlankEmpty is in effect, an empty
context line will omit the space, giving us a true empty line. This
confuses the parser, which is unable to split based on such a line.

It's tempting to say that we should just make sure that we generate a
diff without that option. But we may parse diffs not only generated by
Git, but ones that users have manually edited. And POSIX calls the
decision of whether to print the space here "implementation-defined".

So let's handle both cases: a context line either starts with a space or
consists of a totally empty line.

Reported-by: Ilya Tumaykin <itumaykin@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
I'm a little worried that this creates ambiguities, since I don't think
we are careful about following the hunk header's line counts. Imagine
you had an input like this:

  @@ -1,2 +1,2 @@
  -old
  +new
   stuff

  some garbage

We obviously know that "some garbage" is not a context line and is just
trailing junk, because it does not begin with "-", "+" or space. But
what about the blank line in between? It looks like an empty context
line, but we can only know that it is not by respecting the counts in
the hunk header.

I don't think we'd ever generate this ourselves, but could somebody
manually edit a hunk into this shape? When I tried it in practice, it
looks like we fail to apply the result even before my patch, though. I'm
not sure why that is. If I put "some garbage" without the blank line, we
correctly realize it should be discarded. It's possible I'm just holding
it wrong.

 add-patch.c                |  8 ++++----
 t/t3701-add-interactive.sh | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/add-patch.c b/add-patch.c
index 6e176cd21a..7beead1d0a 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -588,7 +588,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 			    (int)(eol - (plain->buf + file_diff->head.start)),
 			    plain->buf + file_diff->head.start);
 
-		if ((marker == '-' || marker == '+') && *p == ' ')
+		if ((marker == '-' || marker == '+') && (*p == ' ' || *p == '\n'))
 			hunk->splittable_into++;
 		if (marker && *p != '\\')
 			marker = *p;
@@ -964,7 +964,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
 		 * Is this the first context line after a chain of +/- lines?
 		 * Then record the start of the next split hunk.
 		 */
-		if ((marker == '-' || marker == '+') && ch == ' ') {
+		if ((marker == '-' || marker == '+') && (ch == ' ' || ch == '\n')) {
 			first = 0;
 			hunk[1].start = current;
 			if (colored)
@@ -979,14 +979,14 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
 		 * Then just increment the appropriate counter and continue
 		 * with the next line.
 		 */
-		if (marker != ' ' || (ch != '-' && ch != '+')) {
+		if ((marker != ' ' && marker != '\n') || (ch != '-' && ch != '+')) {
 next_hunk_line:
 			/* Comment lines are attached to the previous line */
 			if (ch == '\\')
 				ch = marker ? marker : ' ';
 
 			/* current hunk not done yet */
-			if (ch == ' ')
+			if (ch == ' ' || ch == '\n')
 				context_line_count++;
 			else if (ch == '-')
 				header->old_count++;
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 5d78868ac1..92c8e6dc8c 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -1164,4 +1164,36 @@ test_expect_success 'reset -p with unmerged files' '
 	test_must_be_empty staged
 '
 
+test_expect_success 'splitting handles diff.suppressBlankEmpty' '
+	test_when_finished "git reset --hard" &&
+	cat >file <<-\EOF &&
+	1
+	2
+
+	3
+	4
+	EOF
+	git add file &&
+
+	cat >file <<-\EOF &&
+	one
+	two
+
+	three
+	four
+	EOF
+	test_write_lines s n y |
+	git -c diff.suppressBlankEmpty=true add -p &&
+
+	git cat-file blob :file >actual &&
+	cat >expect <<-\EOF &&
+	1
+	2
+
+	three
+	four
+	EOF
+	test_cmp expect actual
+'
+
 test_done
-- 
2.45.2.1249.gb036353db5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
  2024-07-10  9:36     ` [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty Jeff King
@ 2024-07-10 13:46       ` Phillip Wood
  2024-07-11 21:26         ` Jeff King
  2024-07-10 17:06       ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Phillip Wood @ 2024-07-10 13:46 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Ilya Tumaykin, git, Johannes Schindelin

Hi Peff

On 10/07/2024 10:36, Jeff King wrote:
> Subject: add-patch: handle splitting hunks with diff.suppressBlankEmpty

> When "add -p" parses diffs, it looks for context lines starting with a
> single space. But when diff.suppressBlankEmpty is in effect, an empty
> context line will omit the space, giving us a true empty line. This
> confuses the parser, which is unable to split based on such a line.

> It's tempting to say that we should just make sure that we generate a
> diff without that option. But we may parse diffs not only generated by
> Git, but ones that users have manually edited. And POSIX calls the
> decision of whether to print the space here "implementation-defined".

Do we ever parse an edited hunk with this code? I'm not sure there is a
way of splitting a hunk after it has been edited as I don't think we
ever display it again.

> So let's handle both cases: a context line either starts with a space or
> consists of a totally empty line.
> 
> Reported-by: Ilya Tumaykin <itumaykin@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I'm a little worried that this creates ambiguities, since I don't think
> we are careful about following the hunk header's line counts. Imagine
> you had an input like this:
> 
>    @@ -1,2 +1,2 @@> 
>    -old
>    +new
>     stuff
> 
>    some garbage
> 
> We obviously know that "some garbage" is not a context line and is just
> trailing junk, because it does not begin with "-", "+" or space. But
> what about the blank line in between? It looks like an empty context
> line, but we can only know that it is not by respecting the counts in
> the hunk header.
> 
> I don't think we'd ever generate this ourselves, but could somebody
> manually edit a hunk into this shape? When I tried it in practice, it
> looks like we fail to apply the result even before my patch, though. I'm> not sure why that is. If I put "some garbage" without the blank line, we
> correctly realize it should be discarded. It's possible I'm just holding
> it wrong.

When we recount the hunk after it has been edited we ignore lines that
don't begin with '+', '-', ' ', or '\n' so if you add some garbage at
the end of the hunk the recounted hunk header excludes it when it gets
applied.

I think your patch looks good. I did wonder if we wanted to fix this
by normalizing context lines instead as shown in the diff below. That
might make it less likely to miss adding "|| '\n'" in future code that
is looking for a context line but I don't have a strong preference
either way.

Best Wishes

Phillip

---- >8 ----
diff --git a/add-patch.c b/add-patch.c
index d8ea05ff108..795aa772b7a 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -400,6 +400,12 @@ static void complete_file(char marker, struct hunk *hunk)
  		hunk->splittable_into++;
  }
  
+/* Empty context lines may omit the leading ' ' */
+static int normalize_marker(char marker)
+{
+	return marker == '\n' ? ' ' : marker;
+}
+
  static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
  {
  	struct strvec args = STRVEC_INIT;
@@ -485,6 +491,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
  	while (p != pend) {
  		char *eol = memchr(p, '\n', pend - p);
  		const char *deleted = NULL, *mode_change = NULL;
+		char ch = normalize_marker(*p);
  
  		if (!eol)
  			eol = pend;
@@ -532,7 +539,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
  			 * Start counting into how many hunks this one can be
  			 * split
  			 */
-			marker = *p;
+			marker = ch;
  		} else if (hunk == &file_diff->head &&
  			   starts_with(p, "new file")) {
  			file_diff->added = 1;
@@ -586,10 +593,10 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
  			    (int)(eol - (plain->buf + file_diff->head.start)),
  			    plain->buf + file_diff->head.start);
  
-		if ((marker == '-' || marker == '+') && *p == ' ')
+		if ((marker == '-' || marker == '+') && ch == ' ')
  			hunk->splittable_into++;
-		if (marker && *p != '\\')
-			marker = *p;
+		if (marker && ch != '\\')
+			marker = ch;
  
  		p = eol == pend ? pend : eol + 1;
  		hunk->end = p - plain->buf;
@@ -953,7 +960,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
  	context_line_count = 0;
  
  	while (splittable_into > 1) {
-		ch = s->plain.buf[current];
+		ch = normalize_marker(s->plain.buf[current]);
  
  		if (!ch)
  			BUG("buffer overrun while splitting hunks");
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 5d78868ac16..385c246baf0 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -1164,4 +1164,31 @@ test_expect_success 'reset -p with unmerged files' '
  	test_must_be_empty staged
  '
  
+test_expect_success 'hunk splitting works with diff.suppressBlankEmpty' '
+	cat >expect <<-\EOF &&
+	diff --git a/file b/file
+	index 777b174..ebc9684 100755
+	--- a/file
+	+++ b/file
+	@@ -2,6 +2,6 @@ p
+	 q
+	 r
+
+	-d
+	-e
+	-f
+	+s
+	+t
+	+u
+	EOF
+
+	test_config diff.suppressBlankEmpty true &&
+	test_write_lines a b c "" d e f >file &&
+	git add file &&
+	test_write_lines p q r "" s t u >file &&
+	test_write_lines s y n q | git add -p &&
+	git diff >actual &&
+	diff_cmp expect actual
+'
+
  test_done

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
  2024-07-10  9:36     ` [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty Jeff King
  2024-07-10 13:46       ` Phillip Wood
@ 2024-07-10 17:06       ` Junio C Hamano
  2024-07-11 21:32         ` Jeff King
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2024-07-10 17:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Phillip Wood, Ilya Tumaykin, git, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> I'm a little worried that this creates ambiguities, since I don't think
> we are careful about following the hunk header's line counts. Imagine
> you had an input like this:
>
>   @@ -1,2 +1,2 @@
>   -old
>   +new
>    stuff
>
>   some garbage

True.  Especially if you allow editing of hunks, the only thing you
could make available to you are the version you gave the user and
the version you got back from the user, but comparing them would not
help resolve the ambiguity to correct the hunk header all that much.
"diff" edit mode various editors may have _could_ help as they know
each and every keystroke by the user and how the modified patch was
constructed to guess the intention better than a mere comparison of
before- and after- shape of the patch (but the last time I checked,
diff edit mode of GNU Emacs did not adjust the hunk header correctly
in some corner cases).

> I don't think we'd ever generate this ourselves, but could somebody
> manually edit a hunk into this shape? When I tried it in practice, it
> looks like we fail to apply the result even before my patch, though. I'm
> not sure why that is. If I put "some garbage" without the blank line, we
> correctly realize it should be discarded. It's possible I'm just holding
> it wrong.

I've given up on the "hunk edit" doing wrong things to a patch
already.  

The "edit" codepath used to be a lot less careless before Phillip
whipped it into a much better shape with the series that ended at
436d18f2 (Merge branch 'pw/add-p-recount', 2018-03-14), that
introduced recount_edited_hunk() that special cases "+/-/ ".  It
already knows that a completely empty line in a patch is an empty
and unmodified line (which was ported from f4d35a6b (add -p: fix
counting empty context lines in edited patches, 2018-06-11)), so
this patch does not have to do anything new there.

But "recounting" will be fooled by garbage left in the edited
result, so I think we have done all we can do to resolve possible
ambiguities.  The patch under discussion is not adding anything new
and making it any worse, I would say.

> diff --git a/add-patch.c b/add-patch.c
> index 6e176cd21a..7beead1d0a 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -588,7 +588,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>  			    (int)(eol - (plain->buf + file_diff->head.start)),
>  			    plain->buf + file_diff->head.start);
>  
> -		if ((marker == '-' || marker == '+') && *p == ' ')
> +		if ((marker == '-' || marker == '+') && (*p == ' ' || *p == '\n'))
>  			hunk->splittable_into++;
>  		if (marker && *p != '\\')
>  			marker = *p;
> @@ -964,7 +964,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
>  		 * Is this the first context line after a chain of +/- lines?
>  		 * Then record the start of the next split hunk.
>  		 */
> -		if ((marker == '-' || marker == '+') && ch == ' ') {
> +		if ((marker == '-' || marker == '+') && (ch == ' ' || ch == '\n')) {
>  			first = 0;
>  			hunk[1].start = current;
>  			if (colored)
> @@ -979,14 +979,14 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
>  		 * Then just increment the appropriate counter and continue
>  		 * with the next line.
>  		 */
> -		if (marker != ' ' || (ch != '-' && ch != '+')) {
> +		if ((marker != ' ' && marker != '\n') || (ch != '-' && ch != '+')) {
>  next_hunk_line:
>  			/* Comment lines are attached to the previous line */
>  			if (ch == '\\')
>  				ch = marker ? marker : ' ';
>  
>  			/* current hunk not done yet */
> -			if (ch == ' ')
> +			if (ch == ' ' || ch == '\n')
>  				context_line_count++;
>  			else if (ch == '-')
>  				header->old_count++;
> diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
> index 5d78868ac1..92c8e6dc8c 100755
> --- a/t/t3701-add-interactive.sh
> +++ b/t/t3701-add-interactive.sh
> @@ -1164,4 +1164,36 @@ test_expect_success 'reset -p with unmerged files' '
>  	test_must_be_empty staged
>  '
>  
> +test_expect_success 'splitting handles diff.suppressBlankEmpty' '
> +	test_when_finished "git reset --hard" &&
> +	cat >file <<-\EOF &&
> +	1
> +	2
> +
> +	3
> +	4
> +	EOF
> +	git add file &&
> +
> +	cat >file <<-\EOF &&
> +	one
> +	two
> +
> +	three
> +	four
> +	EOF
> +	test_write_lines s n y |
> +	git -c diff.suppressBlankEmpty=true add -p &&
> +
> +	git cat-file blob :file >actual &&
> +	cat >expect <<-\EOF &&
> +	1
> +	2
> +
> +	three
> +	four
> +	EOF
> +	test_cmp expect actual
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
  2024-07-10 13:46       ` Phillip Wood
@ 2024-07-11 21:26         ` Jeff King
  2024-07-13 13:19           ` phillip.wood123
  2024-07-18 14:23           ` Phillip Wood
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2024-07-11 21:26 UTC (permalink / raw)
  To: phillip.wood; +Cc: Junio C Hamano, Ilya Tumaykin, git, Johannes Schindelin

On Wed, Jul 10, 2024 at 02:46:30PM +0100, Phillip Wood wrote:

> > It's tempting to say that we should just make sure that we generate a
> > diff without that option. But we may parse diffs not only generated by
> > Git, but ones that users have manually edited. And POSIX calls the
> > decision of whether to print the space here "implementation-defined".
> 
> Do we ever parse an edited hunk with this code? I'm not sure there is a
> way of splitting a hunk after it has been edited as I don't think we
> ever display it again.

Hmm, I just assumed this code would see the edited hunk, but now I'm not
sure. Note that the changes required do go outside of split_hunk(); the
initial parse_diff() needs to decide whether the hunk is splittable in
the first place (as an aside, that puzzled me at first why just changing
split_hunk() was enough for the case that started this thread, but not
the one in the included test. The difference is that without the empty
line as context, the hunk in the test wouldn't be splittable at all).

But looking closer: yes, I do think parse_diff() is used only for the
initial patch. So we really would only see git-generated patches here.
Which I think takes away my ambiguity concern, but does mean the commit
message is wrong.

> > I don't think we'd ever generate this ourselves, but could somebody
> > manually edit a hunk into this shape? When I tried it in practice, it
> > looks like we fail to apply the result even before my patch, though. I'm
> > not sure why that is. If I put "some garbage" without the blank line, we
> > correctly realize it should be discarded. It's possible I'm just holding
> > it wrong.
> 
> When we recount the hunk after it has been edited we ignore lines that
> don't begin with '+', '-', ' ', or '\n' so if you add some garbage at
> the end of the hunk the recounted hunk header excludes it when it gets
> applied.

OK, that makes sense. And we could never rely on the hunk header in the
edited hunk anyway, since the whole point is that we have to recount it.
So the user must accept that an extra blank line is potential context
(and that goes all the way back to bcdd297b78 (built-in add -p:
implement hunk editing, 2019-12-13).

> I think your patch looks good. I did wonder if we wanted to fix this
> by normalizing context lines instead as shown in the diff below. That
> might make it less likely to miss adding "|| '\n'" in future code that
> is looking for a context line but I don't have a strong preference
> either way.

Yeah, I had a similar thought, but it got messy because we have to deal
with the source buffer. But the extra "char ch" you added in the patch
below fixes that. I think the result is better.

Looking at the blank-line handling in recount_edited_hunk(), we also
handle a CRLF empty line there. Should we do so here, too? If so, then
it would just be a matter of touching normalize_marker() in your patch.

Do you want to just re-send your patch with a commit message to replace
mine? (Feel free to steal the non-wrong parts of my message ;) ).

> ---- >8 ----
> diff --git a/add-patch.c b/add-patch.c
> index d8ea05ff108..795aa772b7a 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -400,6 +400,12 @@ static void complete_file(char marker, struct hunk *hunk)
>  		hunk->splittable_into++;
>  }
> +/* Empty context lines may omit the leading ' ' */
> +static int normalize_marker(char marker)
> +{
> +	return marker == '\n' ? ' ' : marker;
> +}
> +
>  static int parse_diff(struct add_p_state *s, const struct pathspec *ps)

Minor nit: missing blank line between functions.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
  2024-07-10 17:06       ` Junio C Hamano
@ 2024-07-11 21:32         ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2024-07-11 21:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, Ilya Tumaykin, git, Johannes Schindelin

On Wed, Jul 10, 2024 at 10:06:28AM -0700, Junio C Hamano wrote:

> > I don't think we'd ever generate this ourselves, but could somebody
> > manually edit a hunk into this shape? When I tried it in practice, it
> > looks like we fail to apply the result even before my patch, though. I'm
> > not sure why that is. If I put "some garbage" without the blank line, we
> > correctly realize it should be discarded. It's possible I'm just holding
> > it wrong.
> 
> I've given up on the "hunk edit" doing wrong things to a patch
> already.  
> 
> The "edit" codepath used to be a lot less careless before Phillip
> whipped it into a much better shape with the series that ended at
> 436d18f2 (Merge branch 'pw/add-p-recount', 2018-03-14), that
> introduced recount_edited_hunk() that special cases "+/-/ ".  It
> already knows that a completely empty line in a patch is an empty
> and unmodified line (which was ported from f4d35a6b (add -p: fix
> counting empty context lines in edited patches, 2018-06-11)), so
> this patch does not have to do anything new there.

Yeah, I was being paranoid without actually thinking through the
implications. As Phillip noted, we do not even run the code I changed on
the edited hunk (which already does handle the empty line). So not only
could I not break it, but it is already doing the right thing thanks to
that earlier work.

> But "recounting" will be fooled by garbage left in the edited
> result, so I think we have done all we can do to resolve possible
> ambiguities.  The patch under discussion is not adding anything new
> and making it any worse, I would say.

Yep, agreed. So modulo a slight inaccuracy in the commit message, I
think my patch is OK. That said, I like what Phillip showed in his
reply. If he wants to wrap that up into a patch, I think it could
replace mine.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
  2024-07-11 21:26         ` Jeff King
@ 2024-07-13 13:19           ` phillip.wood123
  2024-07-13 21:21             ` Jeff King
  2024-07-18 14:23           ` Phillip Wood
  1 sibling, 1 reply; 12+ messages in thread
From: phillip.wood123 @ 2024-07-13 13:19 UTC (permalink / raw)
  To: Jeff King, phillip.wood
  Cc: Junio C Hamano, Ilya Tumaykin, git, Johannes Schindelin

Hi Peff

On 11/07/2024 22:26, Jeff King wrote:
> On Wed, Jul 10, 2024 at 02:46:30PM +0100, Phillip Wood wrote:
 >
>> I think your patch looks good. I did wonder if we wanted to fix this
>> by normalizing context lines instead as shown in the diff below. That
>> might make it less likely to miss adding "|| '\n'" in future code that
>> is looking for a context line but I don't have a strong preference
>> either way.
> 
> Yeah, I had a similar thought, but it got messy because we have to deal
> with the source buffer. But the extra "char ch" you added in the patch
> below fixes that. I think the result is better.
> 
> Looking at the blank-line handling in recount_edited_hunk(), we also
> handle a CRLF empty line there. Should we do so here, too? If so, then
> it would just be a matter of touching normalize_marker() in your patch.
> 
> Do you want to just re-send your patch with a commit message to replace
> mine? (Feel free to steal the non-wrong parts of my message ;) ).

Thanks, I'll do that

Best Wishes

Phillip

>> ---- >8 ----
>> diff --git a/add-patch.c b/add-patch.c
>> index d8ea05ff108..795aa772b7a 100644
>> --- a/add-patch.c
>> +++ b/add-patch.c
>> @@ -400,6 +400,12 @@ static void complete_file(char marker, struct hunk *hunk)
>>   		hunk->splittable_into++;
>>   }
>> +/* Empty context lines may omit the leading ' ' */
>> +static int normalize_marker(char marker)
>> +{
>> +	return marker == '\n' ? ' ' : marker;
>> +}
>> +
>>   static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
> 
> Minor nit: missing blank line between functions.
> 
> -Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
  2024-07-13 13:19           ` phillip.wood123
@ 2024-07-13 21:21             ` Jeff King
  2024-07-18 14:22               ` Phillip Wood
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2024-07-13 21:21 UTC (permalink / raw)
  To: phillip.wood; +Cc: Junio C Hamano, Ilya Tumaykin, git, Johannes Schindelin

On Sat, Jul 13, 2024 at 02:19:39PM +0100, phillip.wood123@gmail.com wrote:

> > Do you want to just re-send your patch with a commit message to replace
> > mine? (Feel free to steal the non-wrong parts of my message ;) ).
> 
> Thanks, I'll do that

Thanks. Note that mine is in 'next', but I think it would not be a big
deal to revert and replace (I'm not sure it is even on track for 2.46
anyway).

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
  2024-07-13 21:21             ` Jeff King
@ 2024-07-18 14:22               ` Phillip Wood
  0 siblings, 0 replies; 12+ messages in thread
From: Phillip Wood @ 2024-07-18 14:22 UTC (permalink / raw)
  To: Jeff King, phillip.wood
  Cc: Junio C Hamano, Ilya Tumaykin, git, Johannes Schindelin

On 13/07/2024 22:21, Jeff King wrote:
> On Sat, Jul 13, 2024 at 02:19:39PM +0100, phillip.wood123@gmail.com wrote:
> 
>>> Do you want to just re-send your patch with a commit message to replace
>>> mine? (Feel free to steal the non-wrong parts of my message ;) ).
>>
>> Thanks, I'll do that
> 
> Thanks. Note that mine is in 'next', but I think it would not be a big
> deal to revert and replace (I'm not sure it is even on track for 2.46
> anyway).

Thanks for the heads-up, I see that Junio has reverted it in the latest 
what's cooking email

Phillip

> -Peff
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty
  2024-07-11 21:26         ` Jeff King
  2024-07-13 13:19           ` phillip.wood123
@ 2024-07-18 14:23           ` Phillip Wood
  1 sibling, 0 replies; 12+ messages in thread
From: Phillip Wood @ 2024-07-18 14:23 UTC (permalink / raw)
  To: Jeff King, phillip.wood
  Cc: Junio C Hamano, Ilya Tumaykin, git, Johannes Schindelin

On 11/07/2024 22:26, Jeff King wrote:
> On Wed, Jul 10, 2024 at 02:46:30PM +0100, Phillip Wood wrote:
> 
>>> It's tempting to say that we should just make sure that we generate a
>>> diff without that option. But we may parse diffs not only generated by
>>> Git, but ones that users have manually edited. And POSIX calls the
>>> decision of whether to print the space here "implementation-defined".
>>
>> Do we ever parse an edited hunk with this code? I'm not sure there is a
>> way of splitting a hunk after it has been edited as I don't think we
>> ever display it again.
> 
> Hmm, I just assumed this code would see the edited hunk, but now I'm not
> sure. Note that the changes required do go outside of split_hunk(); the
> initial parse_diff() needs to decide whether the hunk is splittable in
> the first place (as an aside, that puzzled me at first why just changing
> split_hunk() was enough for the case that started this thread, but not
> the one in the included test. The difference is that without the empty
> line as context, the hunk in the test wouldn't be splittable at all).
> 
> But looking closer: yes, I do think parse_diff() is used only for the
> initial patch.

That's true but I've realized that we do in fact allow the user to 
re-display and split an edited hunk. If there are two changes in a file 
then you can edit the first hunk and when prompted about the second 
press 'K' to go back to the first one and then split it with 's' (I 
messed up my test for this before sending my previous mail as I changed 
two separate files rather than putting two changes in a single file). So 
split_hunk() can encounter empty context lines even without 
diff.suppressBlankEmpty being set as lots of editors strip the leading 
space when the rest of the line is empty[1].

As we haven't had any bug reports about that I suspect people are not 
splitting the hunks they edited which I guess makes sense. I think there 
is another bug lurking for anyone who does try to split an edited hunk 
as we don't update `hunk->splittable_into` after it has been edited and 
the edit might change a deleted of an empty line to a context line or 
vice versa. We should make sure any garbage at the end of the hunk is 
discarded as well so we don't show it when we display the edited hunk.

Best Wishes

Phillip

[1] When I added the code to recount the hunk header rather than relying 
on "git apply --recount" initially it did not support empty context 
lines and we received quite a few bug reports pretty quickly after it 
was released.

> So we really would only see git-generated patches here.
> Which I think takes away my ambiguity concern, but does mean the commit
> message is wrong.
> 
>>> I don't think we'd ever generate this ourselves, but could somebody
>>> manually edit a hunk into this shape? When I tried it in practice, it
>>> looks like we fail to apply the result even before my patch, though. I'm
>>> not sure why that is. If I put "some garbage" without the blank line, we
>>> correctly realize it should be discarded. It's possible I'm just holding
>>> it wrong.
>>
>> When we recount the hunk after it has been edited we ignore lines that
>> don't begin with '+', '-', ' ', or '\n' so if you add some garbage at
>> the end of the hunk the recounted hunk header excludes it when it gets
>> applied.
> 
> OK, that makes sense. And we could never rely on the hunk header in the
> edited hunk anyway, since the whole point is that we have to recount it.
> So the user must accept that an extra blank line is potential context
> (and that goes all the way back to bcdd297b78 (built-in add -p:
> implement hunk editing, 2019-12-13).
> 
>> I think your patch looks good. I did wonder if we wanted to fix this
>> by normalizing context lines instead as shown in the diff below. That
>> might make it less likely to miss adding "|| '\n'" in future code that
>> is looking for a context line but I don't have a strong preference
>> either way.
> 
> Yeah, I had a similar thought, but it got messy because we have to deal
> with the source buffer. But the extra "char ch" you added in the patch
> below fixes that. I think the result is better.
> 
> Looking at the blank-line handling in recount_edited_hunk(), we also
> handle a CRLF empty line there. Should we do so here, too? If so, then
> it would just be a matter of touching normalize_marker() in your patch.
> 
> Do you want to just re-send your patch with a commit message to replace
> mine? (Feel free to steal the non-wrong parts of my message ;) ).
> 
>> ---- >8 ----
>> diff --git a/add-patch.c b/add-patch.c
>> index d8ea05ff108..795aa772b7a 100644
>> --- a/add-patch.c
>> +++ b/add-patch.c
>> @@ -400,6 +400,12 @@ static void complete_file(char marker, struct hunk *hunk)
>>   		hunk->splittable_into++;
>>   }
>> +/* Empty context lines may omit the leading ' ' */
>> +static int normalize_marker(char marker)
>> +{
>> +	return marker == '\n' ? ' ' : marker;
>> +}
>> +
>>   static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
> 
> Minor nit: missing blank line between functions.
> 
> -Peff
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-07-18 14:23 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-03 14:41 git crashes in `git commit --patch` with diff.suppressBlankEmpty = true Ilya Tumaykin
2024-07-04 13:14 ` Phillip Wood
2024-07-05 16:39   ` Junio C Hamano
2024-07-10  9:36     ` [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty Jeff King
2024-07-10 13:46       ` Phillip Wood
2024-07-11 21:26         ` Jeff King
2024-07-13 13:19           ` phillip.wood123
2024-07-13 21:21             ` Jeff King
2024-07-18 14:22               ` Phillip Wood
2024-07-18 14:23           ` Phillip Wood
2024-07-10 17:06       ` Junio C Hamano
2024-07-11 21:32         ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).