git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Inconsistent handling of corrupt patches based on line endings
@ 2024-10-28 16:57 Peregi Tamás
  2024-10-28 18:11 ` Taylor Blau
  2024-10-29 16:32 ` Torsten Bögershausen
  0 siblings, 2 replies; 6+ messages in thread
From: Peregi Tamás @ 2024-10-28 16:57 UTC (permalink / raw)
  To: git


[-- Attachment #1.1: Type: text/plain, Size: 3508 bytes --]

Hi all,

I might've found an inconsistency in how git-apply treats corrupt patches
(representing empty context lines with completely empty lines instead of
lines containing a single space - usually a result of a "trim trailing
whitespace" editor setting) based on whether the patch file uses
Windows-style line endings (CRLF) or Unix-style line endings (LF only).

Regardless of whether I use LF or CRLF in original.txt, "git apply --check
--ignore-whitespace" will happily apply intact-win.patch (uncorrupted patch
file with CRLF), intact-unix.patch (uncorrupted patch file with LF) and
corrupt-unix.patch (corrupted patch file with LF), but reports "error:
corrupt patch at line 6" for corrupt-win.patch (corrupted patch file with
CRLF).

My intuition tells me git-apply should either fail for all corrupt patch
files (regardless of EOL) or it should apply the corrupt patch to work
around the issue caused by misconfigured text editors (regardless of EOL).

Am I missing something, or is this a bug/inconsistency in
git-apply's behaviour?

Thanks in advance: Tamás Peregi

**What did you do before the bug happened? (Steps to reproduce your issue)**

1. Rename either original-win.txt or original-unix.txt to original.txt
2. Run the following commands:
2.1. git apply --check --ignore-whitespace intact-unix.patch
2.2. git apply --check --ignore-whitespace intact-win.patch
2.3. git apply --check --ignore-whitespace corrupt-unix.patch
2.4. git apply --check --ignore-whitespace corrupt-win.patch

**What did you expect to happen? (Expected behavior)**

I'd expect one of the following:
- All 4 commands succeed.
- 2.1 and 2.2 succeeds, 2.3 and 2.4 both fail with "error: corrupt patch at
line 6"

In other words: corrupted patches should either be accepted or not,
regardless of what EOL markers are used in the patch file.

**What happened instead? (Actual behavior)**

Only 2.4, i.e. applying corrupt-win.patch has failed.

**What's different between what you expected and what actually happened?**

2.3 and 2.4 should have the same result.

**Anything else you want to add:**

While the example seems contrived, it has been encountered in the wild when
I was copying a port from https://github.com/microsoft/vcpkg (set up with
autocrlf=false and all files treated as binaries in .gitattributes) to my
company's repository (set up with autocrlf=true and patch files treated as
text). The patches and application worked in Microsoft's repo, then worked
on my machine (Windows) in our repor, but failed in CI on Windows, and
started failing on my machine after switching to a different branch and
switching back due to "git checkout" converting the newline characters.

I think these kind of corrupt patch files should fail to apply
consistently, but being able to apply them is also acceptable. However, the
current inconsistent behaviour (apply if patch is LF, but fail if patch is
CRLF) is confusing, and causes "works on my machine" issues.


[System Info]
git version:
git version 2.46.0.windows.1
cpu: x86_64
built from commit: 2e6a859ffc0471f60f79c1256f766042b0d5d17d
sizeof-long: 4
sizeof-size_t: 8
shell-path: D:/git-sdk-64-build-installers/usr/bin/sh
feature: fsmonitor--daemon
libcurl: 8.9.0
OpenSSL: OpenSSL 3.2.2 4 Jun 2024
zlib: 1.3.1
uname: Windows 10.0 19045
compiler info: gnuc: 14.1
libc info: no libc information available
$SHELL (typically, interactive shell): C:\Program Files\Git\usr\bin\bash.exe

[-- Attachment #1.2: Type: text/html, Size: 3821 bytes --]

[-- Attachment #2: original-unix.txt --]
[-- Type: text/plain, Size: 14 bytes --]

a

b
c
d
e

f

[-- Attachment #3: intact-win.patch --]
[-- Type: application/octet-stream, Size: 165 bytes --]

diff --git a/original.txt b/original.txt
index e7582b7..940b5a6 100644
--- a/original.txt
+++ b/original.txt
@@ -2,6 +2,7 @@ a
 
 b
 c
+NEW LINE
 d
 e
 

[-- Attachment #4: intact-unix.patch --]
[-- Type: application/octet-stream, Size: 153 bytes --]

diff --git a/original.txt b/original.txt
index e7582b7..940b5a6 100644
--- a/original.txt
+++ b/original.txt
@@ -2,6 +2,7 @@ a
 
 b
 c
+NEW LINE
 d
 e
 

[-- Attachment #5: corrupt-unix.patch --]
[-- Type: application/octet-stream, Size: 151 bytes --]

diff --git a/original.txt b/original.txt
index e7582b7..940b5a6 100644
--- a/original.txt
+++ b/original.txt
@@ -2,6 +2,7 @@ a

 b
 c
+NEW LINE
 d
 e


[-- Attachment #6: corrupt-win.patch --]
[-- Type: application/octet-stream, Size: 163 bytes --]

diff --git a/original.txt b/original.txt
index e7582b7..940b5a6 100644
--- a/original.txt
+++ b/original.txt
@@ -2,6 +2,7 @@ a

 b
 c
+NEW LINE
 d
 e


[-- Attachment #7: original-win.txt --]
[-- Type: text/plain, Size: 22 bytes --]

a

b
c
d
e

f

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Inconsistent handling of corrupt patches based on line endings
  2024-10-28 16:57 Inconsistent handling of corrupt patches based on line endings Peregi Tamás
@ 2024-10-28 18:11 ` Taylor Blau
  2024-10-28 18:39   ` Elijah Newren
  2024-10-29 16:32 ` Torsten Bögershausen
  1 sibling, 1 reply; 6+ messages in thread
From: Taylor Blau @ 2024-10-28 18:11 UTC (permalink / raw)
  To: Peregi Tamás
  Cc: git, Elijah Newren, Patrick Steinhardt, Jeff King,
	René Scharfe

On Mon, Oct 28, 2024 at 05:57:54PM +0100, Peregi Tamás wrote:
> Hi all,
>
> I might've found an inconsistency in how git-apply treats corrupt patches
> (representing empty context lines with completely empty lines instead of
> lines containing a single space - usually a result of a "trim trailing
> whitespace" editor setting) based on whether the patch file uses
> Windows-style line endings (CRLF) or Unix-style line endings (LF only).

Let's see if any recent apply.c folks have thoughts...:

$ git shortlog -nse --no-merges --since=3.years.ago.. -- apply.c
    21  Elijah Newren <newren@gmail.com>
    12  Patrick Steinhardt <ps@pks.im>
     7  Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     5  Junio C Hamano <gitster@pobox.com>
     4  Jeff King <peff@peff.net>
     4  René Scharfe <l.s.r@web.de>

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inconsistent handling of corrupt patches based on line endings
  2024-10-28 18:11 ` Taylor Blau
@ 2024-10-28 18:39   ` Elijah Newren
  2024-10-28 22:07     ` René Scharfe
  0 siblings, 1 reply; 6+ messages in thread
From: Elijah Newren @ 2024-10-28 18:39 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Peregi Tamás, git, Patrick Steinhardt, Jeff King,
	René Scharfe

On Mon, Oct 28, 2024 at 11:11 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Mon, Oct 28, 2024 at 05:57:54PM +0100, Peregi Tamás wrote:
> > Hi all,
> >
> > I might've found an inconsistency in how git-apply treats corrupt patches
> > (representing empty context lines with completely empty lines instead of
> > lines containing a single space - usually a result of a "trim trailing
> > whitespace" editor setting) based on whether the patch file uses
> > Windows-style line endings (CRLF) or Unix-style line endings (LF only).
>
> Let's see if any recent apply.c folks have thoughts...:
>
> $ git shortlog -nse --no-merges --since=3.years.ago.. -- apply.c
>     21  Elijah Newren <newren@gmail.com>
>     12  Patrick Steinhardt <ps@pks.im>
>      7  Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>      5  Junio C Hamano <gitster@pobox.com>
>      4  Jeff King <peff@peff.net>
>      4  René Scharfe <l.s.r@web.de>

...and thus we learn why no one wanted to clean up the header files in
git.git before me.  ;-)

I believe this behavior was caused by:

$ git log -1 b507b465f7831612b9d9fc643e3e5218b64e5bfa
commit b507b465f7831612b9d9fc643e3e5218b64e5bfa
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Oct 19 19:26:08 2006 -0700

    git-apply: prepare for upcoming GNU diff -u format change.

    The latest GNU diff from CVS emits an empty line to express
    an empty context line, instead of more traditional "single
    white space followed by a newline".  Do not get broken by it.

    Signed-off-by: Linus Torvalds <torvalds@osdl.org>
    Signed-off-by: Junio C Hamano <junkio@cox.net>

That code special-cased a line containing '\n' but not a line
containing only '\r\n'.

As to whether that's correct, personally I'd rather only special case
workaround important existing clients.  Back in 2006, working with GNU
diff was incredibly important, and I'd say is still important today.
I can see Peregi's comment that this make line ending slightly
inconsistent, but I feel like the blank-line handling is a workaround
for an existing client we want to interoperate with and absent a
similar important client with mis-behaving '\r\n'-only lines, I
wouldn't be interested in adding support for it.  But that's just my
off-the-cuff feeling and I don't feel strongly about it.  Further, all
but one of my contributions above were mere header changes, so if
others have other opinions, they should probably be weighted more
heavily than mine on this topic.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inconsistent handling of corrupt patches based on line endings
  2024-10-28 18:39   ` Elijah Newren
@ 2024-10-28 22:07     ` René Scharfe
  2024-10-28 22:59       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: René Scharfe @ 2024-10-28 22:07 UTC (permalink / raw)
  To: Elijah Newren, Taylor Blau
  Cc: Peregi Tamás, git, Patrick Steinhardt, Jeff King

Am 28.10.24 um 19:39 schrieb Elijah Newren:
> On Mon, Oct 28, 2024 at 11:11 AM Taylor Blau <me@ttaylorr.com> wrote:
>>
>> On Mon, Oct 28, 2024 at 05:57:54PM +0100, Peregi Tamás wrote:
>>> Hi all,
>>>
>>> I might've found an inconsistency in how git-apply treats corrupt patches
>>> (representing empty context lines with completely empty lines instead of
>>> lines containing a single space - usually a result of a "trim trailing
>>> whitespace" editor setting) based on whether the patch file uses
>>> Windows-style line endings (CRLF) or Unix-style line endings (LF only).
>
> I believe this behavior was caused by:
>
> $ git log -1 b507b465f7831612b9d9fc643e3e5218b64e5bfa
> commit b507b465f7831612b9d9fc643e3e5218b64e5bfa
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Thu Oct 19 19:26:08 2006 -0700
>
>     git-apply: prepare for upcoming GNU diff -u format change.
>
>     The latest GNU diff from CVS emits an empty line to express
>     an empty context line, instead of more traditional "single
>     white space followed by a newline".  Do not get broken by it.
>
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>     Signed-off-by: Junio C Hamano <junkio@cox.net>
>
> That code special-cased a line containing '\n' but not a line
> containing only '\r\n'.
>
> As to whether that's correct, personally I'd rather only special case
> workaround important existing clients.  Back in 2006, working with GNU
> diff was incredibly important, and I'd say is still important today.
> I can see Peregi's comment that this make line ending slightly
> inconsistent, but I feel like the blank-line handling is a workaround
> for an existing client we want to interoperate with and absent a
> similar important client with mis-behaving '\r\n'-only lines, I
> wouldn't be interested in adding support for it.  But that's just my
> off-the-cuff feeling and I don't feel strongly about it.  Further, all
> but one of my contributions above were mere header changes, so if
> others have other opinions, they should probably be weighted more
> heavily than mine on this topic.

What would the patch(1) do?

The first column is the exit code of the subsequent command (0: success,
1: one or more rejected lines, 2: failure):

0 patch                     -p1 --dry-run original-unix.txt <corrupt-unix.patch
0 patch                     -p1 --dry-run original-unix.txt <intact-unix.patch
0 patch                     -p1 --dry-run original-win.txt  <intact-win.patch
0 patch --ignore-whitespace -p1 --dry-run original-unix.txt <corrupt-unix.patch
0 patch --ignore-whitespace -p1 --dry-run original-unix.txt <intact-unix.patch
0 patch --ignore-whitespace -p1 --dry-run original-unix.txt <intact-win.patch
0 patch --ignore-whitespace -p1 --dry-run original-win.txt  <corrupt-unix.patch
0 patch --ignore-whitespace -p1 --dry-run original-win.txt  <intact-unix.patch
0 patch --ignore-whitespace -p1 --dry-run original-win.txt  <intact-win.patch
1 patch                     -p1 --dry-run original-unix.txt <intact-win.patch
1 patch                     -p1 --dry-run original-win.txt  <corrupt-unix.patch
1 patch                     -p1 --dry-run original-win.txt  <intact-unix.patch
2 patch                     -p1 --dry-run original-unix.txt <corrupt-win.patch
2 patch                     -p1 --dry-run original-win.txt  <corrupt-win.patch
2 patch --ignore-whitespace -p1 --dry-run original-unix.txt <corrupt-win.patch
2 patch --ignore-whitespace -p1 --dry-run original-win.txt  <corrupt-win.patch

So basically the same as git apply?


What does current GNU diff do?

diff -u                        <(printf 'a\n\n') <(printf 'b\n\n') | tail -1 | od -a
0000000   sp  nl
0000002
diff -u --suppress-blank-empty <(printf 'a\n\n') <(printf 'b\n\n') | tail -1 | od -a
0000000   nl
0000001
diff -u                        <(printf 'a\r\n\r\n') <(printf 'b\r\n\r\n') | tail -1 | od -a
0000000   sp  cr  nl
0000003
diff -u --suppress-blank-empty <(printf 'a\r\n\r\n') <(printf 'b\r\n\r\n') | tail -1 | od -a
0000000   sp  cr  nl
0000003

So it omits the space only if --suppress-blank-empty is given and the
blank line ends with \n, not with \r\n.


Anyway, I agree with Elijah: The targeted support for GNU diff's
eccentricity is fine, and we're in good company with that.  We could
remove it, since it doesn't seem to be the default (anymore?), but I
don't see much of a benefit.  We could add support for blank context
lines that end in CRLF if there's a notable source of that kind of
that deviation from the original format.

René


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inconsistent handling of corrupt patches based on line endings
  2024-10-28 22:07     ` René Scharfe
@ 2024-10-28 22:59       ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2024-10-28 22:59 UTC (permalink / raw)
  To: René Scharfe
  Cc: Elijah Newren, Taylor Blau, Peregi Tamás, git,
	Patrick Steinhardt, Jeff King

(On vacation, pardon GMail web client).

As POSIX.1 says
https://pubs.opengroup.org/onlinepubs/9799919799/utilities/diff.html#tag_20_34_10_07
"""It is implementation-defined whether an empty unaffected line is
written as an empty line or a line containing a single <space>
character.""", it might be unfair to call it "GNU's eccentricity".

But I found that the POSIX text is fairly clear that the OP's
CRLF-only line does not even qualify as an context line that is
totally empty.
So perhaps there is nothing to see here, and not just we and patch(1),
but GNU diff is doing the right thing?


2024年10月29日(火) 7:07 René Scharfe <l.s.r@web.de>:
>
> Am 28.10.24 um 19:39 schrieb Elijah Newren:
> > On Mon, Oct 28, 2024 at 11:11 AM Taylor Blau <me@ttaylorr.com> wrote:
> >>
> >> On Mon, Oct 28, 2024 at 05:57:54PM +0100, Peregi Tamás wrote:
> >>> Hi all,
> >>>
> >>> I might've found an inconsistency in how git-apply treats corrupt patches
> >>> (representing empty context lines with completely empty lines instead of
> >>> lines containing a single space - usually a result of a "trim trailing
> >>> whitespace" editor setting) based on whether the patch file uses
> >>> Windows-style line endings (CRLF) or Unix-style line endings (LF only).
> >
> > I believe this behavior was caused by:
> >
> > $ git log -1 b507b465f7831612b9d9fc643e3e5218b64e5bfa
> > commit b507b465f7831612b9d9fc643e3e5218b64e5bfa
> > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > Date:   Thu Oct 19 19:26:08 2006 -0700
> >
> >     git-apply: prepare for upcoming GNU diff -u format change.
> >
> >     The latest GNU diff from CVS emits an empty line to express
> >     an empty context line, instead of more traditional "single
> >     white space followed by a newline".  Do not get broken by it.
> >
> >     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> >     Signed-off-by: Junio C Hamano <junkio@cox.net>
> >
> > That code special-cased a line containing '\n' but not a line
> > containing only '\r\n'.
> >
> > As to whether that's correct, personally I'd rather only special case
> > workaround important existing clients.  Back in 2006, working with GNU
> > diff was incredibly important, and I'd say is still important today.
> > I can see Peregi's comment that this make line ending slightly
> > inconsistent, but I feel like the blank-line handling is a workaround
> > for an existing client we want to interoperate with and absent a
> > similar important client with mis-behaving '\r\n'-only lines, I
> > wouldn't be interested in adding support for it.  But that's just my
> > off-the-cuff feeling and I don't feel strongly about it.  Further, all
> > but one of my contributions above were mere header changes, so if
> > others have other opinions, they should probably be weighted more
> > heavily than mine on this topic.
>
> What would the patch(1) do?
>
> The first column is the exit code of the subsequent command (0: success,
> 1: one or more rejected lines, 2: failure):
>
> 0 patch                     -p1 --dry-run original-unix.txt <corrupt-unix.patch
> 0 patch                     -p1 --dry-run original-unix.txt <intact-unix.patch
> 0 patch                     -p1 --dry-run original-win.txt  <intact-win.patch
> 0 patch --ignore-whitespace -p1 --dry-run original-unix.txt <corrupt-unix.patch
> 0 patch --ignore-whitespace -p1 --dry-run original-unix.txt <intact-unix.patch
> 0 patch --ignore-whitespace -p1 --dry-run original-unix.txt <intact-win.patch
> 0 patch --ignore-whitespace -p1 --dry-run original-win.txt  <corrupt-unix.patch
> 0 patch --ignore-whitespace -p1 --dry-run original-win.txt  <intact-unix.patch
> 0 patch --ignore-whitespace -p1 --dry-run original-win.txt  <intact-win.patch
> 1 patch                     -p1 --dry-run original-unix.txt <intact-win.patch
> 1 patch                     -p1 --dry-run original-win.txt  <corrupt-unix.patch
> 1 patch                     -p1 --dry-run original-win.txt  <intact-unix.patch
> 2 patch                     -p1 --dry-run original-unix.txt <corrupt-win.patch
> 2 patch                     -p1 --dry-run original-win.txt  <corrupt-win.patch
> 2 patch --ignore-whitespace -p1 --dry-run original-unix.txt <corrupt-win.patch
> 2 patch --ignore-whitespace -p1 --dry-run original-win.txt  <corrupt-win.patch
>
> So basically the same as git apply?
>
>
> What does current GNU diff do?
>
> diff -u                        <(printf 'a\n\n') <(printf 'b\n\n') | tail -1 | od -a
> 0000000   sp  nl
> 0000002
> diff -u --suppress-blank-empty <(printf 'a\n\n') <(printf 'b\n\n') | tail -1 | od -a
> 0000000   nl
> 0000001
> diff -u                        <(printf 'a\r\n\r\n') <(printf 'b\r\n\r\n') | tail -1 | od -a
> 0000000   sp  cr  nl
> 0000003
> diff -u --suppress-blank-empty <(printf 'a\r\n\r\n') <(printf 'b\r\n\r\n') | tail -1 | od -a
> 0000000   sp  cr  nl
> 0000003
>
> So it omits the space only if --suppress-blank-empty is given and the
> blank line ends with \n, not with \r\n.
>
>
> Anyway, I agree with Elijah: The targeted support for GNU diff's
> eccentricity is fine, and we're in good company with that.  We could
> remove it, since it doesn't seem to be the default (anymore?), but I
> don't see much of a benefit.  We could add support for blank context
> lines that end in CRLF if there's a notable source of that kind of
> that deviation from the original format.
>
> René
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inconsistent handling of corrupt patches based on line endings
  2024-10-28 16:57 Inconsistent handling of corrupt patches based on line endings Peregi Tamás
  2024-10-28 18:11 ` Taylor Blau
@ 2024-10-29 16:32 ` Torsten Bögershausen
  1 sibling, 0 replies; 6+ messages in thread
From: Torsten Bögershausen @ 2024-10-29 16:32 UTC (permalink / raw)
  To: Peregi Tamás; +Cc: git

On Mon, Oct 28, 2024 at 05:57:54PM +0100, Peregi Tamás wrote:
> Hi all,
[]

Thanks for an extensive bug (?) report.
I can not answer, if this a bug or working as designed.

However, please allow a few words:
>
> While the example seems contrived, it has been encountered in the wild when
> I was copying a port from https://github.com/microsoft/vcpkg (set up with
> autocrlf=false and all files treated as binaries in .gitattributes) to my
[]

That very repo is not in a shape I would like to see it.
Basically, Git is designed to work with line endings using LF and nothing else.
You can configure Git to see CRLF in the working tree, if needed.
But all the diff machinery, and git-apply, is build on this assumption.

If I look into the repo, this very commit
===========================
    commit ebdb41039450383ec7f41bd5f589cc46b7fd6a59
    Author: ...
    Date:   Mon Feb 26 18:18:05 2018 -0800

        [everything] Use -text to ensure consistent files across machines.

        If you experience trouble, you can use the following to renormalize your local working directory:

        git add --renormalize .
        git reset .
        git checkout .
===========================
is probably made with good intentions.
However, `git add --renormalize .` will not do anything at all.
All files are marked as "-text", so there is nothing to normalize.
The commit message here is simply wrong.

Something like this would have made more sense:
===========================
* text=auto eol=crlf

# Declare files that will always have LF line endings on checkout.
*.patch text eol=lf
scripts/ci.baseline.txt text eol=lf

ports/** -linguist-detectable
===========================
(But I didn't check each and every file).

After changing the .gitattributes file, run
git add --renormalize .
git add .
warning: in the working copy of 'crlf', LF will be replaced by CRLF the next time Git touches it
git commit -m "all files with LF in the repo"

git ls-files | xargs rm
git checkout .gitattributes
git checkout .

After this, we can fine-tune:
git ls-files --eol | egrep -v "json|cmake|patch"
shows that we can add lines like
*.exe binary
and may be
scripts/vcpkg_completion* text eol=lf
to .gitattributes

Does this makes sense ?


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-10-29 16:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-28 16:57 Inconsistent handling of corrupt patches based on line endings Peregi Tamás
2024-10-28 18:11 ` Taylor Blau
2024-10-28 18:39   ` Elijah Newren
2024-10-28 22:07     ` René Scharfe
2024-10-28 22:59       ` Junio C Hamano
2024-10-29 16:32 ` Torsten Bögershausen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).