git-apply: warn/fail on *changed* end of line (eol) *only*?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* git-apply: warn/fail on *changed* end of line (eol) *only*?
@ 2016-12-25 23:49 Igor Djordjevic BugA
  2016-12-25 23:55 ` Igor Djordjevic BugA
  2016-12-26  3:25 ` Junio C Hamano
  0 siblings, 2 replies; 6+ messages in thread
From: Igor Djordjevic BugA @ 2016-12-25 23:49 UTC (permalink / raw)
  To: git

Hi to all,

This is a follow-up of a message posted at git-users Google group[1],
as the topic may actually be better suited for this mailing list
instead. If needed, some elaboration on the context can be found there,
as well as a detailed example describing the motive for the question
itself (or directly here[2], second half of the message).

In short -- git-apply warns on applying the patch with CRLF line endings
(new), considered whitespace errors, even when previous hunk version
(old) has/had that very same CRLF line endings, too, so nothing actually
changed in this regards. Even worse, it happily applies a patch with LF
line endings (new) without any warning/hint, even though previous (old)
line endings were CRLF, thus effectively (and silently) breaking the
(previous) line endings.

I understand that what should be considered "correct" line endings can
be set explicitly (and should be communicated on the project level in
the first place), but would it make sense to have an additional
git-apply --whitespace option (like "warn-changed" and "error-changed",
or something) to warn/fail on *changed* end of line *only*, without any
further knowledge needed?

So not to have Git need to assume or know which line endings should be
considered "correct", nor to think/bother too much with it, but just to
warn/fail on actual line ending *change* (in comparison between old and
new part/hunk of the patch).

This seems more intuitive to me, and much more cross-platform
collaboration friendly, at least as an option, as one wouldn`t need to
think much about "correct" project/file line endings (which would still
be advised, anyway), but would be promptly warned about possibly
breaking previous/existing line endings, maybe even with an option to
keep those previous ones, or force the new ones... yet as I`m still new
around, I accept that I might be missing some bigger picture here :)

Thanks, BugA

P.S. I`m discussing git-apply and end-of-line handling here in
particular as that`s what I`m currently concerned with, and for the sake
of simplicity, but might be that whole thing could theoretically be
broadened to other Git tools (like git-diff) and whitespace (error)
handling in general, too (warn/fail only if actually introduced in new
hunk, not previously being present in the old one).
--
[1] https://groups.google.com/d/msg/git-users/9Os_48yJdn0/j9S_W7h-EAAJ
[2] https://groups.google.com/d/msg/git-users/9Os_48yJdn0/5S9Ja1fEEAAJ

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-apply: warn/fail on *changed* end of line (eol) *only*?
  2016-12-25 23:49 git-apply: warn/fail on *changed* end of line (eol) *only*? Igor Djordjevic BugA
@ 2016-12-25 23:55 ` Igor Djordjevic BugA
  2016-12-26  3:25 ` Junio C Hamano
  1 sibling, 0 replies; 6+ messages in thread
From: Igor Djordjevic BugA @ 2016-12-25 23:55 UTC (permalink / raw)
  To: git

On 26/12/2016 00:49, Igor Djordjevic BugA wrote:
> This is a follow-up of a message posted at git-users Google group[1],
> as the topic may actually be better suited for this mailing list
> instead. If needed, some elaboration on the context can be found there,
> as well as a detailed example describing the motive for the question
> itself (or directly here[2], second half of the message).

For some reason reference links got excluded, so here they are again:

[1] https://groups.google.com/d/msg/git-users/9Os_48yJdn0/j9S_W7h-EAAJ
[2] https://groups.google.com/d/msg/git-users/9Os_48yJdn0/5S9Ja1fEEAAJ

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-apply: warn/fail on *changed* end of line (eol) *only*?
  2016-12-25 23:49 git-apply: warn/fail on *changed* end of line (eol) *only*? Igor Djordjevic BugA
  2016-12-25 23:55 ` Igor Djordjevic BugA
@ 2016-12-26  3:25 ` Junio C Hamano
  2016-12-27  0:26   ` Igor Djordjevic BugA
  2016-12-27 17:27   ` Junio C Hamano
  1 sibling, 2 replies; 6+ messages in thread
From: Junio C Hamano @ 2016-12-26  3:25 UTC (permalink / raw)
  To: Igor Djordjevic BugA; +Cc: git

Igor Djordjevic BugA <igor.d.djordjevic@gmail.com> writes:

> In short -- git-apply warns on applying the patch with CRLF line endings
> (new), considered whitespace errors, even when previous hunk version
> (old) has/had that very same CRLF line endings, too, so nothing actually
> changed in this regards. Even worse, it happily applies a patch with LF
> line endings (new) without any warning/hint, even though previous (old)
> line endings were CRLF, thus effectively (and silently) breaking the
> (previous) line endings.

Let me see if I understood your problem description correctly.

Imagine that the project wants LF line endings, i.e. it considers
that a line with CRLF ending has an unwanted "whitespace" at the
end.  Now, you start from this source file:

    1 <CRLF>
    3 <CRLF>
    5 <CRLF>

and a patch like this comes in:

     1 <CRLF>
    -3 <CRLF>
    +three <CRLF>
     5 <CRLF>

You think that "3 <CRLF>" was replaced by "three <CRLF>", and the
claim is "the 'previous' contents already had <CRLF> ending, so the
change is not making things worse".

But what if the patch was like this?

     1 <CRLF>
    -3 <CRLF>
    +three <CRLF>
    +four <CRLF>
     5 <CRLF>

Do you want to warn on "four <CRLF>" because it does not have any
"previous" corresponding line?

Extending the thought further, which line do you want to warn and
which line do you not want to, if the patch were like this instead?

     1 <CRLF>
    -3 <CRLF>
    +four <CRLF>
    +three <CRLF>
     5 <CRLF>

Extending this thought experiment further, you would realize that
fundamentally the concept of "previous contents" has no sensible
definition.

Incidentally, not realizing "there is no sensible definition of
'previous contents'" is a source of another often seen confusion by
new users of Git and any version control systems regarding the
"blame" feature.  When given the final answer by "blame", they often
say "what content did the line have _before_ the commit blame gave
me?"  As there is no sensible definition of 'previous contents' that
corresponds to the line, that question does not in general have a
sensible answer.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-apply: warn/fail on *changed* end of line (eol) *only*?
  2016-12-26  3:25 ` Junio C Hamano
@ 2016-12-27  0:26   ` Igor Djordjevic BugA
  2016-12-27 17:27   ` Junio C Hamano
  1 sibling, 0 replies; 6+ messages in thread
From: Igor Djordjevic BugA @ 2016-12-27  0:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio, and thanks for sharing your thoughts.

You understood it pretty well, except the context "scope". Example does
show a single line change, and a single line old/new comparison, but
that is for simplicity sake of the initial example. What I was
discussing about was the whole patch "hunk" instead (mentioned inside
the part you quoted, too).

> But what if the patch was like this?
> 
>      1 <CRLF>
>     -3 <CRLF>
>     +three <CRLF>
>     +four <CRLF>
>      5 <CRLF>
> 
> Do you want to warn on "four <CRLF>" because it does not have any
> "previous" corresponding line?

No, because the old hunk (single - line) has <CRLF> line ending(s), the
same as the new hunk (both + lines), so effectively there is no end of
line change between old and new _hunks_, so no warning needed.

> Extending the thought further, which line do you want to warn and
> which line do you not want to, if the patch were like this instead?
> 
>      1 <CRLF>
>     -3 <CRLF>
>     +four <CRLF>
>     +three <CRLF>
>      5 <CRLF>

Again, no warnings here, same as described above.

> Extending this thought experiment further, you would realize that
> fundamentally the concept of "previous contents" has no sensible
> definition.

This one is interesting, and maybe arguable - I certainly don`t have
enough knowledge about the overall matter (Git heuristics for diff
creation, in terms of which parts get marked as "old" (-), and which as
"new" (+), which probably even depends on user defined settings)...

... BUT, the overall idea here seems rather simple (to me :P) - if *all*
removed lines inside the old hunk have the same line endings (<CRLF>,
for example), where there is *any* (at least one) added line inside the
new hunk that has a different line ending than the ones in the old hunk
(like <LF>), then warning or failure might be preferred (optionally,
at least), as it does seem like something fishy is going on.

I appreciate and understand a need for Git being able to know "correct"
line endings, but in case where the whole previous hunk (or the whole
file, even) has "incorrect" line endings (<CRLF>, as far as Git is
concerned, as in the given example), it seems more sensible to me for
Git to warn you about _that_ first, rather than keep silent and apply a
new hunk introducing <LF> line endings without you even knowing - heck,
maybe your core end of line setting is incorrect, indeed, so actually
having someone to let you know seems nice, before potentially corrupting
your file.

It felt like - "Hey, I do have a mean to know that your _whole file_
has incorrect line endings (as per my current settings), but I don`t
want to tell you, yet I`ll rather happily apply this patch which
introduces n lines with _correct_ line endings... So, now you still have
your whole incorrect file, but don`t worry, I fixed those n lines for
you behind your back, life is good, and you`re welcome."

Ok, I might be exaggerating, but it just doesn`t seem right, the end
result seems even worse - having "incorrect" and "correct" endings
silently mixed.

To prevent this, the most important question seems to be - do we have a
sensible way to tell "this is THE line ending of the old hunk (the only
line ending used)"?

And the worst case answer would probably be - you need to check the
whole (old) file, if all line endings are the same, that is THE line
ending you`ll use for comparison against line endings of the newly added
lines.

For the best case scenario, I don`t know enough of Git diff heuristics
to say if we would even be able to determine THE line endings based on
the old _hunk_ only, not to examine the whole old file.

_If_ we can find THE old hunk (or file) line endings, and patch
introduces different ones, then warn/fail option seems sensible...?

Of course, in case where old hunk/file already has mixed line endings
(like both <CRLF> and <LF>), we can either choose to warn right away (as
mixed lines endings are found already), or maybe even not warn at all,
as nothing actually changes in regards to line endings no matter which
one gets added, and we`re discussing "warn/fail on eol *change* only".

Otherwise, I agree about "previous contents" sensibility, especially in
the "blame" case, but this does seem a bit different... or does it?

Regards, BugA

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-apply: warn/fail on *changed* end of line (eol) *only*?
  2016-12-26  3:25 ` Junio C Hamano
  2016-12-27  0:26   ` Igor Djordjevic BugA
@ 2016-12-27 17:27   ` Junio C Hamano
  2016-12-27 22:14     ` Igor Djordjevic BugA
  1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2016-12-27 17:27 UTC (permalink / raw)
  To: Igor Djordjevic BugA; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Imagine that the project wants LF line endings, i.e. it considers
> that a line with CRLF ending has an unwanted "whitespace" at the
> end.  Now, you start from this source file:
>
>     1 <CRLF>
>     3 <CRLF>
>     5 <CRLF>
>
> and a patch like this comes in:
>
>      1 <CRLF>
>     -3 <CRLF>
>     +three <CRLF>
>      5 <CRLF>
>
> You think that "3 <CRLF>" was replaced by "three <CRLF>", and the
> claim is "the 'previous' contents already had <CRLF> ending, so the
> change is not making things worse".

To see the problem with "check existing lines", it probably is
easier to extend the above example to start from a file with one
more line, like this:

    1 <CRLF>
    3 <CRLF>
    4 <LF>
    5 <CRLF>

and extend all the example patches to remove "4 <LF>" line as well,
where they remove "3 <CRLF>", making the first example patch like
so:

     1 <CRLF>
    -3 <CRLF>
    -4 <LF>
    +three <CRLF>
     5 <CRLF>

Now, if you take "three <CRLF>" to be replacing "3 <CRLF>", then you
may feel that not warning on the CRLF would be the right thing, but
there is no reason (other than the fact you, a human, understand
what 'three' means) to choose "3 <CRLF>" over "4 <LF>" as the
original.  If you take "three <CRLF>" to be replacing "4 <LF>", you
would need to warn.

A totally uninteresting special case is when the original is all
<CRLF>, but in that case, as you already said in the original
message, the project wants <CRLF> and you can configure it as such.
Then <CR> in the line-end <CRLF> won't be mistaken as if it is a
whitespace character <CR> at the end of a line terminated with <LF>.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-apply: warn/fail on *changed* end of line (eol) *only*?
  2016-12-27 17:27   ` Junio C Hamano
@ 2016-12-27 22:14     ` Igor Djordjevic BugA
  0 siblings, 0 replies; 6+ messages in thread
From: Igor Djordjevic BugA @ 2016-12-27 22:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 27/12/2016 18:27, Junio C Hamano wrote:

> To see the problem with "check existing lines", it probably is
> easier to extend the above example to start from a file with one
> more line, like this:
>
>     1 <CRLF>
>     3 <CRLF>
>     4 <LF>
>     5 <CRLF>
>
> and extend all the example patches to remove "4 <LF>" line as well,
> where they remove "3 <CRLF>", making the first example patch like
> so:
>
>      1 <CRLF>
>     -3 <CRLF>
>     -4 <LF>
>     +three <CRLF>
>      5 <CRLF>
>
> Now, if you take "three <CRLF>" to be replacing "3 <CRLF>", then you
> may feel that not warning on the CRLF would be the right thing, but
> there is no reason (other than the fact you, a human, understand
> what 'three' means) to choose "3 <CRLF>" over "4 <LF>" as the
> original.  If you take "three <CRLF>" to be replacing "4 <LF>", you
> would need to warn.

Hmm, but why do you keep insisting on knowing what the lines are about,
and still making a 'per line' end of line comparison? Besides, your
example falls within the (only?) special case I mentioned as the one Git
probably shouldn`t be acting upon, as both <LF> and <CRLF> are present
among the the old hunk lines already (lines starting with '-').

The logic should be simple, what I tried to describe in the previous
message:

  1. Examine all '-' lines line endings.

  1.1. If not all line endings are the same (like in your example, no
       matter the line content), do nothing.

  1.2. If all line endings _are_ the same within the old hunk ('-'
       lines), check if any of the '+' lines (new hunk) has a different
       line ending (still no matter the line content).

  1.2.1. If no different line endings among '+' lines in comparison to
         unique line ending found in '-' lines, do nothing.

  1.2.2. If _any_ of the '+' lines _has_ a different line ending in
         comparison to unique line ending found in '-' lines, then do
         warn/fail.

This even might seem as the most sensible approach, no matter the
overall project end of line setting, which is exactly why I find it
interesting - it seems to 'just work', being beginner friendly _and_
also watching your back on unexpected end of line changes.

> A totally uninteresting special case is when the original is all
> <CRLF>, but in that case, as you already said in the original
> message, the project wants <CRLF> and you can configure it as such.
> Then <CR> in the line-end <CRLF> won't be mistaken as if it is a
> whitespace character <CR> at the end of a line terminated with <LF>.

But this is exactly the most confusing case, especially for beginners,
where project configuration is incorrect, and you get warned about
'whitespace errors' where all line endings (old/new) are in fact still
the same (confusing).

Yet worse, you don`t get warned of different line endings being applied,
as these are considered 'correct' due to current (incorrect) setting, no
matter the whole file is actually different, which you don`t get warned
about in the first place, and you may discover the file line ending
breakage only when other contributors start complaining.

Also, would it be sensible to account for a possible file inside the
project having different line endings than project in general...? This
would help there as well, without unintentionally breaking the file.

Regards, BugA

P.S. I guess you don`t need me to tell you that, but please feel free
to drop out of this discussion at your own discretion, even though I
enjoy sharing thoughts (and learning new stuff!), I do kind of feel I`m
wasting your time for something that might not be that important, if at
all, otherwise someone would have probably addressed it so far :/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-12-27 22:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-25 23:49 git-apply: warn/fail on *changed* end of line (eol) *only*? Igor Djordjevic BugA
2016-12-25 23:55 ` Igor Djordjevic BugA
2016-12-26  3:25 ` Junio C Hamano
2016-12-27  0:26   ` Igor Djordjevic BugA
2016-12-27 17:27   ` Junio C Hamano
2016-12-27 22:14     ` Igor Djordjevic BugA

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).