* git-apply: warn/fail on *changed* end of line (eol) *only*? @ 2016-12-25 23:49 Igor Djordjevic BugA 2016-12-25 23:55 ` Igor Djordjevic BugA 2016-12-26 3:25 ` Junio C Hamano 0 siblings, 2 replies; 6+ messages in thread From: Igor Djordjevic BugA @ 2016-12-25 23:49 UTC (permalink / raw) To: git Hi to all, This is a follow-up of a message posted at git-users Google group[1], as the topic may actually be better suited for this mailing list instead. If needed, some elaboration on the context can be found there, as well as a detailed example describing the motive for the question itself (or directly here[2], second half of the message). In short -- git-apply warns on applying the patch with CRLF line endings (new), considered whitespace errors, even when previous hunk version (old) has/had that very same CRLF line endings, too, so nothing actually changed in this regards. Even worse, it happily applies a patch with LF line endings (new) without any warning/hint, even though previous (old) line endings were CRLF, thus effectively (and silently) breaking the (previous) line endings. I understand that what should be considered "correct" line endings can be set explicitly (and should be communicated on the project level in the first place), but would it make sense to have an additional git-apply --whitespace option (like "warn-changed" and "error-changed", or something) to warn/fail on *changed* end of line *only*, without any further knowledge needed? So not to have Git need to assume or know which line endings should be considered "correct", nor to think/bother too much with it, but just to warn/fail on actual line ending *change* (in comparison between old and new part/hunk of the patch). This seems more intuitive to me, and much more cross-platform collaboration friendly, at least as an option, as one wouldn`t need to think much about "correct" project/file line endings (which would still be advised, anyway), but would be promptly warned about possibly breaking previous/existing line endings, maybe even with an option to keep those previous ones, or force the new ones... yet as I`m still new around, I accept that I might be missing some bigger picture here :) Thanks, BugA P.S. I`m discussing git-apply and end-of-line handling here in particular as that`s what I`m currently concerned with, and for the sake of simplicity, but might be that whole thing could theoretically be broadened to other Git tools (like git-diff) and whitespace (error) handling in general, too (warn/fail only if actually introduced in new hunk, not previously being present in the old one). -- [1] https://groups.google.com/d/msg/git-users/9Os_48yJdn0/j9S_W7h-EAAJ [2] https://groups.google.com/d/msg/git-users/9Os_48yJdn0/5S9Ja1fEEAAJ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-apply: warn/fail on *changed* end of line (eol) *only*? 2016-12-25 23:49 git-apply: warn/fail on *changed* end of line (eol) *only*? Igor Djordjevic BugA @ 2016-12-25 23:55 ` Igor Djordjevic BugA 2016-12-26 3:25 ` Junio C Hamano 1 sibling, 0 replies; 6+ messages in thread From: Igor Djordjevic BugA @ 2016-12-25 23:55 UTC (permalink / raw) To: git On 26/12/2016 00:49, Igor Djordjevic BugA wrote: > This is a follow-up of a message posted at git-users Google group[1], > as the topic may actually be better suited for this mailing list > instead. If needed, some elaboration on the context can be found there, > as well as a detailed example describing the motive for the question > itself (or directly here[2], second half of the message). For some reason reference links got excluded, so here they are again: [1] https://groups.google.com/d/msg/git-users/9Os_48yJdn0/j9S_W7h-EAAJ [2] https://groups.google.com/d/msg/git-users/9Os_48yJdn0/5S9Ja1fEEAAJ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-apply: warn/fail on *changed* end of line (eol) *only*? 2016-12-25 23:49 git-apply: warn/fail on *changed* end of line (eol) *only*? Igor Djordjevic BugA 2016-12-25 23:55 ` Igor Djordjevic BugA @ 2016-12-26 3:25 ` Junio C Hamano 2016-12-27 0:26 ` Igor Djordjevic BugA 2016-12-27 17:27 ` Junio C Hamano 1 sibling, 2 replies; 6+ messages in thread From: Junio C Hamano @ 2016-12-26 3:25 UTC (permalink / raw) To: Igor Djordjevic BugA; +Cc: git Igor Djordjevic BugA <igor.d.djordjevic@gmail.com> writes: > In short -- git-apply warns on applying the patch with CRLF line endings > (new), considered whitespace errors, even when previous hunk version > (old) has/had that very same CRLF line endings, too, so nothing actually > changed in this regards. Even worse, it happily applies a patch with LF > line endings (new) without any warning/hint, even though previous (old) > line endings were CRLF, thus effectively (and silently) breaking the > (previous) line endings. Let me see if I understood your problem description correctly. Imagine that the project wants LF line endings, i.e. it considers that a line with CRLF ending has an unwanted "whitespace" at the end. Now, you start from this source file: 1 <CRLF> 3 <CRLF> 5 <CRLF> and a patch like this comes in: 1 <CRLF> -3 <CRLF> +three <CRLF> 5 <CRLF> You think that "3 <CRLF>" was replaced by "three <CRLF>", and the claim is "the 'previous' contents already had <CRLF> ending, so the change is not making things worse". But what if the patch was like this? 1 <CRLF> -3 <CRLF> +three <CRLF> +four <CRLF> 5 <CRLF> Do you want to warn on "four <CRLF>" because it does not have any "previous" corresponding line? Extending the thought further, which line do you want to warn and which line do you not want to, if the patch were like this instead? 1 <CRLF> -3 <CRLF> +four <CRLF> +three <CRLF> 5 <CRLF> Extending this thought experiment further, you would realize that fundamentally the concept of "previous contents" has no sensible definition. Incidentally, not realizing "there is no sensible definition of 'previous contents'" is a source of another often seen confusion by new users of Git and any version control systems regarding the "blame" feature. When given the final answer by "blame", they often say "what content did the line have _before_ the commit blame gave me?" As there is no sensible definition of 'previous contents' that corresponds to the line, that question does not in general have a sensible answer. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-apply: warn/fail on *changed* end of line (eol) *only*? 2016-12-26 3:25 ` Junio C Hamano @ 2016-12-27 0:26 ` Igor Djordjevic BugA 2016-12-27 17:27 ` Junio C Hamano 1 sibling, 0 replies; 6+ messages in thread From: Igor Djordjevic BugA @ 2016-12-27 0:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi Junio, and thanks for sharing your thoughts. You understood it pretty well, except the context "scope". Example does show a single line change, and a single line old/new comparison, but that is for simplicity sake of the initial example. What I was discussing about was the whole patch "hunk" instead (mentioned inside the part you quoted, too). > But what if the patch was like this? > > 1 <CRLF> > -3 <CRLF> > +three <CRLF> > +four <CRLF> > 5 <CRLF> > > Do you want to warn on "four <CRLF>" because it does not have any > "previous" corresponding line? No, because the old hunk (single - line) has <CRLF> line ending(s), the same as the new hunk (both + lines), so effectively there is no end of line change between old and new _hunks_, so no warning needed. > Extending the thought further, which line do you want to warn and > which line do you not want to, if the patch were like this instead? > > 1 <CRLF> > -3 <CRLF> > +four <CRLF> > +three <CRLF> > 5 <CRLF> Again, no warnings here, same as described above. > Extending this thought experiment further, you would realize that > fundamentally the concept of "previous contents" has no sensible > definition. This one is interesting, and maybe arguable - I certainly don`t have enough knowledge about the overall matter (Git heuristics for diff creation, in terms of which parts get marked as "old" (-), and which as "new" (+), which probably even depends on user defined settings)... ... BUT, the overall idea here seems rather simple (to me :P) - if *all* removed lines inside the old hunk have the same line endings (<CRLF>, for example), where there is *any* (at least one) added line inside the new hunk that has a different line ending than the ones in the old hunk (like <LF>), then warning or failure might be preferred (optionally, at least), as it does seem like something fishy is going on. I appreciate and understand a need for Git being able to know "correct" line endings, but in case where the whole previous hunk (or the whole file, even) has "incorrect" line endings (<CRLF>, as far as Git is concerned, as in the given example), it seems more sensible to me for Git to warn you about _that_ first, rather than keep silent and apply a new hunk introducing <LF> line endings without you even knowing - heck, maybe your core end of line setting is incorrect, indeed, so actually having someone to let you know seems nice, before potentially corrupting your file. It felt like - "Hey, I do have a mean to know that your _whole file_ has incorrect line endings (as per my current settings), but I don`t want to tell you, yet I`ll rather happily apply this patch which introduces n lines with _correct_ line endings... So, now you still have your whole incorrect file, but don`t worry, I fixed those n lines for you behind your back, life is good, and you`re welcome." Ok, I might be exaggerating, but it just doesn`t seem right, the end result seems even worse - having "incorrect" and "correct" endings silently mixed. To prevent this, the most important question seems to be - do we have a sensible way to tell "this is THE line ending of the old hunk (the only line ending used)"? And the worst case answer would probably be - you need to check the whole (old) file, if all line endings are the same, that is THE line ending you`ll use for comparison against line endings of the newly added lines. For the best case scenario, I don`t know enough of Git diff heuristics to say if we would even be able to determine THE line endings based on the old _hunk_ only, not to examine the whole old file. _If_ we can find THE old hunk (or file) line endings, and patch introduces different ones, then warn/fail option seems sensible...? Of course, in case where old hunk/file already has mixed line endings (like both <CRLF> and <LF>), we can either choose to warn right away (as mixed lines endings are found already), or maybe even not warn at all, as nothing actually changes in regards to line endings no matter which one gets added, and we`re discussing "warn/fail on eol *change* only". Otherwise, I agree about "previous contents" sensibility, especially in the "blame" case, but this does seem a bit different... or does it? Regards, BugA ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-apply: warn/fail on *changed* end of line (eol) *only*? 2016-12-26 3:25 ` Junio C Hamano 2016-12-27 0:26 ` Igor Djordjevic BugA @ 2016-12-27 17:27 ` Junio C Hamano 2016-12-27 22:14 ` Igor Djordjevic BugA 1 sibling, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2016-12-27 17:27 UTC (permalink / raw) To: Igor Djordjevic BugA; +Cc: git Junio C Hamano <gitster@pobox.com> writes: > Imagine that the project wants LF line endings, i.e. it considers > that a line with CRLF ending has an unwanted "whitespace" at the > end. Now, you start from this source file: > > 1 <CRLF> > 3 <CRLF> > 5 <CRLF> > > and a patch like this comes in: > > 1 <CRLF> > -3 <CRLF> > +three <CRLF> > 5 <CRLF> > > You think that "3 <CRLF>" was replaced by "three <CRLF>", and the > claim is "the 'previous' contents already had <CRLF> ending, so the > change is not making things worse". To see the problem with "check existing lines", it probably is easier to extend the above example to start from a file with one more line, like this: 1 <CRLF> 3 <CRLF> 4 <LF> 5 <CRLF> and extend all the example patches to remove "4 <LF>" line as well, where they remove "3 <CRLF>", making the first example patch like so: 1 <CRLF> -3 <CRLF> -4 <LF> +three <CRLF> 5 <CRLF> Now, if you take "three <CRLF>" to be replacing "3 <CRLF>", then you may feel that not warning on the CRLF would be the right thing, but there is no reason (other than the fact you, a human, understand what 'three' means) to choose "3 <CRLF>" over "4 <LF>" as the original. If you take "three <CRLF>" to be replacing "4 <LF>", you would need to warn. A totally uninteresting special case is when the original is all <CRLF>, but in that case, as you already said in the original message, the project wants <CRLF> and you can configure it as such. Then <CR> in the line-end <CRLF> won't be mistaken as if it is a whitespace character <CR> at the end of a line terminated with <LF>. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-apply: warn/fail on *changed* end of line (eol) *only*? 2016-12-27 17:27 ` Junio C Hamano @ 2016-12-27 22:14 ` Igor Djordjevic BugA 0 siblings, 0 replies; 6+ messages in thread From: Igor Djordjevic BugA @ 2016-12-27 22:14 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 27/12/2016 18:27, Junio C Hamano wrote: > To see the problem with "check existing lines", it probably is > easier to extend the above example to start from a file with one > more line, like this: > > 1 <CRLF> > 3 <CRLF> > 4 <LF> > 5 <CRLF> > > and extend all the example patches to remove "4 <LF>" line as well, > where they remove "3 <CRLF>", making the first example patch like > so: > > 1 <CRLF> > -3 <CRLF> > -4 <LF> > +three <CRLF> > 5 <CRLF> > > Now, if you take "three <CRLF>" to be replacing "3 <CRLF>", then you > may feel that not warning on the CRLF would be the right thing, but > there is no reason (other than the fact you, a human, understand > what 'three' means) to choose "3 <CRLF>" over "4 <LF>" as the > original. If you take "three <CRLF>" to be replacing "4 <LF>", you > would need to warn. Hmm, but why do you keep insisting on knowing what the lines are about, and still making a 'per line' end of line comparison? Besides, your example falls within the (only?) special case I mentioned as the one Git probably shouldn`t be acting upon, as both <LF> and <CRLF> are present among the the old hunk lines already (lines starting with '-'). The logic should be simple, what I tried to describe in the previous message: 1. Examine all '-' lines line endings. 1.1. If not all line endings are the same (like in your example, no matter the line content), do nothing. 1.2. If all line endings _are_ the same within the old hunk ('-' lines), check if any of the '+' lines (new hunk) has a different line ending (still no matter the line content). 1.2.1. If no different line endings among '+' lines in comparison to unique line ending found in '-' lines, do nothing. 1.2.2. If _any_ of the '+' lines _has_ a different line ending in comparison to unique line ending found in '-' lines, then do warn/fail. This even might seem as the most sensible approach, no matter the overall project end of line setting, which is exactly why I find it interesting - it seems to 'just work', being beginner friendly _and_ also watching your back on unexpected end of line changes. > A totally uninteresting special case is when the original is all > <CRLF>, but in that case, as you already said in the original > message, the project wants <CRLF> and you can configure it as such. > Then <CR> in the line-end <CRLF> won't be mistaken as if it is a > whitespace character <CR> at the end of a line terminated with <LF>. But this is exactly the most confusing case, especially for beginners, where project configuration is incorrect, and you get warned about 'whitespace errors' where all line endings (old/new) are in fact still the same (confusing). Yet worse, you don`t get warned of different line endings being applied, as these are considered 'correct' due to current (incorrect) setting, no matter the whole file is actually different, which you don`t get warned about in the first place, and you may discover the file line ending breakage only when other contributors start complaining. Also, would it be sensible to account for a possible file inside the project having different line endings than project in general...? This would help there as well, without unintentionally breaking the file. Regards, BugA P.S. I guess you don`t need me to tell you that, but please feel free to drop out of this discussion at your own discretion, even though I enjoy sharing thoughts (and learning new stuff!), I do kind of feel I`m wasting your time for something that might not be that important, if at all, otherwise someone would have probably addressed it so far :/ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-12-27 22:15 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-25 23:49 git-apply: warn/fail on *changed* end of line (eol) *only*? Igor Djordjevic BugA 2016-12-25 23:55 ` Igor Djordjevic BugA 2016-12-26 3:25 ` Junio C Hamano 2016-12-27 0:26 ` Igor Djordjevic BugA 2016-12-27 17:27 ` Junio C Hamano 2016-12-27 22:14 ` Igor Djordjevic BugA
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).