git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] Git does not convert CRLF=>LF on files with \r not before \n
@ 2015-04-21 13:51 Alexandre Garnier
  2015-04-21 17:41 ` Junio C Hamano
  2015-04-21 19:28 ` Torsten Bögershausen
  0 siblings, 2 replies; 5+ messages in thread
From: Alexandre Garnier @ 2015-04-21 13:51 UTC (permalink / raw)
  To: git

Here is a test:

git init -q crlf-test
cd crlf-test
echo '*       text=auto' > .gitattributes
git add .gitattributes
git commit -q -m "Normalize EOL"
echo -ne 'some content\r\nother \rcontent with CR\r\ncontent\r\nagain
content with\r\r\n' > inline-cr.txt
echo "Working directory content:"
cat -A inline-cr.txt
echo
git add inline-cr.txt
echo "Indexed content:"
git show :inline-cr.txt | cat -A

Result
------
File content:
some content^M$
other ^Mcontent with CR^M$
content^M$
again content with^M^M$

Indexed content:
some content^M$
other ^Mcontent with CR^M$
content^M$
again content with^M^M$

Expected result
---------------
File content:
some content^M$
other ^Mcontent with CR^M$
content^M$
again content with^M^M$

Indexed content:
some content$
other ^Mcontent with CR$
content$
again content with^M$
# or even 'again content with$' for this last line

If you remove the \r that are not at the end of the lines, EOL are
converted as expected:
File content:
some content^M$
other content with CR^M$
content^M$
again content with^M$

Indexed content:
some content$
other content with CR$
content$
again content with$

-- 
Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] Git does not convert CRLF=>LF on files with \r not before \n
  2015-04-21 13:51 [BUG] Git does not convert CRLF=>LF on files with \r not before \n Alexandre Garnier
@ 2015-04-21 17:41 ` Junio C Hamano
  2015-04-22 17:42   ` Junio C Hamano
  2015-04-21 19:28 ` Torsten Bögershausen
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2015-04-21 17:41 UTC (permalink / raw)
  To: Alexandre Garnier
  Cc: git, Steffen Prohaska, Alex Riesen, Eyvind Bernhardsen,
	Carlos Martín Nieto

Alexandre Garnier <zigarn@gmail.com> writes:

> echo '*       text=auto' > .gitattributes
> git add .gitattributes
> git commit -q -m "Normalize EOL"
> echo -ne 'some content\r\nother \rcontent with CR\r\ncontent\r\nagain

With text=auto, the user instructs us to guess, and we expect either
LF or CRLF line-terminated files that is *TEXT*.  A lone CR in the
middle of the line would mean we cannot reliably guess---it may be
LF terminated file with CRs sprinkled inside text, some of which
happen to be at the end of the line, or it may be CRLF terminated
file with CRs sprinkled in.  We try to preserve the user input by
not munging when we are not sure.

You are seeing the designed and intended behaviour.

But it would be a bug if the same thing happens when the user
explicitly tells us that the file has CRLF line endings, and I
suspect we have that bug, which may want to be corrected.

I've Cc'ed various people who worked on convert.c around line
endings.  I recall we saw a few other discussion threads on
text=auto and eol settings.  Stakeholders may want to have a unified
discussion to first list the issues in the current implementation
and come up with fixes for them.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] Git does not convert CRLF=>LF on files with \r not before \n
  2015-04-21 13:51 [BUG] Git does not convert CRLF=>LF on files with \r not before \n Alexandre Garnier
  2015-04-21 17:41 ` Junio C Hamano
@ 2015-04-21 19:28 ` Torsten Bögershausen
  2015-04-22 13:06   ` Alexandre Garnier
  1 sibling, 1 reply; 5+ messages in thread
From: Torsten Bögershausen @ 2015-04-21 19:28 UTC (permalink / raw)
  To: Alexandre Garnier, git

On 2015-04-21 15.51, Alexandre Garnier wrote:
> Here is a test:
> 
> git init -q crlf-test
> cd crlf-test
> echo '*       text=auto' > .gitattributes
> git add .gitattributes
> git commit -q -m "Normalize EOL"
> echo -ne 'some content\r\nother \rcontent with CR\r\ncontent\r\nagain
> content with\r\r\n' > inline-cr.txt
> echo "Working directory content:"
> cat -A inline-cr.txt
> echo
> git add inline-cr.txt
> echo "Indexed content:"
> git show :inline-cr.txt | cat -A
> 
> Result
> ------
> File content:
> some content^M$
> other ^Mcontent with CR^M$
> content^M$
> again content with^M^M$
> 
> Indexed content:
> some content^M$
> other ^Mcontent with CR^M$
> content^M$
> again content with^M^M$
> 
> Expected result
> ---------------
> File content:
> some content^M$
> other ^Mcontent with CR^M$
> content^M$
> again content with^M^M$
> 
> Indexed content:
> some content$
> other ^Mcontent with CR$
> content$
> again content with^M$
> # or even 'again content with$' for this last line
> 
> If you remove the \r that are not at the end of the lines, EOL are
> converted as expected:
> File content:
> some content^M$
> other content with CR^M$
> content^M$
> again content with^M$
> 
> Indexed content:
> some content$
> other content with CR$
> content$
> again content with$
> 

First of all, thanks for the info.

The current implementation of Git does an auto-detection
if a file is text or binary.

For a file which is "suspected to be text", it is expected to have either LF or CRLF as
line endings, but a "bare CR" make Git wonder:
Should this still be treated as a text file ?
If yes, should the CR be kept as is, or should it be converted into LF (or CRLF) ?

The current implementation may simply be explained by the fact that nobody has so far asked 
to treat this file as "text", so the implementation assumes it to be binary.

(Which makes the code a little bit easier, at the time it was written)

So the status of today is that you can force Git to let the CR as is,
when you specify that the file is "text".

Is there a real life problem behind it ?
And what should happen to the CRs ?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] Git does not convert CRLF=>LF on files with \r not before \n
  2015-04-21 19:28 ` Torsten Bögershausen
@ 2015-04-22 13:06   ` Alexandre Garnier
  0 siblings, 0 replies; 5+ messages in thread
From: Alexandre Garnier @ 2015-04-22 13:06 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

Indeed, when changing the gitattributes for '* text', the replacement is OK.
Thanks for all the explanations.

At first, my use case was some source files (imported from another
VCS) with CR in different contexts:
 - lines ending with CRCRLF
 - all content in LF or CRLF but some CR that should be EOL...
 - CR in the middle of the line for no reason!
For all this, I will fix the files during import.

But when digging I found some shell or awk scripts with CR as a valid
char in search/replacement string. I know that the EOL should not be
CRLF in this case, but I don't know if this situation could happen in
DOS batch files or PowerShell scripts with CRLF EOL.

2015-04-21 21:28 GMT+02:00 Torsten Bögershausen <tboegi@web.de>:
> On 2015-04-21 15.51, Alexandre Garnier wrote:
>> Here is a test:
>>
>> git init -q crlf-test
>> cd crlf-test
>> echo '*       text=auto' > .gitattributes
>> git add .gitattributes
>> git commit -q -m "Normalize EOL"
>> echo -ne 'some content\r\nother \rcontent with CR\r\ncontent\r\nagain
>> content with\r\r\n' > inline-cr.txt
>> echo "Working directory content:"
>> cat -A inline-cr.txt
>> echo
>> git add inline-cr.txt
>> echo "Indexed content:"
>> git show :inline-cr.txt | cat -A
>>
>> Result
>> ------
>> File content:
>> some content^M$
>> other ^Mcontent with CR^M$
>> content^M$
>> again content with^M^M$
>>
>> Indexed content:
>> some content^M$
>> other ^Mcontent with CR^M$
>> content^M$
>> again content with^M^M$
>>
>> Expected result
>> ---------------
>> File content:
>> some content^M$
>> other ^Mcontent with CR^M$
>> content^M$
>> again content with^M^M$
>>
>> Indexed content:
>> some content$
>> other ^Mcontent with CR$
>> content$
>> again content with^M$
>> # or even 'again content with$' for this last line
>>
>> If you remove the \r that are not at the end of the lines, EOL are
>> converted as expected:
>> File content:
>> some content^M$
>> other content with CR^M$
>> content^M$
>> again content with^M$
>>
>> Indexed content:
>> some content$
>> other content with CR$
>> content$
>> again content with$
>>
>
> First of all, thanks for the info.
>
> The current implementation of Git does an auto-detection
> if a file is text or binary.
>
> For a file which is "suspected to be text", it is expected to have either LF or CRLF as
> line endings, but a "bare CR" make Git wonder:
> Should this still be treated as a text file ?
> If yes, should the CR be kept as is, or should it be converted into LF (or CRLF) ?
>
> The current implementation may simply be explained by the fact that nobody has so far asked
> to treat this file as "text", so the implementation assumes it to be binary.
>
> (Which makes the code a little bit easier, at the time it was written)
>
> So the status of today is that you can force Git to let the CR as is,
> when you specify that the file is "text".
>
> Is there a real life problem behind it ?
> And what should happen to the CRs ?
>
>
>
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] Git does not convert CRLF=>LF on files with \r not before \n
  2015-04-21 17:41 ` Junio C Hamano
@ 2015-04-22 17:42   ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2015-04-22 17:42 UTC (permalink / raw)
  To: Alexandre Garnier; +Cc: Torsten Bögershausen, git

Alexandre Garnier <zigarn@gmail.com> writes:

> Indeed, when changing the gitattributes for '* text', the replacement is OK.

OK.  Earlier I said:

>> But it would be a bug if the same thing happens when the user
>> explicitly tells us that the file has CRLF line endings, and I
>> suspect we have that bug, which may want to be corrected.

but you are saying that my suspicion is incorrect and we do not have
such a bug.

Thanks for digging further.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-04-22 17:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-21 13:51 [BUG] Git does not convert CRLF=>LF on files with \r not before \n Alexandre Garnier
2015-04-21 17:41 ` Junio C Hamano
2015-04-22 17:42   ` Junio C Hamano
2015-04-21 19:28 ` Torsten Bögershausen
2015-04-22 13:06   ` Alexandre Garnier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).