Git development
 help / color / mirror / Atom feed
* Bug-ish: CRLF endings and conflict markers
@ 2007-01-11  9:41 Andy Parkins
  2007-01-11  9:46 ` Johannes Schindelin
  2007-01-11  9:50 ` Shawn O. Pearce
  0 siblings, 2 replies; 8+ messages in thread
From: Andy Parkins @ 2007-01-11  9:41 UTC (permalink / raw)
  To: git

Hello,

I am sure the response to this will be "tough", however I'll mention it 
anyway.  It wasn't a problem for me, as I know git is LF endings only, but 
others might be a bit confused.

 * Track a file that has CR-LF endings.  Not a problem, git works fine with 
   these and treats the CR as an extra character at the end of the line.
 * Have two branches make conflicting changes to that file
 * Merge, conflict
 * Open the file
 * Find that (almost) every line now displays with a "^M" at the end.  vim has
   found some lines that don't have CR-LF ends so dropped back to UNIX mode.
 * Be careful that your conflict resolution doesn't introduce any new lines as 
   they will not have CR-LF endings.

The "(almost)" above refers of course to the conflict markers.  These have LF 
endings only so force the editor into UNIX mode.  Assuming a binary safe 
editor, things will be fine if the conflict resolution is simply to remove 
the markers, and edit the existing lines.  However, if the user adds any 
lines during conflict resolution, those new lines will only have LF ends.

This is a problem for compilers that are expecting CR-LF input.

Note: this is not on Windows (spit), this is simply editing a CR-LF file on 
UNIX.

The best solution is probably to use the line ending of the conflicted lines.  
I've had a look, but I can only fine builtin-rerere.c that generates the 
markers - would that be the place to make this change?


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug-ish: CRLF endings and conflict markers
  2007-01-11  9:41 Bug-ish: CRLF endings and conflict markers Andy Parkins
@ 2007-01-11  9:46 ` Johannes Schindelin
  2007-01-11 11:36   ` Andy Parkins
  2007-01-11  9:50 ` Shawn O. Pearce
  1 sibling, 1 reply; 8+ messages in thread
From: Johannes Schindelin @ 2007-01-11  9:46 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Hi,

On Thu, 11 Jan 2007, Andy Parkins wrote:

> [describes that conflict markers have LF-only line ending, even if file 
>  has CR-LF line endings]
> 
> The best solution is probably to use the line ending of the conflicted 
> lines.

Question is: how to find out. Especially if your file already has mixed 
line endings...

> I've had a look, but I can only fine builtin-rerere.c that generates the 
> markers - would that be the place to make this change?

No. It would be in xdiff/xmerge.c:{139,147,160}. But I think that xdiff 
really is LF-only throughout, so you'd have to do much more work anyway.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug-ish: CRLF endings and conflict markers
  2007-01-11  9:41 Bug-ish: CRLF endings and conflict markers Andy Parkins
  2007-01-11  9:46 ` Johannes Schindelin
@ 2007-01-11  9:50 ` Shawn O. Pearce
  2007-01-11  9:59   ` Johannes Schindelin
  1 sibling, 1 reply; 8+ messages in thread
From: Shawn O. Pearce @ 2007-01-11  9:50 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Andy Parkins <andyparkins@gmail.com> wrote:
> The best solution is probably to use the line ending of the conflicted lines.  
> I've had a look, but I can only fine builtin-rerere.c that generates the 
> markers - would that be the place to make this change?

builtin-rerere may need to change but the code that's actually
creating the conflict markers isn't there.  Its somewhere in
xdiff/xmerge.c.  I say somewhere as I haven't dredged down into
that code myself, but I know that's where xdl_merge() is and I
know its xdl_merge() that actually created the content of the
conflict file during the merge.

That said I don't really care about this problem that much.
The problem that I care about is its far too easy to convert the
lineendings in a file (e.g. CRLF->LF, LF->CRLF).  This causes the
entire file to differ, making merges very difficult.  I really
should just fix it (in the one place where it matters to me) by
modifying the pre-commit hook to look for such a case and abort.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug-ish: CRLF endings and conflict markers
  2007-01-11  9:50 ` Shawn O. Pearce
@ 2007-01-11  9:59   ` Johannes Schindelin
  2007-01-11 10:16     ` Shawn O. Pearce
  0 siblings, 1 reply; 8+ messages in thread
From: Johannes Schindelin @ 2007-01-11  9:59 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Andy Parkins, git

Hi,

On Thu, 11 Jan 2007, Shawn O. Pearce wrote:

> That said I don't really care about this problem that much. The problem 
> that I care about is its far too easy to convert the lineendings in a 
> file (e.g. CRLF->LF, LF->CRLF).  This causes the entire file to differ, 
> making merges very difficult.  I really should just fix it (in the one 
> place where it matters to me) by modifying the pre-commit hook to look 
> for such a case and abort.

Why not just introduce a config variable, and do the conversion in-flight?

Or, alternatively, do the merge ignoring white space? (Of course, this is 
somewhat pointless when merging whitespace fixes...)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug-ish: CRLF endings and conflict markers
  2007-01-11  9:59   ` Johannes Schindelin
@ 2007-01-11 10:16     ` Shawn O. Pearce
  2007-01-11 10:26       ` Johannes Schindelin
  0 siblings, 1 reply; 8+ messages in thread
From: Shawn O. Pearce @ 2007-01-11 10:16 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Andy Parkins, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Thu, 11 Jan 2007, Shawn O. Pearce wrote:
> 
> > That said I don't really care about this problem that much. The problem 
> > that I care about is its far too easy to convert the lineendings in a 
> > file (e.g. CRLF->LF, LF->CRLF).  This causes the entire file to differ, 
> > making merges very difficult.  I really should just fix it (in the one 
> > place where it matters to me) by modifying the pre-commit hook to look 
> > for such a case and abort.
> 
> Why not just introduce a config variable, and do the conversion in-flight?

That's a lot of work and goes very much against the Git mindset that
we never alter content, just store it as-is.  If Linus sees this
thread I'm suspecting he will jump in and point out that altering
content transparently like this just wrong.  I think he's stated
something like that in the past.  :-)

All I want is to make the user realize what they have done.  "Hey
dummy, you just changed the entire file and the only difference I see
for most lines is simply the addition/removal of a CR.  You shouldn't
do that!".  The pre-commit hook is the perfect place for that.

It should be pretty easy to do.  For every line in the diff stuff
it into a perl hash after removing any trailing CR.  If both an
add and a delete for the exact same line of text appear in the diff
(with the only difference being the CR on the end) and the number
of such lines number is at least 50% of the modified lines in the
file, something's amiss.  Smack the user and tell them to fix the
file.
 
> Or, alternatively, do the merge ignoring white space? (Of course, this is 
> somewhat pointless when merging whitespace fixes...)

Lets not go down that road.  That's just asking for trouble.
And it sounds like a lot of work from what you pointed out in
another message.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug-ish: CRLF endings and conflict markers
  2007-01-11 10:16     ` Shawn O. Pearce
@ 2007-01-11 10:26       ` Johannes Schindelin
  2007-01-11 10:41         ` Shawn O. Pearce
  0 siblings, 1 reply; 8+ messages in thread
From: Johannes Schindelin @ 2007-01-11 10:26 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Andy Parkins, git

Hi,

On Thu, 11 Jan 2007, Shawn O. Pearce wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > On Thu, 11 Jan 2007, Shawn O. Pearce wrote:
> > 
> > > That said I don't really care about this problem that much. The problem 
> > > that I care about is its far too easy to convert the lineendings in a 
> > > file (e.g. CRLF->LF, LF->CRLF).  This causes the entire file to differ, 
> > > making merges very difficult.  I really should just fix it (in the one 
> > > place where it matters to me) by modifying the pre-commit hook to look 
> > > for such a case and abort.
> > 
> > Why not just introduce a config variable, and do the conversion in-flight?
> 
> That's a lot of work and goes very much against the Git mindset that
> we never alter content, just store it as-is.

While that is correct, we also _use_ the content, and very much alter it 
all the time. A diff, for example, is nothing but altered content.

> All I want is to make the user realize what they have done.  "Hey dummy, 
> you just changed the entire file and the only difference I see for most 
> lines is simply the addition/removal of a CR.  You shouldn't do that!".  
> The pre-commit hook is the perfect place for that.

This sounds sensible. I mistook your task to be integrator for people you 
can't smack.

> > Or, alternatively, do the merge ignoring white space? (Of course, this 
> > is somewhat pointless when merging whitespace fixes...)
> 
> Lets not go down that road.  That's just asking for trouble. And it 
> sounds like a lot of work from what you pointed out in another message.

But to the contrary, ignoring white space in xdl_merge() is just a 
question of passing XDF_IGNORE_WHITESPACE_CHANGE as part of the xpparam's.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug-ish: CRLF endings and conflict markers
  2007-01-11 10:26       ` Johannes Schindelin
@ 2007-01-11 10:41         ` Shawn O. Pearce
  0 siblings, 0 replies; 8+ messages in thread
From: Shawn O. Pearce @ 2007-01-11 10:41 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Andy Parkins, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Thu, 11 Jan 2007, Shawn O. Pearce wrote:
> > That's a lot of work and goes very much against the Git mindset that
> > we never alter content, just store it as-is.
> 
> While that is correct, we also _use_ the content, and very much alter it 
> all the time. A diff, for example, is nothing but altered content.

Yes, of course.  But we don't take A and store A' in the ODB during
a commit.  We always assert that if you give us A, you'll get back A.
You can always ask us to compute some sort of A' later on, but that
is always based on the originally stored A.

You can also ask us to combine an A and B and get some new thing C.
But again, we don't then go off and store C' instead of C.
 
> > All I want is to make the user realize what they have done.  "Hey dummy, 
> > you just changed the entire file and the only difference I see for most 
> > lines is simply the addition/removal of a CR.  You shouldn't do that!".  
> > The pre-commit hook is the perfect place for that.
> 
> This sounds sensible. I mistook your task to be integrator for people you 
> can't smack.

To some extent I am.  But I have patched the local Cygwin package
I use to distribute Git to everyone to have a 1 line pre-commit
update hook (and no other hooks) which just invokes a Perl script
in /usr/local/bin, and that Perl script is also in the same Cygwin
package.

So Cygwin's "feature" of making all Git hooks executable by default
in a new repository plays in my favor here.  The pre-commit hook is
active in every repository by default, and it just execs a standard
script which can be modified at any time.  So I can easily update
everyone's pre-commit hook at once.

Currently the hook script is just the stock one that comes with Git,
except it ignores CRLF line endings.  I'll likely tweak it as I'm
suggesting, make the CRLF->LF/LF->CRLF check an optional feature
that can be toggled on/off in the top of the script, and submit the
patch to update the stock hook.  Others may find the check useful.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug-ish: CRLF endings and conflict markers
  2007-01-11  9:46 ` Johannes Schindelin
@ 2007-01-11 11:36   ` Andy Parkins
  0 siblings, 0 replies; 8+ messages in thread
From: Andy Parkins @ 2007-01-11 11:36 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

On Thursday 2007 January 11 09:46, Johannes Schindelin wrote:

> > The best solution is probably to use the line ending of the conflicted
> > lines.
>
> Question is: how to find out. Especially if your file already has mixed
> line endings...

That's why I said use the line endings of the conflicted lines.  Even if the 
file is mixed, at least lines added are the same as the ones near them.  
Let's say for example a conflict like this:

<<<<
Whole file has DOS endings^M
====
The whole file has DOS endings^M
>>>>

As both of the conflicted parts end in CRLF, the conflict markers would 
default to having CRLF.  This would cope with mixed-ending files as well, as 
the ending is always determined locally.

If we were being really clever, we could make each marker use the ending of 
the line following it; making it impossible to accuse git of doing anything 
worse than already existed.

> No. It would be in xdiff/xmerge.c:{139,147,160}. But I think that xdiff

Thanks; I see why I couldn't find it: it's generated as a loop "marker_size" 
long, rather than the literal that is in rerere.  I've had a quick glance at 
xdl_fill_merge_buffer() and can see it's not an easy thing to do (at least 
for me).  Maybe if it itches me a bit more I'll put some effort into my 
scratching :-)

> really is LF-only throughout, so you'd have to do much more work anyway.

That doesn't seem to be a problem.  git is performing very nicely on the CR-LF 
file I'm tracking.  As I mentioned, it's treating the CR as just another 
character on the line - perfect: merges, diffs, diffstats all work just fine.


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-01-11 11:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-11  9:41 Bug-ish: CRLF endings and conflict markers Andy Parkins
2007-01-11  9:46 ` Johannes Schindelin
2007-01-11 11:36   ` Andy Parkins
2007-01-11  9:50 ` Shawn O. Pearce
2007-01-11  9:59   ` Johannes Schindelin
2007-01-11 10:16     ` Shawn O. Pearce
2007-01-11 10:26       ` Johannes Schindelin
2007-01-11 10:41         ` Shawn O. Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox