Wrong file diff for merge conflict

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Wrong file diff for merge conflict
@ 2009-07-04  7:53 Stefan Bucur
  2009-07-05 18:43 ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Bucur @ 2009-07-04  7:53 UTC (permalink / raw)
  To: git

Hi!

I would like to point out a very strange behavior of git merge, which led to
data loss or data duplication in the conflict file generated during a diff merge.

I had to merge two branches (A and B) which contained more files (e.g.
http://pastebin.ca/1483691 - before splitting branches) which were affected in
the following way:
* The files were formatted (indented) in branch A: http://pastebin.ca/1483684
* In branch B, their contents were altered in various points, but not
significantly (refactored some statements into macros): http://pastebin.ca/1483683

I checked out branch A, and I ran "git merge B", and (obviously) there was a
conflict with this file. The big surprise was to see that the generated diff
file looks like this:

http://pastebin.ca/1483228

The problem is with the last diff in the file, where the left portion is empty,
and the right portion contains code which already was marked as merged (common),
right before the start of the diff. Therefore, the mark at line 127 should
really have been before line 114.

Is this a bug or I am missing something?

Thanks,
Stefan Bucur

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Wrong file diff for merge conflict
  2009-07-04  7:53 Wrong file diff for merge conflict Stefan Bucur
@ 2009-07-05 18:43 ` Linus Torvalds
  2009-07-05 19:22   ` Jakub Narebski
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Linus Torvalds @ 2009-07-05 18:43 UTC (permalink / raw)
  To: Stefan Bucur; +Cc: git

On Sat, 4 Jul 2009, Stefan Bucur wrote:
> 
> http://pastebin.ca/1483228
> 
> The problem is with the last diff in the file, where the left portion is empty,
> and the right portion contains code which already was marked as merged (common),
> right before the start of the diff. Therefore, the mark at line 127 should
> really have been before line 114.
> 
> Is this a bug or I am missing something?

I suspect (but without the origin files/history I can't verify) that what 
happens is that the "successfully merged" code was seen as a fully new 
snippet of code (probably due to getting re-indented?), and the other part 
of that action on that branch was the removal of the old code.

That _removal_ is then shown as a conflict against the other branch, which 
presumably didn't re-indent things (of course, it could be exactly the 
other way around too), and so now you end up having the "conflict" being 
seen as "one branch removes this code (empty conflict part), the other one 
presumably changed it some way".

Is that what you wanted? Obviously not. To you, the conflict makes no 
sense. You're a human, who tries to understand what wen't wrong, and to 
you, the end result of the conflict resolution makes no sense.

On the other hand, it probably _does_ make sense from the standpoint of a 
fairly stupid simple three-way merge that is also (on purpose) tuned to 
try to give minimal merge conflicts (the "XDL_MERGE_ZEALOUS" flag) that 
are often easier for humans to then correct because they have limited 
context.

And the thing is, that zealous merge conflict minimization generally does 
result in conflicts that are easier to work with, because you end up with 
a much smaller conflict - rather than having one huge conflict that in 
your case would probably have covered pretty much the whole 'realloc()' 
function, the zealous merge has tried to really zoom in on just the 
critical parts, and has turned it into three conflicts that are fairly 
small (one is just two trivial lines).

In this case, I can also guess at why you then end up with a very odd 
conflict: when one side has re-indented the code, the only lines that tend 
to match after the re-indentation are the trivial lines that have just a 
single closing brace, or perhaps a "} else {" that actually ends up 
matching the _wrong_ indentation level, but because it existed in both 
indentation levels, the stupid line-based merge logic picked it as a 
common feature - even though it very much is exactly the _wrong_ kind of 
common feature to pick.

So is it a bug? Nope. I can pretty much guarantee that it's not a bug. The 
whole zealous merge thing is very much a feature. The "take whitespace 
into account when comparing lines" is also a feature, even though it 
obviously matches up the wrong lines in re-indentation cases like this. I 
bet you've seen diffs (not just merges) that you think look "stupid", and 
they look stupid exactly for the same reason - diff matches the wrong 
closing braces across re-indentation.

Now, that said - it may not be a bug, but it's clearly not what you want, 
so how do you fix it?

There's a couple of things to do:

 - Try to make the merging smarter, and not do this mistake.

   I'd love to have that, but quite frankly, I don't think it's 
   reasonable. As mentioned, aggressively minimizing merge conflicts 
   really does do the right thing most of the time, and non-aggressive 
   merges - while obviously "safer" - are really irritating.

   So the merge is not hugely smart, and _despite_ it being pretty stupid 
   it's also pretty aggressive. Not generally a good combination, but so 
   far, the alternatives have always been way worse.

 - Tune the zealous merge a bit, and in particular try to avoid the cases 
   where this happens (as mentioned, in C code, it's often re-indentation 
   together with trivial lines that match after it).

   This is kind of what the "patience" diff algorithm tries to do: use 
   primarily non-trivial sequences as the anchors for similarity, and 
   ignore the trivial ones. We have a "--patience" switch to "git diff", 
   maybe we could tune the three-way merge somehow similarly.

   (We did already soem time ago add _some_ tuning, in the form of 
   ZEALOUS_ALNUM that only combines non-conflicting lines if they are 
   non-trivial in the sense that they have some alphanumeric characters. 
   But all that happens after the diff algorithm itself has generated the 
   chunks, so the empty '}' lines still do matter for merging. Also, it 
   still considers things like '} else {' to be "meaningful" by virtue of 
   having alphanumerics on it.)

 - Don't rely so heavily on just the traditional three-way merge result.

   This is what I personally do. The trivial 3-way merge result is 
   wondeful for the truly trivial merges, when it gives trivial results 
   that are easy to fix up. But let's face it, the traditional 3-way merge 
   result just sucks for anything more complicated. When you have an empty 
   side on one of the conflicts, is that because the other side added 
   everything, or is it because oen side removed it? Or is it, like in 
   this case, simply because trivially similar lines got the whole diff 
   confused about which parts didn't change at all?

   The good news is that git does have a few nice merge tools. One is 
   "git diff", which actually shows way more than the trivial three-way 
   end result, in that you can diff against either side, and by default it 
   does that fairly complex "diff against both sides" thing that is also 
   quite useful once you get used to it.

   Another is "gitk --merge [filename]" which is wonderful as a tool to 
   see what both sides actually did, to figure out what the intent of both 
   branches were when the three-way merge result is just noise.

The right answer is probably some combination of "all of the above". In 
the meantime, right now, only the last one is somethign git itself will 
help you with.

			Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Wrong file diff for merge conflict
  2009-07-05 18:43 ` Linus Torvalds
@ 2009-07-05 19:22   ` Jakub Narebski
  2009-07-05 19:23   ` Junio C Hamano
  2009-07-05 22:23   ` Stefan Bucur
  2 siblings, 0 replies; 7+ messages in thread
From: Jakub Narebski @ 2009-07-05 19:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stefan Bucur, git

Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Sat, 4 Jul 2009, Stefan Bucur wrote:
> > 
> > http://pastebin.ca/1483228
> > 
> > The problem is with the last diff in the file, where the left portion is empty,
> > and the right portion contains code which already was marked as merged (common),
> > right before the start of the diff. Therefore, the mark at line 127 should
> > really have been before line 114.
> > 
> > Is this a bug or I am missing something?
> 
> I suspect (but without the origin files/history I can't verify) that what 
> happens is that the "successfully merged" code was seen as a fully new 
> snippet of code (probably due to getting re-indented?), and the other part 
> of that action on that branch was the removal of the old code.
> 
> That _removal_ is then shown as a conflict against the other branch, which 
> presumably didn't re-indent things (of course, it could be exactly the 
> other way around too), and so now you end up having the "conflict" being 
> seen as "one branch removes this code (empty conflict part), the other one 
> presumably changed it some way".
> 
> Is that what you wanted? Obviously not. To you, the conflict makes no 
> sense. You're a human, who tries to understand what wen't wrong, and to 
> you, the end result of the conflict resolution makes no sense.

[...]
>  - Don't rely so heavily on just the traditional three-way merge result.
> 
>    This is what I personally do. The trivial 3-way merge result is 
>    wondeful for the truly trivial merges, when it gives trivial results 
>    that are easy to fix up. But let's face it, the traditional 3-way merge 
>    result just sucks for anything more complicated. When you have an empty 
>    side on one of the conflicts, is that because the other side added 
>    everything, or is it because oen side removed it? Or is it, like in 
>    this case, simply because trivially similar lines got the whole diff 
>    confused about which parts didn't change at all?
> 
>    The good news is that git does have a few nice merge tools. One is 
>    "git diff", which actually shows way more than the trivial three-way 
>    end result, in that you can diff against either side, and by default it 
>    does that fairly complex "diff against both sides" thing that is also 
>    quite useful once you get used to it.
> 
>    Another is "gitk --merge [filename]" which is wonderful as a tool to 
>    see what both sides actually did, to figure out what the intent of both 
>    branches were when the three-way merge result is just noise.

There is also "git mergetool" which runs graphical merge tool of your
choice.  It can be easier to work with GUI here.


And git also supports diff3 conflict merge markers, which shows our
and their side, and also the ancestor side; in your case it would make
easy to distinguish between 'one side added' and 'other side removed'.
You can get it using 

  $ git checkout --conflict=diff3 <file>
 
(where <file> can be '.').  You would need modern git (post 1.6.2
I think) for that.

> The right answer is probably some combination of "all of the above". In 
> the meantime, right now, only the last one is something git itself will 
> help you with.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Wrong file diff for merge conflict
  2009-07-05 18:43 ` Linus Torvalds
  2009-07-05 19:22   ` Jakub Narebski
@ 2009-07-05 19:23   ` Junio C Hamano
  2009-07-05 22:23   ` Stefan Bucur
  2 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2009-07-05 19:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stefan Bucur, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> So is it a bug? Nope. I can pretty much guarantee that it's not a bug....

Without the precise common ancestor version it is hard to tell, but using
the "one old version http://pastebin.ca/1483691" as a base and running
diff3 (or merge from RCS suite) yields identical results as ours, so in
that sense it is not a bug.

>  - Tune the zealous merge a bit, and in particular try to avoid the cases 
>    where this happens (as mentioned, in C code, it's often re-indentation 
>    together with trivial lines that match after it).
>
>    This is kind of what the "patience" diff algorithm tries to do: use 
>    primarily non-trivial sequences as the anchors for similarity, and 
>    ignore the trivial ones. We have a "--patience" switch to "git diff", 
>    maybe we could tune the three-way merge somehow similarly.

Hmm. I have to think about this a bit.

>  - Don't rely so heavily on just the traditional three-way merge result.
>
>    This is what I personally do. The trivial 3-way merge result is 
>    wondeful for the truly trivial merges, when it gives trivial results 
>    that are easy to fix up. But let's face it, the traditional 3-way merge 
>    result just sucks for anything more complicated. When you have an empty 
>    side on one of the conflicts, is that because the other side added 
>    everything, or is it because oen side removed it? Or is it, like in 
>    this case, simply because trivially similar lines got the whole diff 
>    confused about which parts didn't change at all?
>
>    The good news is that git does have a few nice merge tools. One is 
>    "git diff", which actually shows way more than the trivial three-way 
>    end result, in that you can diff against either side, and by default it 
>    does that fairly complex "diff against both sides" thing that is also 
>    quite useful once you get used to it.
>
>    Another is "gitk --merge [filename]" which is wonderful as a tool to 
>    see what both sides actually did, to figure out what the intent of both 
>    branches were when the three-way merge result is just noise.

Two other are "git log --merge -p" and "git checkout --conflicts=diff3".

> The right answer is probably some combination of "all of the above". In 
> the meantime, right now, only the last one is somethign git itself will 
> help you with.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Wrong file diff for merge conflict
  2009-07-05 18:43 ` Linus Torvalds
  2009-07-05 19:22   ` Jakub Narebski
  2009-07-05 19:23   ` Junio C Hamano
@ 2009-07-05 22:23   ` Stefan Bucur
  2009-07-06  0:33     ` Linus Torvalds
  2 siblings, 1 reply; 7+ messages in thread
From: Stefan Bucur @ 2009-07-05 22:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, H. Peter Anvin

On Sun, Jul 5, 2009 at 9:43 PM, Linus
Torvalds<torvalds@linux-foundation.org> wrote:
>
>
> On Sat, 4 Jul 2009, Stefan Bucur wrote:
>>
>> http://pastebin.ca/1483228
>>
>> The problem is with the last diff in the file, where the left portion is empty,
>> and the right portion contains code which already was marked as merged (common),
>> right before the start of the diff. Therefore, the mark at line 127 should
>> really have been before line 114.
>>
>> Is this a bug or I am missing something?
>
> I suspect (but without the origin files/history I can't verify) that what
> happens is that the "successfully merged" code was seen as a fully new
> snippet of code (probably due to getting re-indented?), and the other part
> of that action on that branch was the removal of the old code.
>
> That _removal_ is then shown as a conflict against the other branch, which
> presumably didn't re-indent things (of course, it could be exactly the
> other way around too), and so now you end up having the "conflict" being
> seen as "one branch removes this code (empty conflict part), the other one
> presumably changed it some way".
>
> Is that what you wanted? Obviously not. To you, the conflict makes no
> sense. You're a human, who tries to understand what wen't wrong, and to
> you, the end result of the conflict resolution makes no sense.

Thank you for your suggestions for a better and more efficient merging
experience, as I'm sure they will help me from now on. However, I
think I did not make myself clear: I was not arguing the fact that the
merge result was suboptimal, but the fact that _the generated conflict
file is semantically wrong_. So, to reiterate, we have this git
generated file:

http://pastebin.ca/1483228

Basically, if one would take the common (successfully merged) parts
and keep the "left" parts in the conflict sections, one would obtain
the first branch version of the file (with minor modifications).
Similarly, if one would opt to keep only the "right" parts in the
conflict section, one would obtain the version found in the second
branch before merge.

However, this is _not_ true in my case. If you take the last conflict
section in the file, if you would keep the left part, you would obtain
the correct left branch file, while if you keep the right part, you
would obtain duplicate code (the common code right before + the
disputed part). And that's why I think this is wrong behavior.

Moreover, now I was able to trigger the bug in a way that leads to
_data loss_. You can find here [1] an archive with a simple git
repository with two branches, "a" and "b", right after a merge between
the two, in a conflict state. The conflict file contains code which is
missing data in one of the two diff sides:

http://pastebin.ca/1485051

You can notice, in the first conflict section, that the right brace of
the inner structure is present in the left part, while it is missing
in the right part (feel free to reset the merge and do it again to see
it for yourself). If you would blindly select the right part for
conflict resolution, you would get erroneous code.

Do you still think it is not a bug?

Thank you,
Stefan Bucur

[1] http://terminus.zytor.com/~stefanb/git/testrepo.tgz

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Wrong file diff for merge conflict
  2009-07-05 22:23   ` Stefan Bucur
@ 2009-07-06  0:33     ` Linus Torvalds
  2009-07-06 14:44       ` Stefan Bucur
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2009-07-06  0:33 UTC (permalink / raw)
  To: Stefan Bucur; +Cc: git, H. Peter Anvin

On Mon, 6 Jul 2009, Stefan Bucur wrote:
> 
> Thank you for your suggestions for a better and more efficient merging
> experience, as I'm sure they will help me from now on. However, I
> think I did not make myself clear: I was not arguing the fact that the
> merge result was suboptimal, but the fact that _the generated conflict
> file is semantically wrong_.

Oh. 

No, you're confused about what a conflict file is.

> Basically, if one would take the common (successfully merged) parts
> and keep the "left" parts in the conflict sections, one would obtain
> the first branch version of the file (with minor modifications).

No. This is not how conflicts work.

If you blindly do that, you'll always get the wrong answer. Why? Because 
you're ignoring the parts of the file that didn't conflict. Those will be 
_outside_ the conflict markers in all cases.

> Similarly, if one would opt to keep only the "right" parts in the
> conflict section, one would obtain the version found in the second
> branch before merge.

Same problem. What you expect from a conflicting file is simply not true.

The fact is, a traditional rcs three-way merge (which is pretty much what 
you get with git, ignoring the fact that we have other tools in addition 
to it, and ignoring things like criss-cross merges etc) just doesn't work 
the way you seem to think it should work. You simply don't get the 
original of one side by picking one side of the conflict markers. It will 
have merged the stuff that it thought merged cleanly, and not have any 
conflict markers at all for those parts.

Of course, "what it thought merged cleanly" may not be what you want it to 
be. Sometimes you get a clean merge for things that you'd have wanted to 
conflict. And sometimes you get conflicts for stuff that you'd think is 
just silly and shouldn't have. 

There are no perfect file merge algorithms that I know of. Lots of people 
hate the diff3/merge behavior - it's by no means perfect. But so far, I've 
never seen anybody successfully advocate anything better either.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Wrong file diff for merge conflict
  2009-07-06  0:33     ` Linus Torvalds
@ 2009-07-06 14:44       ` Stefan Bucur
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Bucur @ 2009-07-06 14:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, H. Peter Anvin

On Mon, Jul 6, 2009 at 3:33 AM, Linus
Torvalds<torvalds@linux-foundation.org> wrote:
[...]
>
> The fact is, a traditional rcs three-way merge (which is pretty much what
> you get with git, ignoring the fact that we have other tools in addition
> to it, and ignoring things like criss-cross merges etc) just doesn't work
> the way you seem to think it should work. You simply don't get the
> original of one side by picking one side of the conflict markers. It will
> have merged the stuff that it thought merged cleanly, and not have any
> conflict markers at all for those parts.
>
> Of course, "what it thought merged cleanly" may not be what you want it to
> be. Sometimes you get a clean merge for things that you'd have wanted to
> conflict. And sometimes you get conflicts for stuff that you'd think is
> just silly and shouldn't have.
>
> There are no perfect file merge algorithms that I know of. Lots of people
> hate the diff3/merge behavior - it's by no means perfect. But so far, I've
> never seen anybody successfully advocate anything better either.
>
>                Linus
>

Indeed it seems I had a wrong image of how a conflict file should
look. In the light of what you've told me now, everything makes sense
:) Thank you very much for taking your time to explain these things!

Cheers,
Stefan Bucur

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-07-06 14:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-04  7:53 Wrong file diff for merge conflict Stefan Bucur
2009-07-05 18:43 ` Linus Torvalds
2009-07-05 19:22   ` Jakub Narebski
2009-07-05 19:23   ` Junio C Hamano
2009-07-05 22:23   ` Stefan Bucur
2009-07-06  0:33     ` Linus Torvalds
2009-07-06 14:44       ` Stefan Bucur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).