* Wrong file diff for merge conflict @ 2009-07-04 7:53 Stefan Bucur 2009-07-05 18:43 ` Linus Torvalds 0 siblings, 1 reply; 7+ messages in thread From: Stefan Bucur @ 2009-07-04 7:53 UTC (permalink / raw) To: git Hi! I would like to point out a very strange behavior of git merge, which led to data loss or data duplication in the conflict file generated during a diff merge. I had to merge two branches (A and B) which contained more files (e.g. http://pastebin.ca/1483691 - before splitting branches) which were affected in the following way: * The files were formatted (indented) in branch A: http://pastebin.ca/1483684 * In branch B, their contents were altered in various points, but not significantly (refactored some statements into macros): http://pastebin.ca/1483683 I checked out branch A, and I ran "git merge B", and (obviously) there was a conflict with this file. The big surprise was to see that the generated diff file looks like this: http://pastebin.ca/1483228 The problem is with the last diff in the file, where the left portion is empty, and the right portion contains code which already was marked as merged (common), right before the start of the diff. Therefore, the mark at line 127 should really have been before line 114. Is this a bug or I am missing something? Thanks, Stefan Bucur ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Wrong file diff for merge conflict 2009-07-04 7:53 Wrong file diff for merge conflict Stefan Bucur @ 2009-07-05 18:43 ` Linus Torvalds 2009-07-05 19:22 ` Jakub Narebski ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Linus Torvalds @ 2009-07-05 18:43 UTC (permalink / raw) To: Stefan Bucur; +Cc: git On Sat, 4 Jul 2009, Stefan Bucur wrote: > > http://pastebin.ca/1483228 > > The problem is with the last diff in the file, where the left portion is empty, > and the right portion contains code which already was marked as merged (common), > right before the start of the diff. Therefore, the mark at line 127 should > really have been before line 114. > > Is this a bug or I am missing something? I suspect (but without the origin files/history I can't verify) that what happens is that the "successfully merged" code was seen as a fully new snippet of code (probably due to getting re-indented?), and the other part of that action on that branch was the removal of the old code. That _removal_ is then shown as a conflict against the other branch, which presumably didn't re-indent things (of course, it could be exactly the other way around too), and so now you end up having the "conflict" being seen as "one branch removes this code (empty conflict part), the other one presumably changed it some way". Is that what you wanted? Obviously not. To you, the conflict makes no sense. You're a human, who tries to understand what wen't wrong, and to you, the end result of the conflict resolution makes no sense. On the other hand, it probably _does_ make sense from the standpoint of a fairly stupid simple three-way merge that is also (on purpose) tuned to try to give minimal merge conflicts (the "XDL_MERGE_ZEALOUS" flag) that are often easier for humans to then correct because they have limited context. And the thing is, that zealous merge conflict minimization generally does result in conflicts that are easier to work with, because you end up with a much smaller conflict - rather than having one huge conflict that in your case would probably have covered pretty much the whole 'realloc()' function, the zealous merge has tried to really zoom in on just the critical parts, and has turned it into three conflicts that are fairly small (one is just two trivial lines). In this case, I can also guess at why you then end up with a very odd conflict: when one side has re-indented the code, the only lines that tend to match after the re-indentation are the trivial lines that have just a single closing brace, or perhaps a "} else {" that actually ends up matching the _wrong_ indentation level, but because it existed in both indentation levels, the stupid line-based merge logic picked it as a common feature - even though it very much is exactly the _wrong_ kind of common feature to pick. So is it a bug? Nope. I can pretty much guarantee that it's not a bug. The whole zealous merge thing is very much a feature. The "take whitespace into account when comparing lines" is also a feature, even though it obviously matches up the wrong lines in re-indentation cases like this. I bet you've seen diffs (not just merges) that you think look "stupid", and they look stupid exactly for the same reason - diff matches the wrong closing braces across re-indentation. Now, that said - it may not be a bug, but it's clearly not what you want, so how do you fix it? There's a couple of things to do: - Try to make the merging smarter, and not do this mistake. I'd love to have that, but quite frankly, I don't think it's reasonable. As mentioned, aggressively minimizing merge conflicts really does do the right thing most of the time, and non-aggressive merges - while obviously "safer" - are really irritating. So the merge is not hugely smart, and _despite_ it being pretty stupid it's also pretty aggressive. Not generally a good combination, but so far, the alternatives have always been way worse. - Tune the zealous merge a bit, and in particular try to avoid the cases where this happens (as mentioned, in C code, it's often re-indentation together with trivial lines that match after it). This is kind of what the "patience" diff algorithm tries to do: use primarily non-trivial sequences as the anchors for similarity, and ignore the trivial ones. We have a "--patience" switch to "git diff", maybe we could tune the three-way merge somehow similarly. (We did already soem time ago add _some_ tuning, in the form of ZEALOUS_ALNUM that only combines non-conflicting lines if they are non-trivial in the sense that they have some alphanumeric characters. But all that happens after the diff algorithm itself has generated the chunks, so the empty '}' lines still do matter for merging. Also, it still considers things like '} else {' to be "meaningful" by virtue of having alphanumerics on it.) - Don't rely so heavily on just the traditional three-way merge result. This is what I personally do. The trivial 3-way merge result is wondeful for the truly trivial merges, when it gives trivial results that are easy to fix up. But let's face it, the traditional 3-way merge result just sucks for anything more complicated. When you have an empty side on one of the conflicts, is that because the other side added everything, or is it because oen side removed it? Or is it, like in this case, simply because trivially similar lines got the whole diff confused about which parts didn't change at all? The good news is that git does have a few nice merge tools. One is "git diff", which actually shows way more than the trivial three-way end result, in that you can diff against either side, and by default it does that fairly complex "diff against both sides" thing that is also quite useful once you get used to it. Another is "gitk --merge [filename]" which is wonderful as a tool to see what both sides actually did, to figure out what the intent of both branches were when the three-way merge result is just noise. The right answer is probably some combination of "all of the above". In the meantime, right now, only the last one is somethign git itself will help you with. Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Wrong file diff for merge conflict 2009-07-05 18:43 ` Linus Torvalds @ 2009-07-05 19:22 ` Jakub Narebski 2009-07-05 19:23 ` Junio C Hamano 2009-07-05 22:23 ` Stefan Bucur 2 siblings, 0 replies; 7+ messages in thread From: Jakub Narebski @ 2009-07-05 19:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: Stefan Bucur, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sat, 4 Jul 2009, Stefan Bucur wrote: > > > > http://pastebin.ca/1483228 > > > > The problem is with the last diff in the file, where the left portion is empty, > > and the right portion contains code which already was marked as merged (common), > > right before the start of the diff. Therefore, the mark at line 127 should > > really have been before line 114. > > > > Is this a bug or I am missing something? > > I suspect (but without the origin files/history I can't verify) that what > happens is that the "successfully merged" code was seen as a fully new > snippet of code (probably due to getting re-indented?), and the other part > of that action on that branch was the removal of the old code. > > That _removal_ is then shown as a conflict against the other branch, which > presumably didn't re-indent things (of course, it could be exactly the > other way around too), and so now you end up having the "conflict" being > seen as "one branch removes this code (empty conflict part), the other one > presumably changed it some way". > > Is that what you wanted? Obviously not. To you, the conflict makes no > sense. You're a human, who tries to understand what wen't wrong, and to > you, the end result of the conflict resolution makes no sense. [...] > - Don't rely so heavily on just the traditional three-way merge result. > > This is what I personally do. The trivial 3-way merge result is > wondeful for the truly trivial merges, when it gives trivial results > that are easy to fix up. But let's face it, the traditional 3-way merge > result just sucks for anything more complicated. When you have an empty > side on one of the conflicts, is that because the other side added > everything, or is it because oen side removed it? Or is it, like in > this case, simply because trivially similar lines got the whole diff > confused about which parts didn't change at all? > > The good news is that git does have a few nice merge tools. One is > "git diff", which actually shows way more than the trivial three-way > end result, in that you can diff against either side, and by default it > does that fairly complex "diff against both sides" thing that is also > quite useful once you get used to it. > > Another is "gitk --merge [filename]" which is wonderful as a tool to > see what both sides actually did, to figure out what the intent of both > branches were when the three-way merge result is just noise. There is also "git mergetool" which runs graphical merge tool of your choice. It can be easier to work with GUI here. And git also supports diff3 conflict merge markers, which shows our and their side, and also the ancestor side; in your case it would make easy to distinguish between 'one side added' and 'other side removed'. You can get it using $ git checkout --conflict=diff3 <file> (where <file> can be '.'). You would need modern git (post 1.6.2 I think) for that. > The right answer is probably some combination of "all of the above". In > the meantime, right now, only the last one is something git itself will > help you with. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Wrong file diff for merge conflict 2009-07-05 18:43 ` Linus Torvalds 2009-07-05 19:22 ` Jakub Narebski @ 2009-07-05 19:23 ` Junio C Hamano 2009-07-05 22:23 ` Stefan Bucur 2 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2009-07-05 19:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Stefan Bucur, git Linus Torvalds <torvalds@linux-foundation.org> writes: > So is it a bug? Nope. I can pretty much guarantee that it's not a bug.... Without the precise common ancestor version it is hard to tell, but using the "one old version http://pastebin.ca/1483691" as a base and running diff3 (or merge from RCS suite) yields identical results as ours, so in that sense it is not a bug. > - Tune the zealous merge a bit, and in particular try to avoid the cases > where this happens (as mentioned, in C code, it's often re-indentation > together with trivial lines that match after it). > > This is kind of what the "patience" diff algorithm tries to do: use > primarily non-trivial sequences as the anchors for similarity, and > ignore the trivial ones. We have a "--patience" switch to "git diff", > maybe we could tune the three-way merge somehow similarly. Hmm. I have to think about this a bit. > - Don't rely so heavily on just the traditional three-way merge result. > > This is what I personally do. The trivial 3-way merge result is > wondeful for the truly trivial merges, when it gives trivial results > that are easy to fix up. But let's face it, the traditional 3-way merge > result just sucks for anything more complicated. When you have an empty > side on one of the conflicts, is that because the other side added > everything, or is it because oen side removed it? Or is it, like in > this case, simply because trivially similar lines got the whole diff > confused about which parts didn't change at all? > > The good news is that git does have a few nice merge tools. One is > "git diff", which actually shows way more than the trivial three-way > end result, in that you can diff against either side, and by default it > does that fairly complex "diff against both sides" thing that is also > quite useful once you get used to it. > > Another is "gitk --merge [filename]" which is wonderful as a tool to > see what both sides actually did, to figure out what the intent of both > branches were when the three-way merge result is just noise. Two other are "git log --merge -p" and "git checkout --conflicts=diff3". > The right answer is probably some combination of "all of the above". In > the meantime, right now, only the last one is somethign git itself will > help you with. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Wrong file diff for merge conflict 2009-07-05 18:43 ` Linus Torvalds 2009-07-05 19:22 ` Jakub Narebski 2009-07-05 19:23 ` Junio C Hamano @ 2009-07-05 22:23 ` Stefan Bucur 2009-07-06 0:33 ` Linus Torvalds 2 siblings, 1 reply; 7+ messages in thread From: Stefan Bucur @ 2009-07-05 22:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: git, H. Peter Anvin On Sun, Jul 5, 2009 at 9:43 PM, Linus Torvalds<torvalds@linux-foundation.org> wrote: > > > On Sat, 4 Jul 2009, Stefan Bucur wrote: >> >> http://pastebin.ca/1483228 >> >> The problem is with the last diff in the file, where the left portion is empty, >> and the right portion contains code which already was marked as merged (common), >> right before the start of the diff. Therefore, the mark at line 127 should >> really have been before line 114. >> >> Is this a bug or I am missing something? > > I suspect (but without the origin files/history I can't verify) that what > happens is that the "successfully merged" code was seen as a fully new > snippet of code (probably due to getting re-indented?), and the other part > of that action on that branch was the removal of the old code. > > That _removal_ is then shown as a conflict against the other branch, which > presumably didn't re-indent things (of course, it could be exactly the > other way around too), and so now you end up having the "conflict" being > seen as "one branch removes this code (empty conflict part), the other one > presumably changed it some way". > > Is that what you wanted? Obviously not. To you, the conflict makes no > sense. You're a human, who tries to understand what wen't wrong, and to > you, the end result of the conflict resolution makes no sense. Thank you for your suggestions for a better and more efficient merging experience, as I'm sure they will help me from now on. However, I think I did not make myself clear: I was not arguing the fact that the merge result was suboptimal, but the fact that _the generated conflict file is semantically wrong_. So, to reiterate, we have this git generated file: http://pastebin.ca/1483228 Basically, if one would take the common (successfully merged) parts and keep the "left" parts in the conflict sections, one would obtain the first branch version of the file (with minor modifications). Similarly, if one would opt to keep only the "right" parts in the conflict section, one would obtain the version found in the second branch before merge. However, this is _not_ true in my case. If you take the last conflict section in the file, if you would keep the left part, you would obtain the correct left branch file, while if you keep the right part, you would obtain duplicate code (the common code right before + the disputed part). And that's why I think this is wrong behavior. Moreover, now I was able to trigger the bug in a way that leads to _data loss_. You can find here [1] an archive with a simple git repository with two branches, "a" and "b", right after a merge between the two, in a conflict state. The conflict file contains code which is missing data in one of the two diff sides: http://pastebin.ca/1485051 You can notice, in the first conflict section, that the right brace of the inner structure is present in the left part, while it is missing in the right part (feel free to reset the merge and do it again to see it for yourself). If you would blindly select the right part for conflict resolution, you would get erroneous code. Do you still think it is not a bug? Thank you, Stefan Bucur [1] http://terminus.zytor.com/~stefanb/git/testrepo.tgz ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Wrong file diff for merge conflict 2009-07-05 22:23 ` Stefan Bucur @ 2009-07-06 0:33 ` Linus Torvalds 2009-07-06 14:44 ` Stefan Bucur 0 siblings, 1 reply; 7+ messages in thread From: Linus Torvalds @ 2009-07-06 0:33 UTC (permalink / raw) To: Stefan Bucur; +Cc: git, H. Peter Anvin On Mon, 6 Jul 2009, Stefan Bucur wrote: > > Thank you for your suggestions for a better and more efficient merging > experience, as I'm sure they will help me from now on. However, I > think I did not make myself clear: I was not arguing the fact that the > merge result was suboptimal, but the fact that _the generated conflict > file is semantically wrong_. Oh. No, you're confused about what a conflict file is. > Basically, if one would take the common (successfully merged) parts > and keep the "left" parts in the conflict sections, one would obtain > the first branch version of the file (with minor modifications). No. This is not how conflicts work. If you blindly do that, you'll always get the wrong answer. Why? Because you're ignoring the parts of the file that didn't conflict. Those will be _outside_ the conflict markers in all cases. > Similarly, if one would opt to keep only the "right" parts in the > conflict section, one would obtain the version found in the second > branch before merge. Same problem. What you expect from a conflicting file is simply not true. The fact is, a traditional rcs three-way merge (which is pretty much what you get with git, ignoring the fact that we have other tools in addition to it, and ignoring things like criss-cross merges etc) just doesn't work the way you seem to think it should work. You simply don't get the original of one side by picking one side of the conflict markers. It will have merged the stuff that it thought merged cleanly, and not have any conflict markers at all for those parts. Of course, "what it thought merged cleanly" may not be what you want it to be. Sometimes you get a clean merge for things that you'd have wanted to conflict. And sometimes you get conflicts for stuff that you'd think is just silly and shouldn't have. There are no perfect file merge algorithms that I know of. Lots of people hate the diff3/merge behavior - it's by no means perfect. But so far, I've never seen anybody successfully advocate anything better either. Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Wrong file diff for merge conflict 2009-07-06 0:33 ` Linus Torvalds @ 2009-07-06 14:44 ` Stefan Bucur 0 siblings, 0 replies; 7+ messages in thread From: Stefan Bucur @ 2009-07-06 14:44 UTC (permalink / raw) To: Linus Torvalds; +Cc: git, H. Peter Anvin On Mon, Jul 6, 2009 at 3:33 AM, Linus Torvalds<torvalds@linux-foundation.org> wrote: [...] > > The fact is, a traditional rcs three-way merge (which is pretty much what > you get with git, ignoring the fact that we have other tools in addition > to it, and ignoring things like criss-cross merges etc) just doesn't work > the way you seem to think it should work. You simply don't get the > original of one side by picking one side of the conflict markers. It will > have merged the stuff that it thought merged cleanly, and not have any > conflict markers at all for those parts. > > Of course, "what it thought merged cleanly" may not be what you want it to > be. Sometimes you get a clean merge for things that you'd have wanted to > conflict. And sometimes you get conflicts for stuff that you'd think is > just silly and shouldn't have. > > There are no perfect file merge algorithms that I know of. Lots of people > hate the diff3/merge behavior - it's by no means perfect. But so far, I've > never seen anybody successfully advocate anything better either. > > Linus > Indeed it seems I had a wrong image of how a conflict file should look. In the light of what you've told me now, everything makes sense :) Thank you very much for taking your time to explain these things! Cheers, Stefan Bucur ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-07-06 14:45 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-07-04 7:53 Wrong file diff for merge conflict Stefan Bucur 2009-07-05 18:43 ` Linus Torvalds 2009-07-05 19:22 ` Jakub Narebski 2009-07-05 19:23 ` Junio C Hamano 2009-07-05 22:23 ` Stefan Bucur 2009-07-06 0:33 ` Linus Torvalds 2009-07-06 14:44 ` Stefan Bucur
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).