git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* The criss-cross merge case
@ 2005-04-27 20:25 Bram Cohen
  2005-04-27 23:32 ` Daniel Barkalow
  0 siblings, 1 reply; 12+ messages in thread
From: Bram Cohen @ 2005-04-27 20:25 UTC (permalink / raw)
  To: git

Here's an example of where simple three-way merge can't do the right
thing. Each letter represents a snapshot of the history, and time goes
downwards. The numbers after some letters refer to which line number was
modified at that time.


A
|\
| \
|  \
|   \
|    \
|     \
|      \
B8      C3
|\     /|
| \   / |
|  \ /  |
|   X   |
|  / \  |
| /   \ |
|/     \|
D8      E3
 \      |
  \     |
   \    |
    \   |
     \  |
      \ |
       \|
        ?

In this case the ? should have a clean merge with the D vesion of line 8
(because it was made with the B version already in the history) and the E
version of line 3 (because it was made with the C version already in the
history).

The problem is that there's no single ancestor for the three-way merge
which does the right thing. If one picks B, then there will be an
unnecessary merge conflict at line 3, because D will have the C version
and E will have the E version but B will have neither. Likewise if one
picks C, there will be an unnecessary conflict at line 8 because D will
have the D version and E will have the B version but C will have neither.
Picking A will cause unnecessary conflicts on *both* lines.

The problem can actually be much worse than a simple unnecesary conflict,
because if the later updates were strict undos of the earlier updates,
then picking either B or C will merge something *wrong*. Using A as the
ancestor will keep that from happening, but it also maximizes unnecessary
conflicts.

Note that the above criss-cross case only involves two branches, using the
methodology of each one modifying their own section and pulling in old
versions of the other one from time to time. Cogito's interface encourages
exactly this work flow, which is not a bad thing from a work flow
perspective, but does make it hit this case regularly.

The way Git handles this currently is very bad, because it forces the
common ancestor to be from the same snapshot across all files, so this
problem will happen if the modifications are made even in different files,
not just different lines within the same file. That could be improved
greatly by finding an LCA for each file individually, which is what
Monotone does. Darcs, Codeville, and all the Arch descendants have better
merge algorithms which don't have to pick a single common ancestor.

-Bram


^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: The criss-cross merge case
@ 2005-04-28 14:25 Adam J. Richter
  2005-04-29 12:19 ` Wayne Scott
  0 siblings, 1 reply; 12+ messages in thread
From: Adam J. Richter @ 2005-04-28 14:25 UTC (permalink / raw)
  To: ry102; +Cc: barkalow, bram, droundry, git, tupshin

On 2005-04-28, Benedikt Schmidt wrote:
>AFAIK the paper mentioned in the GNU diff sources [1] is an improvement
>to an earlier paper by the same author titled
>"A File Comparison Program" - Miller, Myers - 1985.
[...]
>[1] http://citeseer.ist.psu.edu/myers86ond.html

	Monotone apparently uses a futher acceleration of that algorithm
from the 1989 paper, also co-authored by the Myers, "An O(NP) Sequence
Comparison Algorithm" by Sun Wu, Udi Manber, and Gene Myers.
http://www.eecs.berkeley.edu/~gene/Papers/np_diff.pdf .  The Monotone
implementation was apparently a port of an implementation originally
written in Scheme by Aubrey Jaffer.

	I don't fully understand the 1989 paper, but I get the
general impression that is a small change to the previous algorithm
(the one in GNU diff) that might be a 30 line patch if someone
got around to submitting it, and seems to make the code run more
than twice as fast in practice.  One of these days, I will probably get
around to coding up a patch to GNU diff if nobody beats me to it.

	Making diff run faster may have at least one potentially useful
benefit for merging.  A faster diff makes it more practical run diff
on smaller units of comparison.  I posted a note here before about
converting the input files to diff3 to have just one character per
line, and then undoing that transformation of the result to produce
a character based merge that seemed to work pretty well in the
couple of tests that I tried.

                    __     ______________ 
Adam J. Richter        \ /
adam@yggdrasil.com      | g g d r a s i l

^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: The criss-cross merge case
@ 2005-04-30 12:32 Adam J. Richter
  0 siblings, 0 replies; 12+ messages in thread
From: Adam J. Richter @ 2005-04-30 12:32 UTC (permalink / raw)
  To: wsc9tt; +Cc: barkalow, bram, droundry, git, ry102, tupshin

On Fri, 29 Apr 2005 07:19:18 -0500, Wayne Scott wrote:
>On 4/28/05, Adam J. Richter <adam@yggdrasil.com> wrote:
>> On 2005-04-28, Benedikt Schmidt wrote:
>> >AFAIK the paper mentioned in the GNU diff sources [1] is an improvement
>> >to an earlier paper by the same author titled
>> >"A File Comparison Program" - Miller, Myers - 1985.
>> [...]
>> >[1] http://citeseer.ist.psu.edu/myers86ond.html
>> 
>>         Monotone apparently uses a futher acceleration of that algorithm
>> from the 1989 paper, also co-authored by the Myers, "An O(NP) Sequence
>> Comparison Algorithm" by Sun Wu, Udi Manber, and Gene Myers.
>> http://www.eecs.berkeley.edu/~gene/Papers/np_diff.pdf .  The Monotone
>> implementation was apparently a port of an implementation originally
>> written in Scheme by Aubrey Jaffer.
>> 
>>         I don't fully understand the 1989 paper, but I get the
>> general impression that is a small change to the previous algorithm
>> (the one in GNU diff) that might be a 30 line patch if someone
>> got around to submitting it, and seems to make the code run more
>> than twice as fast in practice.  One of these days, I will probably get
>> around to coding up a patch to GNU diff if nobody beats me to it.
>> 
>>         Making diff run faster may have at least one potentially useful
>> benefit for merging.  A faster diff makes it more practical run diff
>> on smaller units of comparison.  I posted a note here before about
>> converting the input files to diff3 to have just one character per
>> line, and then undoing that transformation of the result to produce
>> a character based merge that seemed to work pretty well in the
>> couple of tests that I tried.

>I just read that paper and unless I am mistaken, it already describes
>the basis for how GNU diff works.  I don't think anything in that
>paper would make it faster.
>
>I also don't find anything to suggest the Monotone guys have rewritten
>diff.  Just some notes from graydon that notes python's difflib uses a
>non-optimal diff that is faster in some cases.

	In terminology that can only be understood by reading
the 1985 paper, the 1989 paper describes a possible reduction
in the number of diagonals in the edit graph that iterations of the
1989 algorithm have to consider.  I say "possible reduction" because
the reduction can be zero in the worse case, although I get the
impression that it should be a reduction of 50% or better
typically, and it makes the case where the changes is just
a bunch of inserts run in linear time.

	I believe that the longest common subsequence finder
at the core of GNU diff does not currently perform this optimization,
but the one in monotone-0.18/lcs.{cc,hh} does.

                    __     ______________
Adam J. Richter        \ /
adam@yggdrasil.com      | g g d r a s i l

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-04-30 13:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-27 20:25 The criss-cross merge case Bram Cohen
2005-04-27 23:32 ` Daniel Barkalow
2005-04-28  0:43   ` Tupshin Harper
2005-04-28  1:16     ` Daniel Barkalow
2005-04-28  2:15       ` Benedikt Schmidt
2005-04-28  2:19         ` Daniel Barkalow
2005-04-28 11:16           ` David Roundy
2005-04-28  3:41   ` suffix array/tree deltas (Was: The criss-cross merge case) Zed A. Shaw
2005-04-28  4:30     ` Daniel Barkalow
  -- strict thread matches above, loose matches on Subject: below --
2005-04-28 14:25 The criss-cross merge case Adam J. Richter
2005-04-29 12:19 ` Wayne Scott
2005-04-30 12:32 Adam J. Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).