RFC: How to handle merge-recursive corner cases?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RFC: How to handle merge-recursive corner cases?
@ 2010-09-29 19:16 Elijah Newren
  2010-09-29 19:51 ` Jonathan Nieder
  0 siblings, 1 reply; 2+ messages in thread
From: Elijah Newren @ 2010-09-29 19:16 UTC (permalink / raw)
  To: Git Mailing List

Hi,

Sorry for the lengthy email, but I'm still working on my patch series
auditing merge-recursive corner cases, and I've run into criss-cross
merges that I don't know how to solve.  In some cases, even the
'right' answer (in terms of what unmerged entries the index should
contain when there are conflicts) isn't entirely clear to me, much
less how to implement it.  I'm hoping someone else has some bright
ideas.

The basic setup is the following:
#      B   D
#      o---o
#     / \ / \
#  A o   X   ? F
#     \ / \ /
#      o---o
#      C   E
In other words, there

Let's start with a simple though very contrived case that will
illustrate issues later, namely a repository with exactly one file
named 'file', with the following contents at different points in
history:

Commit A: file has contents 'A\n'
Commit B: file has contents 'B\n'
Commit C: file has contents 'C\n'
Commit D: file has contents 'D\n'
Commit E: file has the following 5 lines in it:
<<<<<<< Temporary merge branch 1
C
=======
B
>>>>>>> Temporary merge branch 2

Now, if we try to merge D & E, clearly there should be a conflict.
But git merges cleanly, giving file the contents 'D\n'.  The reason
for this is that git notices there are two merge bases, B & C, tries
to merge them in-core, notices the conflict, stores the 3-way merged
contents as the version of 'file' in the new virtual ancestor, and
then does a three-way merge between that virtual ancestor and D & E.
Since the contents of 'file' match between the virtual ancestor and E,
the contents of file from D are selected as the merge resolution.
It's wrong, but the only way to fix it is to somehow have an entry in
the virtual ancestor tree where the conflict region(s) of file are
guaranteed to not match the corresponding content locations of file
for either D or E.  (Perhaps by adding random content on the line
after the ='s?)

Given the extremely contrived nature of this example, perhaps we don't
care.  But the basic problem that we need some kind of entry in the
virtual ancestor that makes sense for a conflict case can come up in
other less-contrived scenarios.  For example, with directory/file
conflicts...

Let's look at a slightly more complicated example, a repository with
up to three files: 'junk', 'a' and 'a/file':

Commit A: Only 'junk' exists
Commit B: Adds file 'a'
Commit C: Adds file 'a/file'
Commit D: Merges B & C, keeping 'a' and deleting 'a/file'
Commit E: Merges B & C, deleting 'a' but keeping 'a/file'

Now, merging D & E should result in a (directory/file) conflict, with
both 'a' and 'a/file' left in the index (at stages 2 and 3 -- what, if
anything, exists at stage 1 is unclear to me) and both 'a/file' and
'a~<branch>' existing in the working copy.  But how/if that happens,
depends on the virtual ancestor created when merging B & C in-core.
Possibilities:

Possibility 1: Virtual ancestor somehow contains both 'a' and 'a/file'
(currently impossible in git, since this virtual ancestor is written
as a tree and there's code to prevent writing trees with D/F
conflicts).  This would mean that 'a' is unchanged between the virtual
ancestor and D and thus should be deleted as in E.  Also, 'a/file' is
unchanged between the virtual ancestor and E and thus should be
deleted as in D.  So we get a clean merge without either 'a' or
'a/file' present.  Bad.

Possibility 2: Virtual ancestor contains 'a' but not 'a/file'.  This
means the virtual ancestor matches commit D exactly, so git can
cleanly merge by just selecting the contents of E.  Bad.

Possibility 3: Virtual ancestor contains 'a/file' but not 'a'.  Like
possibility 2, in reverse: this time the merge is clean having the
contents of D.  Bad.

Possibility 4: Virtual ancestor contains neither 'a' nor 'a/file'.
This case will correctly detect an (add/add) conflict of the
directory/file variety.  Stage 1 will be empty, but stages 2 and 3
will have the appropriate contents.

So possibility 4 sounds great, right?  Let's say we use that
algorithm, but now we have a different situation:
Commits A, B, C, & D: all as above
Commit E2: Merges B & C, keeping a slightly modified version of 'a'
and deleting 'a/file'.

Merging D & E2 should be clean, with 'a/file' being removed (since
both D and E2 removed it) and the slightly modified version of 'a'
from E2 appearing in the result (since 'a' is unmodified in D).
However, having the virtual ancestor contain neither 'a' nor 'a/file'
will cause an add/add conflict when merging D & E2.  Bad.

The only way out of this that I see (and I'm hoping I'm just missing
something), is a modification of possibility 1: make the virtual
ancestor contain both 'a' and 'a/file', but with both files modified
somehow to ensure they don't match the contents of D, E, or E2 (maybe
by appending some random data? something extremely unlikely to match
the contents of the file in any of D, E, or E2 such as a timestamp?).
That would result in merging D & E giving the appropriate conflict
(detected as a modify/delete), and merging D & E2 to go cleanly so
long as the modification of 'a' in E2 didn't appear near the random
modification of 'a' in the virtual ancestor.

Ugh.  Seems like a mess.  Anyone have any bright ideas?

Elijah

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: RFC: How to handle merge-recursive corner cases?
  2010-09-29 19:16 RFC: How to handle merge-recursive corner cases? Elijah Newren
@ 2010-09-29 19:51 ` Jonathan Nieder
  0 siblings, 0 replies; 2+ messages in thread
From: Jonathan Nieder @ 2010-09-29 19:51 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

Hi Elijah,

Elijah Newren wrote:

> #      B   D
> #      o---o
> #     / \ / \
> #  A o   X   ? F
> #     \ / \ /
> #      o---o
> #      C   E
> In other words, there
> 
> Let's start with a simple though very contrived case that will
> illustrate issues later, namely a repository with exactly one file
> named 'file', with the following contents at different points in
> history:
> 
> Commit A: file has contents 'A\n'
> Commit B: file has contents 'B\n'
> Commit C: file has contents 'C\n'
> Commit D: file has contents 'D\n'
> Commit E: file has the following 5 lines in it:
> <<<<<<< Temporary merge branch 1
> C
> =======
> B
> >>>>>>> Temporary merge branch 2
> 
> Now, if we try to merge D & E, clearly there should be a conflict.
> But git merges cleanly, giving file the contents 'D\n'.

Funny.  A different problem is that with a less unusual history (i.e.
no conflict hunks committed) with "[merge] conflictstyle=diff3", if
there are conflicts in an early stage of recursive merge, we get
nested conflict hunks, leaving rerere and humans confused.

I have wondered: why doesn't merge-recursive implicitly use the union
merge driver for its in-core merge of ancestors?  The resulting
diff3-style conflict hunks from the final 3-way merge would be more
readable, but now you've provided an answer: coincidences of matching
content like you describe would become more likely.

[...]
> It's wrong, but the only way to fix it is to somehow have an entry in
> the virtual ancestor tree where the conflict region(s) of file are
> guaranteed to not match the corresponding content locations of file
> for either D or E.  (Perhaps by adding random content on the line
> after the ='s?)

Random content does not provides a guarantee.

Maybe merge_recursive()/merge_trees() could steal a flag bit from
struct object for "conflicted"?  No, that would give only one bit
for the entire tree, and we need one bit per path.

How about using index entries for this?

>From the signature of git_merge_trees(1, ...) it is not obvious
to me where it reports conflicts.  From the start to
write_tree_from_memory() I would guess it does not at all.

[...]
> Possibility 1: Virtual ancestor somehow contains both 'a' and 'a/file'
> (currently impossible in git, since this virtual ancestor is written
> as a tree
[...]
> The only way out of this that I see (and I'm hoping I'm just missing
> something), is a modification of possibility 1: make the virtual
> ancestor contain both 'a' and 'a/file'

Hmm, D/F is a little harder than the content conflict case, I guess.
;-)  But I think in general, the thing to do is to keep something
other than a tree.

Thanks for the food for thought.
Jonathan

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-09-29 19:51 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-29 19:16 RFC: How to handle merge-recursive corner cases? Elijah Newren
2010-09-29 19:51 ` Jonathan Nieder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).