Multi-ancestor read-tree notes

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Multi-ancestor read-tree notes
@ 2005-09-05  5:41 Daniel Barkalow
  2005-09-06  5:42 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Daniel Barkalow @ 2005-09-05  5:41 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

I've got a version of read-tree which accepts multiple ancestors and does 
a merge using information from all of them.

The basic features are that it looks for an ancestor which would permit a 
trivial merge, and uses that. However, if it finds ancestors which permit 
different trivial merges, it does not merge (which I call case #16).

In case #16, I'm not sure what I should produce. I think the best thing 
might be to not leave anything in stage 1. The desired end effect is that 
the user is given a file with a section like:

  {
    *t = NULL;
    *m = 0;
<<<<<<<<
    return Z_DATA_ERROR;
========
    return Z_OK;
>>>>>>>>
  }

In other news, the merge that was giving Len Brown problems a while ago 
turns out to have the above conflict, and he happened to end up doing the 
right thing and not reverting Linus's revert of an unnecessary (but 
harmless) change. I only noticed this just now, when I was testing that 
merge, and got it to generate only two conflicts regardless of order of 
ancestors (didn't try to resolve the other one, drivers/acpi/osl.c, with 
"merge" either way).

So this test is encouraging: I get fewer non-trivial cases than either of 
the ancestors alone gives, and I catch a case that both single ancestors 
gets wrong.

Note that there are still some memory leaks for me to fix, but that's the 
only flaw I know of with this.

Patches against mainline to follow shortly.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-05  5:41 Multi-ancestor read-tree notes Daniel Barkalow
@ 2005-09-06  5:42 ` Junio C Hamano
  2005-09-06 17:43   ` Daniel Barkalow
  2005-09-08 17:16 ` Darrin Thompson
  2005-09-09 17:29 ` Junio C Hamano
  2 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2005-09-06  5:42 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> I've got a version of read-tree which accepts multiple ancestors and does 
> a merge using information from all of them.

After disabling the debugging printf(), I used this read-tree to
try resolving the parents of four commits Fredrik Kuivinen gave
us in <20050827014009.GB18880@c165.ib.student.liu.se> using
their two merge bases, and compared the resulting tree with the
tree recorded in the commit.  The results are really promising.

For the following two commits, multi-base merge resolved their
parents trivially and produced the same result as the tree in
the commit.  The current "best-base merge" in the master branch
performed far worse and left many conflicts.

 - 467ca22d3371f132ee225a5591a1ed0cd518cb3d 
 - da28c12089dfcfb8695b6b555cdb8e03dda2b690

Another one, 0e396ee43e445cb7c215a98da4e76d0ce354d9d7,
multi-base merge left only one conflicting path to be hand
resolved.  The best-base merge again performed far worse.

The other one, 3190186362466658f01b2e354e639378ce07e1a9, is
resolved trivially with both algorithms.

> In case #16, I'm not sure what I should produce. I think the best thing 
> might be to not leave anything in stage 1.

Because?  I know it would affect the readers of index files if
you did so, but it would seem the most natural in git
architecture to have merge-cache look at the resulting cache
with such multiple stage 1 entries (and other stages) and let
the script make a decision.

> The desired end effect is that the user is given a file with a
> section like:
>
>   {
>     *t = NULL;
>     *m = 0;
> <<<<<<<<
>     return Z_DATA_ERROR;
> ========
>     return Z_OK;
>>>>>>>>>
>   }

Sounds fine.

Anyway, I really am happy to see this multi-base merge perform
well on real-world data, and you are certainly the git hero of
the week ;-).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-06  5:42 ` Junio C Hamano
@ 2005-09-06 17:43   ` Daniel Barkalow
  2005-09-06 20:03     ` Junio C Hamano
  2005-09-10 22:50     ` Junio C Hamano
  0 siblings, 2 replies; 18+ messages in thread
From: Daniel Barkalow @ 2005-09-06 17:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, 5 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > I've got a version of read-tree which accepts multiple ancestors and does 
> > a merge using information from all of them.
> 
> After disabling the debugging printf(), I used this read-tree to
> try resolving the parents of four commits Fredrik Kuivinen gave
> us in <20050827014009.GB18880@c165.ib.student.liu.se> using
> their two merge bases, and compared the resulting tree with the
> tree recorded in the commit.  The results are really promising.
> 
> For the following two commits, multi-base merge resolved their
> parents trivially and produced the same result as the tree in
> the commit.  The current "best-base merge" in the master branch
> performed far worse and left many conflicts.
> 
>  - 467ca22d3371f132ee225a5591a1ed0cd518cb3d 
>  - da28c12089dfcfb8695b6b555cdb8e03dda2b690
> 
> Another one, 0e396ee43e445cb7c215a98da4e76d0ce354d9d7,
> multi-base merge left only one conflicting path to be hand
> resolved.  The best-base merge again performed far worse.
> 
> The other one, 3190186362466658f01b2e354e639378ce07e1a9, is
> resolved trivially with both algorithms.

Do you know if there's anything like case #16 in there? I'd be interested 
to know if there's anything that gets handled automatically in different 
ways depending on which single base is used, and doesn't require manual 
intervention with multiple bases, because that's probably wrong.

> > In case #16, I'm not sure what I should produce. I think the best thing 
> > might be to not leave anything in stage 1.
> 
> Because?  I know it would affect the readers of index files if
> you did so, but it would seem the most natural in git
> architecture to have merge-cache look at the resulting cache
> with such multiple stage 1 entries (and other stages) and let
> the script make a decision.

I didn't want to break the assumption of only one entry per stage in the 
initial version. I'm also not sure that listing the ancestors is 
particularly useful in this case. They have to be exactly the contents of 
stages 2 and 3, plus possibly more stuff that's not been kept by either 
side. What you actually want is a two-way merge (i.e., a diff between the 
two sides, presented in "merge" format), so you don't really need any 
ancestors, unless it would fit some more general case that way.

> > The desired end effect is that the user is given a file with a
> > section like:
> >
> >   {
> >     *t = NULL;
> >     *m = 0;
> > <<<<<<<<
> >     return Z_DATA_ERROR;
> > ========
> >     return Z_OK;
> >>>>>>>>>
> >   }
> 
> Sounds fine.
> 
> Anyway, I really am happy to see this multi-base merge perform
> well on real-world data, and you are certainly the git hero of
> the week ;-).

Great. Want me to send the patches with better organization, or are you 
set with what I've sent?

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-06 17:43   ` Daniel Barkalow
@ 2005-09-06 20:03     ` Junio C Hamano
  2005-09-06 20:25       ` Daniel Barkalow
  2005-09-10 22:50     ` Junio C Hamano
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2005-09-06 20:03 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> Do you know if there's anything like case #16 in there? I'd be interested 
> to know if there's anything that gets handled automatically in different 
> ways depending on which single base is used, and doesn't require manual 
> intervention with multiple bases, because that's probably wrong.

Re-running the tests with the attached patch shows there weren't any.

> I didn't want to break the assumption of only one entry per stage in the 
> initial version. I'm also not sure that listing the ancestors is 
> particularly useful in this case. They have to be exactly the contents of 
> stages 2 and 3, plus possibly more stuff that's not been kept by either 
> side.

Ah, I see, that's true.

> Great. Want me to send the patches with better organization, or are you 
> set with what I've sent?

That's up to you.  If you are content with what I have in the pu
branch, there is no need to bother resending.  OTOH if you have
further clean-ups in mind, i.e. "better organization" above, I
do not mind dropping the current ones from "pu" and replace them
with another set from you.

------------
[PATCH] Add debugging help for case #16 to read-tree.c

This will help us detect if real-world example merges have multiple
merge-base candidates and one of them matches one head while another
matches the other head.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 read-tree.c                 |   36 ++++++++++++++++++++++++++++--------
 t/t1000-read-tree-m-3way.sh |   16 ++++++++++++++++
 2 files changed, 44 insertions(+), 8 deletions(-)

6405d0aa729f4b060123e1235b3ddc074fdd01b7
diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -3,6 +3,8 @@
  *
  * Copyright (C) Linus Torvalds, 2005
  */
+#define DBRT_DEBUG 1
+
 #include "cache.h"
 
 #include "object.h"
@@ -47,8 +49,6 @@ static int entcmp(char *name1, int dir1,
 	return ret;
 }
 
-#define DBRT_DEBUG 0
-
 static int unpack_trees_rec(struct tree_entry_list **posns, int len,
 			    const char *base, merge_fn_t fn, int *indpos)
 {
@@ -101,14 +101,14 @@ static int unpack_trees_rec(struct tree_
 			}
 		}
 
-#if DBRT_DEBUG
+#if DBRT_DEBUG > 1
 		if (first)
 			printf("index %s\n", first);
 #endif
 		for (i = 0; i < len; i++) {
 			if (!posns[i] || posns[i] == &df_conflict_list)
 				continue;
-#if DBRT_DEBUG
+#if DBRT_DEBUG > 1
 			printf("%d %s\n", i + 1, posns[i]->name);
 #endif
 			if (!first || entcmp(first, firstdir,
@@ -188,7 +188,7 @@ static int unpack_trees_rec(struct tree_
 			if (merge) {
 				int ret;
 
-#if DBRT_DEBUG
+#if DBRT_DEBUG > 1
 				printf("%s:\n", first);
 				for (i = 0; i < src_size; i++) {
 					printf(" %d ", i);
@@ -200,7 +200,7 @@ static int unpack_trees_rec(struct tree_
 #endif
 				ret = fn(src);
 				
-#if DBRT_DEBUG
+#if DBRT_DEBUG > 1
 				printf("Added %d entries\n", ret);
 #endif
 				*indpos += ret;
@@ -353,6 +353,19 @@ static int keep_entry(struct cache_entry
 	return 1;
 }
 
+#if DBRT_DEBUG
+static void show_stage_entry(FILE *o,
+			     const char *label, const struct cache_entry *ce)
+{
+	fprintf(stderr, "%s%06o %s %d\t%s\n",
+		label,
+		ntohl(ce->ce_mode),
+		sha1_to_hex(ce->sha1),
+		ce_stage(ce),
+		ce->name);
+}
+#endif
+
 static int threeway_merge(struct cache_entry **stages)
 {
 	struct cache_entry *index;
@@ -392,10 +405,10 @@ static int threeway_merge(struct cache_e
 	if (!same(remote, head)) {
 		for (i = 1; i < head_idx; i++) {
 			if (same(stages[i], head)) {
-				head_match = 1;
+				head_match = i;
 			}
 			if (same(stages[i], remote)) {
-				remote_match = 1;
+				remote_match = i;
 			}
 		}
 	}
@@ -450,6 +463,13 @@ static int threeway_merge(struct cache_e
 			}
 		}
 	}
+#if DBRT_DEBUG
+	else {
+		fprintf(stderr, "read-tree: warning #16 detected\n");
+		show_stage_entry(stderr, "head   ", stages[head_match]);
+		show_stage_entry(stderr, "remote ", stages[remote_match]);
+	}
+#endif
 	if (head) { count += keep_entry(head); }
 	if (remote) { count += keep_entry(remote); }
 	return count;
diff --git a/t/t1000-read-tree-m-3way.sh b/t/t1000-read-tree-m-3way.sh
--- a/t/t1000-read-tree-m-3way.sh
+++ b/t/t1000-read-tree-m-3way.sh
@@ -218,6 +218,9 @@ currently implemented.
                                            or (2) match B.
  ------------------------------------------------------------------
  15  exists  O==A    O==B      take B      must match A if exists.
+ ------------------------------------------------------------------
+ 16  exists  O==A    O==B      barf        must match A if exists.
+     *multi* in one  in another
 -------------------------------------------------------------------
 
 Note: if we want to implement 2ALT and 3ALT we need to be careful.
@@ -514,4 +517,17 @@ test_expect_failure \
      git-update-cache --add NN &&
      git-read-tree -m $tree_O $tree_A $tree_B"
 
+# #16
+test_expect_success \
+    '16 - A matches in one and B matches in another.' \
+    'rm -f .git/index F16 &&
+    echo F16 >F16 &&
+    git-update-cache --add F16 &&
+    tree0=`git-write-tree` &&
+    echo E16 >F16 &&
+    git-update-cache F16 &&
+    tree1=`git-write-tree` &&
+    git-read-tree -m $tree0 $tree1 $tree1 $tree0 &&
+    git-ls-files --stage'
+
 test_done

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-06 20:03     ` Junio C Hamano
@ 2005-09-06 20:25       ` Daniel Barkalow
  2005-09-06 21:53         ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Barkalow @ 2005-09-06 20:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, 6 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > Do you know if there's anything like case #16 in there? I'd be interested 
> > to know if there's anything that gets handled automatically in different 
> > ways depending on which single base is used, and doesn't require manual 
> > intervention with multiple bases, because that's probably wrong.
> 
> Re-running the tests with the attached patch shows there weren't any.

Good. (Although that patch doesn't seem to be directly on top of my 
version; I can tell what it's doing anyway)

> > Great. Want me to send the patches with better organization, or are you 
> > set with what I've sent?
> 
> That's up to you.  If you are content with what I have in the pu
> branch, there is no need to bother resending.  OTOH if you have
> further clean-ups in mind, i.e. "better organization" above, I
> do not mind dropping the current ones from "pu" and replace them
> with another set from you.

I'm happy with the content in "pu"; the issue is just whether you want the 
history cleaned up more. In the series I sent, I kept forgetting parts 
that belonged in earlier patches.

Could you look over the documentation in 
Documentation/technical/trivial-merge.txt, and see if it's a suitable 
replacement for the table in t1000-read-tree-m-3way.sh? It should be the 
same, except for ALT or non-ALT versions that we're not using, combining a 
few matching cases, describing the rules behind index requirements rather 
than listing outcomes, and the addition of info on how multiple ancestors 
are handled.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-06 20:25       ` Daniel Barkalow
@ 2005-09-06 21:53         ` Junio C Hamano
  2005-09-06 22:59           ` Daniel Barkalow
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2005-09-06 21:53 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> Good. (Although that patch doesn't seem to be directly on top of my 
> version; I can tell what it's doing anyway)

That one was against the proposed updates head.  I've updated it
again to include the patch.

> I'm happy with the content in "pu"; the issue is just whether you want the 
> history cleaned up more. In the series I sent, I kept forgetting parts 
> that belonged in earlier patches.

Again, that is up to you.  I am not _that_ perfectionist but I
do not mind reapplying updated ones if you are ;-).

> Could you look over the documentation in
> Documentation/technical/trivial-merge.txt, and see if it's a
> suitable replacement for the table in
> t1000-read-tree-m-3way.sh?

I do not understand what you meant by '*' and 'index+' in
one-way merge table.  I take the first row ('*') to mean "If the
tree is missing a path, that path is removed from the index."

I like the second sentence in three-way merge description.  That
is a very easy-to-understand description of what the index
requirements are.

You have 2 2ALTs.  Also 14 and 14ALT look like they are the same
rule now.

What's "(empty)^" in "ancest"?  All of them must be empty for
this rule to apply?

I am not quite sure it is 'a suitable replacement' yet; the
existing table you can see it covers all the cases, but with
things like "'ancestor+' means one of them matches", I cannot
really tell the table covers all the cases or some cases fall of
the end of the chain.

Also when we have more than one ancestors or one remotes and we
say "no merge", it is still unspecified (and I have to admit I
cannot readily say what the result should be for all of them,
except that I agree #16 will be fine with an empty stage1) what
are left in which stages.

I personally think the exotic cases (i.e. no rule applies, or
"no merge" result with more than one ancestors/remotes) needs to
be handled outside read-tree anyway, by the script that drives
read-tree to attempt trivial merges.  That is the core of the
git-resolve-script would become:

    bases=`git-merge-base -a $ours $theirs` || exit
    git-read-tree -m -u $bases $ours $theirs
    case "$?" in
    0)
        # trivially resolved; nothing to do.
        ;;
    1)
        # no exotic cases -- just do it the old way.
    	git-merge-cache -o git-merge-one-file-script -a ;;
    2)
        # exotic cases like #16 and any other.
        # maybe try different heuristics, like best-base, from
        # scratch.
        git-read-tree --reset $ours
        ... attempt different merge policies ...
        ;;
    esac

So in that sense it probably does not matter what we leave in
stage1/2/3 in such cases as long as the command fails in such a
way that allows us to tell what happened.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-06 21:53         ` Junio C Hamano
@ 2005-09-06 22:59           ` Daniel Barkalow
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Barkalow @ 2005-09-06 22:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, 6 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > Good. (Although that patch doesn't seem to be directly on top of my 
> > version; I can tell what it's doing anyway)
> 
> That one was against the proposed updates head.  I've updated it
> again to include the patch.
> 
> > I'm happy with the content in "pu"; the issue is just whether you want the 
> > history cleaned up more. In the series I sent, I kept forgetting parts 
> > that belonged in earlier patches.
> 
> Again, that is up to you.  I am not _that_ perfectionist but I
> do not mind reapplying updated ones if you are ;-).

What's there is fine with me.

(I'll work on improving the documentation as a further patch)

> > Could you look over the documentation in
> > Documentation/technical/trivial-merge.txt, and see if it's a
> > suitable replacement for the table in
> > t1000-read-tree-m-3way.sh?
> 
> I do not understand what you meant by '*' and 'index+' in
> one-way merge table.  I take the first row ('*') to mean "If the
> tree is missing a path, that path is removed from the index."

'*' means that that case applies regardless of what's there. 'index+' 
means that it's the index, with the stat information. I forgot to actually 
explain the table before going on to the interesting section.

> I like the second sentence in three-way merge description.  That
> is a very easy-to-understand description of what the index
> requirements are.
> 
> You have 2 2ALTs.  Also 14 and 14ALT look like they are the same
> rule now.

Ah, right. I had originally listed "index" in the table, with separate 
cases for having it match the head and having it match the result, but 
then ditched that when I figured out how that actually works.

> What's "(empty)^" in "ancest"?  All of them must be empty for
> this rule to apply?

The '^' means that all must be like that. 

I have to check, but I think that 8ALT and 10ALT should be '+'.

> I am not quite sure it is 'a suitable replacement' yet; the
> existing table you can see it covers all the cases, but with
> things like "'ancestor+' means one of them matches", I cannot
> really tell the table covers all the cases or some cases fall of
> the end of the chain.

All of the "any ancestor" spots are good for covering things. Case #11 
(which actually needs to be at the bottom) is basically "everything else".

> Also when we have more than one ancestors or one remotes and we
> say "no merge", it is still unspecified (and I have to admit I
> cannot readily say what the result should be for all of them,
> except that I agree #16 will be fine with an empty stage1) what
> are left in which stages.

Presently, except for case #16, only the first ancestor is used in "no 
merge" output. The right thing should be worked out and documented, of 
course.

I'm not at all convinced at this point that we can do much with multiple 
remotes in a single application of the rules; you won't necessarily have 
the same merge base for all pairs, and all sorts of things go wrong if you 
start including ancestors that aren't related to something, or not 
including common ancestors of some pair.

What might work is to have the error for an unmerged index only happen 
when you get to a "no merge" result, so that you can get as many conflicts 
as possible (in different files) resolved by the user at the same time.

> I personally think the exotic cases (i.e. no rule applies, or
> "no merge" result with more than one ancestors/remotes) needs to
> be handled outside read-tree anyway, by the script that drives
> read-tree to attempt trivial merges.

I think case #16 would benefit from doing more stuff, but there aren't any 
holes in the rules, and I think that, for the multiple ancestors in "no 
merge", we just want to use the one with the least conflict. (Or, if we 
write our own merge, do a #16/#13,#14/#11 decision per-hunk in our merge, 
which is the really right thing). I think the common case for multiple 
ancestors will really be that you've got a side branch that split before 
the split you're resolving, and was merged into both sides before now; in 
this case, there's no big problem, and it's not the exotic cross-merge 
case. Of course, we won't see this in projects like the kernel and git, 
which aren't that amorphous.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-05  5:41 Multi-ancestor read-tree notes Daniel Barkalow
  2005-09-06  5:42 ` Junio C Hamano
@ 2005-09-08 17:16 ` Darrin Thompson
  2005-09-08 20:37   ` Fredrik Kuivinen
  2005-09-08 21:39   ` Daniel Barkalow
  2005-09-09 17:29 ` Junio C Hamano
  2 siblings, 2 replies; 18+ messages in thread
From: Darrin Thompson @ 2005-09-08 17:16 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

On Mon, 2005-09-05 at 01:41 -0400, Daniel Barkalow wrote:
> I've got a version of read-tree which accepts multiple ancestors and does 
> a merge using information from all of them.

Do the multiple ancestors have to share a common parent? More to the
point, is this read-tree any more friendly to baseless merges?

--
Darrin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-08 17:16 ` Darrin Thompson
@ 2005-09-08 20:37   ` Fredrik Kuivinen
  2005-09-08 21:39   ` Daniel Barkalow
  1 sibling, 0 replies; 18+ messages in thread
From: Fredrik Kuivinen @ 2005-09-08 20:37 UTC (permalink / raw)
  To: Darrin Thompson; +Cc: Daniel Barkalow, git

On Thu, Sep 08, 2005 at 12:16:05PM -0500, Darrin Thompson wrote:
> On Mon, 2005-09-05 at 01:41 -0400, Daniel Barkalow wrote:
> > I've got a version of read-tree which accepts multiple ancestors and does 
> > a merge using information from all of them.
> 
> Do the multiple ancestors have to share a common parent? More to the
> point, is this read-tree any more friendly to baseless merges?
> 

Is a baseless merge a merge with two branches that do not have a
common ancestor? That is, if we want to merge the branches A and B and
git-merge-base --all A B do not return any commits, is that a baseless
merge?

If that is the case then my merge code handles baseless merges
fine. Or, it is _supposed_ to handle them, I have done some basic
tests but it hasn't been tested thoroughly yet.

I don't know how the new read-tree code handles those cases.

- Fredrik

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-08 17:16 ` Darrin Thompson
  2005-09-08 20:37   ` Fredrik Kuivinen
@ 2005-09-08 21:39   ` Daniel Barkalow
  2005-09-08 21:54     ` Darrin Thompson
  2005-09-08 22:00     ` Junio C Hamano
  1 sibling, 2 replies; 18+ messages in thread
From: Daniel Barkalow @ 2005-09-08 21:39 UTC (permalink / raw)
  To: Darrin Thompson; +Cc: git

On Thu, 8 Sep 2005, Darrin Thompson wrote:

> On Mon, 2005-09-05 at 01:41 -0400, Daniel Barkalow wrote:
> > I've got a version of read-tree which accepts multiple ancestors and does 
> > a merge using information from all of them.
> 
> Do the multiple ancestors have to share a common parent? More to the
> point, is this read-tree any more friendly to baseless merges?

read-tree doesn't care about the relationships between its inputs; it's 
only interested in the trees. But using ancestors which aren't common is 
unlikely to give you desired results. I think, if you do read-tree a^ b^ a 
b, you will get everything into the index, but it'll all going to be 
conflicts.

I assume that what you want is something to include everything from two 
commits, which would give conflicts if a name is reused?

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-08 21:39   ` Daniel Barkalow
@ 2005-09-08 21:54     ` Darrin Thompson
  2005-09-08 22:00     ` Junio C Hamano
  1 sibling, 0 replies; 18+ messages in thread
From: Darrin Thompson @ 2005-09-08 21:54 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

On Thu, 2005-09-08 at 17:39 -0400, Daniel Barkalow wrote:
> I assume that what you want is something to include everything from two 
> commits, which would give conflicts if a name is reused?
> 

Yes.

--
Darrin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-08 21:39   ` Daniel Barkalow
  2005-09-08 21:54     ` Darrin Thompson
@ 2005-09-08 22:00     ` Junio C Hamano
  2005-09-08 22:10       ` Daniel Barkalow
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2005-09-08 22:00 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> I assume that what you want is something to include everything from two 
> commits, which would give conflicts if a name is reused?

My understanding is that Darrin wants to do what Linus did when
he merged gitk into git.git.

Personally I think that is a specialized application and
something like the git-merge-projects script I posted as a
follow-up would be more appropriate than adding it to the
current merge discussion.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-08 22:00     ` Junio C Hamano
@ 2005-09-08 22:10       ` Daniel Barkalow
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Barkalow @ 2005-09-08 22:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, 8 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > I assume that what you want is something to include everything from two 
> > commits, which would give conflicts if a name is reused?
> 
> My understanding is that Darrin wants to do what Linus did when
> he merged gitk into git.git.
> 
> Personally I think that is a specialized application and
> something like the git-merge-projects script I posted as a
> follow-up would be more appropriate than adding it to the
> current merge discussion.

Well, it's an easy addition to read-tree; just need a merge function which 
takes two entries and adds the non-NULL one in stage 0, or adds both if 
they both exist. git-merge-script probably shouldn't be the entry point to 
it, of course, but that part isn't my area anyway.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-05  5:41 Multi-ancestor read-tree notes Daniel Barkalow
  2005-09-06  5:42 ` Junio C Hamano
  2005-09-08 17:16 ` Darrin Thompson
@ 2005-09-09 17:29 ` Junio C Hamano
  2005-09-09 20:44   ` Daniel Barkalow
  2 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2005-09-09 17:29 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> In case #16, I'm not sure what I should produce. I think the best thing 
> might be to not leave anything in stage 1. The desired end effect is that 
> the user is given a file with a section like:
>
>   {
>     *t = NULL;
>     *m = 0;
> <<<<<<<<
>     return Z_DATA_ERROR;
> ========
>     return Z_OK;
>>>>>>>>>
>   }

I was thinking a bit more about this.  Let's rephrase case #16.
I'll call merge bases O1, O2,... and merge heads A and B, and we
are interested in one path.

If O1 and O2, the path has quite different contents.  A has the
same contents as O1 and B has the same contents as O2.  We
should not just pick one or the other and do two-file merge
between the version in A and B (we could prototype by massaging
'diff A B' output to produce what is common between A and B and
run (RCS) merge of A and B pretending that the common contents
is the original to produce something like the above).

If A has slight changes since O1 but B did not change since O2,
ideally I think we would want the same thing to happen.  Let's
call it case #16+.

What does the current implementation do?  It is not case #16
because A and O1 does not exactly match.  I suspect the result
will be skewed because B has an exact match with O2.  The
situation becomes more interesting if both A and B has slight
changes since O1 and O2 respectively.  They do not exactly match
with their bases, but I think ideally we would like something
very similar to case #16 resolution to happen.

One way to solve this would be to try doing things entirely in
read-tree by doing not just exact matches but also checking the
amount of changes -- if each heads has similar but different
base call it case #16 and try two-file merge between the heads
disregarding the bases.

But I am a bit reluctant to suggest this.  My gut feeling tells
me that these 'interesting' cases are easier if scripted outside
read-tree machinery to later enhance and improve the heuristics.

Of course, the current case #16 detected by the exact match rule
should be something we can automatically handle, but to make
things safer to use I think we should have a way to detect case
#16+ situlation and avoid mistakenly favoring A over B (or vice
versa) only because one has slight modification while the other
does not.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-09 17:29 ` Junio C Hamano
@ 2005-09-09 20:44   ` Daniel Barkalow
  2005-09-11 16:45     ` Matthias Urlichs
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Barkalow @ 2005-09-09 20:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, 9 Sep 2005, Junio C Hamano wrote:

> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > In case #16, I'm not sure what I should produce. I think the best thing 
> > might be to not leave anything in stage 1. The desired end effect is that 
> > the user is given a file with a section like:
> >
> >   {
> >     *t = NULL;
> >     *m = 0;
> > <<<<<<<<
> >     return Z_DATA_ERROR;
> > ========
> >     return Z_OK;
> >>>>>>>>>
> >   }
> 
> I was thinking a bit more about this.  Let's rephrase case #16.
> I'll call merge bases O1, O2,... and merge heads A and B, and we
> are interested in one path.
> 
> If O1 and O2, the path has quite different contents.  A has the
> same contents as O1 and B has the same contents as O2. 

There's a bit more subtlety here: since these are common ancestors, A must 
have somehow changed O2's version to O1's version, and B must have changed 
O1's version to O2's version. It's isn't just that each side left the file 
the same, but from different ancestral versions; both of the other 
versions must have gotten rejected somehow. I think the real key is to 
identify what was going on in between.

> We should not just pick one or the other and do two-file merge
> between the version in A and B (we could prototype by massaging
> 'diff A B' output to produce what is common between A and B and
> run (RCS) merge of A and B pretending that the common contents
> is the original to produce something like the above).
> 
> If A has slight changes since O1 but B did not change since O2,
> ideally I think we would want the same thing to happen.  Let's
> call it case #16+.
> 
> What does the current implementation do?  It is not case #16
> because A and O1 does not exactly match.  I suspect the result
> will be skewed because B has an exact match with O2. 

Yes, in this case we miss whatever caused A to reject O2, and we use the 
modified O2, because we don't realize that A's rejection of O2 should also 
apply to the version in B. Unfortunately, this looks just like the 
situation where both sides took O1, and B did a further modification to 
that.

> The situation becomes more interesting if both A and B has slight
> changes since O1 and O2 respectively.  They do not exactly match
> with their bases, but I think ideally we would like something
> very similar to case #16 resolution to happen.

I think the right thing, ideally, is to have the content merge also take 
multiple ancestors and have a #16 case itself when it's deciding which 
version of a block to use. The #16+ case is actually trickier, because we 
have fewer cues.

> One way to solve this would be to try doing things entirely in
> read-tree by doing not just exact matches but also checking the
> amount of changes -- if each heads has similar but different
> base call it case #16 and try two-file merge between the heads
> disregarding the bases.
> 
> But I am a bit reluctant to suggest this.  My gut feeling tells
> me that these 'interesting' cases are easier if scripted outside
> read-tree machinery to later enhance and improve the heuristics.
> 
> Of course, the current case #16 detected by the exact match rule
> should be something we can automatically handle, but to make
> things safer to use I think we should have a way to detect case
> #16+ situlation and avoid mistakenly favoring A over B (or vice
> versa) only because one has slight modification while the other
> does not.

I think #16+ is extra uncommon, because it involves someone making an 
irrelevant modification to a patched version of a file while someone else 
reverts the patch. I'm actually interested in doing a big spiffy program 
to do merges with information drawn as needed from the history, stuff 
happening on a per-hunk level, and support for block moves. It'll take a 
while before it gets anywhere, but I still think it's likely that people 
won't hit #16+ and get unexpected behavior before it's ready.

The main thing I'm unsure of is whether Fredrick's algorithm is actually 
not a better solution: it is possible to understand what happened leading 
up to a merge either by looking at the time after the common ancestors or 
by looking at the time before them. I think that the more recent history 
is a better guide, but the older history is easier to use; the case his 
version isn't good for, I think, is when the common ancestors of the sides 
are even more complicated to merge.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-06 17:43   ` Daniel Barkalow
  2005-09-06 20:03     ` Junio C Hamano
@ 2005-09-10 22:50     ` Junio C Hamano
  2005-09-10 22:56       ` Junio C Hamano
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2005-09-10 22:50 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> Do you know if there's anything like case #16 in there? I'd be interested 
> to know if there's anything that gets handled automatically in different 
> ways depending on which single base is used, and doesn't require manual 
> intervention with multiple bases, because that's probably wrong.

I was playing with a handful of non-Linus 2.6 kernel
repositories today, and found two case #16 merges.

0c3e091838f02c537ccab3b6e8180091080f7df2

	case #16; stupid resolves the same way as the actual
	commit, but that might mean the actual commit is bad.

84ffa747520edd4556b136bdfc9df9eb1673ce12

	case #16; stupid also fails.

0c168775709faa74c1b87f1e61046e0c51ade7f3
0e396ee43e445cb7c215a98da4e76d0ce354d9d7

	These two fails both in stupid and resolve.

a855f9a4d29f0ec338c337358844389171b0bae0

	This one stupid fails but resolve succeeds.

19925d7e0af7de645d808fd973ef99049fce52f0
3190186362466658f01b2e354e639378ce07e1a9
467ca22d3371f132ee225a5591a1ed0cd518cb3d
539aeb871b52233b189ae976dfded20e14db645a
6358924b06cd1aaa031b8ba0c33e5a15e795bef0
cf70be167c085af820c0c2a606cab8c3ee819dc6
da28c12089dfcfb8695b6b555cdb8e03dda2b690

	These give good merges in both stupid and resolve.

Here are the logs for the commits involved in the above
experiments; some of them might be contained in only one
maintainer tree but I do not keep track of whose tree they came
from.  The repository I am testing in is `fetch but not merge
many random trees derived from Linus linux-2.6' repository.

----------------------------------------------------------------
commit 0c168775709faa74c1b87f1e61046e0c51ade7f3
tree c40fd8818c64c5d7d1d90afab0bd6ffd94505526
parent 9bd481f85940726bf66aae5cd03c5b912ad0ae4c
parent 9b4311eedb17fa88f02e4876cd6aa9a08e383cd6
author Jeff Garzik <jgarzik@pobox.com> 1120106958 -0400
committer Jeff Garzik <jgarzik@pobox.com> 1120106958 -0400

    Merge upstream 2.6.13-rc1-git1 into 'ieee80211' branch of netdev-2.6.

commit 0c3e091838f02c537ccab3b6e8180091080f7df2
tree 61a407356d1e897e0badea552ce69e657cab6108
parent 7ffacc1a2527c219b834fe226a7a55dc67ca3637
parent a4cce10492358b33d33bb43f98284c80482037e8
author Tony Luck <tony.luck@intel.com> 1124808655 -0700
committer Tony Luck <tony.luck@intel.com> 1124808655 -0700

    Pull release into test branch

commit 0e396ee43e445cb7c215a98da4e76d0ce354d9d7
tree a6fde6a33965abb6077420cda31e3f1fbe8d3891
parent b8112df71cae7d6a86158caeb19d215f56c4f9ab
parent 2089a0d38bc9c2cdd084207ebf7082b18cf4bf58
author Linus Torvalds <torvalds@ppc970.osdl.org> 1119120155 -0700
committer Linus Torvalds <torvalds@ppc970.osdl.org> 1119120155 -0700

    Manual merge of rsync://rsync.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git
    
    This is a fixed-up version of the broken "upstream-2.6.13" branch, where
    I re-did the manual merge of drivers/net/r8169.c by hand, and made sure
    the history is all good.

commit 19925d7e0af7de645d808fd973ef99049fce52f0
tree 01e7bf7717582bd70fbf1ba86132c33e61d044d5
parent cce3217e147b46ec4b7d20d922dadd3016b5fd49
parent 85f265d887d2389376f1caa191e9682085feb76e
author Tony Luck <tony.luck@intel.com> 1124143503 -0700
committer Tony Luck <tony.luck@intel.com> 1124143503 -0700

    Pull CONFIG_PCI description fix

commit 3190186362466658f01b2e354e639378ce07e1a9
tree 4ef50e96c385ed076465aac23f52902467e7d825
parent 08848e446bcd2130c26945be966446389d25bcc2
parent f60f700876cd51de9de69f3a3c865d95e287a24d
author Tony Luck <tony.luck@intel.com> 1121628606 -0700
committer Tony Luck <tony.luck@intel.com> 1121628606 -0700

    Auto merge with rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

commit 467ca22d3371f132ee225a5591a1ed0cd518cb3d
tree 0e3d6de84b4ffa379c2c7ddcebd3f55de52b9844
parent 12725675e26d52c39e856d341035b94bf7802458
parent 1e86d1c648508fd50e6c9960576b87906a7906ad
author Steve French <sfrench@hera.kernel.org> 1117748543 -0700
committer Steve French <sfrench@hera.kernel.org> 1117748543 -0700

    Merge with rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    

commit 539aeb871b52233b189ae976dfded20e14db645a
tree ad028270f2b79d74886418014f971d52cb729e11
parent 04141b215a2a0ba7c2052b53a912c9412f2ed8ea
parent 30d5b64b63fa69af31b2cba32e6d71d68526eec9
author Tony Luck <tony.luck@intel.com> 1124407778 -0700
committer Tony Luck <tony.luck@intel.com> 1124407778 -0700

    Auto-update from upstream

commit 6358924b06cd1aaa031b8ba0c33e5a15e795bef0
tree 9cc45ad48ba1c1171ceb949294223839c89d1f8c
parent 1da21e73bdc458f8131e6071072632b4c3b0430f
parent 344a076110f4ecb16ea6d286b63be696604982ed
author Tony Luck <tony.luck@intel.com> 1126214972 -0700
committer Tony Luck <tony.luck@intel.com> 1126214972 -0700

    Pull release into test branch

commit 84ffa747520edd4556b136bdfc9df9eb1673ce12
tree 1cfe20bd31fce1b5b3024384fcb324c3338d1d32
parent 702c7e7626deeabb057b6f529167b65ec2eefbdb
parent 81065e2f415af6c028eac13f481fb9e60a0b487b
author Len Brown <len.brown@intel.com> 1124849543 -0400
committer Len Brown <len.brown@intel.com> 1124849543 -0400

    Merge from-linus to-akpm

commit a855f9a4d29f0ec338c337358844389171b0bae0
tree 211824376a8b170a8087956c8d5ec25269f2bc49
parent 04c573e1d1625b48b3c90f988579d7835f4c55f3
parent f505380ba7b98ec97bf25300c2a58aeae903530b
author Tony Luck <tony.luck@intel.com> 1125729479 -0700
committer Tony Luck <tony.luck@intel.com> 1125729479 -0700

    Update from linus with manual merge for include/asm-ia64/sn/sn_sal.h

commit cf70be167c085af820c0c2a606cab8c3ee819dc6
tree 0a587cec3a6bd4fdd53fcfb75f87bc45da5d1a7f
parent 3844dcf8faae3a163ca2d263ba6468085ffaceb8
parent ff67b59726a8cd3549b069dfa78de2f538d3b8e3
author Tony Luck <tony.luck@intel.com> 1125439229 -0700
committer Tony Luck <tony.luck@intel.com> 1125439229 -0700

    Pull release into test branch

commit da28c12089dfcfb8695b6b555cdb8e03dda2b690
tree b3ff509f21352ef053cb3d490cb13528090d32ac
parent 6de7dc2c4c713d037c19aa1e310d240f16973414
parent 577a4f8102d54b504cb22eb021b89e957e8df18f
author Dave Kleikamp <shaggy@austin.ibm.com> 1122559416 -0500
committer Dave Kleikamp <shaggy@austin.ibm.com> 1122559416 -0500

    Merge with /home/shaggy/git/linus-clean/
    /home/shaggy/git/linus-clean/
    /home/shaggy/git/linus-clean/
    
    Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-10 22:50     ` Junio C Hamano
@ 2005-09-10 22:56       ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2005-09-10 22:56 UTC (permalink / raw)
  To: Russell King; +Cc: git

While I was doing merge experiments, I found two commits whose
parents are actually fast forwards.  This does not mean you made
mistakes when you made these commits, but it suggests that the
tool that made these commits may have screwed up.  Do you use
'git commit' to make your commits, or do you use something else?


commit b129a8ccd53f74c43e4c83c8e0031a4990040830
Merge: 6b39374a27eb4be7e9d82145ae270ba02ea90dc8 194d0710e1a7fe92dcf860ddd31fded8c3103b7a
Author: Russell King <rmk@dyn-67.arm.linux.org.uk>
Date:   Wed Aug 31 10:12:14 2005 +0100

    [SERIAL] Clean up and fix tty transmission start/stoping
    
    The start_tx and stop_tx methods were passed a flag to indicate
    whether the start/stop was from the tty start/stop callbacks, and
    some drivers used this flag to decide whether to ask the UART to
    immediately stop transmission (where the UART supports such a
    feature.)
    
    There are other cases when we wish this to occur - when CTS is
    lowered, or if we change from soft to hard flow control and CTS
    is inactive.  In these cases, this flag was false, and we would
    allow the transmitter to drain before stopping.
    
    There is really only one case where we want to let the transmitter
    drain before disabling, and that's when we run out of characters
    to send.
    
    Hence, re-jig the start_tx and stop_tx methods to eliminate this
    flag, and introduce new functions for the special "disable and
    allow transmitter to drain" case.
    
    Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

commit 603fff54420a0ccc4c3b48bfef43896fb4e33161
Merge: 99f95e5286df2f69edab8a04c7080d986ee4233b 8b22c249e7de453961e4d253b19fc2a0bdd65d53
Author: Russell King <rmk@dyn-67.arm.linux.org.uk>
Date:   Tue Jun 28 13:40:39 2005 +0100

    [PATCH] ARM SMP: TLB implementations only affect local CPU
    
    The existing TLB flush implementations only have an effect on
    the local CPU.  Prefix them with local_.
    
    Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Multi-ancestor read-tree notes
  2005-09-09 20:44   ` Daniel Barkalow
@ 2005-09-11 16:45     ` Matthias Urlichs
  0 siblings, 0 replies; 18+ messages in thread
From: Matthias Urlichs @ 2005-09-11 16:45 UTC (permalink / raw)
  To: git

Hi, Daniel Barkalow wrote:

>  I'm actually interested in doing a big spiffy program 
> to do merges with information drawn as needed from the history, stuff 
> happening on a per-hunk level, and support for block moves.

You should look at "meld", which is a nice Python program to do
graphical merges.

In addition to that, I've written a Python interface for direct
(read-only) access to git objects, so you can do more interesting
things without forking off a lot of programs. Expect that work to appear
here shortly.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
"The majority of the stupid is invincible and guaranteed for all time. The
terror of their tyranny, however, is alleviated by their lack of consistency."
-- Albert Einstein

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2005-09-11 16:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-05  5:41 Multi-ancestor read-tree notes Daniel Barkalow
2005-09-06  5:42 ` Junio C Hamano
2005-09-06 17:43   ` Daniel Barkalow
2005-09-06 20:03     ` Junio C Hamano
2005-09-06 20:25       ` Daniel Barkalow
2005-09-06 21:53         ` Junio C Hamano
2005-09-06 22:59           ` Daniel Barkalow
2005-09-10 22:50     ` Junio C Hamano
2005-09-10 22:56       ` Junio C Hamano
2005-09-08 17:16 ` Darrin Thompson
2005-09-08 20:37   ` Fredrik Kuivinen
2005-09-08 21:39   ` Daniel Barkalow
2005-09-08 21:54     ` Darrin Thompson
2005-09-08 22:00     ` Junio C Hamano
2005-09-08 22:10       ` Daniel Barkalow
2005-09-09 17:29 ` Junio C Hamano
2005-09-09 20:44   ` Daniel Barkalow
2005-09-11 16:45     ` Matthias Urlichs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).