git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Update the documentation for git-merge-base
@ 2006-05-16  5:58 Fredrik Kuivinen
  2006-05-16  6:13 ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Fredrik Kuivinen @ 2006-05-16  5:58 UTC (permalink / raw)
  To: junkio; +Cc: git


Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se>

---

Is the code guaranteed to return a least common ancestor? If that is
the case we should probably mention it in the documentation.


 Documentation/git-merge-base.txt |   18 ++++++++++++++----
 1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-merge-base.txt b/Documentation/git-merge-base.txt
index d1d56f1..6099be2 100644
--- a/Documentation/git-merge-base.txt
+++ b/Documentation/git-merge-base.txt
@@ -8,16 +8,26 @@ git-merge-base - Finds as good a common 
 
 SYNOPSIS
 --------
-'git-merge-base' <commit> <commit>
+'git-merge-base' [--all] <commit> <commit>
 
 DESCRIPTION
 -----------
-"git-merge-base" finds as good a common ancestor as possible. Given a
-selection of equally good common ancestors it should not be relied on
-to decide in any particular way.
+
+"git-merge-base" finds as good a common ancestor as possible between
+the two commits. That is, given two commits A and B 'git-merge-base A
+B' will output a commit which is reachable from both A and B through
+the parent relationship.
+
+Given a selection of equally good common ancestors it should not be
+relied on to decide in any particular way.
 
 The "git-merge-base" algorithm is still in flux - use the source...
 
+OPTIONS
+-------
+--all::
+	Output all common ancestors for the two commits instead of
+	just one.
 
 Author
 ------

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Update the documentation for git-merge-base
  2006-05-16  5:58 [PATCH] Update the documentation for git-merge-base Fredrik Kuivinen
@ 2006-05-16  6:13 ` Junio C Hamano
  2006-05-16  6:54   ` Fredrik Kuivinen
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2006-05-16  6:13 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git

Fredrik Kuivinen <freku045@student.liu.se> writes:

> Is the code guaranteed to return a least common ancestor? If that is
> the case we should probably mention it in the documentation.

Unfortunately, no, if you mean by "least common" closest to the
tips.

See the big illustration at the top of the source for how you
can construct pathological case to defeat an attempt to
guarantee such.  --all guarantees that the output contains all
interesting ones, but does not guarantee the output has no
suboptimal merge bases.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Update the documentation for git-merge-base
  2006-05-16  6:13 ` Junio C Hamano
@ 2006-05-16  6:54   ` Fredrik Kuivinen
  2006-05-16  7:51     ` Junio C Hamano
  2006-05-16 15:32     ` Linus Torvalds
  0 siblings, 2 replies; 7+ messages in thread
From: Fredrik Kuivinen @ 2006-05-16  6:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Fredrik Kuivinen, git

On Mon, May 15, 2006 at 11:13:12PM -0700, Junio C Hamano wrote:
> Fredrik Kuivinen <freku045@student.liu.se> writes:
> 
> > Is the code guaranteed to return a least common ancestor? If that is
> > the case we should probably mention it in the documentation.
> 
> Unfortunately, no, if you mean by "least common" closest to the
> tips.
>

By "least" I mean the following:

C is a least common ancestor of A and B if:

* C is a common ancestor of A and B, and
* for every other common ancestor D (different from C) of A and B, C
  is not reacheable from D.

> See the big illustration at the top of the source for how you
> can construct pathological case to defeat an attempt to
> guarantee such.  --all guarantees that the output contains all
> interesting ones, but does not guarantee the output has no
> suboptimal merge bases.

There are two examples at the top of the source. In the first one a
least common ancestor is returned. As I interpret the second one, it
is an example of how the old algorithm without the postprocessing step
produced a common ancestor which is not least.

Am I wrong? Do we have any cases where the current merge-base
algorithm gives us common ancestors which are not least?

- Fredrik

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Update the documentation for git-merge-base
  2006-05-16  6:54   ` Fredrik Kuivinen
@ 2006-05-16  7:51     ` Junio C Hamano
  2006-05-16 15:32     ` Linus Torvalds
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2006-05-16  7:51 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git

Fredrik Kuivinen <freku045@student.liu.se> writes:

>> See the big illustration at the top of the source for how you
>> can construct pathological case to defeat an attempt to
>> guarantee such.  --all guarantees that the output contains all
>> interesting ones, but does not guarantee the output has no
>> suboptimal merge bases.
>
> There are two examples at the top of the source. In the first one a
> least common ancestor is returned. As I interpret the second one, it
> is an example of how the old algorithm without the postprocessing step
> produced a common ancestor which is not least.

Ah, yes, I remember now.

The drawing was done while we were working on the solution to
that pathological case; mark_reachable_commits() solves that
horizon effect.

    http://article.gmane.org/gmane.comp.version-control.git/11410
    http://article.gmane.org/gmane.comp.version-control.git/11429
    http://article.gmane.org/gmane.comp.version-control.git/11552
    http://article.gmane.org/gmane.comp.version-control.git/11613

However, our inability to come up with one is not a nonexistence
proof of cases the current algorithm can fail, so math minded
people _might_ want to prove the algorithm is optimal.

Not me, though.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Update the documentation for git-merge-base
  2006-05-16  6:54   ` Fredrik Kuivinen
  2006-05-16  7:51     ` Junio C Hamano
@ 2006-05-16 15:32     ` Linus Torvalds
  2006-05-16 16:32       ` Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2006-05-16 15:32 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: Junio C Hamano, git



On Tue, 16 May 2006, Fredrik Kuivinen wrote:
> 
> By "least" I mean the following:
> 
> C is a least common ancestor of A and B if:
> 
> * C is a common ancestor of A and B, and
> * for every other common ancestor D (different from C) of A and B, C
>   is not reacheable from D.

Yes, git-merge-base should always return a least common ancestor, never 
anything less. The only question is what happens when there are multiple 
LCA commits.

In fact, even in that case git-merge-base will have a pretty strong 
specification:

  "git-merge-base with the '--all' flag will return the complete set of
   least common ancestors, sorted by most recent (as defined purely by
   the commit date order, not any graph ordering) first.

   Without the '--all' flag, it will return just one LCA commit (the most 
   recent one, by the same date-based definition).

   In the case two or more LCA commits have exactly the same committer 
   date, the ordering between them is arbitrary"

I don't think you can be more specific, or do a better job. The definition 
of "most recent" is arbitrary, of course - and going by commit date is 
just _one_ way to order them, but it happens to be an easy one, and one 
that is as good as any other choice.

Of course, the defined ordering probably really matters only for the case 
where we return just one LCA out of many, but it's nice to be able to tell 
what the order will be even for the multi-commit case.

> There are two examples at the top of the source. In the first one a
> least common ancestor is returned. As I interpret the second one, it
> is an example of how the old algorithm without the postprocessing step
> produced a common ancestor which is not least.

Correct. We used to occasionally get it wrong, and return a common 
ancestor that wasn't least.

> Am I wrong? Do we have any cases where the current merge-base
> algorithm gives us common ancestors which are not least?

Modulo bugs, no. And I don't think there are any bugs in that respect.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Update the documentation for git-merge-base
  2006-05-16 15:32     ` Linus Torvalds
@ 2006-05-16 16:32       ` Linus Torvalds
  2006-05-16 23:20         ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2006-05-16 16:32 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: Junio C Hamano, git



On Tue, 16 May 2006, Linus Torvalds wrote:
> 
> I don't think you can be more specific, or do a better job. The definition 
> of "most recent" is arbitrary, of course - and going by commit date is 
> just _one_ way to order them, but it happens to be an easy one, and one 
> that is as good as any other choice.

Side note: since LCA's are (by definition) never going to have a direct 
graph relationship, there is obviously no ordering enforced by the graph 
itself. So they're all unordered in the graph sense, but you could use 
other measures of "distance" in the graph.

Example other orderings that we _could_ do, and I considered (some purely 
graph based, others based on content):

 - order by number of commits between the LCA and the two commits that we 
   are trying to find the LCA for (the "tips").
 - order by diff size between the LCA and the tips
 - order by lack of conflicts between the LCA and the tips.

however, none of the alternate orderings really seem to make a lot of 
sense.

The date-based one trumps them all by being straightforward and simple to 
both implement and explain. And iirc, I actually verified that it happened 
to pick the "better" one for at least one of my tests when using the old 
stupid straigth three-way merge, so I think it was actually objectively a 
good measure at least once.

Anybody who cares can obviously always just do "git-merge-base --all" and 
do their own sorting for the (relatively unlikely) case that you get more 
than one parent.

Anyway, just out of interest I just did some statistics using some shell 
scripts:

 - For the current kernel tree, of 1857 merges, only 17 had more than one 
   merge base (and none had more than two):

   1840 o
     17 oo

 - In contrast, for git (current master branch), the numbers are 35 out of 
   540, and there are lots of merges with many LCA's:

    505 o
     15 oo
     13 ooo
      2 oooo
      3 ooooo
      2 ooooooo

I think the difference is that Junio does a lot of these branches where he 
keeps on pulling from them, and never syncs back (which is a great 
workflow). In contrast, the kernel tends to try to avoid that because the 
history gets messy enough as it is ;)

Anyway, the two commits that apparently have seven (!) LCA's in the git 
tree should probably be checked out. They are probably a good thing to see 
if git-merge-base really _really_ does the right thing, and whether they 
really are true LCA's.

They are commits ad0b46bf.. and e6a933bd.. respectively.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Update the documentation for git-merge-base
  2006-05-16 16:32       ` Linus Torvalds
@ 2006-05-16 23:20         ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2006-05-16 23:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Fredrik Kuivinen, git

Linus Torvalds <torvalds@osdl.org> writes:

>  - In contrast, for git (current master branch), the numbers are 35 out of 
>    540, and there are lots of merges with many LCA's:
>
>     505 o
>      15 oo
>      13 ooo
>       2 oooo
>       3 ooooo
>       2 ooooooo
>
> I think the difference is that Junio does a lot of these branches where he 
> keeps on pulling from them, and never syncs back (which is a great 
> workflow). In contrast, the kernel tends to try to avoid that because the 
> history gets messy enough as it is ;)
>
> Anyway, the two commits that apparently have seven (!) LCA's in the git 
> tree should probably be checked out. They are probably a good thing to see 
> if git-merge-base really _really_ does the right thing, and whether they 
> really are true LCA's.
>
> They are commits ad0b46bf.. and e6a933bd.. respectively.

The first one is because at 1.3.0 I pulled everything from
"next" to "master".

Usually "next" incorporates topic branches that stem from
different commits on "master", and when a new topic is merged to
"next", it gets the updates to "master" up to that point along
with the new topic.  When topics graduate (i.e. merged back) to
"master", they do so at different pace.


      topic2          o---o---o---o---H---.
                     /                 \   \
      next   -----------o---o---E---o---I-------B
                   /   /       /             \   \
      topic1      /   /   o---D---.           \   \
                 /   /   /         \           \   \
      master ---G---o---C---o---o---F---o---o---A---X

The above illustration shows that two topics branched from
master were cooked in next.  Topic 1 branched from master at C,
added two commits (its tip is at D), merged to next at E and
then later merged to master at F.  Similarly, topic 2 branched
from master at G, added five commits (its tip is at H), merged
to next at I and then later merged to master at A.

When merging "next" into "master" by merging A and B to produce
X, tips of topics 1 and 2 (D and H, respectively) become the
merge base.

Merging "next" wholesale to "master" is hopefully a rare event,
but the seven bases you are seeing are the topic tips.

The other one is the other way around.  From time to time,
"next" itself gets updates from "master" to keep it in sync with
fixes that occurred on "master" directly.  Such a merge into
"next" will have this picture but the principles are the same.

      topic2          o---o---o---o---H---.
                     /                 \   \
      next   -----------o---o---E---o---I-------B---Y
                   /   /       /             \     /
      topic1      /   /   o---D---.           \   /
                 /   /   /         \           \ /
      master ---G---o---C---o---o---F---o---o---A

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-05-16 23:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-16  5:58 [PATCH] Update the documentation for git-merge-base Fredrik Kuivinen
2006-05-16  6:13 ` Junio C Hamano
2006-05-16  6:54   ` Fredrik Kuivinen
2006-05-16  7:51     ` Junio C Hamano
2006-05-16 15:32     ` Linus Torvalds
2006-05-16 16:32       ` Linus Torvalds
2006-05-16 23:20         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).