git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Rast <trast@student.ethz.ch>
To: Junio C Hamano <gitster@pobox.com>
Cc: Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Git Mailing List <git@vger.kernel.org>, Jeff King <peff@peff.net>
Subject: Re: in_merge_bases() is too expensive for recent "pu" update
Date: Fri, 24 Aug 2012 11:43:40 +0200	[thread overview]
Message-ID: <87393cbp6b.fsf@thomas.inf.ethz.ch> (raw)
In-Reply-To: <7v393d49xs.fsf@alter.siamese.dyndns.org> (Junio C. Hamano's message of "Thu, 23 Aug 2012 13:42:23 -0700")

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Thomas Rast <trast@student.ethz.ch> writes:
>>
>>> At the very least it should be possible to change in_merge_bases() to
>>> not do any of the post-filtering; perhaps like the patch below.
>>
>> I do not recall the details but the post-filtering was added after
>> the initial naive version without it that was posted to the list did
>> not work correctly in corner cases.  I wouldn't doubt that N*(N-1)
>> loop can be optimized to something with log(N), but I offhand find
>> it hard to believe "to not do any" could be correct without seeing
>> more detailed analysis.
>
> If "one on one side, many on the other side" in merge_bases_many()
> confuses any of you in the readers, you can just ignore many-ness.
> Getting merge bases for "one" against many "two"s using
> merge_bases_many() is exactly the same as getting merge bases for
> "one" against a (fictitious) single commit you can make by merging
> all "two"s.
>
> So we can think about the degenerate case of merge base between two
> commits A and B in the following discussion.
>
> A merge-base between A and B is defined to be a commit that is a
> common ancestor of both A and B, and that is not an ancestor of any
> other merge-base between A and B.

Ok, under that definition I really do think that it's "easy".

You have all the pieces here but one:

> Start from A and B.  Follow from B to find 'x' and paint it in blue,
> follow from A to find 'y' and paint it in amber.  Follow from 'x' to
> '1', paint it in blue.  Follow from 'y' to '1', paint it in amber
> but notice that it already is painted in blue.
[...]
>             o-------o
>            /         \
>           /       y---A
>          /       /
>      ---2---z---1---x---B
>          \         /
>           o-------o
[...]
> So we need to notice that '1' and '2' have ancestry relation in
> order to reject '2' as "common but not merge-base".  One way of
> doing so is not to stop at '1' and keep digging (and eventually we
> find that '2' is what we could reach from '1' that already is a
> merge base), but then we will be susceptible to the same kind of
> clock skew issue as the revision traverser.

I think that is *the* way to do it.

And the way to fix the clock skew issue (also in the revision traversal)
is Peff's generation number cache.  Just keep propagating marks, digging
in generation order, until the generations prove that nothing new can
happen.

  [Side note, in reply to an earlier comment in the rev-list thread: I
  agree with Peff's reasoning that a cache is better than a commit
  header.]

The precise termination condition should be:

  There are no more one-colored commits to walk, and

  The maximum generation of the remaining commits in the walk is less
  than the minimum generation of the merge base candidates

Then the generation numbers ensure that no commit in the walk (which by
then only propagates STALE marks) can possibly be a descendant of any of
the candidates.  At that point, the candidates are the true set of merge
bases.


I conjecture that every history walking problem can be solved in time
linear in the number of commits once we properly use the generation
numbers ;-)

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

  parent reply	other threads:[~2012-08-24  9:43 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-23 12:32 in_merge_bases() is too expensive for recent "pu" update Nguyen Thai Ngoc Duy
2012-08-23 14:20 ` Thomas Rast
2012-08-23 18:08   ` Junio C Hamano
2012-08-23 20:42     ` Junio C Hamano
2012-08-23 21:03       ` Junio C Hamano
2012-08-24  9:32         ` Thomas Rast
2012-08-24 15:50           ` Junio C Hamano
2012-08-24  9:43       ` Thomas Rast [this message]
2012-08-24 15:15         ` Jeff King
2012-08-24 16:15           ` Junio C Hamano
2012-08-24 15:52         ` Junio C Hamano
2012-08-24 11:42   ` Nguyen Thai Ngoc Duy
2012-08-24 11:51     ` Nguyen Thai Ngoc Duy
2012-08-28  1:50   ` Junio C Hamano
2012-08-28  8:12     ` Thomas Rast
2012-08-28 16:03       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87393cbp6b.fsf@thomas.inf.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).