[PATCH] rebase: be cleverer with rebased upstream branches

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] rebase: be cleverer with rebased upstream branches
@ 2011-02-14 13:51 Martin von Zweigbergk
  2011-02-15 11:28 ` Santi Béjar
  2011-02-15 20:30 ` Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Martin von Zweigbergk @ 2011-02-14 13:51 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, santi, Martin von Zweigbergk

Since c85c792 (pull --rebase: be cleverer with rebased upstream
branches, 2008-01-26), 'git pull --rebase' has used the reflog to try
to rebase from the old upstream onto the new upstream.

However, if, instead of 'git pull --rebase', the user were to do 'git
fetch' followed by 'git rebase', the reflog would not be walked. This
patch teaches "git rebase" the same reflog-walking tricks that 'git
pull --rebase' already knows.

This may be useful for rebasing one branch against another local
branch that has been rebased. Currently, you would have to do that
using 'git rebase --onto' or by configuring it on the branch.

It might seem like most of the related code in git-pull.sh can be
removed once git-rebase.sh supports reflog walking. Unfortunately, not
much of it can be removed, though. The reason is that git-pull.sh
simulates one step of "reflog walking" by keeping track of the
position of the remote-tracking branch before and after the fetch
operation. This does not rely on reflogs. There are at least two cases
where the reflog is not used: a) when it is disabled, b) when the
remote branch was specified on the command line (as in 'git pull
--rebase origin master').  In both of these cases, git-pull.sh
remembers the position of the reference before the fetch and uses that
as a kind of '$upstream@{1}'.

Signed-off-by: Martin von Zweigbergk <martin.von.zweigbergk@gmail.com>
---
    This applies on top of mz/rebase.

    I have been using this in combination with the patch for defaulting
    rebase to @{upstream} for a few months now. I find it very convenient
    e.g. when Junio has published some changes and I want to rebase my
    branches. Then I just go to each of my branches and run 'git rebase'
    without any arguments, and they get rebased correctly, whether they
    are based on origin/master, origin/pu (they shouldn't be, but let's
    say they are anyway), or on a local branch that is in turn based on
    e.g. origin/master.

    HOWEVER, this causes a very noticable delay in some cases. With this
    patch, 'git rebase' walks the reflog of the upstream ref until it
    finds a commit that the branch-to-rebase contains. If the upstream ref
    has moved a lot since the branch was last rebased, there may be quite
    a few commits to test before the old upstream commit is found.

    The same thing can already occur with 'git pull --rebase' for exactly
    the same reasons. For example, assume that your upstream remote branch
    changes quite frequently and that you often fetch from the remote so
    that your origin/master gets a long reflog. If you then checkout some
    branch you had not been working on for a while, and run 'git pull',
    you get into the same situation. The delay is probably less likely to
    be noticed in the case of 'git pull --rebase', however, since most
    users will probably assume it is a problem with the network or the
    server.

    Of course, 'git pull --rebase' can also be used with a local branch
    configured as upstream. In this case, the behavior today is just like
    what this patch introduces for 'git rebase'.

    What do you think? I think it's a useful feature, but how do we handle
    the delay problem? Maybe simply by making it configurable?

    Should such a configuration variable apply to 'git pull --rebase' as
    well? It would seem inconsistent otherwise, but maybe that's ok since
    'git pull --rebase' is usually used with remote-tracking branches,
    which probably change less frequently. Btw, is this a correct
    assumption? It is definitely true for my own work on git, but I
    actually think it's the other way around for my work at $dayjob. Am I
    missing some part to the puzzle that explains why I had not noticed
    the delay until I started using this patch?


 Documentation/git-rebase.txt |    7 ++++++-
 git-rebase.sh                |   13 +++++++++++++
 t/t3400-rebase.sh            |   26 ++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 095a67f..d4dbe28 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -24,7 +24,12 @@ it remains on the current branch.
 All changes made by commits in the current branch but that are not
 in <upstream> are saved to a temporary area.  This is the same set
 of commits that would be shown by `git log <upstream>..HEAD` (or
-`git log HEAD`, if --root is specified).
+`git log HEAD`, if --root is specified). If, however, there is a ref
+for the upstream branch, and this branch was rebased since the
+current branch was last rebased, the rebase uses that information to
+avoid rebasing changes done on the upstream branch. If you do not want
+'git rebase' to use this intelligence, refer to the upstream without
+using a reference (e.g. 'master~0').
 
 The current branch is reset to <upstream>, or <newbase> if the
 --onto option was supplied.  This has the exact same effect as
diff --git a/git-rebase.sh b/git-rebase.sh
index 5abfeac..1bc0c29 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -466,6 +466,19 @@ esac
 
 require_clean_work_tree "rebase" "Please commit or stash them."
 
+test -n "$upstream_name" && for reflog in \
+	$(git rev-list -g $upstream_name 2>/dev/null)
+do
+	if test $reflog = $(git merge-base $reflog $orig_head)
+	then
+		if test $reflog != $(git merge-base $onto $reflog)
+		then
+			upstream=$reflog
+		fi
+		break
+	fi
+done
+
 # Now we are rebasing commits $upstream..$orig_head (or with --root,
 # everything leading up to $orig_head) on top of $onto
 
diff --git a/t/t3400-rebase.sh b/t/t3400-rebase.sh
index 349eebd..b64df31 100755
--- a/t/t3400-rebase.sh
+++ b/t/t3400-rebase.sh
@@ -209,4 +209,30 @@ test_expect_success 'rebase -m can copy notes' '
 	test "a note" = "$(git notes show HEAD)"
 '
 
+test_expect_success 'rebase against rebased upstream uses reflog' '
+	git checkout my-topic-branch &&
+	echo "Conflicting modification" > B &&
+	git add B &&
+	git commit -m "Modify B" &&
+	git reset --hard nonlinear &&
+	git checkout -b old-topic my-topic-branch@{1} &&
+	echo def > D &&
+	git add D &&
+	git commit -m "Add D" &&
+	git rebase my-topic-branch &&
+	test $(git rev-parse HEAD^) = $(git rev-parse my-topic-branch)
+'
+
+test_expect_success 'rebase against forwarded upstream does not reapply patches' '
+	git checkout my-topic-branch &&
+	echo abc > B &&
+	git add B &&
+	git commit -m "Conficting B" &&
+	git reset HEAD~2 &&
+	git reset HEAD@{1} &&
+	git checkout old-topic &&
+	git rebase my-topic-branch &&
+	test $(git rev-parse HEAD^) = $(git rev-parse my-topic-branch)
+'
+
 test_done
-- 
1.7.4.rc2.33.g8a14f

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-02-14 13:51 [PATCH] rebase: be cleverer with rebased upstream branches Martin von Zweigbergk
@ 2011-02-15 11:28 ` Santi Béjar
  2011-02-16  1:37   ` Martin von Zweigbergk
  2011-02-15 20:30 ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Santi Béjar @ 2011-02-15 11:28 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: git, Junio C Hamano

On Mon, Feb 14, 2011 at 1:51 PM, Martin von Zweigbergk
<martin.von.zweigbergk@gmail.com> wrote:
> Since c85c792 (pull --rebase: be cleverer with rebased upstream
> branches, 2008-01-26), 'git pull --rebase' has used the reflog to try
> to rebase from the old upstream onto the new upstream.
>
> However, if, instead of 'git pull --rebase', the user were to do 'git
> fetch' followed by 'git rebase', the reflog would not be walked. This
> patch teaches "git rebase" the same reflog-walking tricks that 'git
> pull --rebase' already knows.
>
> This may be useful for rebasing one branch against another local
> branch that has been rebased. Currently, you would have to do that
> using 'git rebase --onto' or by configuring it on the branch.

It make sense.

>
> It might seem like most of the related code in git-pull.sh can be
> removed once git-rebase.sh supports reflog walking. Unfortunately, not
> much of it can be removed, though. The reason is that git-pull.sh
> simulates one step of "reflog walking" by keeping track of the
> position of the remote-tracking branch before and after the fetch
> operation. This does not rely on reflogs. There are at least two cases
> where the reflog is not used: a) when it is disabled, b) when the
> remote branch was specified on the command line (as in 'git pull
> --rebase origin master').  In both of these cases, git-pull.sh
> remembers the position of the reference before the fetch and uses that
> as a kind of '$upstream@{1}'.

I don't agree with point b). In line 190:

	remoteref="$(get_remote_merge_branch "$@" 2>/dev/null)" &&

It returns the local tracking branch for repo=origin and branch=master
and uses its reflog.

The end result is the same, there is one case where you need the old
value of the tracking branch, so it should be done in git-pull.

But I wonder if it is possible to write a function shared by
git-pull.sh and git-rebase.sh that computes the branch forking points,
the number of arguments could detect if it has the old-remote-hash or
not.

>
> Signed-off-by: Martin von Zweigbergk <martin.von.zweigbergk@gmail.com>
> ---

[...]

>
>    HOWEVER, this causes a very noticable delay in some cases. With this
>    patch, 'git rebase' walks the reflog of the upstream ref until it
>    finds a commit that the branch-to-rebase contains. If the upstream ref
>    has moved a lot since the branch was last rebased, there may be quite
>    a few commits to test before the old upstream commit is found.
>
>    The same thing can already occur with 'git pull --rebase' for exactly
>    the same reasons. For example, assume that your upstream remote branch
>    changes quite frequently and that you often fetch from the remote so
>    that your origin/master gets a long reflog. If you then checkout some
>    branch you had not been working on for a while, and run 'git pull',
>    you get into the same situation. The delay is probably less likely to
>    be noticed in the case of 'git pull --rebase', however, since most
>    users will probably assume it is a problem with the network or the
>    server.
>
>    Of course, 'git pull --rebase' can also be used with a local branch
>    configured as upstream. In this case, the behavior today is just like
>    what this patch introduces for 'git rebase'.
>
>    What do you think? I think it's a useful feature, but how do we handle
>    the delay problem? Maybe simply by making it configurable?
>
>    Should such a configuration variable apply to 'git pull --rebase' as
>    well? It would seem inconsistent otherwise, but maybe that's ok since
>    'git pull --rebase' is usually used with remote-tracking branches,
>    which probably change less frequently. Btw, is this a correct
>    assumption? It is definitely true for my own work on git, but I
>    actually think it's the other way around for my work at $dayjob. Am I
>    missing some part to the puzzle that explains why I had not noticed
>    the delay until I started using this patch?

I agree with you that it may add a long delay in some cases. I
normally rebase branches based on an upstream branch and this may
explain why I haven't seen the delay (or maybe I thought it was a
network delay).

I think the delay could be much shorter if the computation was not in
shell, but in C. Or maybe change the algorithm. So I don't think a
configuration item is the answer here.

Other than that I think the change make sense it include docs and
tests and works, thanks.

Santi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-02-15 11:28 ` Santi Béjar
@ 2011-02-16  1:37   ` Martin von Zweigbergk
  2011-02-16  9:26     ` Santi Béjar
  0 siblings, 1 reply; 19+ messages in thread
From: Martin von Zweigbergk @ 2011-02-16  1:37 UTC (permalink / raw)
  To: Santi Béjar; +Cc: Martin von Zweigbergk, git, Junio C Hamano

On Tue, 15 Feb 2011, Santi B?jar wrote:

> On Mon, Feb 14, 2011 at 1:51 PM, Martin von Zweigbergk
> <martin.von.zweigbergk@gmail.com> wrote:
> > It might seem like most of the related code in git-pull.sh can be
> > removed once git-rebase.sh supports reflog walking. Unfortunately, not
> > much of it can be removed, though. The reason is that git-pull.sh
> > simulates one step of "reflog walking" by keeping track of the
> > position of the remote-tracking branch before and after the fetch
> > operation. This does not rely on reflogs. There are at least two cases
> > where the reflog is not used: a) when it is disabled, b) when the
> > remote branch was specified on the command line (as in 'git pull
> > --rebase origin master').  In both of these cases, git-pull.sh
> > remembers the position of the reference before the fetch and uses that
> > as a kind of '$upstream@{1}'.
> 
> I don't agree with point b). In line 190:
> 
> 	remoteref="$(get_remote_merge_branch "$@" 2>/dev/null)" &&
> 
> It returns the local tracking branch for repo=origin and branch=master
> and uses its reflog.

Yes, but the local tracking branch is not updated when the
two-argument version of 'git pull' is used [1].
 
> The end result is the same, there is one case where you need the old
> value of the tracking branch, so it should be done in git-pull.

True, case a) is still there. I was just trying to explain why I
didn't just move the code from git-pull.sh to git-rebase.sh, but maybe
it confused more than it clarified...

> But I wonder if it is possible to write a function shared by
> git-pull.sh and git-rebase.sh that computes the branch forking points,
> the number of arguments could detect if it has the old-remote-hash or
> not.

Makes sense. I will have a look at it.

> I think the delay could be much shorter if the computation was not in
> shell, but in C. Or maybe change the algorithm. So I don't think a
> configuration item is the answer here.

It would definitely be nice if we can make it fast enough so we don't
have to make it configurable.


[1] http://thread.gmane.org/gmane.comp.version-control.git/165758

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-02-16  1:37   ` Martin von Zweigbergk
@ 2011-02-16  9:26     ` Santi Béjar
  0 siblings, 0 replies; 19+ messages in thread
From: Santi Béjar @ 2011-02-16  9:26 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: git, Junio C Hamano

On Wed, Feb 16, 2011 at 1:37 AM, Martin von Zweigbergk
<martin.von.zweigbergk@gmail.com> wrote:
> On Tue, 15 Feb 2011, Santi B?jar wrote:
>
>> On Mon, Feb 14, 2011 at 1:51 PM, Martin von Zweigbergk
>> <martin.von.zweigbergk@gmail.com> wrote:
>> > It might seem like most of the related code in git-pull.sh can be
>> > removed once git-rebase.sh supports reflog walking. Unfortunately, not
>> > much of it can be removed, though. The reason is that git-pull.sh
>> > simulates one step of "reflog walking" by keeping track of the
>> > position of the remote-tracking branch before and after the fetch
>> > operation. This does not rely on reflogs. There are at least two cases
>> > where the reflog is not used: a) when it is disabled, b) when the
>> > remote branch was specified on the command line (as in 'git pull
>> > --rebase origin master').  In both of these cases, git-pull.sh
>> > remembers the position of the reference before the fetch and uses that
>> > as a kind of '$upstream@{1}'.
>>
>> I don't agree with point b). In line 190:
>>
>>       remoteref="$(get_remote_merge_branch "$@" 2>/dev/null)" &&
>>
>> It returns the local tracking branch for repo=origin and branch=master
>> and uses its reflog.
>
> Yes, but the local tracking branch is not updated when the
> two-argument version of 'git pull' is used [1].

Yes, but the reflog is used nevertheless and it can use the local
tracking branch as the old-remote-hash.

>
>> The end result is the same, there is one case where you need the old
>> value of the tracking branch, so it should be done in git-pull.
>
> True, case a) is still there. I was just trying to explain why I
> didn't just move the code from git-pull.sh to git-rebase.sh, but maybe
> it confused more than it clarified...

Yes, with only case a) is sufficient (and complete).

HTH,
Santi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-02-14 13:51 [PATCH] rebase: be cleverer with rebased upstream branches Martin von Zweigbergk
  2011-02-15 11:28 ` Santi Béjar
@ 2011-02-15 20:30 ` Junio C Hamano
  2011-02-16  3:03   ` Martin von Zweigbergk
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2011-02-15 20:30 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: git, santi

Martin von Zweigbergk <martin.von.zweigbergk@gmail.com> writes:

> diff --git a/git-rebase.sh b/git-rebase.sh
> index 5abfeac..1bc0c29 100755
> --- a/git-rebase.sh
> +++ b/git-rebase.sh
> @@ -466,6 +466,19 @@ esac
>  
>  require_clean_work_tree "rebase" "Please commit or stash them."
>  
> +test -n "$upstream_name" && for reflog in \
> +	$(git rev-list -g $upstream_name 2>/dev/null)

Ugly.

	test -n "$upstream_name" &&
        for reflog in $(git rev-list ...)
        do
        	...
	done

Don't you need to make sure $upstream_name is a branch (or a ref in
general that can have a reflog), or does it not matter because the
"rev-list -g" will die without producing anything and you are discarding
the error message?

Now, a handful of random questions, none of them rhetorical, as I don't
know the answers to any of them.

Would it help if the code is made just as clever as the patch attempts to
be, when the user says

	git rebase origin/next~4

IOW, use the reflog of origin/next even in such a case?

> +do
> +	if test $reflog = $(git merge-base $reflog $orig_head)
> +	then
> +		if test $reflog != $(git merge-base $onto $reflog)
> +		then
> +			upstream=$reflog
> +		fi
> +		break
> +	fi

Do we always traverse down to the beginning of the reflog in the worst
case?  Would bisection help to avoid the cost?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-02-15 20:30 ` Junio C Hamano
@ 2011-02-16  3:03   ` Martin von Zweigbergk
  2011-02-16 12:10     ` Santi Béjar
  0 siblings, 1 reply; 19+ messages in thread
From: Martin von Zweigbergk @ 2011-02-16  3:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin von Zweigbergk, git, santi, Thomas Rast

On Tue, 15 Feb 2011, Junio C Hamano wrote:

> Martin von Zweigbergk <martin.von.zweigbergk@gmail.com> writes:
> 
> > diff --git a/git-rebase.sh b/git-rebase.sh
> > index 5abfeac..1bc0c29 100755
> > --- a/git-rebase.sh
> > +++ b/git-rebase.sh
> > @@ -466,6 +466,19 @@ esac
> >  
> >  require_clean_work_tree "rebase" "Please commit or stash them."
> >  
> > +test -n "$upstream_name" && for reflog in \
> > +	$(git rev-list -g $upstream_name 2>/dev/null)
> 
> Ugly.

Very. Fixed. Thanks.

> 	test -n "$upstream_name" &&
>         for reflog in $(git rev-list ...)
>         do
>         	...
> 	done
> 
> Don't you need to make sure $upstream_name is a branch (or a ref in
> general that can have a reflog), or does it not matter because the
> "rev-list -g" will die without producing anything and you are discarding
> the error message?

Exactly as you suspect. Is it too ugly?

> Now, a handful of random questions, none of them rhetorical, as I don't
> know the answers to any of them.
> 
> Would it help if the code is made just as clever as the patch attempts to
> be, when the user says
> 
> 	git rebase origin/next~4
> 
> IOW, use the reflog of origin/next even in such a case?

Not sure. I think it seems too rare to worry about. In those cases,
one could still use the good old '--onto' option manually. Also, if we
don't handle the ref~4 case, the "cleverness" can be disabled by using
ref~0.

> > +do
> > +	if test $reflog = $(git merge-base $reflog $orig_head)
> > +	then
> > +		if test $reflog != $(git merge-base $onto $reflog)
> > +		then
> > +			upstream=$reflog
> > +		fi
> > +		break
> > +	fi
> 
> Do we always traverse down to the beginning of the reflog in the worst
> case?

Yes.

> Would bisection help to avoid the cost?

I don't think the straight-forward use of bisection would work. If the
history looks something like below, where 'b' is the branch to rebase
and 'u' is the upstream, we have to go through each entry in the
reflog to find u@{3}.

        .-u@{0}
       /
      .---u@{1}
     /
x---y-----u@{2}
     \
      .---u@{3}---b
       \
        .-u@{4}

I have an idea inspired by bisection, Thomas's exponential stride, and
what someone (you?) mentioned the other day about virtual merge
commits. I haven't tried it out, but let me know what you think. I'll
try to explain it using an example only:

Exponential stride phase:
1. candidates={ u@{0} }
   merge-base b $candidates -> y, _not_ in $candidates
2. candidates={ u@{1} u@{2} }
   merge-base b $candidates -> y, _not_ in $candidates
3. candidates={ u@{3} u@{4} u@{5} u@{6} }
   merge-base b $candidates -> u@{3}, in $candidates
Bisection phase:
1. candidates={ u@{3} u@{4} }
   merge-base b $candidates -> u@{3}, in $candidates
2. candidates={ u@{3} }
   merge-base b $candidates -> u@{3}, in $candidates, done

It works for the few cases I have thought of, but it may break in
other other cases. I just read about the virtual merge commits, so I'm
not sure I understand correctly how that works eiter.

Would it even perform better than searching linearly? I tried stepping
through it manually a few times and it seems faster.

Maybe something based on timestamps would be better?

/Martin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-02-16  3:03   ` Martin von Zweigbergk
@ 2011-02-16 12:10     ` Santi Béjar
  2011-02-16 13:22       ` Santi Béjar
  2011-02-16 16:45       ` Martin von Zweigbergk
  0 siblings, 2 replies; 19+ messages in thread
From: Santi Béjar @ 2011-02-16 12:10 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: Junio C Hamano, git, Thomas Rast

On Wed, Feb 16, 2011 at 3:03 AM, Martin von Zweigbergk
<martin.von.zweigbergk@gmail.com> wrote:
> On Tue, 15 Feb 2011, Junio C Hamano wrote:
>
>> Martin von Zweigbergk <martin.von.zweigbergk@gmail.com> writes:
>>
>> > diff --git a/git-rebase.sh b/git-rebase.sh
>> > index 5abfeac..1bc0c29 100755
>> > --- a/git-rebase.sh
>> > +++ b/git-rebase.sh
>>       test -n "$upstream_name" &&
>>         for reflog in $(git rev-list ...)
>>         do
>>               ...
>>       done
>>
>> Don't you need to make sure $upstream_name is a branch (or a ref in
>> general that can have a reflog), or does it not matter because the
>> "rev-list -g" will die without producing anything and you are discarding
>> the error message?
>
> Exactly as you suspect. Is it too ugly?

I also prefer Junio's version.

>
>> Now, a handful of random questions, none of them rhetorical, as I don't
>> know the answers to any of them.
>>
>> Would it help if the code is made just as clever as the patch attempts to
>> be, when the user says
>>
>>       git rebase origin/next~4
>>
>> IOW, use the reflog of origin/next even in such a case?
>
> Not sure. I think it seems too rare to worry about. In those cases,
> one could still use the good old '--onto' option manually. Also, if we
> don't handle the ref~4 case, the "cleverness" can be disabled by using
> ref~0.

With ref~4 you are specifying a commit, so I would expect to rebase to
use it as such, not also as a branch ref.

>
>> > +do
>> > +   if test $reflog = $(git merge-base $reflog $orig_head)
>> > +   then
>> > +           if test $reflog != $(git merge-base $onto $reflog)
>> > +           then
>> > +                   upstream=$reflog
>> > +           fi
>> > +           break
>> > +   fi
>>
>> Do we always traverse down to the beginning of the reflog in the worst
>> case?
>
> Yes.
>
>> Would bisection help to avoid the cost?
>
> I don't think the straight-forward use of bisection would work. If the
> history looks something like below, where 'b' is the branch to rebase
> and 'u' is the upstream, we have to go through each entry in the
> reflog to find u@{3}.
>
>
>        .-u@{0}
>       /
>      .---u@{1}
>     /
> x---y-----u@{2}
>     \
>      .---u@{3}---b
>       \
>        .-u@{4}
>
>
> I have an idea inspired by bisection, Thomas's exponential stride, and
> what someone (you?) mentioned the other day about virtual merge
> commits. I haven't tried it out, but let me know what you think. I'll
> try to explain it using an example only:
>
> Exponential stride phase:
> 1. candidates={ u@{0} }
>   merge-base b $candidates -> y, _not_ in $candidates
> 2. candidates={ u@{1} u@{2} }
>   merge-base b $candidates -> y, _not_ in $candidates
> 3. candidates={ u@{3} u@{4} u@{5} u@{6} }
>   merge-base b $candidates -> u@{3}, in $candidates

Doesn't it indicate that u@{3} is the commit we are looking for? I
haven't found a counterexample...

If this is true the following patch can implement it for git-pull.sh and
git-rebase.sh (sorry if it is space damaged):

diff --git i/git-pull.sh w/git-pull.sh
index 2cdea26..09ef0a9 100755
--- i/git-pull.sh
+++ w/git-pull.sh
@@ -189,14 +189,7 @@ test true = "$rebase" && {
 	. git-parse-remote &&
 	remoteref="$(get_remote_merge_branch "$@" 2>/dev/null)" &&
 	oldremoteref="$(git rev-parse -q --verify "$remoteref")" &&
-	for reflog in $(git rev-list -g $remoteref 2>/dev/null)
-	do
-		if test "$reflog" = "$(git merge-base $reflog $curr_branch)"
-		then
-			oldremoteref="$reflog"
-			break
-		fi
-	done
+	oldremoteref=$(git merge-base $curr_branch $oldremoteref $(git
rev-list -g $remoteref 2>/dev/null))
 }
 orig_head=$(git rev-parse -q --verify HEAD)
 git fetch $verbosity $progress $dry_run $recurse_submodules
--update-head-ok "$@" || exit 1
diff --git i/git-rebase.sh w/git-rebase.sh
index 0d245fe..4b3e131 100755
--- i/git-rebase.sh
+++ w/git-rebase.sh
@@ -448,18 +448,8 @@ esac

 require_clean_work_tree "rebase" "Please commit or stash them."

-test -n "$upstream_name" && for reflog in \
-	$(git rev-list -g $upstream_name 2>/dev/null)
-do
-	if test $reflog = $(git merge-base $reflog $orig_head)
-	then
-		if test $reflog != $(git merge-base $onto $reflog)
-		then
-			upstream=$reflog
-		fi
-		break
-	fi
-done
+test -n "$upstream_name" &&
+upstream=$(git merge-base $orig_head $(git rev-list -g $upstream_name
2>/dev/null))

 # Now we are rebasing commits $upstream..$orig_head (or with --root,
 # everything leading up to $orig_head) on top of $onto
diff --git i/t/t3408-rebase-multi-line.sh w/t/t3408-rebase-multi-line.sh
index 6b84e60..bee4494 100755
--- i/t/t3408-rebase-multi-line.sh
+++ w/t/t3408-rebase-multi-line.sh
@@ -10,7 +10,12 @@ test_expect_success setup '
 	git add file &&
 	test_tick &&
 	git commit -m initial &&
+	>elif &&
+	git add elif &&
+	test_tick &&
+	git commit -m second &&

+	git checkout -b side HEAD^
 	echo hello >file &&
 	test_tick &&
 	git commit -a -m "A sample commit log message that has a long
@@ -18,13 +23,7 @@ summary that spills over multiple lines.

 But otherwise with a sane description." &&

-	git branch side &&
-
-	git reset --hard HEAD^ &&
-	>elif &&
-	git add elif &&
-	test_tick &&
-	git commit -m second
+	git checkout master

 '


It passes the "git pull --rebase" test and the basic "git rebase branch"
tests, but it fails basically with two type of tests: 1) those involving "git
rebase -i" (I'll try to debug those but I find it difficult to debug all those
FAKE_LINES), and those with "bad" reflogs as shown in the above patch to
t3408 (see the next paragraph).

Trying to find the counterexample (and debugging the failing test
above) I've found one corner we don't handle (neither in git-pull.sh
nor in git-rebase.sh). It is the case when the upstream branch is
"fast-backwards" into an older commit without extra commits on top.
Something like this:

x---y----u@{1}---u@{2}---b
          \
           .---u@{0}

In this case the algorithm picks u@{1} instead of u@{2} (the
alternative algorithm has the same problem when u@{n} and u@{n+1} are
in different exponential phases.

Or the simple case in:

u@{2}---u@{0}
 \
  .---u@{1}=b

I think this is a very rare corner case as the upstream branch has to
be "fast-backward",  and you have to fetch this state. So far nobody
has found it, at least.

> Bisection phase:
> 1. candidates={ u@{3} u@{4} }
>   merge-base b $candidates -> u@{3}, in $candidates
> 2. candidates={ u@{3} }
>   merge-base b $candidates -> u@{3}, in $candidates, done
>
>
> It works for the few cases I have thought of, but it may break in
> other other cases. I just read about the virtual merge commits, so I'm
> not sure I understand correctly how that works eiter.

Me too.

HTH,
Santi

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-02-16 12:10     ` Santi Béjar
@ 2011-02-16 13:22       ` Santi Béjar
  2011-02-16 19:07         ` Junio C Hamano
  2011-02-16 16:45       ` Martin von Zweigbergk
  1 sibling, 1 reply; 19+ messages in thread
From: Santi Béjar @ 2011-02-16 13:22 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: Junio C Hamano, git, Thomas Rast

[-- Attachment #1: Type: text/plain, Size: 3096 bytes --]

On Wed, Feb 16, 2011 at 1:10 PM, Santi Béjar <santi@agolina.net> wrote:
> On Wed, Feb 16, 2011 at 3:03 AM, Martin von Zweigbergk
> <martin.von.zweigbergk@gmail.com> wrote:
>> On Tue, 15 Feb 2011, Junio C Hamano wrote:
>>
>>> Would bisection help to avoid the cost?
>>
>> I don't think the straight-forward use of bisection would work. If the
>> history looks something like below, where 'b' is the branch to rebase
>> and 'u' is the upstream, we have to go through each entry in the
>> reflog to find u@{3}.
>>
>>
>>        .-u@{0}
>>       /
>>      .---u@{1}
>>     /
>> x---y-----u@{2}
>>     \
>>      .---u@{3}---b
>>       \
>>        .-u@{4}
>>
>>
>> I have an idea inspired by bisection, Thomas's exponential stride, and
>> what someone (you?) mentioned the other day about virtual merge
>> commits. I haven't tried it out, but let me know what you think. I'll
>> try to explain it using an example only:
>>
>> Exponential stride phase:
>> 1. candidates={ u@{0} }
>>   merge-base b $candidates -> y, _not_ in $candidates
>> 2. candidates={ u@{1} u@{2} }
>>   merge-base b $candidates -> y, _not_ in $candidates
>> 3. candidates={ u@{3} u@{4} u@{5} u@{6} }
>>   merge-base b $candidates -> u@{3}, in $candidates
>
> Doesn't it indicate that u@{3} is the commit we are looking for? I
> haven't found a counterexample...
>
> If this is true the following patch can implement it for git-pull.sh and
> git-rebase.sh (sorry if it is space damaged):
>
> diff --git i/git-pull.sh w/git-pull.sh
> index 2cdea26..09ef0a9 100755
> --- i/git-pull.sh
> +++ w/git-pull.sh
> @@ -189,14 +189,7 @@ test true = "$rebase" && {
>        . git-parse-remote &&
>        remoteref="$(get_remote_merge_branch "$@" 2>/dev/null)" &&
>        oldremoteref="$(git rev-parse -q --verify "$remoteref")" &&
> -       for reflog in $(git rev-list -g $remoteref 2>/dev/null)
> -       do
> -               if test "$reflog" = "$(git merge-base $reflog $curr_branch)"
> -               then
> -                       oldremoteref="$reflog"
> -                       break
> -               fi
> -       done
> +       oldremoteref=$(git merge-base $curr_branch $oldremoteref $(git
> rev-list -g $remoteref 2>/dev/null))

One thing I forgot to say is that it seems to perform quite well:

Hot cache:

$ git rev-list -g origin/next | wc -l
36
$ git rev-list -g origin/master | wc -l
52
$ time git merge-base $(git rev-list -g origin/next)
3b781df0a25d5ba23bd2603b0e3e9bb4731369df

real	0m0.155s
user	0m0.064s
sys	0m0.044s
$ time git merge-base $(git rev-list -g origin/master)
7811d9600f02e70c9f835719c71156c967a684f7

real	0m0.161s
user	0m0.064s
sys	0m0.040s
$ time git merge-base $(git rev-list -g origin/master origin/master)
7811d9600f02e70c9f835719c71156c967a684f7

real	0m0.175s
user	0m0.076s
sys	0m0.036s

Cold-cache around 1 second. And it's not linear with the number of
reflog entries, but with the number of independent branches, I
suppose.

HTH,
Santi

P.D.: Attached is the patch, in case someone wants to try it.

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 2185 bytes --]

diff --git i/git-pull.sh w/git-pull.sh
index 2cdea26..09ef0a9 100755
--- i/git-pull.sh
+++ w/git-pull.sh
@@ -189,14 +189,7 @@ test true = "$rebase" && {
 	. git-parse-remote &&
 	remoteref="$(get_remote_merge_branch "$@" 2>/dev/null)" &&
 	oldremoteref="$(git rev-parse -q --verify "$remoteref")" &&
-	for reflog in $(git rev-list -g $remoteref 2>/dev/null)
-	do
-		if test "$reflog" = "$(git merge-base $reflog $curr_branch)"
-		then
-			oldremoteref="$reflog"
-			break
-		fi
-	done
+	oldremoteref=$(git merge-base $curr_branch $oldremoteref $(git rev-list -g $remoteref 2>/dev/null))
 }
 orig_head=$(git rev-parse -q --verify HEAD)
 git fetch $verbosity $progress $dry_run $recurse_submodules --update-head-ok "$@" || exit 1
diff --git i/git-rebase.sh w/git-rebase.sh
index 0d245fe..4b3e131 100755
--- i/git-rebase.sh
+++ w/git-rebase.sh
@@ -448,18 +448,8 @@ esac
 
 require_clean_work_tree "rebase" "Please commit or stash them."
 
-test -n "$upstream_name" && for reflog in \
-	$(git rev-list -g $upstream_name 2>/dev/null)
-do
-	if test $reflog = $(git merge-base $reflog $orig_head)
-	then
-		if test $reflog != $(git merge-base $onto $reflog)
-		then
-			upstream=$reflog
-		fi
-		break
-	fi
-done
+test -n "$upstream_name" &&
+upstream=$(git merge-base $orig_head $(git rev-list -g $upstream_name 2>/dev/null))
 
 # Now we are rebasing commits $upstream..$orig_head (or with --root,
 # everything leading up to $orig_head) on top of $onto
diff --git i/t/t3408-rebase-multi-line.sh w/t/t3408-rebase-multi-line.sh
index 6b84e60..bee4494 100755
--- i/t/t3408-rebase-multi-line.sh
+++ w/t/t3408-rebase-multi-line.sh
@@ -10,7 +10,12 @@ test_expect_success setup '
 	git add file &&
 	test_tick &&
 	git commit -m initial &&
+	>elif &&
+	git add elif &&
+	test_tick &&
+	git commit -m second &&
 
+	git checkout -b side HEAD^
 	echo hello >file &&
 	test_tick &&
 	git commit -a -m "A sample commit log message that has a long
@@ -18,13 +23,7 @@ summary that spills over multiple lines.
 
 But otherwise with a sane description." &&
 
-	git branch side &&
-
-	git reset --hard HEAD^ &&
-	>elif &&
-	git add elif &&
-	test_tick &&
-	git commit -m second
+	git checkout master
 
 '
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-02-16 13:22       ` Santi Béjar
@ 2011-02-16 19:07         ` Junio C Hamano
  2011-02-16 21:16           ` Santi Béjar
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2011-02-16 19:07 UTC (permalink / raw)
  To: Santi Béjar; +Cc: Martin von Zweigbergk, git, Thomas Rast

Santi Béjar <santi@agolina.net> writes:

>> +    oldremoteref=$(git merge-base $curr_branch $oldremoteref $(git
>> rev-list -g $remoteref 2>/dev/null))

Yuck; the entire set of commits that appear in reflog can be quite long.
What will happen when this exceeds the shell command line limit or when
you get E2BIG from execve(2)?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-02-16 19:07         ` Junio C Hamano
@ 2011-02-16 21:16           ` Santi Béjar
  0 siblings, 0 replies; 19+ messages in thread
From: Santi Béjar @ 2011-02-16 21:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin von Zweigbergk, git, Thomas Rast

On Wed, Feb 16, 2011 at 8:07 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Santi Béjar <santi@agolina.net> writes:
>
>>> +    oldremoteref=$(git merge-base $curr_branch $oldremoteref $(git
>>> rev-list -g $remoteref 2>/dev/null))
>
> Yuck; the entire set of commits that appear in reflog can be quite long.
> What will happen when this exceeds the shell command line limit or when
> you get E2BIG from execve(2)?
>

Sure, but it was just a proof of concept to test the algorithm.

Santi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-02-16 12:10     ` Santi Béjar
  2011-02-16 13:22       ` Santi Béjar
@ 2011-02-16 16:45       ` Martin von Zweigbergk
  2011-02-17 10:24         ` Santi Béjar
  1 sibling, 1 reply; 19+ messages in thread
From: Martin von Zweigbergk @ 2011-02-16 16:45 UTC (permalink / raw)
  To: Santi Béjar; +Cc: Martin von Zweigbergk, Junio C Hamano, git, Thomas Rast

On Wed, 16 Feb 2011, Santi B?jar wrote:

> On Wed, Feb 16, 2011 at 3:03 AM, Martin von Zweigbergk
> <martin.von.zweigbergk@gmail.com> wrote:
> > On Tue, 15 Feb 2011, Junio C Hamano wrote:
> >
> >> Martin von Zweigbergk <martin.von.zweigbergk@gmail.com> writes:
> >>
> >> > diff --git a/git-rebase.sh b/git-rebase.sh
> >> > index 5abfeac..1bc0c29 100755
> >> > --- a/git-rebase.sh
> >> > +++ b/git-rebase.sh
> >>       test -n "$upstream_name" &&
> >>         for reflog in $(git rev-list ...)
> >>         do
> >>               ...
> >>       done
> >>
> >> Don't you need to make sure $upstream_name is a branch (or a ref in
> >> general that can have a reflog), or does it not matter because the
> >> "rev-list -g" will die without producing anything and you are discarding
> >> the error message?
> >
> > Exactly as you suspect. Is it too ugly?
> 
> I also prefer Junio's version.

I fixed the test + for loop, if that's what you mean by "Junio's
version". Or did you mean "make sure $upstream_name is a branch"? I
could do that as well if you like. I have no preference.

> >        .-u@{0}
> >       /
> >      .---u@{1}
> >     /
> > x---y-----u@{2}
> >     \
> >      .---u@{3}---b
> >       \
> >        .-u@{4}
> >
> >
> > I have an idea inspired by bisection, Thomas's exponential stride, and
> > what someone (you?) mentioned the other day about virtual merge
> > commits. I haven't tried it out, but let me know what you think. I'll
> > try to explain it using an example only:
> >
> > Exponential stride phase:
> > 1. candidates={ u@{0} }
> >   merge-base b $candidates -> y, _not_ in $candidates
> > 2. candidates={ u@{1} u@{2} }
> >   merge-base b $candidates -> y, _not_ in $candidates
> > 3. candidates={ u@{3} u@{4} u@{5} u@{6} }
> >   merge-base b $candidates -> u@{3}, in $candidates
> 
> Doesn't it indicate that u@{3} is the commit we are looking for? I
> haven't found a counterexample...

Yes, of course. Stupid me ;-). Forget about the other half. (I think
that's what I did manually to match the sha1 back to the ref name, but
that is of course complete non-sense to do in the script.)

> If this is true the following patch can implement it for git-pull.sh and
> git-rebase.sh (sorry if it is space damaged):

Thanks! Will have a closer look at it later today. If I understand
correctly, you simply call merge-base with the _entire_ reflog. I
would have thought that would be slow, but it's great if that is fast
enough. The resulting code looks very nice and short. Thanks again.


/Martin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-02-16 16:45       ` Martin von Zweigbergk
@ 2011-02-17 10:24         ` Santi Béjar
  2011-03-12 21:15           ` Martin von Zweigbergk
  0 siblings, 1 reply; 19+ messages in thread
From: Santi Béjar @ 2011-02-17 10:24 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: Junio C Hamano, git, Thomas Rast

On Wed, Feb 16, 2011 at 5:45 PM, Martin von Zweigbergk
<martin.von.zweigbergk@gmail.com> wrote:
> On Wed, 16 Feb 2011, Santi B?jar wrote:
>
>> On Wed, Feb 16, 2011 at 3:03 AM, Martin von Zweigbergk
>> <martin.von.zweigbergk@gmail.com> wrote:
>> > On Tue, 15 Feb 2011, Junio C Hamano wrote:
>> >
>> >> Martin von Zweigbergk <martin.von.zweigbergk@gmail.com> writes:
>> >>
>> >> > diff --git a/git-rebase.sh b/git-rebase.sh
>> >> > index 5abfeac..1bc0c29 100755
>> >> > --- a/git-rebase.sh
>> >> > +++ b/git-rebase.sh
>> >>       test -n "$upstream_name" &&
>> >>         for reflog in $(git rev-list ...)
>> >>         do
>> >>               ...
>> >>       done
>> >>
>> >> Don't you need to make sure $upstream_name is a branch (or a ref in
>> >> general that can have a reflog), or does it not matter because the
>> >> "rev-list -g" will die without producing anything and you are discarding
>> >> the error message?
>> >
>> > Exactly as you suspect. Is it too ugly?
>>
>> I also prefer Junio's version.
>
> I fixed the test + for loop, if that's what you mean by "Junio's
> version". Or did you mean "make sure $upstream_name is a branch"? I
> could do that as well if you like. I have no preference.

I meant the test + loop, but I would also "make sure $upstream_name is
a branch", as done in git-pull.sh with:

git rev-parse -q --verify "$remoteref"

>
>> >        .-u@{0}
>> >       /
>> >      .---u@{1}
>> >     /
>> > x---y-----u@{2}
>> >     \
>> >      .---u@{3}---b
>> >       \
>> >        .-u@{4}
>> >
>> >
>> > I have an idea inspired by bisection, Thomas's exponential stride, and
>> > what someone (you?) mentioned the other day about virtual merge
>> > commits. I haven't tried it out, but let me know what you think. I'll
>> > try to explain it using an example only:
>> >
>> > Exponential stride phase:
>> > 1. candidates={ u@{0} }
>> >   merge-base b $candidates -> y, _not_ in $candidates
>> > 2. candidates={ u@{1} u@{2} }
>> >   merge-base b $candidates -> y, _not_ in $candidates
>> > 3. candidates={ u@{3} u@{4} u@{5} u@{6} }
>> >   merge-base b $candidates -> u@{3}, in $candidates
>>
>> Doesn't it indicate that u@{3} is the commit we are looking for? I
>> haven't found a counterexample...
>
> Yes, of course. Stupid me ;-). Forget about the other half. (I think
> that's what I did manually to match the sha1 back to the ref name, but
> that is of course complete non-sense to do in the script.)
>
>> If this is true the following patch can implement it for git-pull.sh and
>> git-rebase.sh (sorry if it is space damaged):
>
> Thanks! Will have a closer look at it later today. If I understand
> correctly, you simply call merge-base with the _entire_ reflog. I

Yes, that is the idea (plus the old remote hash in case of git-pull)

> would have thought that would be slow, but it's great if that is fast
> enough.

Yes, I think it is fast enough in the normal case. Even feeding the
entire git.git's master, ~25000 revisions, it takes around 2-4 seconds
only:

$ git rev-list origin/master | wc -l
24380
$ time git merge-base $(git rev-list origin/master)
9971d6d52c5afeb8ba60ae6ddcffb34af23eeadd

real	0m4.014s
user	0m1.520s
sys	0m2.284s

(2.5GHz CPU)

But, as Junio showed, it has problems when the reflog lenght is too
large. Maybe git-merge-base can learn the --stdin flag, or we could
process the reflog in batches of 1000 (?) entries, ... but the nice
property of using the entire reflog is that the output is what you are
looking for, if you take the first 1000 entries you have to check if
the output is one of these entries.

> The resulting code looks very nice and short. Thanks again.

No, thanks to you for the nice idea!

HTH,
Santi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-02-17 10:24         ` Santi Béjar
@ 2011-03-12 21:15           ` Martin von Zweigbergk
  2011-03-12 23:51             ` Santi Béjar
  2011-03-13 23:09             ` Santi Béjar
  0 siblings, 2 replies; 19+ messages in thread
From: Martin von Zweigbergk @ 2011-03-12 21:15 UTC (permalink / raw)
  To: Santi Béjar; +Cc: Martin von Zweigbergk, Junio C Hamano, git, Thomas Rast

On Thu, 17 Feb 2011, Santi B?jar wrote:

> On Wed, Feb 16, 2011 at 5:45 PM, Martin von Zweigbergk
> <martin.von.zweigbergk@gmail.com> wrote:
> > On Wed, 16 Feb 2011, Santi B?jar wrote:
> >
> >> On Wed, Feb 16, 2011 at 3:03 AM, Martin von Zweigbergk
> >> <martin.von.zweigbergk@gmail.com> wrote:
> >> >
> >> >        .-u@{0}
> >> >       /
> >> >      .---u@{1}
> >> >     /
> >> > x---y-----u@{2}
> >> >     \
> >> >      .---u@{3}---b
> >> >       \
> >> >        .-u@{4}
> >> >
> >> >
> >> > I have an idea inspired by bisection, Thomas's exponential stride, and
> >> > what someone (you?) mentioned the other day about virtual merge
> >> > commits. I haven't tried it out, but let me know what you think. I'll
> >> > try to explain it using an example only:
> >> >
> >> > Exponential stride phase:
> >> > 1. candidates={ u@{0} }
> >> >   merge-base b $candidates -> y, _not_ in $candidates
> >> > 2. candidates={ u@{1} u@{2} }
> >> >   merge-base b $candidates -> y, _not_ in $candidates
> >> > 3. candidates={ u@{3} u@{4} u@{5} u@{6} }
> >> >   merge-base b $candidates -> u@{3}, in $candidates
> >>
> >> Doesn't it indicate that u@{3} is the commit we are looking for? I
> >> haven't found a counterexample...
> >
> > Yes, of course. Stupid me ;-). Forget about the other half. (I think
> > that's what I did manually to match the sha1 back to the ref name, but
> > that is of course complete non-sense to do in the script.)
> >
> >> If this is true the following patch can implement it for git-pull.sh and
> >> git-rebase.sh (sorry if it is space damaged):
> >
> > Thanks! Will have a closer look at it later today. If I understand
> > correctly, you simply call merge-base with the _entire_ reflog. I
> 
> Yes, that is the idea (plus the old remote hash in case of git-pull)
> 
> > would have thought that would be slow, but it's great if that is fast
> > enough.
> 
> Yes, I think it is fast enough in the normal case. Even feeding the
> entire git.git's master, ~25000 revisions, it takes around 2-4 seconds
> only:
> 
> $ git rev-list origin/master | wc -l
> 24380
> $ time git merge-base $(git rev-list origin/master)
> 9971d6d52c5afeb8ba60ae6ddcffb34af23eeadd
> 
> real	0m4.014s
> user	0m1.520s
> sys	0m2.284s
> 
> (2.5GHz CPU)

I finally got around to doing some tests on this myself. I used
git.git as of mid Feb, which at that time had 10010 commits in master,
following only the first parent. I took the first 563 commits from the
todo branch and transplanted onto master~10000 (there were some
conflicts after about 563 commits and I figured that would be enough
anyway). I then rebased the resulting branch (let's call it 'u')
against master~9990, then against master~9980 and so on to get a
reflog with 1001 entries for u. I then created another branch 'b'
based on u@{10}, u@{100} and @{1000}, for different runs of the
tests. I created one additional commit on b in each case. I then
rebased b with master, using the following algorithms to find the base
to rebase from:

 manual: simply calling 'git rebase --onto u b~1'

 linear: same algorithm as in 'git pull', which linearly walks the
 reflog until a commit that b contains is found

 merge-base: the base will be calculated as 'git merge-base b $(git
 ref-list -g u)'

 exponential: like merge-base, but start with only u@{0}, then
 {u@{1},u@{2}} and so on until a commit that b contains is found

These are the results:

                 u@{10}     u@{100}    u@{1000}
manual         0m0.535s    0m1.164s    0m1.415s
linear         0m1.245s   0m37.367s   5m10.068s
merge-base    0m14.490s   0m15.409s   0m15.508s
exponential    0m1.056s    0m6.175s   0m27.221s

(1.8 GHz Athlon 64).

This clearly shows that the linear algorithm from git pull is not good
enough when rebasing older branches (i.e. branches whose upstream has
many reflog entries created after the branch itself was created).

The time it takes the "merge-base" algorithm is quite independent on
how old the branch is, but with this quite long and branchy reflog
(but not too dissimilar from git.git's pu?), it takes quite a while to
calculate it. I think this is also too slow to be acceptable as a
default.

I would personnally be happy if the "exponential" algorithm was used
by git rebase default. I suppose not everyone would agree that the
convenience outweighs the performance cost, though. OTOH, a slower
algorithm has been used in git pull for a long time and it seems like
not many people have really been bothered by that. Also see the
following paragraphs.

I also ran the same tests with an upstream branch that was never
force-updated. For these test cases, I created a reflog such that
u@{$i} = master~$((10 * $i)). Since the upstream branch was know never
to have been force-updated in this case, the "manual" test case was
simply 'git rebase u'. These are the results:

                 u@{10}     u@{100}    u@{1000}
manual         0m0.885s    0m6.126s   0m52.248s
linear         0m1.349s   0m39.688s   5m28.753s
merge-base     0m1.160s    0m1.699s    0m1.901s
exponential    0m0.769s    0m4.342s    0m7.360s

Not surprisingly, the linear algorithm is slow in these cases as well.

What's more interesting here is that the last two algorithms are
actually faster than the plain 'git rebase u'. This is caused by
--ignore-if-in-upstream flag to format-patch. Since the other three
algorithms try to figure out what the base was and pass the range from
the guessed base to the branch (e.g. u@{100}..b) to format-patch, the
--ignore-if-in-upstream to that command effectively becomes a no-op.

Although this makes rebase faster in the case of a non-force-updated
upstream, it may also be a problem in some cases. This was something
that I had not thought about until I started timing the calls. One
reason I can think of when the --ignore-if-in-upstream is useful is
when the upstream branch has been rebased, but this is exactly the
case when guessing the old base is useful and solves the problem in a
better way anyway. However, if a commit on the upstream branch was
cherry-picked from some commit on the current branch above its base
(i.e. in u@{x}..b), then that would not be detected by
--ignore-if-in-upstream and could result in unnecessary merge
conflicts. I don't know how common this case is.

The above also applies to 'git pull', of course, but the ways of
getting identical patches in upstream are probably different (more
likely by 'git am' than 'git cherry-pick' perhaps).

I think this is a useful feature. I'm just not sure how to balance the
performance vs convenience. Worst case, this could probably become a
command line option and configuration. I guess 'git pull' should use
the same algorithm. If we decide to use configation, maybe git-pull's
default would need to be different to be backward compatible.

Any thoughts?

> But, as Junio showed, it has problems when the reflog lenght is too
> large. Maybe git-merge-base can learn the --stdin flag, or we could
> process the reflog in batches of 1000 (?) entries, ... but the nice
> property of using the entire reflog is that the output is what you are
> looking for, if you take the first 1000 entries you have to check if
> the output is one of these entries.

Since I think the exponential algorithm seems the best choice, we
could probably just limit it to a certain number of entries, but maybe
it's better to implement a --stdin flag to merge-base. It could be
useful for others too.

/Martin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-03-12 21:15           ` Martin von Zweigbergk
@ 2011-03-12 23:51             ` Santi Béjar
  2011-03-13  1:32               ` Martin von Zweigbergk
  2011-03-13 23:09             ` Santi Béjar
  1 sibling, 1 reply; 19+ messages in thread
From: Santi Béjar @ 2011-03-12 23:51 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: Junio C Hamano, git, Thomas Rast

Thanks for pushing this further.

I'll read it all carefully later, but let me just comment one thing.

On Sat, Mar 12, 2011 at 10:15 PM, Martin von Zweigbergk
<martin.von.zweigbergk@gmail.com> wrote:
> On Thu, 17 Feb 2011, Santi B?jar wrote:
>
>> On Wed, Feb 16, 2011 at 5:45 PM, Martin von Zweigbergk
>> <martin.von.zweigbergk@gmail.com> wrote:
>> > On Wed, 16 Feb 2011, Santi B?jar wrote:
>> >
>> >> On Wed, Feb 16, 2011 at 3:03 AM, Martin von Zweigbergk
>> >> <martin.von.zweigbergk@gmail.com> wrote:
>> >> >
>> >> >        .-u@{0}
>> >> >       /
>> >> >      .---u@{1}
>> >> >     /
>> >> > x---y-----u@{2}
>> >> >     \
>> >> >      .---u@{3}---b
>> >> >       \
>> >> >        .-u@{4}
>> >> >
>> >> >
>> >> > I have an idea inspired by bisection, Thomas's exponential stride, and
>> >> > what someone (you?) mentioned the other day about virtual merge
>> >> > commits. I haven't tried it out, but let me know what you think. I'll
>> >> > try to explain it using an example only:
>> >> >
>> >> > Exponential stride phase:
>> >> > 1. candidates={ u@{0} }
>> >> >   merge-base b $candidates -> y, _not_ in $candidates
>> >> > 2. candidates={ u@{1} u@{2} }
>> >> >   merge-base b $candidates -> y, _not_ in $candidates
>> >> > 3. candidates={ u@{3} u@{4} u@{5} u@{6} }
>> >> >   merge-base b $candidates -> u@{3}, in $candidates
>> >>
>> >> Doesn't it indicate that u@{3} is the commit we are looking for? I
>> >> haven't found a counterexample...
>> >
>> > Yes, of course. Stupid me ;-). Forget about the other half. (I think
>> > that's what I did manually to match the sha1 back to the ref name, but
>> > that is of course complete non-sense to do in the script.)
>> >
>> >> If this is true the following patch can implement it for git-pull.sh and
>> >> git-rebase.sh (sorry if it is space damaged):
>> >
>> > Thanks! Will have a closer look at it later today. If I understand
>> > correctly, you simply call merge-base with the _entire_ reflog. I
>>
>> Yes, that is the idea (plus the old remote hash in case of git-pull)
>>
>> > would have thought that would be slow, but it's great if that is fast
>> > enough.
>>
>> Yes, I think it is fast enough in the normal case. Even feeding the
>> entire git.git's master, ~25000 revisions, it takes around 2-4 seconds
>> only:
>>
>> $ git rev-list origin/master | wc -l
>> 24380
>> $ time git merge-base $(git rev-list origin/master)
>> 9971d6d52c5afeb8ba60ae6ddcffb34af23eeadd
>>
>> real  0m4.014s
>> user  0m1.520s
>> sys   0m2.284s
>>
>> (2.5GHz CPU)
>
>
> I finally got around to doing some tests on this myself. I used
> git.git as of mid Feb, which at that time had 10010 commits in master,
> following only the first parent. I took the first 563 commits from the
> todo branch and transplanted onto master~10000 (there were some
> conflicts after about 563 commits and I figured that would be enough
> anyway). I then rebased the resulting branch (let's call it 'u')
> against master~9990, then against master~9980 and so on to get a
> reflog with 1001 entries for u. I then created another branch 'b'
> based on u@{10}, u@{100} and @{1000}, for different runs of the
> tests. I created one additional commit on b in each case. I then
> rebased b with master, using the following algorithms to find the base
> to rebase from:
>
>  manual: simply calling 'git rebase --onto u b~1'
>
>  linear: same algorithm as in 'git pull', which linearly walks the
>  reflog until a commit that b contains is found
>
>  merge-base: the base will be calculated as 'git merge-base b $(git
>  ref-list -g u)'
>
>  exponential: like merge-base, but start with only u@{0}, then
>  {u@{1},u@{2}} and so on until a commit that b contains is found
>

First, care to share the scripts/patches for the timings? Thanks.

Could you test also variants of the exponential strategy?

exponential(n,m): like merge-base, but start with n candidates {u@{0},
..., u@{n-1}}, then n*m candidates and so on until a commit that b
contains is found.

Your exponential would be exponential(1,2).

Timings for something like exponential(10,2) or exponential(10,10),
maybe others.

Thanks,
Santi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-03-12 23:51             ` Santi Béjar
@ 2011-03-13  1:32               ` Martin von Zweigbergk
  2011-03-13  3:14                 ` Martin von Zweigbergk
  0 siblings, 1 reply; 19+ messages in thread
From: Martin von Zweigbergk @ 2011-03-13  1:32 UTC (permalink / raw)
  To: Santi Béjar; +Cc: Martin von Zweigbergk, Junio C Hamano, git, Thomas Rast

On Sun, 13 Mar 2011, Santi B?jar wrote:

> First, care to share the scripts/patches for the timings? Thanks.

Sure, see end of mail for the changes to git-rebase.sh. It applies on
top of the patch that started this thread. There are some minor
differences from what I used when I ran the tests, but nothing that
should impact the timings.

To run the tests, I just did modified versions of

git reset --hard u@{100}
touch foo && git add foo && git ci -m foo
time git rebase -n --guess-base=merge u

I don't have a script that creates the initial setup, but that should
be easy enough to do. I should warn you that it took 16 hours to run
it on my 6 year old computer :-).


> Could you test also variants of the exponential strategy?

I guess I could :-). Will see if I get time for that later today.

> exponential(n,m): like merge-base, but start with n candidates {u@{0},
> ..., u@{n-1}}, then n*m candidates and so on until a commit that b
> contains is found.
> 
> Your exponential would be exponential(1,2).
> 
> Timings for something like exponential(10,2) or exponential(10,10),
> maybe others.
> 
> Thanks,
> Santi
> 


-- 8< --

diff --git a/git-rebase.sh b/git-rebase.sh
index b50c91e..8a4efab 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -56,6 +56,7 @@ ignore-date!       passed to 'git am'
 whitespace=!       passed to 'git apply'
 ignore-whitespace! passed to 'git apply'
 C=!                passed to 'git apply'
+guess-base=!       to guess the base
  Actions:
 continue!          continue rebasing process
 abort!             abort rebasing process and restore original branch
@@ -87,6 +88,7 @@ git_am_opt=
 rebase_root=
 force_rebase=
 allow_rerere_autoupdate=
+guess_base=
 # Non-empty if a rebase was in progress when 'git rebase' was invoked
 in_progress=
 # One of {am, merge, interactive}
@@ -228,6 +230,10 @@ do
 	--no-autosquash)
 		autosquash=
 		;;
+	--guess-base)
+		shift
+		guess_base=$1
+		;;
 	-M|-m)
 		do_merge=t
 		;;
@@ -459,19 +465,51 @@ esac
 
 require_clean_work_tree "rebase" "Please commit or stash them."
 
-upstream_ref=$(git rev-parse -q --verify --symbolic-full-name \
-	"$upstream_name") && test -n "$upstream_ref" &&
-for reflog in $(git rev-list -g $upstream_name 2>/dev/null)
-do
-	if test $reflog = $(git merge-base $reflog $orig_head)
-	then
-		if test $reflog != $(git merge-base $onto $reflog)
-		then
-			upstream=$reflog
-		fi
-		break
-	fi
-done
+if test -n "$guess_base" && test -n "$(git rev-parse -q --verify \
+	--symbolic-full-name "$upstream_name")"
+then
+	case $guess_base in
+	linear)
+		for reflog in $(git rev-list -g $upstream_name 2>/dev/null)
+		do
+			if test $reflog = $(git merge-base $reflog $orig_head)
+			then
+				if test $reflog != $(git merge-base $onto $reflog)
+				then
+					upstream=$reflog
+				fi
+				break
+			fi
+		done
+		;;
+	merge)
+		upstream=$(git merge-base $orig_head $(git rev-list -g $upstream_name 2>/dev/null))
+		;;
+	exponential)
+		reflogs=$(git rev-list -g $upstream_name 2>/dev/null)
+		limit=$(echo "$reflogs" | wc -l)
+		lo=0
+		hi=1
+		while true
+		do
+			echo $lo - $hi
+			candidates=$(echo "$reflogs" | head -$hi | tail -$(($hi - $lo)))
+			reflog=$(git merge-base $orig_head $candidates)
+			if test -n "$(echo $candidates | grep $reflog)"
+			then
+				upstream=$reflog
+				break
+			fi
+			if test $hi -ge $limit
+			then
+				break
+			fi
+			lo=$hi
+			hi=$((2 * $lo))
+		done
+		;;
+	esac
+fi
 
 # Now we are rebasing commits $upstream..$orig_head (or with --root,
 # everything leading up to $orig_head) on top of $onto

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-03-13  1:32               ` Martin von Zweigbergk
@ 2011-03-13  3:14                 ` Martin von Zweigbergk
  2011-03-13 22:57                   ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Martin von Zweigbergk @ 2011-03-13  3:14 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: Santi Béjar, Junio C Hamano, git, Thomas Rast

On Sat, 12 Mar 2011, Martin von Zweigbergk wrote:

> On Sun, 13 Mar 2011, Santi B?jar wrote:
> 
> > Could you test also variants of the exponential strategy?
> 
> I guess I could :-). Will see if I get time for that later today.

So here are the updated figures for the force-updated history
(pu-like):

                 u@{10}     u@{100}    u@{1000}
manual         0m0.535s    0m1.164s    0m1.415s
linear         0m1.245s   0m37.367s   5m10.068s
merge-base    0m14.490s   0m15.409s   0m15.508s
exp(1,2)       0m1.056s    0m6.175s   0m27.221s
exp(10,10)     0m1.950s   0m20.031s   0m18.215s
exp(7,7)       0m1.310s    0m6.851s   0m16.757s

and for the non-force-updated history (master-like):

                 u@{10}     u@{100}    u@{1000}
manual         0m0.885s    0m6.126s   0m52.248s
linear         0m1.349s   0m39.688s   5m28.753s
merge-base     0m1.160s    0m1.699s    0m1.901s
exp(1,2)       0m0.769s    0m4.342s    0m7.360s
exp(10,10)     0m0.700s    0m2.535s    0m3.110s
exp(7,7)       0m0.653s    0m2.332s    0m3.506s


exp(10,10) is worst possible for the test cases I picked, since the
wanted reflog entry is always the first one in an interval, so almost
10 times as many entries as necessary are considered. I therefore also
tried with exp(7,7) to get more fair figures.


/Martin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-03-13  3:14                 ` Martin von Zweigbergk
@ 2011-03-13 22:57                   ` Junio C Hamano
  2011-03-13 23:42                     ` Santi Béjar
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2011-03-13 22:57 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: Santi Béjar, git, Thomas Rast

Martin von Zweigbergk <martin.von.zweigbergk@gmail.com> writes:

> So here are the updated figures for the force-updated history
> (pu-like):
>
>                  u@{10}     u@{100}    u@{1000}
> manual         0m0.535s    0m1.164s    0m1.415s
> linear         0m1.245s   0m37.367s   5m10.068s
> merge-base    0m14.490s   0m15.409s   0m15.508s
> exp(1,2)       0m1.056s    0m6.175s   0m27.221s
> exp(10,10)     0m1.950s   0m20.031s   0m18.215s
> exp(7,7)       0m1.310s    0m6.851s   0m16.757s
>
> and for the non-force-updated history (master-like):
>
>                  u@{10}     u@{100}    u@{1000}
> manual         0m0.885s    0m6.126s   0m52.248s
> linear         0m1.349s   0m39.688s   5m28.753s
> merge-base     0m1.160s    0m1.699s    0m1.901s
> exp(1,2)       0m0.769s    0m4.342s    0m7.360s
> exp(10,10)     0m0.700s    0m2.535s    0m3.110s
> exp(7,7)       0m0.653s    0m2.332s    0m3.506s

I have a suspicion that merge-base shouldn't be taken as "one of the
candidates among others".

It is merely a technique to check multiple heads simultaneously and
cheaply, instead of checking things one by one, and should be applicable
regardless of these "multiple heads" come from.  It may come from a linear
walk of reflog, or it may come from a leaky exponential or fibonacci walk.

Instead of "linear" that checks "Is the tip good?  Is the tip@{1} good?
Is the tip@{2} good? How about tip@{3}?" repeatedly, we can check "is the
tip through tip@{N} good?" with a single invocation using the merge-base
technique.

Similarly if your exp(i,j) checks "Is the tip good? tip@{1}? tip@{2}?
tip@{4}?  tip@{8}? ..."  iteratively, you can grab these commit object
names out of reflog and still use the merge-base optimization, effectively
making "one at a time" check into "N at a time" check (perhaps checking a
dozen or two at a time).

If you generate list of reglog entries way beyond what matters to the end
result, and feed all of them to the machinery, it is not surprising that
it would spend more time than linear that can stop early upon the first
success.

We should optimize for common case by picking the default that performs
well for history that is never rewound.  If the the default algorithm on
pathological histories performs badly, it is Ok to have an alternative
heuristics that is only triggered on such a history, but if we are going
to do so, we need to make sure that we can cheaply detect if we need to
use the alternative heuristics in the first place.

Is it easy to tell in an earlier phase if the history is "force-updated"
or not?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-03-13 22:57                   ` Junio C Hamano
@ 2011-03-13 23:42                     ` Santi Béjar
  0 siblings, 0 replies; 19+ messages in thread
From: Santi Béjar @ 2011-03-13 23:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin von Zweigbergk, git, Thomas Rast

On Sun, Mar 13, 2011 at 11:57 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Martin von Zweigbergk <martin.von.zweigbergk@gmail.com> writes:
>
>> So here are the updated figures for the force-updated history
>> (pu-like):
>>
>>                  u@{10}     u@{100}    u@{1000}
>> manual         0m0.535s    0m1.164s    0m1.415s
>> linear         0m1.245s   0m37.367s   5m10.068s
>> merge-base    0m14.490s   0m15.409s   0m15.508s
>> exp(1,2)       0m1.056s    0m6.175s   0m27.221s
>> exp(10,10)     0m1.950s   0m20.031s   0m18.215s
>> exp(7,7)       0m1.310s    0m6.851s   0m16.757s
>>
>> and for the non-force-updated history (master-like):
>>
>>                  u@{10}     u@{100}    u@{1000}
>> manual         0m0.885s    0m6.126s   0m52.248s
>> linear         0m1.349s   0m39.688s   5m28.753s
>> merge-base     0m1.160s    0m1.699s    0m1.901s
>> exp(1,2)       0m0.769s    0m4.342s    0m7.360s
>> exp(10,10)     0m0.700s    0m2.535s    0m3.110s
>> exp(7,7)       0m0.653s    0m2.332s    0m3.506s
>
> I have a suspicion that merge-base shouldn't be taken as "one of the
> candidates among others".
>
> It is merely a technique to check multiple heads simultaneously and
> cheaply, instead of checking things one by one, and should be applicable
> regardless of these "multiple heads" come from.  It may come from a linear
> walk of reflog, or it may come from a leaky exponential or fibonacci walk.

Here the merge-base strategy mean take all the reflog at once. And in this
case the reflog is 1000 entries long, it is only relevant to the u@{1000}
case. For me it is more like: take the solution and check that indeed it is
the solution. In this sense it is the "minimum" time required to find the
solution.

>
> Instead of "linear" that checks "Is the tip good?  Is the tip@{1} good?
> Is the tip@{2} good? How about tip@{3}?" repeatedly, we can check "is the
> tip through tip@{N} good?" with a single invocation using the merge-base
> technique.

exp(n,m) is similar to this. But, yes, we could have a merge-base(N)
strategy which checks using the merge-base technique the first N reflog
entries, then the next N entries, and so on. But I think it would scale worst
then the exponential strategy.

>
> Similarly if your exp(i,j) checks "Is the tip good? tip@{1}? tip@{2}?
> tip@{4}?  tip@{8}? ..."  iteratively, you can grab these commit object
> names out of reflog and still use the merge-base optimization, effectively
> making "one at a time" check into "N at a time" check (perhaps checking a
> dozen or two at a time).

Indeed it is "Is the tip good? tip@{1},tip@{2}? tip@{4},tip{5},...,tip@{8}?
... and the merge-base techinque is used to answer each question.

>
> If you generate list of reglog entries way beyond what matters to the end
> result, and feed all of them to the machinery, it is not surprising that
> it would spend more time than linear that can stop early upon the first
> success.

But if the answer is the last entry in the reflog (as in the u@{1000} case) it
is a lower limit, see above.

>
> We should optimize for common case by picking the default that performs
> well for history that is never rewound.  If the the default algorithm on
> pathological histories performs badly, it is Ok to have an alternative
> heuristics that is only triggered on such a history, but if we are going
> to do so, we need to make sure that we can cheaply detect if we need to
> use the alternative heuristics in the first place.

Maybe I'm wrong, but from the number I see that the exp strategy this
"optimize for the common case and works reasonably work for pathological
histories".

>
> Is it easy to tell in an earlier phase if the history is "force-updated"
> or not?
>

I'm afraid this is a similar question.

HTH,
Santi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rebase: be cleverer with rebased upstream branches
  2011-03-12 21:15           ` Martin von Zweigbergk
  2011-03-12 23:51             ` Santi Béjar
@ 2011-03-13 23:09             ` Santi Béjar
  1 sibling, 0 replies; 19+ messages in thread
From: Santi Béjar @ 2011-03-13 23:09 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: Junio C Hamano, git, Thomas Rast

[Also quoted text from On Sun, Mar 13, 2011 at 4:14 AM, Martin von
Zweigbergk <martin.von.zweigbergk@gmail.com>]

On Sat, Mar 12, 2011 at 10:15 PM, Martin von Zweigbergk
<martin.von.zweigbergk@gmail.com> wrote:
> On Thu, 17 Feb 2011, Santi B?jar wrote:
>
>> On Wed, Feb 16, 2011 at 5:45 PM, Martin von Zweigbergk
>> <martin.von.zweigbergk@gmail.com> wrote:
>> > On Wed, 16 Feb 2011, Santi B?jar wrote:
>> >
>> >> On Wed, Feb 16, 2011 at 3:03 AM, Martin von Zweigbergk
>> >> <martin.von.zweigbergk@gmail.com> wrote:
>> >> >
>> >> >        .-u@{0}
>> >> >       /
>> >> >      .---u@{1}
>> >> >     /
>> >> > x---y-----u@{2}
>> >> >     \
>> >> >      .---u@{3}---b
>> >> >       \
>> >> >        .-u@{4}
>> >> >
>> >> >
>> >> > I have an idea inspired by bisection, Thomas's exponential stride, and
>> >> > what someone (you?) mentioned the other day about virtual merge
>> >> > commits. I haven't tried it out, but let me know what you think. I'll
>> >> > try to explain it using an example only:
>> >> >
>> >> > Exponential stride phase:
>> >> > 1. candidates={ u@{0} }
>> >> >   merge-base b $candidates -> y, _not_ in $candidates
>> >> > 2. candidates={ u@{1} u@{2} }
>> >> >   merge-base b $candidates -> y, _not_ in $candidates
>> >> > 3. candidates={ u@{3} u@{4} u@{5} u@{6} }
>> >> >   merge-base b $candidates -> u@{3}, in $candidates
>> >>
>> >> Doesn't it indicate that u@{3} is the commit we are looking for? I
>> >> haven't found a counterexample...
>> >
>> > Yes, of course. Stupid me ;-). Forget about the other half. (I think
>> > that's what I did manually to match the sha1 back to the ref name, but
>> > that is of course complete non-sense to do in the script.)
>> >
>> >> If this is true the following patch can implement it for git-pull.sh and
>> >> git-rebase.sh (sorry if it is space damaged):
>> >
>> > Thanks! Will have a closer look at it later today. If I understand
>> > correctly, you simply call merge-base with the _entire_ reflog. I
>>
>> Yes, that is the idea (plus the old remote hash in case of git-pull)
>>
>> > would have thought that would be slow, but it's great if that is fast
>> > enough.
>>
>> Yes, I think it is fast enough in the normal case. Even feeding the
>> entire git.git's master, ~25000 revisions, it takes around 2-4 seconds
>> only:
>>
>> $ git rev-list origin/master | wc -l
>> 24380
>> $ time git merge-base $(git rev-list origin/master)
>> 9971d6d52c5afeb8ba60ae6ddcffb34af23eeadd
>>
>> real  0m4.014s
>> user  0m1.520s
>> sys   0m2.284s
>>
>> (2.5GHz CPU)
>
>
> I finally got around to doing some tests on this myself. I used
> git.git as of mid Feb, which at that time had 10010 commits in master,
> following only the first parent. I took the first 563 commits from the
> todo branch and transplanted onto master~10000 (there were some
> conflicts after about 563 commits and I figured that would be enough
> anyway). I then rebased the resulting branch (let's call it 'u')
> against master~9990, then against master~9980 and so on to get a
> reflog with 1001 entries for u. I then created another branch 'b'
> based on u@{10}, u@{100} and @{1000}, for different runs of the
> tests. I created one additional commit on b in each case. I then
> rebased b with master, using the following algorithms to find the base
> to rebase from:
>
>  manual: simply calling 'git rebase --onto u b~1'
>
>  linear: same algorithm as in 'git pull', which linearly walks the
>  reflog until a commit that b contains is found
>
>  merge-base: the base will be calculated as 'git merge-base b $(git
>  ref-list -g u)'
>
>  exponential: like merge-base, but start with only u@{0}, then
>  {u@{1},u@{2}} and so on until a commit that b contains is found

exp(n,m): like merge-base, but start with n candidates {u@{0},
..., u@{n-1}}, then n*m candidates and so on until a commit that b
contains is found.

>
> These are the results:

These are best timing out of three runs, mean, only the first one? Hot-cache
for all tests?

>
>                 u@{10}     u@{100}    u@{1000}
> manual         0m0.535s    0m1.164s    0m1.415s
> linear         0m1.245s   0m37.367s   5m10.068s
> merge-base    0m14.490s   0m15.409s   0m15.508s
> exp(1,2)       0m1.056s    0m6.175s   0m27.221s
> exp(10,10)     0m1.950s   0m20.031s   0m18.215s
> exp(7,7)       0m1.310s    0m6.851s   0m16.757s
>
> (1.8 GHz Athlon 64).
>
> This clearly shows that the linear algorithm from git pull is not good
> enough when rebasing older branches (i.e. branches whose upstream has
> many reflog entries created after the branch itself was created).
>
> The time it takes the "merge-base" algorithm is quite independent on
> how old the branch is, but with this quite long and branchy reflog
> (but not too dissimilar from git.git's pu?), it takes quite a while to
> calculate it. I think this is also too slow to be acceptable as a
> default.
>
> I would personnally be happy if the "exponential" algorithm was used
> by git rebase default. I suppose not everyone would agree that the
> convenience outweighs the performance cost, though.

I don't agree with this. For the normal case there is no performance cost
(manual 0.5s, exp(7,7) 1.3s). There is performance cost (manual 1.4s, exp(7,7)
16.7s) when you need it, when your upstream has been rebased a long ago (in
reflog entries).

> OTOH, a slower
> algorithm has been used in git pull for a long time and it seems like
> not many people have really been bothered by that. Also see the
> following paragraphs.
>
> I also ran the same tests with an upstream branch that was never
> force-updated. For these test cases, I created a reflog such that
> u@{$i} = master~$((10 * $i)). Since the upstream branch was know never
> to have been force-updated in this case, the "manual" test case was
> simply 'git rebase u'. These are the results:
>
>                 u@{10}     u@{100}    u@{1000}
> manual         0m0.885s    0m6.126s   0m52.248s
> linear         0m1.349s   0m39.688s   5m28.753s
> merge-base     0m1.160s    0m1.699s    0m1.901s
> exp(1,2)       0m0.769s    0m4.342s    0m7.360s
> exp(10,10)     0m0.700s    0m2.535s    0m3.110s
> exp(7,7)       0m0.653s    0m2.332s    0m3.506s
>
> Not surprisingly, the linear algorithm is slow in these cases as well.
>
> What's more interesting here is that the last two algorithms are
> actually faster than the plain 'git rebase u'. This is caused by
> --ignore-if-in-upstream flag to format-patch. Since the other three
> algorithms try to figure out what the base was and pass the range from
> the guessed base to the branch (e.g. u@{100}..b) to format-patch, the
> --ignore-if-in-upstream to that command effectively becomes a no-op.
>
> Although this makes rebase faster in the case of a non-force-updated
> upstream, it may also be a problem in some cases. This was something
> that I had not thought about until I started timing the calls. One
> reason I can think of when the --ignore-if-in-upstream is useful is
> when the upstream branch has been rebased, but this is exactly the
> case when guessing the old base is useful and solves the problem in a
> better way anyway. However, if a commit on the upstream branch was
> cherry-picked from some commit on the current branch above its base
> (i.e. in u@{x}..b), then that would not be detected by
> --ignore-if-in-upstream and could result in unnecessary merge
> conflicts. I don't know how common this case is.
>
> The above also applies to 'git pull', of course, but the ways of
> getting identical patches in upstream are probably different (more
> likely by 'git am' than 'git cherry-pick' perhaps).
>
> I think this is a useful feature. I'm just not sure how to balance the
> performance vs convenience. Worst case, this could probably become a
> command line option and configuration. I guess 'git pull' should use
> the same algorithm. If we decide to use configation, maybe git-pull's
> default would need to be different to be backward compatible.
>
> Any thoughts?

I think it is worth, as it looks like it only affects those who need
the feature.
The exp(7,7) or similar seems a good candidate.

HTH,
Santi

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-03-13 23:42 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-14 13:51 [PATCH] rebase: be cleverer with rebased upstream branches Martin von Zweigbergk
2011-02-15 11:28 ` Santi Béjar
2011-02-16  1:37   ` Martin von Zweigbergk
2011-02-16  9:26     ` Santi Béjar
2011-02-15 20:30 ` Junio C Hamano
2011-02-16  3:03   ` Martin von Zweigbergk
2011-02-16 12:10     ` Santi Béjar
2011-02-16 13:22       ` Santi Béjar
2011-02-16 19:07         ` Junio C Hamano
2011-02-16 21:16           ` Santi Béjar
2011-02-16 16:45       ` Martin von Zweigbergk
2011-02-17 10:24         ` Santi Béjar
2011-03-12 21:15           ` Martin von Zweigbergk
2011-03-12 23:51             ` Santi Béjar
2011-03-13  1:32               ` Martin von Zweigbergk
2011-03-13  3:14                 ` Martin von Zweigbergk
2011-03-13 22:57                   ` Junio C Hamano
2011-03-13 23:42                     ` Santi Béjar
2011-03-13 23:09             ` Santi Béjar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).