best practices against long git rebase times?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* best practices against long git rebase times?
@ 2015-12-04 15:05 Andreas Krey
  2015-12-04 15:31 ` John Keeping
  2015-12-04 17:09 ` demerphq
  0 siblings, 2 replies; 13+ messages in thread
From: Andreas Krey @ 2015-12-04 15:05 UTC (permalink / raw)
  To: Git Mailing List

Hi all,

our workflow is pretty rebase-free for diverse reasons yet.

One obstacle now appearing is that rebases simply take
very long - once you might want to do a rebase there are
several hundred commits on the remote branch, and our tree
isn't small either.

This produces rebase times in the minute range.
I suppose this is because rebase tries to see
if there are new commits in the destination
branch that are identical to one of the local
commits, to be able to skip them. (I didn't
try to verify this hypothesis.)

What can we do to make this faster?

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: best practices against long git rebase times?
  2015-12-04 15:05 best practices against long git rebase times? Andreas Krey
@ 2015-12-04 15:31 ` John Keeping
  2015-12-06 16:43   ` Andreas Krey
  2015-12-04 17:09 ` demerphq
  1 sibling, 1 reply; 13+ messages in thread
From: John Keeping @ 2015-12-04 15:31 UTC (permalink / raw)
  To: Andreas Krey; +Cc: Git Mailing List

On Fri, Dec 04, 2015 at 04:05:46PM +0100, Andreas Krey wrote:
> our workflow is pretty rebase-free for diverse reasons yet.
> 
> One obstacle now appearing is that rebases simply take
> very long - once you might want to do a rebase there are
> several hundred commits on the remote branch, and our tree
> isn't small either.
> 
> This produces rebase times in the minute range.
> I suppose this is because rebase tries to see
> if there are new commits in the destination
> branch that are identical to one of the local
> commits, to be able to skip them. (I didn't
> try to verify this hypothesis.)
> 
> What can we do to make this faster?

I'm pretty sure that you're right and the cherry-pick analysis is where
the time is spent.

I looked into this a couple of years ago and I have a variety of
(half-finished) experiments that might improve the performance of this:

	https://github.com/johnkeeping/git/commits/log-cherry-no-merges
	https://github.com/johnkeeping/git/commits/patch-id-limit-paths
	https://github.com/johnkeeping/git/commits/revision-cherry-respect-ancestry-path
	https://github.com/johnkeeping/git/commits/patch-id-notes-cache
	http://comments.gmane.org/gmane.comp.version-control.git/224006

I have no idea if any of these changes will apply to modern Git (or if
some of them are even correct) but I can try to clean them up if there's
interest.

The commit for patch-id-limit-paths includes some numbers that might be
relevant for your case:

    Before:
    $ time git log --cherry master...jk/submodule-subdirectory-ok >/dev/null
    
    real    0m0.373s
    user    0m0.341s
    sys     0m0.031s
    
    After:
    $ time git log --cherry master...jk/submodule-subdirectory-ok >/dev/null
    
    real    0m0.060s
    user    0m0.055s
    sys     0m0.005s

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: best practices against long git rebase times?
  2015-12-04 15:05 best practices against long git rebase times? Andreas Krey
  2015-12-04 15:31 ` John Keeping
@ 2015-12-04 17:09 ` demerphq
  2015-12-04 17:28   ` John Keeping
  2015-12-06 16:40   ` Andreas Krey
  1 sibling, 2 replies; 13+ messages in thread
From: demerphq @ 2015-12-04 17:09 UTC (permalink / raw)
  To: Andreas Krey; +Cc: Git Mailing List

On 4 December 2015 at 16:05, Andreas Krey <a.krey@gmx.de> wrote:
> Hi all,
>
> our workflow is pretty rebase-free for diverse reasons yet.
>
> One obstacle now appearing is that rebases simply take
> very long - once you might want to do a rebase there are
> several hundred commits on the remote branch, and our tree
> isn't small either.
>
> This produces rebase times in the minute range.
> I suppose this is because rebase tries to see
> if there are new commits in the destination
> branch that are identical to one of the local
> commits, to be able to skip them. (I didn't
> try to verify this hypothesis.)
>
> What can we do to make this faster?

I bet you have a lot of refs; tags, or branches.

git rebase performance along with many operations seems to scale
proportionately to the number of tags.

At $work we create a tag every time we "roll out" a "server type".

This produces many tags a day.

Over time rebase, and many operations actually, start slowing down to
the point of painfulness.

The workaround we ended up using was to set up a cron job and related
infra that removed old tags.

Once we got rid of most of our old tags git became nice to use again.

Try making a clone, nuking all the refs in it, and then time rebase and friends.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: best practices against long git rebase times?
  2015-12-04 17:09 ` demerphq
@ 2015-12-04 17:28   ` John Keeping
  2015-12-04 17:33     ` demerphq
  2015-12-06 16:40   ` Andreas Krey
  1 sibling, 1 reply; 13+ messages in thread
From: John Keeping @ 2015-12-04 17:28 UTC (permalink / raw)
  To: demerphq; +Cc: Andreas Krey, Git Mailing List

On Fri, Dec 04, 2015 at 06:09:33PM +0100, demerphq wrote:
> On 4 December 2015 at 16:05, Andreas Krey <a.krey@gmx.de> wrote:
> > Hi all,
> >
> > our workflow is pretty rebase-free for diverse reasons yet.
> >
> > One obstacle now appearing is that rebases simply take
> > very long - once you might want to do a rebase there are
> > several hundred commits on the remote branch, and our tree
> > isn't small either.
> >
> > This produces rebase times in the minute range.
> > I suppose this is because rebase tries to see
> > if there are new commits in the destination
> > branch that are identical to one of the local
> > commits, to be able to skip them. (I didn't
> > try to verify this hypothesis.)
> >
> > What can we do to make this faster?
> 
> I bet you have a lot of refs; tags, or branches.
> 
> git rebase performance along with many operations seems to scale
> proportionately to the number of tags.
> 
> At $work we create a tag every time we "roll out" a "server type".
> 
> This produces many tags a day.
> 
> Over time rebase, and many operations actually, start slowing down to
> the point of painfulness.
> 
> The workaround we ended up using was to set up a cron job and related
> infra that removed old tags.
> 
> Once we got rid of most of our old tags git became nice to use again.

This is quite surprising.  Were you using packed or loose tags?

It would be interesting to run git-rebase with GIT_TRACE_PERFORMANCE to
see which subcommand is slow in this particular scenario.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: best practices against long git rebase times?
  2015-12-04 17:28   ` John Keeping
@ 2015-12-04 17:33     ` demerphq
  2015-12-04 18:10       ` Stefan Beller
  0 siblings, 1 reply; 13+ messages in thread
From: demerphq @ 2015-12-04 17:33 UTC (permalink / raw)
  To: John Keeping, Ævar Arnfjörð Bjarmason
  Cc: Andreas Krey, Git Mailing List

On 4 December 2015 at 18:28, John Keeping <john@keeping.me.uk> wrote:
> On Fri, Dec 04, 2015 at 06:09:33PM +0100, demerphq wrote:
>> On 4 December 2015 at 16:05, Andreas Krey <a.krey@gmx.de> wrote:
>> > Hi all,
>> >
>> > our workflow is pretty rebase-free for diverse reasons yet.
>> >
>> > One obstacle now appearing is that rebases simply take
>> > very long - once you might want to do a rebase there are
>> > several hundred commits on the remote branch, and our tree
>> > isn't small either.
>> >
>> > This produces rebase times in the minute range.
>> > I suppose this is because rebase tries to see
>> > if there are new commits in the destination
>> > branch that are identical to one of the local
>> > commits, to be able to skip them. (I didn't
>> > try to verify this hypothesis.)
>> >
>> > What can we do to make this faster?
>>
>> I bet you have a lot of refs; tags, or branches.
>>
>> git rebase performance along with many operations seems to scale
>> proportionately to the number of tags.
>>
>> At $work we create a tag every time we "roll out" a "server type".
>>
>> This produces many tags a day.
>>
>> Over time rebase, and many operations actually, start slowing down to
>> the point of painfulness.
>>
>> The workaround we ended up using was to set up a cron job and related
>> infra that removed old tags.
>>
>> Once we got rid of most of our old tags git became nice to use again.
>
> This is quite surprising.  Were you using packed or loose tags?

It didn't matter.

> It would be interesting to run git-rebase with GIT_TRACE_PERFORMANCE to
> see which subcommand is slow in this particular scenario.

These days it isn't that slow :-)

But I cc'ed Avar, he did the work on that, all I know is when he
finished the tag remover I stopped cursing every time I rebased.

I believe I remember him saying that you can reproduce it using a
public repo by taking the linux repo and creating a tag every 10
commits or so. Once you are done git in many operations will be nice
and slow!

In all fairness however, I do believe that some of the recent changes
to git helped, but I dont how much or which. What I do know is we
still have the cron sweeper process cleaning refs. (It broke one of my
repos that I set up with --reference just the other day).

Yves




-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: best practices against long git rebase times?
  2015-12-04 17:33     ` demerphq
@ 2015-12-04 18:10       ` Stefan Beller
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Beller @ 2015-12-04 18:10 UTC (permalink / raw)
  To: demerphq
  Cc: John Keeping, Ævar Arnfjörð Bjarmason,
	Andreas Krey, Git Mailing List

On Fri, Dec 4, 2015 at 9:33 AM, demerphq <demerphq@gmail.com> wrote:
> In all fairness however, I do believe that some of the recent changes
> to git helped, but I dont how much or which. What I do know is we
> still have the cron sweeper process cleaning refs. (It broke one of my
> repos that I set up with --reference just the other day).
>
> Yves

git-am was rewritten in C, which is used to apply patches.
This also speeds up rebase.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: best practices against long git rebase times?
  2015-12-04 17:09 ` demerphq
  2015-12-04 17:28   ` John Keeping
@ 2015-12-06 16:40   ` Andreas Krey
  1 sibling, 0 replies; 13+ messages in thread
From: Andreas Krey @ 2015-12-06 16:40 UTC (permalink / raw)
  To: demerphq; +Cc: Git Mailing List

On Fri, 04 Dec 2015 18:09:33 +0000, demerphq wrote:
...
> I bet you have a lot of refs; tags, or branches.

We do, but removing them doesn't noticably change the times
(12k refs vs. 120, mostly tags). I'm just running the
second series, the first (with many refs) ended
with rebasing over 3000 commits, for which git log -p
outputs 64 MByte, and the rebase takes 12 minutes.

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: best practices against long git rebase times?
  2015-12-04 15:31 ` John Keeping
@ 2015-12-06 16:43   ` Andreas Krey
  2015-12-07 21:02     ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Krey @ 2015-12-06 16:43 UTC (permalink / raw)
  To: John Keeping; +Cc: Git Mailing List

On Fri, 04 Dec 2015 15:31:03 +0000, John Keeping wrote:
...
> I'm pretty sure that you're right and the cherry-pick analysis is where
> the time is spent.

But I'm pretty surprised as to the amount of CPU time that goes there.

I'm now rebasing a single commit with a single blank line added,
and for 3000 new commits to rebase over (with 64 MByte of git log -
for them) it takes twelve minutes, or about for commits per second,
and all user CPU, no I/O. It's pretty linear in number of commits, too.

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: best practices against long git rebase times?
  2015-12-06 16:43   ` Andreas Krey
@ 2015-12-07 21:02     ` Jeff King
  2015-12-07 22:56       ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2015-12-07 21:02 UTC (permalink / raw)
  To: Andreas Krey; +Cc: John Keeping, Git Mailing List

On Sun, Dec 06, 2015 at 05:43:45PM +0100, Andreas Krey wrote:

> On Fri, 04 Dec 2015 15:31:03 +0000, John Keeping wrote:
> ...
> > I'm pretty sure that you're right and the cherry-pick analysis is where
> > the time is spent.
> 
> But I'm pretty surprised as to the amount of CPU time that goes there.
> 
> I'm now rebasing a single commit with a single blank line added,
> and for 3000 new commits to rebase over (with 64 MByte of git log -
> for them) it takes twelve minutes, or about for commits per second,
> and all user CPU, no I/O. It's pretty linear in number of commits, too.

You're computing the patch against the parent for each of those 3000
commits (to get a hash of it to compare against the single hash on the
other side). Twelve minutes sounds long, but if you have a really
gigantic tree, it might not be unreasonable.

You can also try compiling with "make XDL_FAST_HASH=" (i.e., setting
that option to the empty string). Last year I found there were some
pretty suboptimal corner cases, and you may be hitting one (we should
probably turn that option off by default; I got stuck on trying to find
a hash that would perform faster and never followed up[1].

I doubt that is your problem, but it's possible).

-Peff

[1] http://thread.gmane.org/gmane.comp.version-control.git/261638

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: best practices against long git rebase times?
  2015-12-07 21:02     ` Jeff King
@ 2015-12-07 22:56       ` Junio C Hamano
  2015-12-07 22:59         ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2015-12-07 22:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Andreas Krey, John Keeping, Git Mailing List

Jeff King <peff@peff.net> writes:

> You're computing the patch against the parent for each of those 3000
> commits (to get a hash of it to compare against the single hash on the
> other side). Twelve minutes sounds long, but if you have a really
> gigantic tree, it might not be unreasonable.
>
> You can also try compiling with "make XDL_FAST_HASH=" (i.e., setting
> that option to the empty string). Last year I found there were some
> pretty suboptimal corner cases, and you may be hitting one (we should
> probably turn that option off by default; I got stuck on trying to find
> a hash that would perform faster and never followed up[1].
>
> I doubt that is your problem, but it's possible).
>
> -Peff
>
> [1] http://thread.gmane.org/gmane.comp.version-control.git/261638

I vaguely recall having discussed caching the patch-ids somewhere so
that this does not have to be done every time.  Would such an
extension help here, I wonder?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: best practices against long git rebase times?
  2015-12-07 22:56       ` Junio C Hamano
@ 2015-12-07 22:59         ` Jeff King
  2015-12-08  0:18           ` Junio C Hamano
  2015-12-08 17:45           ` Christian Couder
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff King @ 2015-12-07 22:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Andreas Krey, John Keeping, Git Mailing List

On Mon, Dec 07, 2015 at 02:56:33PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > You're computing the patch against the parent for each of those 3000
> > commits (to get a hash of it to compare against the single hash on the
> > other side). Twelve minutes sounds long, but if you have a really
> > gigantic tree, it might not be unreasonable.
> >
> > You can also try compiling with "make XDL_FAST_HASH=" (i.e., setting
> > that option to the empty string). Last year I found there were some
> > pretty suboptimal corner cases, and you may be hitting one (we should
> > probably turn that option off by default; I got stuck on trying to find
> > a hash that would perform faster and never followed up[1].
> >
> > I doubt that is your problem, but it's possible).
> >
> > -Peff
> >
> > [1] http://thread.gmane.org/gmane.comp.version-control.git/261638
> 
> I vaguely recall having discussed caching the patch-ids somewhere so
> that this does not have to be done every time.  Would such an
> extension help here, I wonder?

I think you missed John's earlier response which gave several pointers
to such caching schemes. :)

I used to run with patch-id-caching in my personal fork (I frequently
use "git log --cherry-mark" to see what has made it upstream), but I
haven't for a while. It did make a big difference in speed, but I never
resolved the corner cases around cache invalidation.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: best practices against long git rebase times?
  2015-12-07 22:59         ` Jeff King
@ 2015-12-08  0:18           ` Junio C Hamano
  2015-12-08 17:45           ` Christian Couder
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2015-12-08  0:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Andreas Krey, John Keeping, Git Mailing List

Jeff King <peff@peff.net> writes:

> I think you missed John's earlier response which gave several pointers
> to such caching schemes. :)

Yeah, you're right.

>
> I used to run with patch-id-caching in my personal fork (I frequently
> use "git log --cherry-mark" to see what has made it upstream), but I
> haven't for a while. It did make a big difference in speed, but I never
> resolved the corner cases around cache invalidation.
>
> -Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: best practices against long git rebase times?
  2015-12-07 22:59         ` Jeff King
  2015-12-08  0:18           ` Junio C Hamano
@ 2015-12-08 17:45           ` Christian Couder
  1 sibling, 0 replies; 13+ messages in thread
From: Christian Couder @ 2015-12-08 17:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Andreas Krey, John Keeping, Git Mailing List,
	Ævar Arnfjörð Bjarmason

On Mon, Dec 7, 2015 at 11:59 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Dec 07, 2015 at 02:56:33PM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>>
>> > You're computing the patch against the parent for each of those 3000
>> > commits (to get a hash of it to compare against the single hash on the
>> > other side). Twelve minutes sounds long, but if you have a really
>> > gigantic tree, it might not be unreasonable.
>> >
>> > You can also try compiling with "make XDL_FAST_HASH=" (i.e., setting
>> > that option to the empty string). Last year I found there were some
>> > pretty suboptimal corner cases, and you may be hitting one (we should
>> > probably turn that option off by default; I got stuck on trying to find
>> > a hash that would perform faster and never followed up[1].
>> >
>> > I doubt that is your problem, but it's possible).
>> >
>> > -Peff
>> >
>> > [1] http://thread.gmane.org/gmane.comp.version-control.git/261638
>>
>> I vaguely recall having discussed caching the patch-ids somewhere so
>> that this does not have to be done every time.  Would such an
>> extension help here, I wonder?
>
> I think you missed John's earlier response which gave several pointers
> to such caching schemes. :)

Yeah, he also gave very interesting performance numbers. Thanks John!

> I used to run with patch-id-caching in my personal fork (I frequently
> use "git log --cherry-mark" to see what has made it upstream), but I
> haven't for a while. It did make a big difference in speed, but I never
> resolved the corner cases around cache invalidation.

I will see if I can work on that after I am done with untracked cache...

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-12-08 17:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-04 15:05 best practices against long git rebase times? Andreas Krey
2015-12-04 15:31 ` John Keeping
2015-12-06 16:43   ` Andreas Krey
2015-12-07 21:02     ` Jeff King
2015-12-07 22:56       ` Junio C Hamano
2015-12-07 22:59         ` Jeff King
2015-12-08  0:18           ` Junio C Hamano
2015-12-08 17:45           ` Christian Couder
2015-12-04 17:09 ` demerphq
2015-12-04 17:28   ` John Keeping
2015-12-04 17:33     ` demerphq
2015-12-04 18:10       ` Stefan Beller
2015-12-06 16:40   ` Andreas Krey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).