git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git branch --contains is slow with a lot of branches
@ 2015-03-31  1:45 Mike Hommey
  2015-03-31  3:03 ` Jeff King
  0 siblings, 1 reply; 2+ messages in thread
From: Mike Hommey @ 2015-03-31  1:45 UTC (permalink / raw)
  To: git

Hi,

Sometimes I want to know what (possibly remote) branch contains a given
commit. The repository where I do that has thousands of branches:

$ git for-each-ref | wc -l
7657

And a lot of commits:

$ git rev-list --all | wc -l
538174

Using git branch --contains can be a very expensive thing:

$ time git branch --contains 0812b94 --all > /dev/null

  real  3m0.871s
  user  3m0.828s
  sys   0m0.084s

I'd argue this shouldn't take much more time than enumerating all revs:

$ time git rev-list --all | wc -l
538174

real    0m4.842s
user    0m4.488s
sys     0m1.332s

This can be reproduced to a certain degree with the git git repo:

$ git clone https://github.com/git/git
$ cd git
$ for i in $(seq 1 1000); do git branch branch$i master; done
$ git gc # will pack the refs
$ time git rev-list --all | wc -l
40886

real    0m0.505s
user    0m0.464s
sys     0m0.108s

$ time git branch --contains v2.0.0 > /dev/null

real    0m6.207s
user    0m6.204s
sys     0m0.004s

(especially in this case where all branches point to the same commit)

It's also essentially linear on the number of branches:

$ for i in $(seq 1001 7000); do git branch branch$i master; done
$ git gc
$ time git rev-list --all | wc -l
40886

real    0m0.493s
user    0m0.484s
sys     0m0.076s

$ time git branch --contains v2.0.0 > /dev/null

real    0m43.446s
user    0m43.436s
sys     0m0.040s

Mike

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: git branch --contains is slow with a lot of branches
  2015-03-31  1:45 git branch --contains is slow with a lot of branches Mike Hommey
@ 2015-03-31  3:03 ` Jeff King
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2015-03-31  3:03 UTC (permalink / raw)
  To: Mike Hommey; +Cc: git

On Tue, Mar 31, 2015 at 10:45:11AM +0900, Mike Hommey wrote:

> Using git branch --contains can be a very expensive thing:
> [...]

Yes, this is well known. It does a separate traversal for each branch,
which is why you noticed that it's linear in the number of branches.

I changed the "tag --contains" algorithm a while ago to do it all in one
traversal. The downside is that it uses a depth-first approach which
means it almost always goes to the roots. This is more appropriate for
tags (as you often have old tags), but less so for branches.

I did some work on a contains() implementation that would is
breadth-first, but handles multiple tips in a single traversal. It needs
a little polish, and then to be hooked into "git branch". This is part
of the proposed GSoC project for unifying "tag -l", "branch -l", and
"for-each-ref".

-Peff

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-03-31  3:04 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-31  1:45 git branch --contains is slow with a lot of branches Mike Hommey
2015-03-31  3:03 ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).