* git branch --contains is slow with a lot of branches
@ 2015-03-31 1:45 Mike Hommey
2015-03-31 3:03 ` Jeff King
0 siblings, 1 reply; 2+ messages in thread
From: Mike Hommey @ 2015-03-31 1:45 UTC (permalink / raw)
To: git
Hi,
Sometimes I want to know what (possibly remote) branch contains a given
commit. The repository where I do that has thousands of branches:
$ git for-each-ref | wc -l
7657
And a lot of commits:
$ git rev-list --all | wc -l
538174
Using git branch --contains can be a very expensive thing:
$ time git branch --contains 0812b94 --all > /dev/null
real 3m0.871s
user 3m0.828s
sys 0m0.084s
I'd argue this shouldn't take much more time than enumerating all revs:
$ time git rev-list --all | wc -l
538174
real 0m4.842s
user 0m4.488s
sys 0m1.332s
This can be reproduced to a certain degree with the git git repo:
$ git clone https://github.com/git/git
$ cd git
$ for i in $(seq 1 1000); do git branch branch$i master; done
$ git gc # will pack the refs
$ time git rev-list --all | wc -l
40886
real 0m0.505s
user 0m0.464s
sys 0m0.108s
$ time git branch --contains v2.0.0 > /dev/null
real 0m6.207s
user 0m6.204s
sys 0m0.004s
(especially in this case where all branches point to the same commit)
It's also essentially linear on the number of branches:
$ for i in $(seq 1001 7000); do git branch branch$i master; done
$ git gc
$ time git rev-list --all | wc -l
40886
real 0m0.493s
user 0m0.484s
sys 0m0.076s
$ time git branch --contains v2.0.0 > /dev/null
real 0m43.446s
user 0m43.436s
sys 0m0.040s
Mike
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: git branch --contains is slow with a lot of branches
2015-03-31 1:45 git branch --contains is slow with a lot of branches Mike Hommey
@ 2015-03-31 3:03 ` Jeff King
0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2015-03-31 3:03 UTC (permalink / raw)
To: Mike Hommey; +Cc: git
On Tue, Mar 31, 2015 at 10:45:11AM +0900, Mike Hommey wrote:
> Using git branch --contains can be a very expensive thing:
> [...]
Yes, this is well known. It does a separate traversal for each branch,
which is why you noticed that it's linear in the number of branches.
I changed the "tag --contains" algorithm a while ago to do it all in one
traversal. The downside is that it uses a depth-first approach which
means it almost always goes to the roots. This is more appropriate for
tags (as you often have old tags), but less so for branches.
I did some work on a contains() implementation that would is
breadth-first, but handles multiple tips in a single traversal. It needs
a little polish, and then to be hooked into "git branch". This is part
of the proposed GSoC project for unifying "tag -l", "branch -l", and
"for-each-ref".
-Peff
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-03-31 3:04 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-31 1:45 git branch --contains is slow with a lot of branches Mike Hommey
2015-03-31 3:03 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).