* [BUG] "git describe --match" performance
@ 2024-10-30 4:43 Josh Poimboeuf
2024-10-31 11:47 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Josh Poimboeuf @ 2024-10-30 4:43 UTC (permalink / raw)
To: git
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
>
> What did you do before the bug happened? (Steps to reproduce your issue)
$ git clone https://github.com/torvalds/linux
$ cd linux
$ git checkout c61e41121036
$ time git describe --match=v6.10-rc7 --debug
describe HEAD
No exact match on refs or tags, searching to describe
finished search at d8470b7c13e11c18cf14a7e3180f0b00e715e4f0
annotated 1844 v6.10-rc7
traversed 1282750 commits
v6.10-rc7-1844-gc61e41121036
real 0m9.243s
user 0m8.940s
sys 0m0.268s
$ time git describe
v6.10-rc7-1844-gc61e41121036
real 0m0.149s
user 0m0.111s
sys 0m0.036s
> What did you expect to happen? (Expected behavior)
I expected "git describe --match=v6.10-rc7" to be faster than plain "git
describe".
> What happened instead? (Actual behavior)
It takes over 9 seconds and traverses 1282750 commits.
(In my actual Linux git repo it's even worse at 15 seconds due to more
git history.)
> What's different between what you expected and what actually happened?
Over 9 seconds :-)
> Anything else you want to add:
I see this with both version 2.47.0 and the next branch.
This command is used by the kernel setlocalversion script, which is run
for every kernel build, so it adds 10-15 seconds to every build on an
untagged commit.
I suspect the problem is that there's only a single match for
"v6.10-rc7", but it tries to find 10 candidates so it ends up searching
the entire history. But "--candidates=1" doesn't seem to help unless I
add a second match like so:
$ time git describe --match=v6.10-rc7 --match=v6.10-rc6 --candidates=1
v6.10-rc7-1844-gc61e41121036
real 0m0.112s
user 0m0.081s
sys 0m0.031s
> Please review the rest of the bug report below.
> You can delete any lines you don't wish to share.
[System Info]
git version:
git version 2.47.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
libcurl: 8.6.0
OpenSSL: OpenSSL 3.2.2 4 Jun 2024
zlib: 1.3.1.zlib-ng
uname: Linux 6.10.12-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Sep 30 21:38:25 UTC 2024 x86_64
compiler info: gnuc: 14.2
libc info: glibc: 2.39
$SHELL (typically, interactive shell): /bin/bash
[Enabled Hooks]
--
Josh
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance
2024-10-30 4:43 [BUG] "git describe --match" performance Josh Poimboeuf
@ 2024-10-31 11:47 ` Jeff King
2024-10-31 15:10 ` Josh Poimboeuf
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Jeff King @ 2024-10-31 11:47 UTC (permalink / raw)
To: Josh Poimboeuf; +Cc: git
On Tue, Oct 29, 2024 at 09:43:22PM -0700, Josh Poimboeuf wrote:
> $ time git describe --match=v6.10-rc7 --debug
> describe HEAD
> No exact match on refs or tags, searching to describe
> finished search at d8470b7c13e11c18cf14a7e3180f0b00e715e4f0
> annotated 1844 v6.10-rc7
> traversed 1282750 commits
> v6.10-rc7-1844-gc61e41121036
>
> real 0m9.243s
> user 0m8.940s
> sys 0m0.268s
>
> $ time git describe
> v6.10-rc7-1844-gc61e41121036
>
> real 0m0.149s
> user 0m0.111s
> sys 0m0.036s
There's more discussion of the actual solution in the nearby thread from
Rasmus. But I did want to note one thing here: when I initially tried to
reproduce your problem, my "slow" case was a lot less bad.
The reason is that I had a commit graph file to speed up traversal. So
independent of the git-describe fix, you might want to try:
git commit-graph write --reachable
That reduces the slow case for me by a factor of 10. And likewise other
traversal operations should get faster.
I think we'll build the commit graph file by default these days when you
run "git gc". But we don't build it immediately after cloning. Perhaps
we should change that.
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance
2024-10-31 11:47 ` Jeff King
@ 2024-10-31 15:10 ` Josh Poimboeuf
2024-10-31 15:31 ` Jeff King
2024-10-31 16:14 ` Kristoffer Haugsbakk
2024-10-31 19:00 ` Taylor Blau
2 siblings, 1 reply; 10+ messages in thread
From: Josh Poimboeuf @ 2024-10-31 15:10 UTC (permalink / raw)
To: Jeff King; +Cc: git
On Thu, Oct 31, 2024 at 07:47:31AM -0400, Jeff King wrote:
> On Tue, Oct 29, 2024 at 09:43:22PM -0700, Josh Poimboeuf wrote:
>
> > $ time git describe --match=v6.10-rc7 --debug
> > describe HEAD
> > No exact match on refs or tags, searching to describe
> > finished search at d8470b7c13e11c18cf14a7e3180f0b00e715e4f0
> > annotated 1844 v6.10-rc7
> > traversed 1282750 commits
> > v6.10-rc7-1844-gc61e41121036
> >
> > real 0m9.243s
> > user 0m8.940s
> > sys 0m0.268s
> >
> > $ time git describe
> > v6.10-rc7-1844-gc61e41121036
> >
> > real 0m0.149s
> > user 0m0.111s
> > sys 0m0.036s
>
> There's more discussion of the actual solution in the nearby thread from
> Rasmus. But I did want to note one thing here: when I initially tried to
> reproduce your problem, my "slow" case was a lot less bad.
>
> The reason is that I had a commit graph file to speed up traversal. So
> independent of the git-describe fix, you might want to try:
>
> git commit-graph write --reachable
>
> That reduces the slow case for me by a factor of 10. And likewise other
> traversal operations should get faster.
>
> I think we'll build the commit graph file by default these days when you
> run "git gc". But we don't build it immediately after cloning. Perhaps
> we should change that.
Hm... I actually ran "git gc" and it didn't seem to help at all.
--
Josh
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance
2024-10-31 15:10 ` Josh Poimboeuf
@ 2024-10-31 15:31 ` Jeff King
2024-10-31 16:25 ` Josh Poimboeuf
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2024-10-31 15:31 UTC (permalink / raw)
To: Josh Poimboeuf; +Cc: git
On Thu, Oct 31, 2024 at 08:10:00AM -0700, Josh Poimboeuf wrote:
> > I think we'll build the commit graph file by default these days when you
> > run "git gc". But we don't build it immediately after cloning. Perhaps
> > we should change that.
>
> Hm... I actually ran "git gc" and it didn't seem to help at all.
What version of Git are you running? I think gc enabled it by default in
31b1de6a09 (commit-graph: turn on commit-graph by default, 2019-08-13),
which is v2.24.0.
You could also try "git commit-graph write --reachable" and see if that
improves things. If it doesn't, then maybe you have the reading side
turned off explicitly for some reason? Try "git config core.commitgraph"
to see if you have that set to "false".
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance
2024-10-31 11:47 ` Jeff King
2024-10-31 15:10 ` Josh Poimboeuf
@ 2024-10-31 16:14 ` Kristoffer Haugsbakk
2024-10-31 17:20 ` Jeff King
2024-10-31 19:00 ` Taylor Blau
2 siblings, 1 reply; 10+ messages in thread
From: Kristoffer Haugsbakk @ 2024-10-31 16:14 UTC (permalink / raw)
To: Jeff King, Josh Poimboeuf; +Cc: git, rohitner1
On Thu, Oct 31, 2024, at 12:47, Jeff King wrote:
> git commit-graph write --reachable
>
> That reduces the slow case for me by a factor of 10. And likewise other
> traversal operations should get faster.
>
> I think we'll build the commit graph file by default these days when you
> run "git gc". But we don't build it immediately after cloning. Perhaps
> we should change that.
>
> -Peff
There was this thread from last year where Rohit cloned Linux and the
command took more than twelve seconds. Then git-commit-graph(1) fixed
it.
https://lore.kernel.org/git/CAKazavxTXwcZFtL2XyU3MpaUR=snWY8w8Lwpco+mkbqm2nWE=w@mail.gmail.com/
It would be nice if the graph was written on clone. With the status quo
you might think that there is performance bug (if that’s the term)
somewhere. Then you make a reproduction script using git-clone(1) in
order to have a blank slate. Of course it reproduces every time. But
the slow git-log(1) doesn’t happen for people who have had the repo long
enough for a GC to hit.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance
2024-10-31 15:31 ` Jeff King
@ 2024-10-31 16:25 ` Josh Poimboeuf
2024-10-31 17:05 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Josh Poimboeuf @ 2024-10-31 16:25 UTC (permalink / raw)
To: Jeff King; +Cc: git
On Thu, Oct 31, 2024 at 11:31:43AM -0400, Jeff King wrote:
> On Thu, Oct 31, 2024 at 08:10:00AM -0700, Josh Poimboeuf wrote:
>
> > > I think we'll build the commit graph file by default these days when you
> > > run "git gc". But we don't build it immediately after cloning. Perhaps
> > > we should change that.
> >
> > Hm... I actually ran "git gc" and it didn't seem to help at all.
>
> What version of Git are you running? I think gc enabled it by default in
> 31b1de6a09 (commit-graph: turn on commit-graph by default, 2019-08-13),
> which is v2.24.0.
>
> You could also try "git commit-graph write --reachable" and see if that
> improves things. If it doesn't, then maybe you have the reading side
> turned off explicitly for some reason? Try "git config core.commitgraph"
> to see if you have that set to "false".
Actually, I take that back. You're right, in the freshly cloned repo,
"git gc" did help ~10x:
$ git describe --match=v6.10-rc7
v6.10-rc7-1844-gc61e41121036
real 0m10.681s
user 0m9.442s
sys 0m0.507s
$ git gc
Enumerating objects: 10460179, done.
Counting objects: 100% (10460179/10460179), done.
Delta compression using up to 12 threads
Compressing objects: 100% (1886458/1886458), done.
Writing objects: 100% (10460179/10460179), done.
Total 10460179 (delta 8517403), reused 10460179 (delta 8517403), pack-reused 0 (from 0)
Expanding reachable commits in commit graph: 1310355, done.
Writing out commit graph in 5 passes: 100% (6551775/6551775), done.
$ time git describe --match=v6.10-rc7
v6.10-rc7-1844-gc61e41121036
real 0m1.173s
user 0m1.002s
sys 0m0.136s
But my real development repo, which has many branches and remotes plus
the historical git repo grafted, still takes 10+ seconds.
$ git --version
git version 2.47.0
$ git gc
Enumerating objects: 14656254, done.
Counting objects: 100% (12534942/12534942), done.
Delta compression using up to 12 threads
Compressing objects: 100% (1829918/1829918), done.
Writing objects: 100% (12534942/12534942), done.
Total 12534942 (delta 10652548), reused 12534853 (delta 10652487), pack-reused 0 (from 0)
Enumerating cruft objects: 6133, done.
Traversing cruft objects: 14736, done.
Counting objects: 100% (6133/6133), done.
Delta compression using up to 12 threads
Compressing objects: 100% (1179/1179), done.
Writing objects: 100% (6133/6133), done.
Total 6133 (delta 4876), reused 6117 (delta 4865), pack-reused 0 (from 0)
$ git commit-graph write --reachable
Expanding reachable commits in commit graph: 1941353, done.
Finding extra edges in commit graph: 100% (1941353/1941353), done.
Writing out commit graph in 5 passes: 100% (9706765/9706765), done.
$ git config core.commitgraph
$
$ git describe --match=v6.12-rc5 --debug
describe HEAD
No exact match on refs or tags, searching to describe
finished search at d8470b7c13e11c18cf14a7e3180f0b00e715e4f0
annotated 297 v6.12-rc5
traversed 1310258 commits
v6.12-rc5-297-ge7427640278f
real 0m11.626s
user 0m11.298s
sys 0m0.289s
Note the commit it finishes at is from almost 20 years ago (I have
historical Linux git history grafted in which goes back to 1991):
commit d8470b7c13e11c18cf14a7e3180f0b00e715e4f0
Author: Karsten Keil <kkeil@suse.de>
Date: Thu Apr 21 08:30:30 2005 -0700
[PATCH] fix for ISDN ippp filtering
We do not longer use DLT_LINUX_SLL for activ/pass filters but
DLT_PPP_WITHDIRECTION witch need 1 as outbound flag.
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Presumably only one candidate matches the "v6.12-rc5" glob (which is an
exact string, not a wildcard) so it tries to find 9 more but never finds
any?
Since it's not a wildcard pattern, I would expect it to stop immediately
when it reaches the exact match.
--
Josh
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance
2024-10-31 16:25 ` Josh Poimboeuf
@ 2024-10-31 17:05 ` Jeff King
2024-10-31 19:01 ` Taylor Blau
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2024-10-31 17:05 UTC (permalink / raw)
To: Josh Poimboeuf; +Cc: git
On Thu, Oct 31, 2024 at 09:25:22AM -0700, Josh Poimboeuf wrote:
> But my real development repo, which has many branches and remotes plus
> the historical git repo grafted, still takes 10+ seconds.
If grafts are present, the use of the commit-graph is disabled, because
the point of the commit-graph is precomputing and caching various
properties of commits. Which, absent grafting, are immutable.
I think we talked long ago about computing commit-graphs over the
grafted state, and then using those graphs as long as the graft state
remained the same. But I don't think we ever implemented anything.
Another possibility (that I don't recall we've ever discussed) is
partially using commit graphs. Some commit properties, like generation
numbers, depend on other commits. So a graft at the bottom of history is
going to rewrite the generations for all of the descendants. But we
could still use the graph information to load the parents and trees of
all of the non-grafted commits. Those are still valid even in a grafted
situation, and that's what's providing most of the speed up in this case
(without it, we're literally zlib inflating each commit we traverse in
order to find its parents, versus an integer lookup via the
commit-graph).
That might not be _too_ hard to implement. In theory, anyway. :)
> Note the commit it finishes at is from almost 20 years ago (I have
> historical Linux git history grafted in which goes back to 1991):
>
> commit d8470b7c13e11c18cf14a7e3180f0b00e715e4f0
> Author: Karsten Keil <kkeil@suse.de>
> Date: Thu Apr 21 08:30:30 2005 -0700
>
> [PATCH] fix for ISDN ippp filtering
>
> We do not longer use DLT_LINUX_SLL for activ/pass filters but
> DLT_PPP_WITHDIRECTION witch need 1 as outbound flag.
>
> Signed-off-by: Karsten Keil <kkeil@suse.de>
> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>
>
> Presumably only one candidate matches the "v6.12-rc5" glob (which is an
> exact string, not a wildcard) so it tries to find 9 more but never finds
> any?
>
> Since it's not a wildcard pattern, I would expect it to stop immediately
> when it reaches the exact match.
Yeah, I think this is just the same issue we've been discussing.
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance
2024-10-31 16:14 ` Kristoffer Haugsbakk
@ 2024-10-31 17:20 ` Jeff King
0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2024-10-31 17:20 UTC (permalink / raw)
To: Kristoffer Haugsbakk; +Cc: Josh Poimboeuf, git, rohitner1
On Thu, Oct 31, 2024 at 05:14:01PM +0100, Kristoffer Haugsbakk wrote:
> On Thu, Oct 31, 2024, at 12:47, Jeff King wrote:
> > git commit-graph write --reachable
> >
> > That reduces the slow case for me by a factor of 10. And likewise other
> > traversal operations should get faster.
> >
> > I think we'll build the commit graph file by default these days when you
> > run "git gc". But we don't build it immediately after cloning. Perhaps
> > we should change that.
> >
> > -Peff
>
> There was this thread from last year where Rohit cloned Linux and the
> command took more than twelve seconds. Then git-commit-graph(1) fixed
> it.
>
> https://lore.kernel.org/git/CAKazavxTXwcZFtL2XyU3MpaUR=snWY8w8Lwpco+mkbqm2nWE=w@mail.gmail.com/
Yeah, there's discussion in the linked thread there about running "git
gc" after clone. If "gc --auto" was pruned down to run some minimal
bits, that might be OK. What we definitely _don't_ want to do is run
"git repack", because it's very expensive to do the full object graph
walk (and buys nothing on a freshly cloned repo unless you do the even
more expensive "-f").
So the simplest path forward, but which is a little messy, is to just
run "commit-graph write" after the clone. In fact, we already have
fetch.writeCommitGraph (though it does still default to "false"). I'd
expect that to work with clone (since it's conceptually init+fetch), but
it doesn't seem to. Looks like the code to trigger it is directly in
builtin/fetch.c, and clone triggers the fetch itself internally.
But I don't think factoring that out and calling it from both places
would be too hard.
It's a bigger question whether people might be annoyed by some extra
computation at clone time. But I suspect it's OK in practice. Even
ignoring the cost of moving it over the network, my 8-core machine takes
almost 3 minutes to index the linux.git packfile on clone. Building the
commit graph after that takes about 13 seconds.
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance
2024-10-31 11:47 ` Jeff King
2024-10-31 15:10 ` Josh Poimboeuf
2024-10-31 16:14 ` Kristoffer Haugsbakk
@ 2024-10-31 19:00 ` Taylor Blau
2 siblings, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2024-10-31 19:00 UTC (permalink / raw)
To: Jeff King; +Cc: Josh Poimboeuf, git
On Thu, Oct 31, 2024 at 07:47:31AM -0400, Jeff King wrote:
> I think we'll build the commit graph file by default these days when you
> run "git gc". But we don't build it immediately after cloning. Perhaps
> we should change that.
I think that would be a reasonable thing to do. We already have
fetch.writeCommitGraph and gc.writeCommitGraph, so it seems like a
natural extension to add clone.writeCommitGraph.
I don't have a strong feeling about what the default should be, although
I err on the side of "true". Most repositories cloned will be small
enough that writing a commit-graph upon clone shouldn't take too long.
So the savings it will provide will be well worth the marginal increase
in perceived clone time.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance
2024-10-31 17:05 ` Jeff King
@ 2024-10-31 19:01 ` Taylor Blau
0 siblings, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2024-10-31 19:01 UTC (permalink / raw)
To: Jeff King; +Cc: Josh Poimboeuf, git, Derrick Stolee
On Thu, Oct 31, 2024 at 01:05:26PM -0400, Jeff King wrote:
> I think we talked long ago about computing commit-graphs over the
> grafted state, and then using those graphs as long as the graft state
> remained the same. But I don't think we ever implemented anything.
>
> Another possibility (that I don't recall we've ever discussed) is
> partially using commit graphs. Some commit properties, like generation
> numbers, depend on other commits. So a graft at the bottom of history is
> going to rewrite the generations for all of the descendants. But we
> could still use the graph information to load the parents and trees of
> all of the non-grafted commits. Those are still valid even in a grafted
> situation, and that's what's providing most of the speed up in this case
> (without it, we're literally zlib inflating each commit we traverse in
> order to find its parents, versus an integer lookup via the
> commit-graph).
>
> That might not be _too_ hard to implement. In theory, anyway. :)
Adding Stolee (CC'd), our resident commit-graph expert, to see if they
have any thoughts.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-10-31 19:01 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-30 4:43 [BUG] "git describe --match" performance Josh Poimboeuf
2024-10-31 11:47 ` Jeff King
2024-10-31 15:10 ` Josh Poimboeuf
2024-10-31 15:31 ` Jeff King
2024-10-31 16:25 ` Josh Poimboeuf
2024-10-31 17:05 ` Jeff King
2024-10-31 19:01 ` Taylor Blau
2024-10-31 16:14 ` Kristoffer Haugsbakk
2024-10-31 17:20 ` Jeff King
2024-10-31 19:00 ` Taylor Blau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).