* [BUG] "git describe --match" performance @ 2024-10-30 4:43 Josh Poimboeuf 2024-10-31 11:47 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Josh Poimboeuf @ 2024-10-30 4:43 UTC (permalink / raw) To: git > Thank you for filling out a Git bug report! > Please answer the following questions to help us understand your issue. > > What did you do before the bug happened? (Steps to reproduce your issue) $ git clone https://github.com/torvalds/linux $ cd linux $ git checkout c61e41121036 $ time git describe --match=v6.10-rc7 --debug describe HEAD No exact match on refs or tags, searching to describe finished search at d8470b7c13e11c18cf14a7e3180f0b00e715e4f0 annotated 1844 v6.10-rc7 traversed 1282750 commits v6.10-rc7-1844-gc61e41121036 real 0m9.243s user 0m8.940s sys 0m0.268s $ time git describe v6.10-rc7-1844-gc61e41121036 real 0m0.149s user 0m0.111s sys 0m0.036s > What did you expect to happen? (Expected behavior) I expected "git describe --match=v6.10-rc7" to be faster than plain "git describe". > What happened instead? (Actual behavior) It takes over 9 seconds and traverses 1282750 commits. (In my actual Linux git repo it's even worse at 15 seconds due to more git history.) > What's different between what you expected and what actually happened? Over 9 seconds :-) > Anything else you want to add: I see this with both version 2.47.0 and the next branch. This command is used by the kernel setlocalversion script, which is run for every kernel build, so it adds 10-15 seconds to every build on an untagged commit. I suspect the problem is that there's only a single match for "v6.10-rc7", but it tries to find 10 candidates so it ends up searching the entire history. But "--candidates=1" doesn't seem to help unless I add a second match like so: $ time git describe --match=v6.10-rc7 --match=v6.10-rc6 --candidates=1 v6.10-rc7-1844-gc61e41121036 real 0m0.112s user 0m0.081s sys 0m0.031s > Please review the rest of the bug report below. > You can delete any lines you don't wish to share. [System Info] git version: git version 2.47.0 cpu: x86_64 no commit associated with this build sizeof-long: 8 sizeof-size_t: 8 shell-path: /bin/sh libcurl: 8.6.0 OpenSSL: OpenSSL 3.2.2 4 Jun 2024 zlib: 1.3.1.zlib-ng uname: Linux 6.10.12-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Sep 30 21:38:25 UTC 2024 x86_64 compiler info: gnuc: 14.2 libc info: glibc: 2.39 $SHELL (typically, interactive shell): /bin/bash [Enabled Hooks] -- Josh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance 2024-10-30 4:43 [BUG] "git describe --match" performance Josh Poimboeuf @ 2024-10-31 11:47 ` Jeff King 2024-10-31 15:10 ` Josh Poimboeuf ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Jeff King @ 2024-10-31 11:47 UTC (permalink / raw) To: Josh Poimboeuf; +Cc: git On Tue, Oct 29, 2024 at 09:43:22PM -0700, Josh Poimboeuf wrote: > $ time git describe --match=v6.10-rc7 --debug > describe HEAD > No exact match on refs or tags, searching to describe > finished search at d8470b7c13e11c18cf14a7e3180f0b00e715e4f0 > annotated 1844 v6.10-rc7 > traversed 1282750 commits > v6.10-rc7-1844-gc61e41121036 > > real 0m9.243s > user 0m8.940s > sys 0m0.268s > > $ time git describe > v6.10-rc7-1844-gc61e41121036 > > real 0m0.149s > user 0m0.111s > sys 0m0.036s There's more discussion of the actual solution in the nearby thread from Rasmus. But I did want to note one thing here: when I initially tried to reproduce your problem, my "slow" case was a lot less bad. The reason is that I had a commit graph file to speed up traversal. So independent of the git-describe fix, you might want to try: git commit-graph write --reachable That reduces the slow case for me by a factor of 10. And likewise other traversal operations should get faster. I think we'll build the commit graph file by default these days when you run "git gc". But we don't build it immediately after cloning. Perhaps we should change that. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance 2024-10-31 11:47 ` Jeff King @ 2024-10-31 15:10 ` Josh Poimboeuf 2024-10-31 15:31 ` Jeff King 2024-10-31 16:14 ` Kristoffer Haugsbakk 2024-10-31 19:00 ` Taylor Blau 2 siblings, 1 reply; 10+ messages in thread From: Josh Poimboeuf @ 2024-10-31 15:10 UTC (permalink / raw) To: Jeff King; +Cc: git On Thu, Oct 31, 2024 at 07:47:31AM -0400, Jeff King wrote: > On Tue, Oct 29, 2024 at 09:43:22PM -0700, Josh Poimboeuf wrote: > > > $ time git describe --match=v6.10-rc7 --debug > > describe HEAD > > No exact match on refs or tags, searching to describe > > finished search at d8470b7c13e11c18cf14a7e3180f0b00e715e4f0 > > annotated 1844 v6.10-rc7 > > traversed 1282750 commits > > v6.10-rc7-1844-gc61e41121036 > > > > real 0m9.243s > > user 0m8.940s > > sys 0m0.268s > > > > $ time git describe > > v6.10-rc7-1844-gc61e41121036 > > > > real 0m0.149s > > user 0m0.111s > > sys 0m0.036s > > There's more discussion of the actual solution in the nearby thread from > Rasmus. But I did want to note one thing here: when I initially tried to > reproduce your problem, my "slow" case was a lot less bad. > > The reason is that I had a commit graph file to speed up traversal. So > independent of the git-describe fix, you might want to try: > > git commit-graph write --reachable > > That reduces the slow case for me by a factor of 10. And likewise other > traversal operations should get faster. > > I think we'll build the commit graph file by default these days when you > run "git gc". But we don't build it immediately after cloning. Perhaps > we should change that. Hm... I actually ran "git gc" and it didn't seem to help at all. -- Josh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance 2024-10-31 15:10 ` Josh Poimboeuf @ 2024-10-31 15:31 ` Jeff King 2024-10-31 16:25 ` Josh Poimboeuf 0 siblings, 1 reply; 10+ messages in thread From: Jeff King @ 2024-10-31 15:31 UTC (permalink / raw) To: Josh Poimboeuf; +Cc: git On Thu, Oct 31, 2024 at 08:10:00AM -0700, Josh Poimboeuf wrote: > > I think we'll build the commit graph file by default these days when you > > run "git gc". But we don't build it immediately after cloning. Perhaps > > we should change that. > > Hm... I actually ran "git gc" and it didn't seem to help at all. What version of Git are you running? I think gc enabled it by default in 31b1de6a09 (commit-graph: turn on commit-graph by default, 2019-08-13), which is v2.24.0. You could also try "git commit-graph write --reachable" and see if that improves things. If it doesn't, then maybe you have the reading side turned off explicitly for some reason? Try "git config core.commitgraph" to see if you have that set to "false". -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance 2024-10-31 15:31 ` Jeff King @ 2024-10-31 16:25 ` Josh Poimboeuf 2024-10-31 17:05 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Josh Poimboeuf @ 2024-10-31 16:25 UTC (permalink / raw) To: Jeff King; +Cc: git On Thu, Oct 31, 2024 at 11:31:43AM -0400, Jeff King wrote: > On Thu, Oct 31, 2024 at 08:10:00AM -0700, Josh Poimboeuf wrote: > > > > I think we'll build the commit graph file by default these days when you > > > run "git gc". But we don't build it immediately after cloning. Perhaps > > > we should change that. > > > > Hm... I actually ran "git gc" and it didn't seem to help at all. > > What version of Git are you running? I think gc enabled it by default in > 31b1de6a09 (commit-graph: turn on commit-graph by default, 2019-08-13), > which is v2.24.0. > > You could also try "git commit-graph write --reachable" and see if that > improves things. If it doesn't, then maybe you have the reading side > turned off explicitly for some reason? Try "git config core.commitgraph" > to see if you have that set to "false". Actually, I take that back. You're right, in the freshly cloned repo, "git gc" did help ~10x: $ git describe --match=v6.10-rc7 v6.10-rc7-1844-gc61e41121036 real 0m10.681s user 0m9.442s sys 0m0.507s $ git gc Enumerating objects: 10460179, done. Counting objects: 100% (10460179/10460179), done. Delta compression using up to 12 threads Compressing objects: 100% (1886458/1886458), done. Writing objects: 100% (10460179/10460179), done. Total 10460179 (delta 8517403), reused 10460179 (delta 8517403), pack-reused 0 (from 0) Expanding reachable commits in commit graph: 1310355, done. Writing out commit graph in 5 passes: 100% (6551775/6551775), done. $ time git describe --match=v6.10-rc7 v6.10-rc7-1844-gc61e41121036 real 0m1.173s user 0m1.002s sys 0m0.136s But my real development repo, which has many branches and remotes plus the historical git repo grafted, still takes 10+ seconds. $ git --version git version 2.47.0 $ git gc Enumerating objects: 14656254, done. Counting objects: 100% (12534942/12534942), done. Delta compression using up to 12 threads Compressing objects: 100% (1829918/1829918), done. Writing objects: 100% (12534942/12534942), done. Total 12534942 (delta 10652548), reused 12534853 (delta 10652487), pack-reused 0 (from 0) Enumerating cruft objects: 6133, done. Traversing cruft objects: 14736, done. Counting objects: 100% (6133/6133), done. Delta compression using up to 12 threads Compressing objects: 100% (1179/1179), done. Writing objects: 100% (6133/6133), done. Total 6133 (delta 4876), reused 6117 (delta 4865), pack-reused 0 (from 0) $ git commit-graph write --reachable Expanding reachable commits in commit graph: 1941353, done. Finding extra edges in commit graph: 100% (1941353/1941353), done. Writing out commit graph in 5 passes: 100% (9706765/9706765), done. $ git config core.commitgraph $ $ git describe --match=v6.12-rc5 --debug describe HEAD No exact match on refs or tags, searching to describe finished search at d8470b7c13e11c18cf14a7e3180f0b00e715e4f0 annotated 297 v6.12-rc5 traversed 1310258 commits v6.12-rc5-297-ge7427640278f real 0m11.626s user 0m11.298s sys 0m0.289s Note the commit it finishes at is from almost 20 years ago (I have historical Linux git history grafted in which goes back to 1991): commit d8470b7c13e11c18cf14a7e3180f0b00e715e4f0 Author: Karsten Keil <kkeil@suse.de> Date: Thu Apr 21 08:30:30 2005 -0700 [PATCH] fix for ISDN ippp filtering We do not longer use DLT_LINUX_SLL for activ/pass filters but DLT_PPP_WITHDIRECTION witch need 1 as outbound flag. Signed-off-by: Karsten Keil <kkeil@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org> Presumably only one candidate matches the "v6.12-rc5" glob (which is an exact string, not a wildcard) so it tries to find 9 more but never finds any? Since it's not a wildcard pattern, I would expect it to stop immediately when it reaches the exact match. -- Josh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance 2024-10-31 16:25 ` Josh Poimboeuf @ 2024-10-31 17:05 ` Jeff King 2024-10-31 19:01 ` Taylor Blau 0 siblings, 1 reply; 10+ messages in thread From: Jeff King @ 2024-10-31 17:05 UTC (permalink / raw) To: Josh Poimboeuf; +Cc: git On Thu, Oct 31, 2024 at 09:25:22AM -0700, Josh Poimboeuf wrote: > But my real development repo, which has many branches and remotes plus > the historical git repo grafted, still takes 10+ seconds. If grafts are present, the use of the commit-graph is disabled, because the point of the commit-graph is precomputing and caching various properties of commits. Which, absent grafting, are immutable. I think we talked long ago about computing commit-graphs over the grafted state, and then using those graphs as long as the graft state remained the same. But I don't think we ever implemented anything. Another possibility (that I don't recall we've ever discussed) is partially using commit graphs. Some commit properties, like generation numbers, depend on other commits. So a graft at the bottom of history is going to rewrite the generations for all of the descendants. But we could still use the graph information to load the parents and trees of all of the non-grafted commits. Those are still valid even in a grafted situation, and that's what's providing most of the speed up in this case (without it, we're literally zlib inflating each commit we traverse in order to find its parents, versus an integer lookup via the commit-graph). That might not be _too_ hard to implement. In theory, anyway. :) > Note the commit it finishes at is from almost 20 years ago (I have > historical Linux git history grafted in which goes back to 1991): > > commit d8470b7c13e11c18cf14a7e3180f0b00e715e4f0 > Author: Karsten Keil <kkeil@suse.de> > Date: Thu Apr 21 08:30:30 2005 -0700 > > [PATCH] fix for ISDN ippp filtering > > We do not longer use DLT_LINUX_SLL for activ/pass filters but > DLT_PPP_WITHDIRECTION witch need 1 as outbound flag. > > Signed-off-by: Karsten Keil <kkeil@suse.de> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > Presumably only one candidate matches the "v6.12-rc5" glob (which is an > exact string, not a wildcard) so it tries to find 9 more but never finds > any? > > Since it's not a wildcard pattern, I would expect it to stop immediately > when it reaches the exact match. Yeah, I think this is just the same issue we've been discussing. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance 2024-10-31 17:05 ` Jeff King @ 2024-10-31 19:01 ` Taylor Blau 0 siblings, 0 replies; 10+ messages in thread From: Taylor Blau @ 2024-10-31 19:01 UTC (permalink / raw) To: Jeff King; +Cc: Josh Poimboeuf, git, Derrick Stolee On Thu, Oct 31, 2024 at 01:05:26PM -0400, Jeff King wrote: > I think we talked long ago about computing commit-graphs over the > grafted state, and then using those graphs as long as the graft state > remained the same. But I don't think we ever implemented anything. > > Another possibility (that I don't recall we've ever discussed) is > partially using commit graphs. Some commit properties, like generation > numbers, depend on other commits. So a graft at the bottom of history is > going to rewrite the generations for all of the descendants. But we > could still use the graph information to load the parents and trees of > all of the non-grafted commits. Those are still valid even in a grafted > situation, and that's what's providing most of the speed up in this case > (without it, we're literally zlib inflating each commit we traverse in > order to find its parents, versus an integer lookup via the > commit-graph). > > That might not be _too_ hard to implement. In theory, anyway. :) Adding Stolee (CC'd), our resident commit-graph expert, to see if they have any thoughts. Thanks, Taylor ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance 2024-10-31 11:47 ` Jeff King 2024-10-31 15:10 ` Josh Poimboeuf @ 2024-10-31 16:14 ` Kristoffer Haugsbakk 2024-10-31 17:20 ` Jeff King 2024-10-31 19:00 ` Taylor Blau 2 siblings, 1 reply; 10+ messages in thread From: Kristoffer Haugsbakk @ 2024-10-31 16:14 UTC (permalink / raw) To: Jeff King, Josh Poimboeuf; +Cc: git, rohitner1 On Thu, Oct 31, 2024, at 12:47, Jeff King wrote: > git commit-graph write --reachable > > That reduces the slow case for me by a factor of 10. And likewise other > traversal operations should get faster. > > I think we'll build the commit graph file by default these days when you > run "git gc". But we don't build it immediately after cloning. Perhaps > we should change that. > > -Peff There was this thread from last year where Rohit cloned Linux and the command took more than twelve seconds. Then git-commit-graph(1) fixed it. https://lore.kernel.org/git/CAKazavxTXwcZFtL2XyU3MpaUR=snWY8w8Lwpco+mkbqm2nWE=w@mail.gmail.com/ It would be nice if the graph was written on clone. With the status quo you might think that there is performance bug (if that’s the term) somewhere. Then you make a reproduction script using git-clone(1) in order to have a blank slate. Of course it reproduces every time. But the slow git-log(1) doesn’t happen for people who have had the repo long enough for a GC to hit. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance 2024-10-31 16:14 ` Kristoffer Haugsbakk @ 2024-10-31 17:20 ` Jeff King 0 siblings, 0 replies; 10+ messages in thread From: Jeff King @ 2024-10-31 17:20 UTC (permalink / raw) To: Kristoffer Haugsbakk; +Cc: Josh Poimboeuf, git, rohitner1 On Thu, Oct 31, 2024 at 05:14:01PM +0100, Kristoffer Haugsbakk wrote: > On Thu, Oct 31, 2024, at 12:47, Jeff King wrote: > > git commit-graph write --reachable > > > > That reduces the slow case for me by a factor of 10. And likewise other > > traversal operations should get faster. > > > > I think we'll build the commit graph file by default these days when you > > run "git gc". But we don't build it immediately after cloning. Perhaps > > we should change that. > > > > -Peff > > There was this thread from last year where Rohit cloned Linux and the > command took more than twelve seconds. Then git-commit-graph(1) fixed > it. > > https://lore.kernel.org/git/CAKazavxTXwcZFtL2XyU3MpaUR=snWY8w8Lwpco+mkbqm2nWE=w@mail.gmail.com/ Yeah, there's discussion in the linked thread there about running "git gc" after clone. If "gc --auto" was pruned down to run some minimal bits, that might be OK. What we definitely _don't_ want to do is run "git repack", because it's very expensive to do the full object graph walk (and buys nothing on a freshly cloned repo unless you do the even more expensive "-f"). So the simplest path forward, but which is a little messy, is to just run "commit-graph write" after the clone. In fact, we already have fetch.writeCommitGraph (though it does still default to "false"). I'd expect that to work with clone (since it's conceptually init+fetch), but it doesn't seem to. Looks like the code to trigger it is directly in builtin/fetch.c, and clone triggers the fetch itself internally. But I don't think factoring that out and calling it from both places would be too hard. It's a bigger question whether people might be annoyed by some extra computation at clone time. But I suspect it's OK in practice. Even ignoring the cost of moving it over the network, my 8-core machine takes almost 3 minutes to index the linux.git packfile on clone. Building the commit graph after that takes about 13 seconds. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] "git describe --match" performance 2024-10-31 11:47 ` Jeff King 2024-10-31 15:10 ` Josh Poimboeuf 2024-10-31 16:14 ` Kristoffer Haugsbakk @ 2024-10-31 19:00 ` Taylor Blau 2 siblings, 0 replies; 10+ messages in thread From: Taylor Blau @ 2024-10-31 19:00 UTC (permalink / raw) To: Jeff King; +Cc: Josh Poimboeuf, git On Thu, Oct 31, 2024 at 07:47:31AM -0400, Jeff King wrote: > I think we'll build the commit graph file by default these days when you > run "git gc". But we don't build it immediately after cloning. Perhaps > we should change that. I think that would be a reasonable thing to do. We already have fetch.writeCommitGraph and gc.writeCommitGraph, so it seems like a natural extension to add clone.writeCommitGraph. I don't have a strong feeling about what the default should be, although I err on the side of "true". Most repositories cloned will be small enough that writing a commit-graph upon clone shouldn't take too long. So the savings it will provide will be well worth the marginal increase in perceived clone time. Thanks, Taylor ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-10-31 19:01 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-30 4:43 [BUG] "git describe --match" performance Josh Poimboeuf 2024-10-31 11:47 ` Jeff King 2024-10-31 15:10 ` Josh Poimboeuf 2024-10-31 15:31 ` Jeff King 2024-10-31 16:25 ` Josh Poimboeuf 2024-10-31 17:05 ` Jeff King 2024-10-31 19:01 ` Taylor Blau 2024-10-31 16:14 ` Kristoffer Haugsbakk 2024-10-31 17:20 ` Jeff King 2024-10-31 19:00 ` Taylor Blau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).