* "submodule foreach" much slower than removed "submodule--helper --list"
@ 2022-10-15 16:50 Jonas Bernoulli
2022-10-15 17:34 ` Jonas Bernoulli
0 siblings, 1 reply; 14+ messages in thread
From: Jonas Bernoulli @ 2022-10-15 16:50 UTC (permalink / raw)
To: git
In v2.38.0 (31955475d1c283120d5d84247eb3fd55d9f5fdd9)
"submodule--helper --list" was remove because
> We're not getting anything useful from the "list | cut -f2"
> invocation that we couldn't get from "foreach 'echo $sm_path'".
But we get speed (this is with about one hundred modules):
$ time git submodule foreach -q 'echo $sm_path' > /dev/null
real 0m0.585s
user 0m0.413s
sys 0m0.182s
$ time git submodule--helper list > /dev/null
real 0m0.008s
user 0m0.004s
sys 0m0.004s
Please consider restoring this subcommand or providing something
equivalent that is just as fast.
Thanks,
Jonas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-10-15 16:50 "submodule foreach" much slower than removed "submodule--helper --list" Jonas Bernoulli @ 2022-10-15 17:34 ` Jonas Bernoulli 2022-10-15 18:13 ` Jeff King 2022-10-17 16:50 ` Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 14+ messages in thread From: Jonas Bernoulli @ 2022-10-15 17:34 UTC (permalink / raw) To: git I just noticed that "submodule--helper name" was also removed, which I also found useful in scripts. Please tell me if I am missing something, but it seems I now have to do something like this instead: git config -f .gitmodules --list | sed -n "s|^submodule.\([^.]*\).path=$path\$|\1|p" The old way was nicer: git submodule--helper name $path I realize submodule--helper is for internal use and using it anyway comes with the risk of such removals and other changes, but again, please consider restoring that or providing something similar in the public interface. Cheers, Jonas ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-10-15 17:34 ` Jonas Bernoulli @ 2022-10-15 18:13 ` Jeff King 2022-10-15 18:16 ` Junio C Hamano 2022-10-15 22:40 ` Junio C Hamano 2022-10-17 16:50 ` Ævar Arnfjörð Bjarmason 1 sibling, 2 replies; 14+ messages in thread From: Jeff King @ 2022-10-15 18:13 UTC (permalink / raw) To: Jonas Bernoulli; +Cc: git On Sat, Oct 15, 2022 at 06:50:36PM +0200, Jonas Bernoulli wrote: > > We're not getting anything useful from the "list | cut -f2" > > invocation that we couldn't get from "foreach 'echo $sm_path'". > > But we get speed (this is with about one hundred modules): > > $ time git submodule foreach -q 'echo $sm_path' > /dev/null > > real 0m0.585s > user 0m0.413s > sys 0m0.182s > > $ time git submodule--helper list > /dev/null > > real 0m0.008s > user 0m0.004s > sys 0m0.004s Yeah, that's not too surprising that it's slower. It's exec-ing a bunch of shells to do that. I don't really use submodules and haven't worked much on them either, but maybe one of these works: - the output of the old "submodule--helper list" looks a lot like "ls-files" dumping the index and filtering on submodule entries. Running: git ls-files -s | grep ^160000 produces the same output. - If you just want the oid/path of each, then "git submodule status" prints those. You might want "--cached" to avoid spending extra time checking the status (though it kind of looks like we may run "git describe" in each anyway; yikes!) I'm not sure if those are exactly equivalent, either. It looks like the old code was probably respecting submodule active markers (though in my test repo without the submodule actually checked out, it's still reported). There is probably room for a user-facing "git submodule list" command, but again, I don't really know enough about the space to say what it should report. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-10-15 18:13 ` Jeff King @ 2022-10-15 18:16 ` Junio C Hamano 2022-10-15 18:23 ` Jeff King 2022-10-15 22:40 ` Junio C Hamano 1 sibling, 1 reply; 14+ messages in thread From: Junio C Hamano @ 2022-10-15 18:16 UTC (permalink / raw) To: Jeff King; +Cc: Jonas Bernoulli, git Jeff King <peff@peff.net> writes: > Yeah, that's not too surprising that it's slower. It's exec-ing a bunch > of shells to do that. > ... > I'm not sure if those are exactly equivalent, either. It looks like the > old code was probably respecting submodule active markers (though in my > test repo without the submodule actually checked out, it's still > reported). There is probably room for a user-facing "git submodule list" > command, but again, I don't really know enough about the space to say > what it should report. We could mimic "find" and make "foreach" that has no action default to an equivalent of "echo" that is done internally without forking. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-10-15 18:16 ` Junio C Hamano @ 2022-10-15 18:23 ` Jeff King 2022-10-15 18:37 ` Junio C Hamano 0 siblings, 1 reply; 14+ messages in thread From: Jeff King @ 2022-10-15 18:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonas Bernoulli, git On Sat, Oct 15, 2022 at 11:16:06AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Yeah, that's not too surprising that it's slower. It's exec-ing a bunch > > of shells to do that. > > ... > > I'm not sure if those are exactly equivalent, either. It looks like the > > old code was probably respecting submodule active markers (though in my > > test repo without the submodule actually checked out, it's still > > reported). There is probably room for a user-facing "git submodule list" > > command, but again, I don't really know enough about the space to say > > what it should report. > > We could mimic "find" and make "foreach" that has no action default > to an equivalent of "echo" that is done internally without forking. That would be reasonable to me, though I wonder what the output format should be. Just name, or name/oid? Once you start having options, you might as well just add a separate "git submodule list" that can take them. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-10-15 18:23 ` Jeff King @ 2022-10-15 18:37 ` Junio C Hamano 0 siblings, 0 replies; 14+ messages in thread From: Junio C Hamano @ 2022-10-15 18:37 UTC (permalink / raw) To: Jeff King; +Cc: Jonas Bernoulli, git Jeff King <peff@peff.net> writes: >> We could mimic "find" and make "foreach" that has no action default >> to an equivalent of "echo" that is done internally without forking. > > That would be reasonable to me, though I wonder what the output format > should be. Just name, or name/oid? Once you start having options, you > might as well just add a separate "git submodule list" that can take > them. Users of "find" seem to be happy with "-print" being the default, so whatever we were doing with the internal "helper --list" would be appropriate here. Folks who want different output can feed their own action. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-10-15 18:13 ` Jeff King 2022-10-15 18:16 ` Junio C Hamano @ 2022-10-15 22:40 ` Junio C Hamano 2022-10-17 17:02 ` Jeff King 2022-11-07 11:01 ` Matti Möll 1 sibling, 2 replies; 14+ messages in thread From: Junio C Hamano @ 2022-10-15 22:40 UTC (permalink / raw) To: Jeff King; +Cc: Jonas Bernoulli, git Jeff King <peff@peff.net> writes: > - the output of the old "submodule--helper list" looks a lot like > "ls-files" dumping the index and filtering on submodule entries. > Running: > > git ls-files -s | grep ^160000 > > produces the same output. Indeed that was what we wrote in the scripted Porcelain. This is one of the times I wish we didn't turn things to C in piecemeal. > I'm not sure if those are exactly equivalent, either. It looks like the > old code was probably respecting submodule active markers (though in my > test repo without the submodule actually checked out, it's still > reported). "submodule--helper list" being the "give me the gitlinks", to be filtered with things like pathspec and presence of .git (i.e. being activated), you are right that those who liked the output from it would not be happy with "foreach" anyway. The comparison in the original was apples and oranges because "foreach" should show only the ones that are active in the current checkout. If one likes the output from "submodule--helper list" so much, I think your "ls-files" above should be the closest. There seems to have existed some logic to squash unmerged entries down to a single one, too (git-submodule.sh in Git 2.0.0 era has a module_list shell function that shows what "helper list" should be doing), though. Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-10-15 22:40 ` Junio C Hamano @ 2022-10-17 17:02 ` Jeff King 2022-11-07 11:01 ` Matti Möll 1 sibling, 0 replies; 14+ messages in thread From: Jeff King @ 2022-10-17 17:02 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonas Bernoulli, git On Sat, Oct 15, 2022 at 03:40:16PM -0700, Junio C Hamano wrote: > If one likes the output from "submodule--helper list" so much, I > think your "ls-files" above should be the closest. There seems to > have existed some logic to squash unmerged entries down to a single > one, too (git-submodule.sh in Git 2.0.0 era has a module_list shell > function that shows what "helper list" should be doing), though. Ah, good digging. I briefly looked at the patch removing the submodule--helper version, saw the word "active", and assumed it was checking that. On closer inspection, it is just the global "active_nr" for iterating over the index. ;) So yeah, using "ls-files" is a suitable replacement for Jonas's original complaint. I'll leave it up to folks more interested and knowledgeable on submodules to discuss whether there ought to be a listing command that's more aware of gitmodules, populated fields, active flags, etc. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-10-15 22:40 ` Junio C Hamano 2022-10-17 17:02 ` Jeff King @ 2022-11-07 11:01 ` Matti Möll 2022-11-08 2:50 ` Taylor Blau 1 sibling, 1 reply; 14+ messages in thread From: Matti Möll @ 2022-11-07 11:01 UTC (permalink / raw) To: Junio C Hamano, Jeff King Cc: Jonas Bernoulli, git, Ævar Arnfjörð Bjarmason On 16.10.22 00:40, Junio C Hamano wrote: > > If one likes the output from "submodule--helper list" so much, I > think your "ls-files" above should be the closest. There seems to > have existed some logic to squash unmerged entries down to a single > one, too (git-submodule.sh in Git 2.0.0 era has a module_list shell > function that shows what "helper list" should be doing), though. I do actually like that submodule is hoisted out of the scripts but as a matter of fact it seems that the openembedded folks relied on the "submodule--helper list" call and with later git versions that doesn't work anymore. https://github.com/openembedded/openembedded-core/commit/6d9364e5f3535954f65cbbc694ee7933ac1d664f At the end of the day it's kinda their fault to depend on the submodule--helper but it does hinder adoption of new git versions for people using openembedded. Cheers, Matti ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-11-07 11:01 ` Matti Möll @ 2022-11-08 2:50 ` Taylor Blau 2022-11-08 3:37 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 14+ messages in thread From: Taylor Blau @ 2022-11-08 2:50 UTC (permalink / raw) To: Matti Möll Cc: Junio C Hamano, Jeff King, Jonas Bernoulli, git, Ævar Arnfjörð Bjarmason On Mon, Nov 07, 2022 at 12:01:40PM +0100, Matti Möll wrote: > On 16.10.22 00:40, Junio C Hamano wrote: > > > > If one likes the output from "submodule--helper list" so much, I > > think your "ls-files" above should be the closest. There seems to > > have existed some logic to squash unmerged entries down to a single > > one, too (git-submodule.sh in Git 2.0.0 era has a module_list shell > > function that shows what "helper list" should be doing), though. > > I do actually like that submodule is hoisted out of the scripts but as a > matter of fact it seems that the openembedded folks relied on the > "submodule--helper list" call and with later git versions that doesn't work > anymore. > > https://github.com/openembedded/openembedded-core/commit/6d9364e5f3535954f65cbbc694ee7933ac1d664f > > At the end of the day it's kinda their fault to depend on the > submodule--helper but it does hinder adoption of new git versions for people > using openembedded. Yeah, I agree. It's unfortunate, but I think we consider anything with '--' in the builtin name to be outside of our compatibility guarantee. I recall a recent message on the list which where Junio said as much, but I can't seem to find it :-<. Thanks, Taylor ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-11-08 2:50 ` Taylor Blau @ 2022-11-08 3:37 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 14+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-08 3:37 UTC (permalink / raw) To: Taylor Blau Cc: Matti Möll, Junio C Hamano, Jeff King, Jonas Bernoulli, git On Mon, Nov 07 2022, Taylor Blau wrote: > On Mon, Nov 07, 2022 at 12:01:40PM +0100, Matti Möll wrote: >> On 16.10.22 00:40, Junio C Hamano wrote: >> > >> > If one likes the output from "submodule--helper list" so much, I >> > think your "ls-files" above should be the closest. There seems to >> > have existed some logic to squash unmerged entries down to a single >> > one, too (git-submodule.sh in Git 2.0.0 era has a module_list shell >> > function that shows what "helper list" should be doing), though. >> >> I do actually like that submodule is hoisted out of the scripts but as a >> matter of fact it seems that the openembedded folks relied on the >> "submodule--helper list" call and with later git versions that doesn't work >> anymore. >> >> https://github.com/openembedded/openembedded-core/commit/6d9364e5f3535954f65cbbc694ee7933ac1d664f >> >> At the end of the day it's kinda their fault to depend on the >> submodule--helper but it does hinder adoption of new git versions for people >> using openembedded. > > Yeah, I agree. It's unfortunate, but I think we consider anything with > '--' in the builtin name to be outside of our compatibility guarantee. > > I recall a recent message on the list which where Junio said as much, > but I can't seem to find it :-<. We've never even documented it, so the uses that have been made of it have been the result of source spelunking. B.t.w. the openembedded commit above using "git config" is not going to get the same results as "git submodule--helper list", or the equivalent "git submodule foreach", but perhaps it works well enough for what they need it for... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-10-15 17:34 ` Jonas Bernoulli 2022-10-15 18:13 ` Jeff King @ 2022-10-17 16:50 ` Ævar Arnfjörð Bjarmason 2022-10-17 17:46 ` Jeff King 2022-10-21 14:58 ` Jonas Bernoulli 1 sibling, 2 replies; 14+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-10-17 16:50 UTC (permalink / raw) To: Jonas Bernoulli; +Cc: git Replying to both this & the parent post: On Sat, Oct 15 2022, Jonas Bernoulli wrote: > In v2.38.0 (31955475d1c283120d5d84247eb3fd55d9f5fdd9) > "submodule--helper --list" was remove because > >> We're not getting anything useful from the "list | cut -f2" >> invocation that we couldn't get from "foreach 'echo $sm_path'". > > But we get speed (this is with about one hundred modules): > > $ time git submodule foreach -q 'echo $sm_path' > /dev/null > > real 0m0.585s > user 0m0.413s > sys 0m0.182s > > $ time git submodule--helper list > /dev/null > > real 0m0.008s > user 0m0.004s > sys 0m0.004s > > Please consider restoring this subcommand or providing something > equivalent that is just as fast. Sorry about the slowdown, the removal of "list" was just an in-between step to migrating "submodule" to a full built-in. I can't reproduce anything like the 8ms v.s. ~600ms difference you note here, but for me migrating it to a built-in is around 10% slower with "foreach" than the old "list". I wonder what results you get? I sent in a topic to migrate it since you sent this report. I was going to do it in this development cycle, but this prompted me to do it earlier: https://lore.kernel.org/git/cover-00.10-00000000000-20221017T115544Z-avarab@gmail.com/ On Sat, Oct 15 2022, Jonas Bernoulli wrote: > I just noticed that "submodule--helper name" was also removed, which I > also found useful in scripts. Please tell me if I am missing something, > but it seems I now have to do something like this instead: > > git config -f .gitmodules --list | > sed -n "s|^submodule.\([^.]*\).path=$path\$|\1|p" > > The old way was nicer: > > git submodule--helper name $path > > I realize submodule--helper is for internal use and using it anyway > comes with the risk of such removals and other changes, but again, > please consider restoring that or providing something similar in the > public interface. This however is another case, I removed "name" along with "list" and other leftover code we weren't using anymore for the internal-only "submodule--helper" (which is at turns out, was not as internal-only as we'd hoped). For "list" it's clear how to use "foreach" instead, but for "name" then AFAICT the "best" replacement is to do a: git submodule foreach 'echo $displaypath $name' And pipe that into grep/sed. If that's fast enough would it satisfy your use-case, or would a "name" equivalent be handy? I think the best way to prove that would be e.g.: git submodule foreach-format '%{name}' -- <pathspec> Which, due to the "foreach" taking N number of arguments isn't easy to add to "foreach" without the interface becoming somewhat tortured (we could add a [---pathspec=<pathspec>]...), but "-- <pathspec>" with a different subcommand name seems better. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-10-17 16:50 ` Ævar Arnfjörð Bjarmason @ 2022-10-17 17:46 ` Jeff King 2022-10-21 14:58 ` Jonas Bernoulli 1 sibling, 0 replies; 14+ messages in thread From: Jeff King @ 2022-10-17 17:46 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Jonas Bernoulli, git On Mon, Oct 17, 2022 at 06:50:06PM +0200, Ævar Arnfjörð Bjarmason wrote: > Sorry about the slowdown, the removal of "list" was just an in-between > step to migrating "submodule" to a full built-in. > > I can't reproduce anything like the 8ms v.s. ~600ms difference you note > here, but for me migrating it to a built-in is around 10% slower with > "foreach" than the old "list". I wonder what results you get? I don't think "foreach" can ever be as performant as a bare "list". It inherently is going to do O(n) fork+execs. So it's OK if you have one or two submodules, but not if you have hundreds. If it treated "echo" as a builtin and served it in-process, it could avoid that extra overhead. But there are so many corner cases there, you'd probably do better to add "list --format" instead. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: "submodule foreach" much slower than removed "submodule--helper --list" 2022-10-17 16:50 ` Ævar Arnfjörð Bjarmason 2022-10-17 17:46 ` Jeff King @ 2022-10-21 14:58 ` Jonas Bernoulli 1 sibling, 0 replies; 14+ messages in thread From: Jonas Bernoulli @ 2022-10-21 14:58 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: git Sorry for going silent right after bringing this up. Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > I can't reproduce anything like the 8ms v.s. ~600ms difference you note > here, but for me migrating it to a built-in is around 10% slower with > "foreach" than the old "list". I wonder what results you get? The repository, in which I observed this slowdown, has one hundred modules. > I sent in a topic to migrate it since you sent this report. I was going > to do it in this development cycle, but this prompted me to do it > earlier: Thanks! A lot more is happening here than I can quickly understand, but the last commit mentions that the slowdown is now just 0.1, which would be good enough for me, I think. > On Sat, Oct 15 2022, Jonas Bernoulli wrote: > >> I just noticed that "submodule--helper name" was also removed, which I >> also found useful in scripts. Please tell me if I am missing something, >> but it seems I now have to do something like this instead: >> >> git config -f .gitmodules --list | >> sed -n "s|^submodule.\([^.]*\).path=$path\$|\1|p" >> >> The old way was nicer: >> >> git submodule--helper name $path >> >> I realize submodule--helper is for internal use and using it anyway >> comes with the risk of such removals and other changes, but again, >> please consider restoring that or providing something similar in the >> public interface. > > This however is another case, I removed "name" along with "list" and > other leftover code we weren't using anymore for the internal-only > "submodule--helper" (which is at turns out, was not as internal-only as > we'd hoped). > > For "list" it's clear how to use "foreach" instead, but for "name" then > AFAICT the "best" replacement is to do a: > > git submodule foreach 'echo $displaypath $name' > > And pipe that into grep/sed. If that's fast enough would it satisfy your > use-case, or would a "name" equivalent be handy? > > I think the best way to prove that would be e.g.: > > git submodule foreach-format '%{name}' -- <pathspec> > > Which, due to the "foreach" taking N number of arguments isn't easy to > add to "foreach" without the interface becoming somewhat tortured (we > could add a [---pathspec=<pathspec>]...), but "-- <pathspec>" with a > different subcommand name seems better. I agree, that adding support for "-- <pathspec>" to an existing or new subcommand, would make it unnecessary to bring back a "name" subcommand. Will "foreach"/"foreach-format" continue to be limited to active modules? Sometimes it would be nice to list all modules, including those that are inactive. As mentioned earlier "git ls-files -s | grep ^160000" is enough to get a list of the module paths, but sometimes we want more information, e.g., "git submodule list --include-inactive --format '$name $is_active submodule.$name.url' -- <pathspec>". Jonas ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-11-08 3:41 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-10-15 16:50 "submodule foreach" much slower than removed "submodule--helper --list" Jonas Bernoulli 2022-10-15 17:34 ` Jonas Bernoulli 2022-10-15 18:13 ` Jeff King 2022-10-15 18:16 ` Junio C Hamano 2022-10-15 18:23 ` Jeff King 2022-10-15 18:37 ` Junio C Hamano 2022-10-15 22:40 ` Junio C Hamano 2022-10-17 17:02 ` Jeff King 2022-11-07 11:01 ` Matti Möll 2022-11-08 2:50 ` Taylor Blau 2022-11-08 3:37 ` Ævar Arnfjörð Bjarmason 2022-10-17 16:50 ` Ævar Arnfjörð Bjarmason 2022-10-17 17:46 ` Jeff King 2022-10-21 14:58 ` Jonas Bernoulli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).