* [PATCH] kbuild: try readelf first in gen_symversions @ 2026-06-03 16:17 Wentao Guan 2026-06-04 1:38 ` Nathan Chancellor 0 siblings, 1 reply; 5+ messages in thread From: Wentao Guan @ 2026-06-03 16:17 UTC (permalink / raw) To: nathan; +Cc: nsc, tamird, linux-kbuild, linux-kernel, petr.pavlu, Wentao Guan Use readelf to dig out if <file>.o contain a __export_symbol_*. Instead of nm, readelf is more faster, and significantly improve speed when enable CONFIG_MODVERSIONS. Build x86_64_defconfigs in 2C4T cloud server with CONFIG_MODVERSIONS=y: With patch: real 17m21.019s user 61m48.388s sys 4m27.709s Without patch: real 17m39.435s user 62m24.686s sys 5m3.200s Link: https://lore.kernel.org/all/tencent_2FA16E0A18D6D0C0703F5D49@qq.com/ Signed-off-by: Wentao Guan <guanwentao@uniontech.com> --- scripts/Makefile.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 3498d25b15e85..54a91bc144cce 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -233,7 +233,7 @@ ifdef CONFIG_MODVERSIONS # be compiled and linked to the kernel and/or modules. gen_symversions = \ - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ + if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ $(cmd_gensymtypes_$1) >> $(dot-target).cmd; \ fi -- 2.30.2 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] kbuild: try readelf first in gen_symversions 2026-06-03 16:17 [PATCH] kbuild: try readelf first in gen_symversions Wentao Guan @ 2026-06-04 1:38 ` Nathan Chancellor 2026-06-04 3:44 ` Wentao Guan 0 siblings, 1 reply; 5+ messages in thread From: Nathan Chancellor @ 2026-06-04 1:38 UTC (permalink / raw) To: Wentao Guan; +Cc: nsc, tamird, linux-kbuild, linux-kernel, petr.pavlu On Thu, Jun 04, 2026 at 12:17:32AM +0800, Wentao Guan wrote: > Use readelf to dig out if <file>.o contain a __export_symbol_*. > > Instead of nm, readelf is more faster, and significantly improve speed > when enable CONFIG_MODVERSIONS. > > Build x86_64_defconfigs in 2C4T cloud server with CONFIG_MODVERSIONS=y: > With patch: > real 17m21.019s > user 61m48.388s > sys 4m27.709s > Without patch: > real 17m39.435s > user 62m24.686s > sys 5m3.200s > > Link: https://lore.kernel.org/all/tencent_2FA16E0A18D6D0C0703F5D49@qq.com/ > Signed-off-by: Wentao Guan <guanwentao@uniontech.com> > --- > scripts/Makefile.build | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > index 3498d25b15e85..54a91bc144cce 100644 > --- a/scripts/Makefile.build > +++ b/scripts/Makefile.build > @@ -233,7 +233,7 @@ ifdef CONFIG_MODVERSIONS > # be compiled and linked to the kernel and/or modules. > > gen_symversions = \ > - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ > + if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ This breaks modversioning for Clang LTO builds, as llvm-nm can read LLVM bitcode but llvm-readelf cannot, it expects strictly ELF. Is there any performance gain with adding '-m1' to the grep command so that it stops looking for a match after the first export symbol is found? > $(cmd_gensymtypes_$1) >> $(dot-target).cmd; \ > fi > > -- > 2.30.2 > -- Cheers, Nathan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] kbuild: try readelf first in gen_symversions 2026-06-04 1:38 ` Nathan Chancellor @ 2026-06-04 3:44 ` Wentao Guan 2026-06-05 6:22 ` Nathan Chancellor 0 siblings, 1 reply; 5+ messages in thread From: Wentao Guan @ 2026-06-04 3:44 UTC (permalink / raw) To: Nathan Chancellor; +Cc: nsc, tamird, linux-kbuild, linux-kernel, Petr Pavlu Hello, > On Thu, Jun 04, 2026 at 12:17:32AM +0800, Wentao Guan wrote: > > Use readelf to dig out if <file>.o contain a __export_symbol_*. > > > > Instead of nm, readelf is more faster, and significantly improve speed > > when enable CONFIG_MODVERSIONS. > > > > Build x86_64_defconfigs in 2C4T cloud server with CONFIG_MODVERSIONS=y: > > With patch: > > real 17m21.019s > > user 61m48.388s > > sys 4m27.709s > > Without patch: > > real 17m39.435s > > user 62m24.686s > > sys 5m3.200s > > > > Link: https://lore.kernel.org/all/tencent_2FA16E0A18D6D0C0703F5D49@qq.com/ > > Signed-off-by: Wentao Guan <guanwentao@uniontech.com> > > --- > > scripts/Makefile.build | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > > index 3498d25b15e85..54a91bc144cce 100644 > > --- a/scripts/Makefile.build > > +++ b/scripts/Makefile.build > > @@ -233,7 +233,7 @@ ifdef CONFIG_MODVERSIONS > > # be compiled and linked to the kernel and/or modules. > > > > gen_symversions = \ > > - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ > > + if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ > > This breaks modversioning for Clang LTO builds, as llvm-nm can read LLVM > bitcode but llvm-readelf cannot, it expects strictly ELF. Oh, is it worth to use the following logic to detect LLVM or LLVM-LTO or not ? +ifeq ($(LLVM),) + SYM_CHECK = $(READELF) -sW +else + SYM_CHECK = $(NM) +endif gen_symversions = \ - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ + if $(SYM_CHECK) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ > Is there any performance gain with adding '-m1' to the grep command so > that it stops looking for a match after the first export symbol is > found? Small, there are my test result in make x86_64_defconfig + enable CONFIG_MODVERSIONS: 1. readelf if $(READELF) $@ 2>/dev/null | grep -q ' __export_symbol_'; real 10m44.359s user 37m43.596s sys 3m2.424s 2. nm if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; real 11m8.008s user 38m51.644s sys 3m29.798s 3. nm + grep -m1 -q if $(NM) $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; real 10m56.891s user 38m8.136s sys 3m28.096s These test based on default gcc toolchain in ubuntu noble. I will do more test which use llvm-nm and llvm-readelf. BRs Wentao Guan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] kbuild: try readelf first in gen_symversions 2026-06-04 3:44 ` Wentao Guan @ 2026-06-05 6:22 ` Nathan Chancellor 2026-06-05 11:03 ` Wentao Guan 0 siblings, 1 reply; 5+ messages in thread From: Nathan Chancellor @ 2026-06-05 6:22 UTC (permalink / raw) To: Wentao Guan; +Cc: nsc, tamird, linux-kbuild, linux-kernel, Petr Pavlu On Thu, Jun 04, 2026 at 11:44:29AM +0800, Wentao Guan wrote: > Hello, > > > On Thu, Jun 04, 2026 at 12:17:32AM +0800, Wentao Guan wrote: > > > Use readelf to dig out if <file>.o contain a __export_symbol_*. > > > > > > Instead of nm, readelf is more faster, and significantly improve speed > > > when enable CONFIG_MODVERSIONS. > > > > > > Build x86_64_defconfigs in 2C4T cloud server with CONFIG_MODVERSIONS=y: > > > With patch: > > > real 17m21.019s > > > user 61m48.388s > > > sys 4m27.709s > > > Without patch: > > > real 17m39.435s > > > user 62m24.686s > > > sys 5m3.200s > > > > > > Link: https://lore.kernel.org/all/tencent_2FA16E0A18D6D0C0703F5D49@qq.com/ > > > Signed-off-by: Wentao Guan <guanwentao@uniontech.com> > > > --- > > > scripts/Makefile.build | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > > > index 3498d25b15e85..54a91bc144cce 100644 > > > --- a/scripts/Makefile.build > > > +++ b/scripts/Makefile.build > > > @@ -233,7 +233,7 @@ ifdef CONFIG_MODVERSIONS > > > # be compiled and linked to the kernel and/or modules. > > > > > > gen_symversions = \ > > > - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ > > > + if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ > > > > This breaks modversioning for Clang LTO builds, as llvm-nm can read LLVM > > bitcode but llvm-readelf cannot, it expects strictly ELF. > Oh, is it worth to use the following logic to detect LLVM or LLVM-LTO or not ? > +ifeq ($(LLVM),) This should probably be CONFIG_LTO_CLANG with flipped branches but... > + SYM_CHECK = $(READELF) -sW > +else > + SYM_CHECK = $(NM) > +endif > gen_symversions = \ > - if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ > + if $(SYM_CHECK) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ > > > that it stops looking for a match after the first export symbol is > > found? > Small, there are my test result in make x86_64_defconfig + enable CONFIG_MODVERSIONS: > 1. readelf > if $(READELF) $@ 2>/dev/null | grep -q ' __export_symbol_'; > real 10m44.359s > user 37m43.596s > sys 3m2.424s > 2. nm > if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; > real 11m8.008s > user 38m51.644s > sys 3m29.798s > 3. nm + grep -m1 -q > if $(NM) $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; > real 10m56.891s > user 38m8.136s > sys 3m28.096s '-m1' appears to get us 50% (12s) of the speed up of 'readelf' (24s) in your environment while sticking with 'nm'. I would be more inclined to take that change since it is small and correct, rather than switching on NM or READELF, as I don't think it is worth the additional complexity. FWIW, on one of my test machines with 8 cores and 16 threads, the difference is much less noticeable. I think that is going to be in line with most developer and build farm hardware, rather than a 2C/4T machine like you mention in the initial commit message. GCC 16.1.0 + binutils 2.46: Benchmark 1: $(NM) Time (mean ± σ): 75.203 s ± 0.283 s [User: 659.465 s, System: 185.605 s] Range (min … max): 74.898 s … 75.457 s 3 runs Benchmark 2: $(READELF) -sW Time (mean ± σ): 73.055 s ± 0.465 s [User: 642.365 s, System: 175.908 s] Range (min … max): 72.523 s … 73.385 s 3 runs Summary $(READELF) -sW ran 1.03 ± 0.01 times faster than $(NM) LLVM 22: Benchmark 1: $(NM) Time (mean ± σ): 75.030 s ± 0.736 s [User: 659.603 s, System: 185.257 s] Range (min … max): 74.207 s … 75.623 s 3 runs Benchmark 2: $(READELF) -sW Time (mean ± σ): 73.405 s ± 0.457 s [User: 642.512 s, System: 176.440 s] Range (min … max): 72.878 s … 73.679 s 3 runs Summary $(READELF) -sW ran 1.02 ± 0.01 times faster than $(NM) -- Cheers, Nathan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] kbuild: try readelf first in gen_symversions 2026-06-05 6:22 ` Nathan Chancellor @ 2026-06-05 11:03 ` Wentao Guan 0 siblings, 0 replies; 5+ messages in thread From: Wentao Guan @ 2026-06-05 11:03 UTC (permalink / raw) To: Nathan Chancellor; +Cc: nsc, tamird, linux-kbuild, linux-kernel, Petr Pavlu Hello, > This should probably be CONFIG_LTO_CLANG with flipped branches but... Right! > '-m1' appears to get us 50% (12s) of the speed up of 'readelf' (24s) in > your environment while sticking with 'nm'. I would be more inclined to > take that change since it is small and correct, rather than switching on > NM or READELF, as I don't think it is worth the additional complexity. > FWIW, on one of my test machines with 8 cores and 16 threads, the > difference is much less noticeable. I think that is going to be in line > with most developer and build farm hardware, rather than a 2C/4T machine > like you mention in the initial commit message. Sorry, it seems my cloud servies provider cause my results up and down:(, also maybe first compile time not stable, so I tested in a 20 cores/28 threads bare metal envirment , here is the result: Intel(R) Core(TM) i7-14700HX + 32GB + NVMe ssd gcc version 12.3.0 binutils 2.46 clang version 18.1.7 source kernel tag v7.0 summary: 1. still benifit from nm to readelf in 20core/28threads (I think there more costs in libbfd in nm, show high cost down in sys time, I guess it cause more memory acces bottle neck to effect overall compile process) but seems no these different when change llvm-18-nm to llvm-18-readelf 2. -m1 seems no expect effect... test scripts: https://gist.github.com/opsiff/832baa9a6986343dddbe530fbee57f52 Makefile.build-nm-m1 : 'grep -q' -> 'grep -m1 -q' Makefile.build-orig : orig Makefile.build Makefile.build-readelf : 'NM' -> 'READELF -sW' Makefile.build-readelf-m1: 'NM' -> 'READELF -sW' , 'grep -q' -> 'grep -m1 -q' full result: 1. run x86_64_defconfig + modversions x3(base) if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ real 2m2.876s real 2m2.578s real 2m2.262s user 42m15.871s user 42m35.250s user 42m33.679s sys 5m52.904s sys 5m52.478s sys 5m49.009s 2. if $(READELF) -sW $@ 2>/dev/null | grep -q __export_symbol_; then real 1m54.931s real 1m55.192s real 1m55.207s user 41m4.162s user 41m7.754s user 41m5.791s sys 4m8.422s sys 4m8.431s sys 4m9.219s 3. if $(NM) $@ 2>/dev/null | grep -m1 -q __export_symbol_; then \ real 2m1.865s real 2m1.866s real 2m2.108s user 42m32.891s user 42m35.047s user 42m33.834s sys 5m48.045s sys 5m47.700s sys 5m48.200s 4. if $(READELF) -sW $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; then \ real 1m55.386s real 1m56.528s real 1m55.489s user 41m6.156s user 41m12.321s user 41m10.545s sys 4m10.093s sys 4m9.838s sys 4m9.367s 5. LLVM run x86_64_defconfig + modversions x3(base) if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ real 2m35.758s real 2m32.696s real 2m32.127s user 58m2.416s user 57m55.030s user 57m54.806s sys 4m20.735s sys 4m18.473s sys 4m18.090s 6. LLVM if $(READELF) -sW $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ real 2m32.448s real 2m32.419s real 2m32.509s user 57m57.262s user 57m53.001s user 57m48.842s sys 4m20.508s sys 4m20.693s sys 4m20.490s 7. LLVM if $(NM) $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; then \ real 2m32.003s real 2m31.900s real 2m32.276s user 57m45.786s user 57m46.982s user 57m49.907s sys 4m18.184s sys 4m17.923s sys 4m18.354s 8. LLVM if $(READELF) -sW $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; then \ real 2m33.365s real 2m32.186s real 2m32.114s user 57m49.533s user 57m47.865s user 57m46.591s sys 4m19.809s sys 4m20.652s sys 4m19.954s 9. LLVM LTO_THIN run x86_64_defconfig + modversions x3(base) if $(NM) $@ 2>/dev/null | grep -q ' __export_symbol_'; then \ real 3m59.411s real 3m55.945s real 3m56.557s user 59m38.877s user 59m20.007s user 59m19.009s sys 4m21.582s sys 4m22.313s sys 4m23.793s 10. LLVM LTO_THIN if $(NM) $@ 2>/dev/null | grep -m1 -q ' __export_symbol_'; then \ real 3m55.722s real 3m56.641s real 3m57.979s user 59m21.865s user 59m25.634s user 59m20.872s sys 4m21.303s sys 4m24.174s sys 4m22.695s Full log: https://gist.github.com/opsiff/1cd7e0a0553c8416dd13a7e92590a440 If you have any other ideas, i will happly to test them, i will try to use llvm-nm instead of nm to test. BRs Wentao Guan ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-05 11:04 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-03 16:17 [PATCH] kbuild: try readelf first in gen_symversions Wentao Guan 2026-06-04 1:38 ` Nathan Chancellor 2026-06-04 3:44 ` Wentao Guan 2026-06-05 6:22 ` Nathan Chancellor 2026-06-05 11:03 ` Wentao Guan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox