* [PATCH 0/2] Add option for generating BTF types of global variables
@ 2025-02-07 1:20 Stephen Brennan
2025-02-07 1:20 ` [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata" Stephen Brennan
2025-02-07 1:20 ` [PATCH 2/2] btf: Add the option to include global variable types Stephen Brennan
0 siblings, 2 replies; 17+ messages in thread
From: Stephen Brennan @ 2025-02-07 1:20 UTC (permalink / raw)
To: Masahiro Yamada, Arnd Bergmann
Cc: Andrii Nakryiko, Nicolas Schier, Kees Cook, KP Singh,
Stephen Brennan, Martin KaFai Lau, Sami Tolvanen,
Eduard Zingerman, linux-arch, Stanislav Fomichev, Kent Overstreet,
Pasha Tatashin, Jiri Olsa, John Fastabend, Jann Horn,
Ard Biesheuvel, Yonghong Song, Hao Luo, Andrew Morton,
linux-kbuild, Daniel Borkmann, Nathan Chancellor, linux-debuggers,
Alexei Starovoitov, Song Liu, linux-kernel, bpf
Hello everyone,
These patches add the CONFIG_DEBUG_INFO_BTF_GLOBAL_VARS option, which instructs
pahole to include types of global variables. Pahole >= 1.28 is required. More
context for what this feature enables can be seen in patch 2, as well as the
series which introduced this feature to pahole [1].
To demonstrate the functionality, my "btf_2024" branch of drgn (the current
development branch for the BTF debugging feature, despite the name) can be used
as below to debug a running kernel with these patches enabled.
git clone https://github.com/brenns10/drgn -b btf_2024
cd drgn
python setup.py build_ext -i
sudo python -m drgn --no-default-symbols --btf -k
The "--no-default-symbols" ensures that drgn doesn't accidentially find & use
your DWARF debuginfo :)
The resulting debugging session supports a similar level of capability as drgn
with DWARF debuginfo: variable & function types are available, stack traces may
be unwound (using ORC), and the kallsyms symbol table is available. You can also
try various drgn "contrib" scripts which implement useful utilities. All of the
ones I could readily test are working with BTF, for example:
sudo python -m drgn --no-default-symbols --btf -k contrib/slabinfo.py
[1] https://lore.kernel.org/all/20241002235253.487251-1-stephen.s.brennan@oracle.com/#t
Stephen Brennan (2):
kallsyms: output rodata to ".kallsyms_rodata"
btf: Add the option to include global variable types
include/asm-generic/vmlinux.lds.h | 1 +
lib/Kconfig.debug | 10 ++++++++++
scripts/Makefile.btf | 3 +++
scripts/kallsyms.c | 2 +-
4 files changed, 15 insertions(+), 1 deletion(-)
--
2.43.5
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata"
2025-02-07 1:20 [PATCH 0/2] Add option for generating BTF types of global variables Stephen Brennan
@ 2025-02-07 1:20 ` Stephen Brennan
2025-02-15 14:21 ` Masahiro Yamada
2025-02-07 1:20 ` [PATCH 2/2] btf: Add the option to include global variable types Stephen Brennan
1 sibling, 1 reply; 17+ messages in thread
From: Stephen Brennan @ 2025-02-07 1:20 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Masahiro Yamada, Andrii Nakryiko, Nicolas Schier, Kees Cook,
KP Singh, Stephen Brennan, Martin KaFai Lau, Sami Tolvanen,
Eduard Zingerman, linux-arch, Stanislav Fomichev, Kent Overstreet,
Pasha Tatashin, Jiri Olsa, John Fastabend, Jann Horn,
Ard Biesheuvel, Yonghong Song, Hao Luo, Andrew Morton,
linux-kbuild, Daniel Borkmann, Nathan Chancellor, linux-debuggers,
Alexei Starovoitov, Song Liu, linux-kernel, bpf
When vmlinux is linked, the rodata from kallsyms is placed arbitrarily
within the .rodata section. The linking process is repeated several
times, since the kallsyms data size changes, which shifts symbols,
requiring re-generating the data and re-linking.
BTF is generated during the first link only. For variables, BTF includes
a BTF_K_DATASEC for each data section that may contain a variable, which
includes the variable's name, type, and offset within the data section.
Because the size of kallsyms data changes during later links, the
offsets of variables placed after it in .rodata will change. This means
that BTF_K_DATASEC information for those variables becomes inaccurate.
This is not currently a problem, because BTF currently only generates
variable data for percpu variables. However, the next commit will add
support for generating BTF for all global variables, including for the
.rodata section.
We could re-generate BTF each time vmlinux is linked, but this is quite
expensive, and should be avoided at all costs. Further as each chunk of
data (BTF and kallsyms) are re-generated, there's no guarantee that
their sizes will converge anyway.
Instead, we can take advantage of the fact that BTF only cares to store
the offset of variables from the start of their section. Therefore, so
long as the kallsyms data is stored last in the .rodata section, no
offsets will be affected. Adjust kallsyms to output to .rodata.kallsyms,
and update the linker script to include this at the end of .rodata.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
include/asm-generic/vmlinux.lds.h | 1 +
scripts/kallsyms.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 54504013c7491..9284f0e502e27 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -463,6 +463,7 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
. = ALIGN(8); \
BOUNDED_SECTION_BY(__tracepoints_ptrs, ___tracepoints_ptrs) \
*(__tracepoints_strings)/* Tracepoints: strings */ \
+ *(.kallsyms_rodata) \
} \
\
.rodata1 : AT(ADDR(.rodata1) - LOAD_OFFSET) { \
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 03852da3d2490..743d3dd453599 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -365,7 +365,7 @@ static void write_src(void)
printf("#define ALGN .balign 4\n");
printf("#endif\n");
- printf("\t.section .rodata, \"a\"\n");
+ printf("\t.section .kallsyms_rodata, \"a\"\n");
output_label("kallsyms_num_syms");
printf("\t.long\t%u\n", table_cnt);
--
2.43.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/2] btf: Add the option to include global variable types
2025-02-07 1:20 [PATCH 0/2] Add option for generating BTF types of global variables Stephen Brennan
2025-02-07 1:20 ` [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata" Stephen Brennan
@ 2025-02-07 1:20 ` Stephen Brennan
2025-02-07 23:50 ` Alexei Starovoitov
1 sibling, 1 reply; 17+ messages in thread
From: Stephen Brennan @ 2025-02-07 1:20 UTC (permalink / raw)
To: Masahiro Yamada
Cc: Andrii Nakryiko, Nicolas Schier, Kees Cook, KP Singh,
Stephen Brennan, Martin KaFai Lau, Sami Tolvanen,
Eduard Zingerman, linux-arch, Stanislav Fomichev, Kent Overstreet,
Pasha Tatashin, Jiri Olsa, John Fastabend, Jann Horn,
Ard Biesheuvel, Yonghong Song, Hao Luo, Andrew Morton,
linux-kbuild, Daniel Borkmann, Arnd Bergmann, Nathan Chancellor,
linux-debuggers, Alexei Starovoitov, Song Liu, linux-kernel, bpf
Since pahole 1.28, BTF can now include types for all global variables.
Previously, BTF has only included types for functions, as well as percpu
variables.
There are a few applications for this type information. For one, runtime
debuggers like drgn[1] can consume it in the absence of DWARF debuginfo.
The support in drgn is currently implemented and moving through the
review process, see [2]. For distributions which don't distribute DWARF
debuginfo, or for situations where it can't be made available, the
compact BTF, combined with ORC for stack unwinding, and the kallsyms
symbol table, can be used for simple runtime debugging and
introspection.
Another application is verifying types of ksyms in BPF programs. libbpf
already supports resolving global variables with "__ksym", but they must
be declared as void. For example, in
tools/bpf/bpftool/skeleton/pid_iter.bpf.c we have:
extern const void bpf_map_fops __ksym;
With global variable information, declarations like these would be able
to use the actual variable types, for example:
extern const struct file_operations bpf_map_fops __ksym;
When the feature was implemented in pahole, my measurements indicated
that vmlinux BTF size increased by about 25.8%, and module BTF size
increased by 53.2%. Due to these increases, the feature is implemented
behind a new config option, allowing users sensitive to increased memory
usage to disable it.
[1]: https://github.com/osandov/drgn
[2]: https://github.com/osandov/drgn/issues/176
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
lib/Kconfig.debug | 10 ++++++++++
scripts/Makefile.btf | 3 +++
2 files changed, 13 insertions(+)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1af972a92d06f..3fbdc5ba2d017 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -409,6 +409,16 @@ config PAHOLE_HAS_LANG_EXCLUDE
otherwise it would emit malformed kernel and module binaries when
using DEBUG_INFO_BTF_MODULES.
+config DEBUG_INFO_BTF_GLOBAL_VARS
+ bool "Generate BTF type information for all global variables"
+ default y
+ depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
+ help
+ Include type information for all global variables in the BTF. This
+ increases the size of the BTF information, which increases memory
+ usage at runtime. With global variable types available, runtime
+ debugging and tracers may be able to provide more detail.
+
config DEBUG_INFO_BTF_MODULES
bool "Generate BTF type information for kernel modules"
default y
diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
index c3cbeb13de503..ad3c05a96a010 100644
--- a/scripts/Makefile.btf
+++ b/scripts/Makefile.btf
@@ -31,5 +31,8 @@ endif
pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE) += --lang_exclude=rust
+# Requires v1.28 or later, enforced by KConfig
+pahole-flags-$(CONFIG_DEBUG_INFO_BTF_GLOBAL_VARS) += --btf_features=global_var
+
export PAHOLE_FLAGS := $(pahole-flags-y)
export MODULE_PAHOLE_FLAGS := $(module-pahole-flags-y)
--
2.43.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] btf: Add the option to include global variable types
2025-02-07 1:20 ` [PATCH 2/2] btf: Add the option to include global variable types Stephen Brennan
@ 2025-02-07 23:50 ` Alexei Starovoitov
2025-02-11 23:58 ` Stephen Brennan
2025-02-25 10:01 ` Alan Maguire
0 siblings, 2 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2025-02-07 23:50 UTC (permalink / raw)
To: Stephen Brennan
Cc: Masahiro Yamada, Andrii Nakryiko, Nicolas Schier, Kees Cook,
KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
Yonghong Song, Hao Luo, Andrew Morton, Linux Kbuild mailing list,
Daniel Borkmann, Arnd Bergmann, Nathan Chancellor,
linux-debuggers, Alexei Starovoitov, Song Liu, LKML, bpf
On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
> When the feature was implemented in pahole, my measurements indicated
> that vmlinux BTF size increased by about 25.8%, and module BTF size
> increased by 53.2%. Due to these increases, the feature is implemented
> behind a new config option, allowing users sensitive to increased memory
> usage to disable it.
>
...
> +config DEBUG_INFO_BTF_GLOBAL_VARS
> + bool "Generate BTF type information for all global variables"
> + default y
> + depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
> + help
> + Include type information for all global variables in the BTF. This
> + increases the size of the BTF information, which increases memory
> + usage at runtime. With global variable types available, runtime
> + debugging and tracers may be able to provide more detail.
This is not a solution.
Even if it's changed to 'default n' distros will enable it
like they enable everything and will suffer a regression.
We need to add a new module like vmlinux_btf.ko that will contain
this additional BTF data. For global vars and everything else we might need.
pw-bot: cr
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] btf: Add the option to include global variable types
2025-02-07 23:50 ` Alexei Starovoitov
@ 2025-02-11 23:58 ` Stephen Brennan
2025-02-14 1:18 ` Alexei Starovoitov
2025-02-25 10:01 ` Alan Maguire
1 sibling, 1 reply; 17+ messages in thread
From: Stephen Brennan @ 2025-02-11 23:58 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Masahiro Yamada, Andrii Nakryiko, Nicolas Schier, Kees Cook,
KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
Yonghong Song, Hao Luo, Andrew Morton, Linux Kbuild mailing list,
Daniel Borkmann, Arnd Bergmann, Nathan Chancellor,
linux-debuggers, Alexei Starovoitov, Song Liu, LKML, bpf
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>> When the feature was implemented in pahole, my measurements indicated
>> that vmlinux BTF size increased by about 25.8%, and module BTF size
>> increased by 53.2%. Due to these increases, the feature is implemented
>> behind a new config option, allowing users sensitive to increased memory
>> usage to disable it.
>>
>
> ...
>> +config DEBUG_INFO_BTF_GLOBAL_VARS
>> + bool "Generate BTF type information for all global variables"
>> + default y
>> + depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
>> + help
>> + Include type information for all global variables in the BTF. This
>> + increases the size of the BTF information, which increases memory
>> + usage at runtime. With global variable types available, runtime
>> + debugging and tracers may be able to provide more detail.
>
> This is not a solution.
> Even if it's changed to 'default n' distros will enable it
> like they enable everything and will suffer a regression.
>
> We need to add a new module like vmlinux_btf.ko that will contain
> this additional BTF data. For global vars and everything else we might need.
Fair enough. I believe I had shared Alan Maguire's proof-of-concept for
that idea a while back for an older version of this feature:
https://lore.kernel.org/all/20221104231103.752040-10-stephen.s.brennan@oracle.com/
We can dust that off and include it for a new version of this series.
I'd be curious of what you'd like to see for kernel modules? A
three-level tree would be too complex, in my opinion.
As a separate note for this patch series, we discovered that variables
declared twice, where one is declared "__weak", will result in two DWARF
variable declarations, and thus two BTF variables. This trips up the BTF
validation code. So this series as it is cannot move forward. I'm
submitting a fix to dwarves today.
Thanks,
Stephen
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] btf: Add the option to include global variable types
2025-02-11 23:58 ` Stephen Brennan
@ 2025-02-14 1:18 ` Alexei Starovoitov
2025-02-18 23:09 ` Stephen Brennan
0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2025-02-14 1:18 UTC (permalink / raw)
To: Stephen Brennan
Cc: Masahiro Yamada, Andrii Nakryiko, Nicolas Schier, Kees Cook,
KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
Yonghong Song, Hao Luo, Andrew Morton, Linux Kbuild mailing list,
Daniel Borkmann, Arnd Bergmann, Nathan Chancellor,
linux-debuggers, Alexei Starovoitov, Song Liu, LKML, bpf
On Tue, Feb 11, 2025 at 3:59 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> > On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> >> When the feature was implemented in pahole, my measurements indicated
> >> that vmlinux BTF size increased by about 25.8%, and module BTF size
> >> increased by 53.2%. Due to these increases, the feature is implemented
> >> behind a new config option, allowing users sensitive to increased memory
> >> usage to disable it.
> >>
> >
> > ...
> >> +config DEBUG_INFO_BTF_GLOBAL_VARS
> >> + bool "Generate BTF type information for all global variables"
> >> + default y
> >> + depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
> >> + help
> >> + Include type information for all global variables in the BTF. This
> >> + increases the size of the BTF information, which increases memory
> >> + usage at runtime. With global variable types available, runtime
> >> + debugging and tracers may be able to provide more detail.
> >
> > This is not a solution.
> > Even if it's changed to 'default n' distros will enable it
> > like they enable everything and will suffer a regression.
> >
> > We need to add a new module like vmlinux_btf.ko that will contain
> > this additional BTF data. For global vars and everything else we might need.
>
> Fair enough. I believe I had shared Alan Maguire's proof-of-concept for
> that idea a while back for an older version of this feature:
>
> https://lore.kernel.org/all/20221104231103.752040-10-stephen.s.brennan@oracle.com/
Right vmlinux_extra was discussed in various context, so let's make it happen.
> We can dust that off and include it for a new version of this series.
> I'd be curious of what you'd like to see for kernel modules? A
> three-level tree would be too complex, in my opinion.
What is the use case for vars in kernel modules?
> module BTF size increased by 53.2%.
This is the sum of all mods with vars divided by
the sum of all mods without?
Any outliers there?
I would expect modules to have few global variables.
So before we decide on what to do with vars in mods lets figure out
the need.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata"
2025-02-07 1:20 ` [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata" Stephen Brennan
@ 2025-02-15 14:21 ` Masahiro Yamada
2025-02-24 18:51 ` Andrii Nakryiko
0 siblings, 1 reply; 17+ messages in thread
From: Masahiro Yamada @ 2025-02-15 14:21 UTC (permalink / raw)
To: Stephen Brennan
Cc: Arnd Bergmann, Andrii Nakryiko, Nicolas Schier, Kees Cook,
KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
Yonghong Song, Hao Luo, Andrew Morton, linux-kbuild,
Daniel Borkmann, Nathan Chancellor, linux-debuggers,
Alexei Starovoitov, Song Liu, linux-kernel, bpf
On Fri, Feb 7, 2025 at 10:21 AM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> When vmlinux is linked, the rodata from kallsyms is placed arbitrarily
> within the .rodata section. The linking process is repeated several
> times, since the kallsyms data size changes, which shifts symbols,
> requiring re-generating the data and re-linking.
>
> BTF is generated during the first link only. For variables, BTF includes
> a BTF_K_DATASEC for each data section that may contain a variable, which
> includes the variable's name, type, and offset within the data section.
> Because the size of kallsyms data changes during later links, the
> offsets of variables placed after it in .rodata will change. This means
> that BTF_K_DATASEC information for those variables becomes inaccurate.
>
> This is not currently a problem, because BTF currently only generates
> variable data for percpu variables. However, the next commit will add
> support for generating BTF for all global variables, including for the
> .rodata section.
>
> We could re-generate BTF each time vmlinux is linked, but this is quite
> expensive, and should be avoided at all costs. Further as each chunk of
> data (BTF and kallsyms) are re-generated, there's no guarantee that
> their sizes will converge anyway.
>
> Instead, we can take advantage of the fact that BTF only cares to store
> the offset of variables from the start of their section. Therefore, so
> long as the kallsyms data is stored last in the .rodata section, no
> offsets will be affected. Adjust kallsyms to output to .rodata.kallsyms,
> and update the linker script to include this at the end of .rodata.
>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> ---
I am fine if this is helpful for BTF.
--
Best Regards
Masahiro Yamada
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] btf: Add the option to include global variable types
2025-02-14 1:18 ` Alexei Starovoitov
@ 2025-02-18 23:09 ` Stephen Brennan
2025-02-25 21:47 ` Andrii Nakryiko
0 siblings, 1 reply; 17+ messages in thread
From: Stephen Brennan @ 2025-02-18 23:09 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Masahiro Yamada, Andrii Nakryiko, Nicolas Schier, Kees Cook,
KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
Yonghong Song, Hao Luo, Andrew Morton, Linux Kbuild mailing list,
Daniel Borkmann, Arnd Bergmann, Nathan Chancellor,
linux-debuggers, Alexei Starovoitov, Song Liu, LKML, bpf
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Tue, Feb 11, 2025 at 3:59 PM Stephen Brennan
[...]
>> We can dust that off and include it for a new version of this series.
>> I'd be curious of what you'd like to see for kernel modules? A
>> three-level tree would be too complex, in my opinion.
>
> What is the use case for vars in kernel modules?
The use case would be the same as for the core kernel. My primary
motivation is to allow drgn to understand the types of global variables,
and that extends to kernel modules too.
>> module BTF size increased by 53.2%.
>
> This is the sum of all mods with vars divided by
> the sum of all mods without?
That was a poorly done comparison, so let me provide this one that I did
using 6.13 and these patches. It was essentially a localmodconfig for a
VM instance, so I could still do better by picking a popular
distribution config. But I think this is far more representative.
MODULE BASE COMP CHG PCT
drm.ko 115833 123410 7577 6.54%
iscsi_boot_sysfs.ko 2627 5380 2753 104.80%
joydev.ko 1816 2289 473 26.05%
libcxgbi.ko 24556 25266 710 2.89%
drm_vram_helper.ko 22325 22751 426 1.91%
nvme-tcp.ko 25044 25973 929 3.71%
vfat.ko 3448 3953 505 14.65%
btrfs.ko 275139 343686 68547 24.91%
libiscsi.ko 21177 21977 800 3.78%
xt_owner.ko 449 803 354 78.84%
nft_ct.ko 4912 6157 1245 25.35%
iscsi_ibft.ko 3967 4463 496 12.50%
pcspkr.ko 283 682 399 140.99%
crc32-pclmul.ko 390 771 381 97.69%
nf_conntrack.ko 23686 28191 4505 19.02%
iscsi_tcp.ko 16827 17750 923 5.49%
nft_fib.ko 835 1117 282 33.77%
nf_reject_ipv6.ko 699 981 282 40.34%
rfkill.ko 4233 6410 2177 51.43%
dm-region-hash.ko 6214 6496 282 4.54%
cxgb3i.ko 35469 37078 1609 4.54%
dm-mirror.ko 7576 8191 615 8.12%
pvpanic-pci.ko 174 574 400 229.89%
crct10dif-pclmul.ko 146 525 379 259.59%
nvme-fabrics.ko 17341 18124 783 4.52%
kvm-amd.ko 47302 51914 4612 9.75%
crc8.ko 221 405 184 83.26%
ib_iser.ko 27769 29116 1347 4.85%
sg.ko 4234 5656 1422 33.59%
intel_rapl_common.ko 5678 8446 2768 48.75%
bochs.ko 35643 36997 1354 3.80%
sha1-ssse3.ko 790 1305 515 65.19%
kvm-intel.ko 53802 59220 5418 10.07%
nft_chain_nat.ko 279 714 435 155.91%
vmlinux 5484970 7330096 1845126 33.64%
sha256-ssse3.ko 851 1378 527 61.93%
nf_nat.ko 6341 7240 899 14.18%
configs.ko 72 256 184 255.56%
xt_comment.ko 151 507 356 235.76%
ccp.ko 30433 34782 4349 14.29%
cxgb3.ko 44981 47504 2523 5.61%
crypto_simd.ko 1331 1613 282 21.19%
iptable_filter.ko 855 1456 601 70.29%
qedi.ko 70653 72786 2133 3.02%
drm_kms_helper.ko 63238 65000 1762 2.79%
cnic.ko 117074 117790 716 0.61%
failover.ko 780 1216 436 55.90%
nft_redir.ko 874 1529 655 74.94%
serio_raw.ko 708 1234 526 74.29%
nf_defrag_ipv6.ko 1520 2253 733 48.22%
nf_defrag_ipv4.ko 306 770 464 151.63%
nft_reject_ipv4.ko 517 939 422 81.62%
nft_nat.ko 1192 1732 540 45.30%
nft_reject_inet.ko 554 976 422 76.17%
fuse.ko 32181 41859 9678 30.07%
nft_compat.ko 3705 4404 699 18.87%
zstd_compress.ko 42597 43622 1025 2.41%
tls.ko 15140 20683 5543 36.61%
virtio_pci.ko 8456 9193 737 8.72%
blake2b_generic.ko 1364 1699 335 24.56%
cryptd.ko 3697 4297 600 16.23%
xor.ko 1358 1879 521 38.37%
intel_rapl_msr.ko 2851 3440 589 20.66%
kvm.ko 177060 256377 79317 44.80%
cxgb4.ko 215865 220844 4979 2.31%
bnx2i.ko 39524 41477 1953 4.94%
dm-round-robin.ko 1795 2123 328 18.27%
virtio_pci_legacy_dev.ko 909 1191 282 31.02%
qla4xxx.ko 79040 82694 3654 4.62%
nfs.ko 108350 169642 61292 56.57%
libata.ko 47301 66188 18887 39.93%
ghash-clmulni-intel.ko 578 997 419 72.49%
nf_reject_ipv4.ko 706 988 282 39.94%
nft_reject.ko 820 1196 376 45.85%
sunrpc.ko 127496 197841 70345 55.17%
nft_fib_ipv4.ko 803 1257 454 56.54%
scsi_transport_iscsi.ko 40419 57633 17214 42.59%
lockd.ko 36144 42137 5993 16.58%
drm_shmem_helper.ko 32555 33043 488 1.50%
nvme-core.ko 50275 58298 8023 15.96%
iw_cm.ko 13405 14796 1391 10.38%
mdio.ko 857 1041 184 21.47%
bnx2.ko 20354 21611 1257 6.18%
net_failover.ko 1742 2187 445 25.55%
ip_set.ko 11812 13093 1281 10.84%
libcxgb.ko 8698 8980 282 3.24%
dm-multipath.ko 8124 8898 774 9.53%
grace.ko 462 890 428 92.64%
virtio_net.ko 12322 14896 2574 20.89%
qed.ko 228735 232231 3496 1.53%
cdc-acm.ko 2923 3679 756 25.86%
i2c-piix4.ko 1124 2341 1217 108.27%
pvpanic-mmio.ko 177 625 448 253.11%
virtio_scsi.ko 3154 3898 744 23.59%
uio.ko 2602 4295 1693 65.07%
nft_fib_ipv6.ko 956 1410 454 47.49%
cec.ko 28370 29266 896 3.16%
qemu_fw_cfg.ko 1601 3476 1875 117.11%
ttm.ko 23672 25727 2055 8.68%
sd_mod.ko 9976 13030 3054 30.61%
xfs.ko 574594 926637 352043 61.27%
libiscsi_tcp.ko 17444 17911 467 2.68%
ib_cm.ko 32324 62373 30049 92.96%
aesni-intel.ko 3370 4922 1552 46.05%
drm_client_lib.ko 27449 27794 345 1.26%
virtio_pci_modern_dev.ko 2537 2819 282 11.12%
rdma_cm.ko 32504 51823 19319 59.44%
fat.ko 11958 13297 1339 11.20%
dm-log.ko 6529 6986 457 7.00%
pata_acpi.ko 9231 9700 469 5.08%
ata_piix.ko 10998 12598 1600 14.55%
ipt_REJECT.ko 956 1311 355 37.13%
drm_ttm_helper.ko 33160 33544 384 1.16%
be2iscsi.ko 55078 56993 1915 3.48%
i2c-smbus.ko 582 973 391 67.18%
cuse.ko 8435 9241 806 9.56%
nft_fib_inet.ko 579 995 416 71.85%
ib_core.ko 103656 123701 20045 19.34%
pulse8-cec.ko 9153 9890 737 8.05%
pvpanic.ko 494 1087 593 120.04%
dm-mod.ko 31377 35265 3888 12.39%
raid6_pq.ko 2774 4207 1433 51.66%
nft_reject_ipv6.ko 517 939 422 81.62%
cxgb4i.ko 47490 49021 1531 3.22%
ata_generic.ko 9008 9666 658 7.30%
vboxvideo.ko 47622 48844 1222 2.57%
ip_tables.ko 3109 3564 455 14.63%
ALL MODS 9153268 11895301 2742033 29.96%
vmlinux 5484970 7330096 1845126 33.64%
TOTAL 14638238 19225397 4587159 31.34%
So this shows a 1.8 MiB increase in vmlinux size, or 33.6%.
And for these modules in aggregate, an increase of 2.7 MiB or 30.0%.
> Any outliers there?
> I would expect modules to have few global variables.
In terms of outliers, there are groups that stand out to me:
1. Large percentage increases are usually always for modules that had
very tiny BTF before. The module system inherently creates a few
global variables for each module, so there's always a slight constant
increase of the BTF size (184 bytes, as far as I can tell), and in those
cases it can be a quite large percentage. Here's an example,
"configs.ko" which comes from the CONFIG_IKCONFIG enablement:
BEFORE:
$ bpftool btf dump file ../build_pahole_novars/kernel/configs.ko -B ../build_pahole_novars/vmlinux
[127877] CONST '(anon)' type_id=11124
[127878] ARRAY '(anon)' type_id=127877 index_type_id=21 nr_elems=1
[127879] CONST '(anon)' type_id=127878
AFTER:
$ bpftool btf dump file ../build_pahole_vars/kernel/configs.ko -B ../build_pahole_vars/vmlinux
[162827] CONST '(anon)' type_id=11124
[162828] ARRAY '(anon)' type_id=162827 index_type_id=21 nr_elems=1
[162829] CONST '(anon)' type_id=162828
[162830] VAR '____versions' type_id=162829, linkage=static
[162831] DATASEC '__versions' size=64 vlen=1
type_id=162830 offset=0 size=64 (VAR '____versions')
[162832] VAR 'orc_header' type_id=8667, linkage=static
[162833] DATASEC '.orc_header' size=20 vlen=1
type_id=162832 offset=0 size=20 (VAR 'orc_header')
[162834] VAR '__this_module' type_id=312, linkage=global
[162835] DATASEC '.gnu.linkonce.this_module' size=1344 vlen=1
type_id=162834 offset=0 size=1344 (VAR '__this_module')
What is, I think interesting, is that the types in that module were
totally useless to begin with, because they were used by a variable
which didn't even get emitted. So while this is a substantial
percentage-wise increase, I think it's a net improvement for this and
other modules.
2. The largest absolute increases come from large, complex modules like
xfs, kvm, sunrpc, btrfs, etc. For example, xfs had 5696 VAR
declarations. What is disappointing is how much of this is due to
automatically-generated "variables" from macros (e.g. tracepoints):
Here is a list of variable prefixes like that:
print_fmt_*
trace_event_fields_*
trace_event_type_funcs_*
event_*
__SCK__tp_func_*
__bpf_trace_tp_map_*
__event_*
event_class_*
TRACE_SYSTEM_*
__TRACE_SYSTEM_*
__tracepoint_*
These are, unfortunately, all valid declarations produced by macros and
they correspond to valid symbols as well. If you look at the kallsyms
for the modules (and core kernel), these variables are present there as
well. It may indeed make sense to have kallsyms entries for them: I
don't know.
These are all, as far as I'm concerned, totally uninteresting types. If
you want to access any of this data, you probably already know its type
and wouldn't need a BTF declaration. Unfortunately, the flip side is
that I don't think we have a good way to automatically detect these,
outside of prefix matching, which quickly goes out of date as the kernel
changes, and can have false positives as well. For kernel modules, many
of these may appear in separate ELF sections, but for vmlinux, they
don't. I'd be happy to eliminate types for these auto-generated kinds of
variables, if we could somehow annotate them so that pahole knows to
ignore them. For instance, maybe we cauld use
__attribute__((btf_decl_tag("btf_omit")))
as an instruction to pahole to omit declarations for these things?
Thanks,
Stephen
> So before we decide on what to do with vars in mods lets figure out
> the need.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata"
2025-02-15 14:21 ` Masahiro Yamada
@ 2025-02-24 18:51 ` Andrii Nakryiko
2025-02-25 1:24 ` Stephen Brennan
0 siblings, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2025-02-24 18:51 UTC (permalink / raw)
To: Masahiro Yamada
Cc: Stephen Brennan, Arnd Bergmann, Andrii Nakryiko, Nicolas Schier,
Kees Cook, KP Singh, Martin KaFai Lau, Sami Tolvanen,
Eduard Zingerman, linux-arch, Stanislav Fomichev, Kent Overstreet,
Pasha Tatashin, Jiri Olsa, John Fastabend, Jann Horn,
Ard Biesheuvel, Yonghong Song, Hao Luo, Andrew Morton,
linux-kbuild, Daniel Borkmann, Nathan Chancellor, linux-debuggers,
Alexei Starovoitov, Song Liu, linux-kernel, bpf
On Sat, Feb 15, 2025 at 6:21 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> On Fri, Feb 7, 2025 at 10:21 AM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
> >
> > When vmlinux is linked, the rodata from kallsyms is placed arbitrarily
> > within the .rodata section. The linking process is repeated several
> > times, since the kallsyms data size changes, which shifts symbols,
> > requiring re-generating the data and re-linking.
> >
> > BTF is generated during the first link only. For variables, BTF includes
> > a BTF_K_DATASEC for each data section that may contain a variable, which
> > includes the variable's name, type, and offset within the data section.
> > Because the size of kallsyms data changes during later links, the
> > offsets of variables placed after it in .rodata will change. This means
> > that BTF_K_DATASEC information for those variables becomes inaccurate.
> >
> > This is not currently a problem, because BTF currently only generates
> > variable data for percpu variables. However, the next commit will add
> > support for generating BTF for all global variables, including for the
> > .rodata section.
> >
> > We could re-generate BTF each time vmlinux is linked, but this is quite
> > expensive, and should be avoided at all costs. Further as each chunk of
> > data (BTF and kallsyms) are re-generated, there's no guarantee that
> > their sizes will converge anyway.
> >
> > Instead, we can take advantage of the fact that BTF only cares to store
> > the offset of variables from the start of their section. Therefore, so
> > long as the kallsyms data is stored last in the .rodata section, no
> > offsets will be affected. Adjust kallsyms to output to .rodata.kallsyms,
> > and update the linker script to include this at the end of .rodata.
> >
> > Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> > ---
>
> I am fine if this is helpful for BTF.
This seems like a useful change all by itself even while the main
feature of this patch set is still being developed and reviewed.
Should we land just this .kallsyms_rodata change?
>
>
>
> --
> Best Regards
> Masahiro Yamada
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata"
2025-02-24 18:51 ` Andrii Nakryiko
@ 2025-02-25 1:24 ` Stephen Brennan
2025-02-25 16:59 ` Andrii Nakryiko
0 siblings, 1 reply; 17+ messages in thread
From: Stephen Brennan @ 2025-02-25 1:24 UTC (permalink / raw)
To: Andrii Nakryiko, Masahiro Yamada
Cc: Arnd Bergmann, Andrii Nakryiko, Nicolas Schier, Kees Cook,
KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
Yonghong Song, Hao Luo, Andrew Morton, linux-kbuild,
Daniel Borkmann, Nathan Chancellor, linux-debuggers,
Alexei Starovoitov, Song Liu, linux-kernel, bpf
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Sat, Feb 15, 2025 at 6:21 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>>
>> On Fri, Feb 7, 2025 at 10:21 AM Stephen Brennan
>> <stephen.s.brennan@oracle.com> wrote:
>> >
>> > When vmlinux is linked, the rodata from kallsyms is placed arbitrarily
>> > within the .rodata section. The linking process is repeated several
>> > times, since the kallsyms data size changes, which shifts symbols,
>> > requiring re-generating the data and re-linking.
>> >
>> > BTF is generated during the first link only. For variables, BTF includes
>> > a BTF_K_DATASEC for each data section that may contain a variable, which
>> > includes the variable's name, type, and offset within the data section.
>> > Because the size of kallsyms data changes during later links, the
>> > offsets of variables placed after it in .rodata will change. This means
>> > that BTF_K_DATASEC information for those variables becomes inaccurate.
>> >
>> > This is not currently a problem, because BTF currently only generates
>> > variable data for percpu variables. However, the next commit will add
>> > support for generating BTF for all global variables, including for the
>> > .rodata section.
>> >
>> > We could re-generate BTF each time vmlinux is linked, but this is quite
>> > expensive, and should be avoided at all costs. Further as each chunk of
>> > data (BTF and kallsyms) are re-generated, there's no guarantee that
>> > their sizes will converge anyway.
>> >
>> > Instead, we can take advantage of the fact that BTF only cares to store
>> > the offset of variables from the start of their section. Therefore, so
>> > long as the kallsyms data is stored last in the .rodata section, no
>> > offsets will be affected. Adjust kallsyms to output to .rodata.kallsyms,
>> > and update the linker script to include this at the end of .rodata.
>> >
>> > Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
>> > ---
>>
>> I am fine if this is helpful for BTF.
>
> This seems like a useful change all by itself even while the main
> feature of this patch set is still being developed and reviewed.
> Should we land just this .kallsyms_rodata change?
I would be happy to see it merged now.
I don't think it would help anything other than BTF, because most other
things (e.g. kallsyms) refer to symbols via an absolute address. Using
the section offset seems pretty uncommon.
But it still is a nice cleanup anyway.
Thanks,
Stephen
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] btf: Add the option to include global variable types
2025-02-07 23:50 ` Alexei Starovoitov
2025-02-11 23:58 ` Stephen Brennan
@ 2025-02-25 10:01 ` Alan Maguire
2025-02-25 21:52 ` Andrii Nakryiko
2025-05-12 11:15 ` Tony Ambardar
1 sibling, 2 replies; 17+ messages in thread
From: Alan Maguire @ 2025-02-25 10:01 UTC (permalink / raw)
To: Alexei Starovoitov, Stephen Brennan
Cc: Masahiro Yamada, Andrii Nakryiko, Nicolas Schier, Kees Cook,
KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
Yonghong Song, Hao Luo, Andrew Morton, Linux Kbuild mailing list,
Daniel Borkmann, Arnd Bergmann, Nathan Chancellor,
linux-debuggers, Alexei Starovoitov, Song Liu, LKML, bpf
On 07/02/2025 23:50, Alexei Starovoitov wrote:
> On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>> When the feature was implemented in pahole, my measurements indicated
>> that vmlinux BTF size increased by about 25.8%, and module BTF size
>> increased by 53.2%. Due to these increases, the feature is implemented
>> behind a new config option, allowing users sensitive to increased memory
>> usage to disable it.
>>
>
> ...
>> +config DEBUG_INFO_BTF_GLOBAL_VARS
>> + bool "Generate BTF type information for all global variables"
>> + default y
>> + depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
>> + help
>> + Include type information for all global variables in the BTF. This
>> + increases the size of the BTF information, which increases memory
>> + usage at runtime. With global variable types available, runtime
>> + debugging and tracers may be able to provide more detail.
>
> This is not a solution.
> Even if it's changed to 'default n' distros will enable it
> like they enable everything and will suffer a regression.
>
> We need to add a new module like vmlinux_btf.ko that will contain
> this additional BTF data. For global vars and everything else we might need.
>
In this area, I've been exploring adding support for
CONFIG_DEBUG_INFO_BTF=m , so that the BTF info for vmlinux is delivered
via a module. From the consumer side, everything looks identical
(/sys/kernel/btf/vmlinux is there etc), it is just that the .BTF section
is delivered via btf_vmlinux.ko instead. The original need for this was
that embedded folks noted that because in the current situation BTF data
is in vmlinux, they cannot enable BTF because such small-footprint
systems do not support a large vmlinux binary. However they could
potentially use kernel BTF if it was delivered via a module. The other
nice thing about module delivery in the general case is we can make use
of module compression. In experiments I see a 5.8Mb vmlinux BTF reduce
to a 1.8Mb btf_vmlinux.ko.gz module on-disk.
The challenge in delivering vmlinux BTF in a module is that on module
load during boot other modules expect vmlinux BTF to be there when
adding their own BTF to /sys/kernel/btf. And kfunc registration from
kernel and modules expects this also. So support for deferred BTF module
load/kfunc registration is required too. I've implemented the former and
now am working on the latter. Hope to have some RFC patches ready soon,
but it looks feasible at this point.
Assuming such an option was available to small-footprint systems, should
we consider adding global variables to core vmlinux BTF along with
per-cpu variables? Then vmlinux BTF extras could be used for some of the
additional optional representations like function site-specific data
(inlines etc)? Or are there other factors other than on-disk footprint
that we need to consider? Thanks!
Alan
> pw-bot: cr
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata"
2025-02-25 1:24 ` Stephen Brennan
@ 2025-02-25 16:59 ` Andrii Nakryiko
0 siblings, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2025-02-25 16:59 UTC (permalink / raw)
To: Stephen Brennan
Cc: Masahiro Yamada, Arnd Bergmann, Andrii Nakryiko, Nicolas Schier,
Kees Cook, KP Singh, Martin KaFai Lau, Sami Tolvanen,
Eduard Zingerman, linux-arch, Stanislav Fomichev, Kent Overstreet,
Pasha Tatashin, Jiri Olsa, John Fastabend, Jann Horn,
Ard Biesheuvel, Yonghong Song, Hao Luo, Andrew Morton,
linux-kbuild, Daniel Borkmann, Nathan Chancellor, linux-debuggers,
Alexei Starovoitov, Song Liu, linux-kernel, bpf
On Mon, Feb 24, 2025 at 5:24 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> > On Sat, Feb 15, 2025 at 6:21 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
> >>
> >> On Fri, Feb 7, 2025 at 10:21 AM Stephen Brennan
> >> <stephen.s.brennan@oracle.com> wrote:
> >> >
> >> > When vmlinux is linked, the rodata from kallsyms is placed arbitrarily
> >> > within the .rodata section. The linking process is repeated several
> >> > times, since the kallsyms data size changes, which shifts symbols,
> >> > requiring re-generating the data and re-linking.
> >> >
> >> > BTF is generated during the first link only. For variables, BTF includes
> >> > a BTF_K_DATASEC for each data section that may contain a variable, which
> >> > includes the variable's name, type, and offset within the data section.
> >> > Because the size of kallsyms data changes during later links, the
> >> > offsets of variables placed after it in .rodata will change. This means
> >> > that BTF_K_DATASEC information for those variables becomes inaccurate.
> >> >
> >> > This is not currently a problem, because BTF currently only generates
> >> > variable data for percpu variables. However, the next commit will add
> >> > support for generating BTF for all global variables, including for the
> >> > .rodata section.
> >> >
> >> > We could re-generate BTF each time vmlinux is linked, but this is quite
> >> > expensive, and should be avoided at all costs. Further as each chunk of
> >> > data (BTF and kallsyms) are re-generated, there's no guarantee that
> >> > their sizes will converge anyway.
> >> >
> >> > Instead, we can take advantage of the fact that BTF only cares to store
> >> > the offset of variables from the start of their section. Therefore, so
> >> > long as the kallsyms data is stored last in the .rodata section, no
> >> > offsets will be affected. Adjust kallsyms to output to .rodata.kallsyms,
> >> > and update the linker script to include this at the end of .rodata.
> >> >
> >> > Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> >> > ---
> >>
> >> I am fine if this is helpful for BTF.
> >
> > This seems like a useful change all by itself even while the main
> > feature of this patch set is still being developed and reviewed.
> > Should we land just this .kallsyms_rodata change?
>
> I would be happy to see it merged now.
>
> I don't think it would help anything other than BTF, because most other
> things (e.g. kallsyms) refer to symbols via an absolute address. Using
> the section offset seems pretty uncommon.
>
> But it still is a nice cleanup anyway.
I was thinking about possible use cases of some tooling wanting to
access kallsyms data from vmlinux (instead of from /proc/kallsyms).
But, frankly, having a separate section doesn't help all that much
even there. We either way seem to have ELF symbols pointing to
relevant pieces of information, so it's not hard to get it even if
it's part of .rodata. So I guess we don't have to rush landing this
patch separately.
>
> Thanks,
> Stephen
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] btf: Add the option to include global variable types
2025-02-18 23:09 ` Stephen Brennan
@ 2025-02-25 21:47 ` Andrii Nakryiko
0 siblings, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2025-02-25 21:47 UTC (permalink / raw)
To: Stephen Brennan
Cc: Alexei Starovoitov, Masahiro Yamada, Andrii Nakryiko,
Nicolas Schier, Kees Cook, KP Singh, Martin KaFai Lau,
Sami Tolvanen, Eduard Zingerman, linux-arch, Stanislav Fomichev,
Kent Overstreet, Pasha Tatashin, Jiri Olsa, John Fastabend,
Jann Horn, Ard Biesheuvel, Yonghong Song, Hao Luo, Andrew Morton,
Linux Kbuild mailing list, Daniel Borkmann, Arnd Bergmann,
Nathan Chancellor, linux-debuggers, Alexei Starovoitov, Song Liu,
LKML, bpf
On Tue, Feb 18, 2025 at 3:10 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> > On Tue, Feb 11, 2025 at 3:59 PM Stephen Brennan
> [...]
> >> We can dust that off and include it for a new version of this series.
> >> I'd be curious of what you'd like to see for kernel modules? A
> >> three-level tree would be too complex, in my opinion.
> >
> > What is the use case for vars in kernel modules?
>
> The use case would be the same as for the core kernel. My primary
> motivation is to allow drgn to understand the types of global variables,
> and that extends to kernel modules too.
>
> >> module BTF size increased by 53.2%.
> >
> > This is the sum of all mods with vars divided by
> > the sum of all mods without?
>
> That was a poorly done comparison, so let me provide this one that I did
> using 6.13 and these patches. It was essentially a localmodconfig for a
> VM instance, so I could still do better by picking a popular
> distribution config. But I think this is far more representative.
>
> MODULE BASE COMP CHG PCT
> drm.ko 115833 123410 7577 6.54%
> iscsi_boot_sysfs.ko 2627 5380 2753 104.80%
> joydev.ko 1816 2289 473 26.05%
> libcxgbi.ko 24556 25266 710 2.89%
> drm_vram_helper.ko 22325 22751 426 1.91%
> nvme-tcp.ko 25044 25973 929 3.71%
> vfat.ko 3448 3953 505 14.65%
> btrfs.ko 275139 343686 68547 24.91%
> libiscsi.ko 21177 21977 800 3.78%
> xt_owner.ko 449 803 354 78.84%
> nft_ct.ko 4912 6157 1245 25.35%
> iscsi_ibft.ko 3967 4463 496 12.50%
> pcspkr.ko 283 682 399 140.99%
> crc32-pclmul.ko 390 771 381 97.69%
> nf_conntrack.ko 23686 28191 4505 19.02%
> iscsi_tcp.ko 16827 17750 923 5.49%
> nft_fib.ko 835 1117 282 33.77%
> nf_reject_ipv6.ko 699 981 282 40.34%
> rfkill.ko 4233 6410 2177 51.43%
> dm-region-hash.ko 6214 6496 282 4.54%
> cxgb3i.ko 35469 37078 1609 4.54%
> dm-mirror.ko 7576 8191 615 8.12%
> pvpanic-pci.ko 174 574 400 229.89%
> crct10dif-pclmul.ko 146 525 379 259.59%
> nvme-fabrics.ko 17341 18124 783 4.52%
> kvm-amd.ko 47302 51914 4612 9.75%
> crc8.ko 221 405 184 83.26%
> ib_iser.ko 27769 29116 1347 4.85%
> sg.ko 4234 5656 1422 33.59%
> intel_rapl_common.ko 5678 8446 2768 48.75%
> bochs.ko 35643 36997 1354 3.80%
> sha1-ssse3.ko 790 1305 515 65.19%
> kvm-intel.ko 53802 59220 5418 10.07%
> nft_chain_nat.ko 279 714 435 155.91%
> vmlinux 5484970 7330096 1845126 33.64%
> sha256-ssse3.ko 851 1378 527 61.93%
> nf_nat.ko 6341 7240 899 14.18%
> configs.ko 72 256 184 255.56%
> xt_comment.ko 151 507 356 235.76%
> ccp.ko 30433 34782 4349 14.29%
> cxgb3.ko 44981 47504 2523 5.61%
> crypto_simd.ko 1331 1613 282 21.19%
> iptable_filter.ko 855 1456 601 70.29%
> qedi.ko 70653 72786 2133 3.02%
> drm_kms_helper.ko 63238 65000 1762 2.79%
> cnic.ko 117074 117790 716 0.61%
> failover.ko 780 1216 436 55.90%
> nft_redir.ko 874 1529 655 74.94%
> serio_raw.ko 708 1234 526 74.29%
> nf_defrag_ipv6.ko 1520 2253 733 48.22%
> nf_defrag_ipv4.ko 306 770 464 151.63%
> nft_reject_ipv4.ko 517 939 422 81.62%
> nft_nat.ko 1192 1732 540 45.30%
> nft_reject_inet.ko 554 976 422 76.17%
> fuse.ko 32181 41859 9678 30.07%
> nft_compat.ko 3705 4404 699 18.87%
> zstd_compress.ko 42597 43622 1025 2.41%
> tls.ko 15140 20683 5543 36.61%
> virtio_pci.ko 8456 9193 737 8.72%
> blake2b_generic.ko 1364 1699 335 24.56%
> cryptd.ko 3697 4297 600 16.23%
> xor.ko 1358 1879 521 38.37%
> intel_rapl_msr.ko 2851 3440 589 20.66%
> kvm.ko 177060 256377 79317 44.80%
> cxgb4.ko 215865 220844 4979 2.31%
> bnx2i.ko 39524 41477 1953 4.94%
> dm-round-robin.ko 1795 2123 328 18.27%
> virtio_pci_legacy_dev.ko 909 1191 282 31.02%
> qla4xxx.ko 79040 82694 3654 4.62%
> nfs.ko 108350 169642 61292 56.57%
> libata.ko 47301 66188 18887 39.93%
> ghash-clmulni-intel.ko 578 997 419 72.49%
> nf_reject_ipv4.ko 706 988 282 39.94%
> nft_reject.ko 820 1196 376 45.85%
> sunrpc.ko 127496 197841 70345 55.17%
> nft_fib_ipv4.ko 803 1257 454 56.54%
> scsi_transport_iscsi.ko 40419 57633 17214 42.59%
> lockd.ko 36144 42137 5993 16.58%
> drm_shmem_helper.ko 32555 33043 488 1.50%
> nvme-core.ko 50275 58298 8023 15.96%
> iw_cm.ko 13405 14796 1391 10.38%
> mdio.ko 857 1041 184 21.47%
> bnx2.ko 20354 21611 1257 6.18%
> net_failover.ko 1742 2187 445 25.55%
> ip_set.ko 11812 13093 1281 10.84%
> libcxgb.ko 8698 8980 282 3.24%
> dm-multipath.ko 8124 8898 774 9.53%
> grace.ko 462 890 428 92.64%
> virtio_net.ko 12322 14896 2574 20.89%
> qed.ko 228735 232231 3496 1.53%
> cdc-acm.ko 2923 3679 756 25.86%
> i2c-piix4.ko 1124 2341 1217 108.27%
> pvpanic-mmio.ko 177 625 448 253.11%
> virtio_scsi.ko 3154 3898 744 23.59%
> uio.ko 2602 4295 1693 65.07%
> nft_fib_ipv6.ko 956 1410 454 47.49%
> cec.ko 28370 29266 896 3.16%
> qemu_fw_cfg.ko 1601 3476 1875 117.11%
> ttm.ko 23672 25727 2055 8.68%
> sd_mod.ko 9976 13030 3054 30.61%
> xfs.ko 574594 926637 352043 61.27%
> libiscsi_tcp.ko 17444 17911 467 2.68%
> ib_cm.ko 32324 62373 30049 92.96%
> aesni-intel.ko 3370 4922 1552 46.05%
> drm_client_lib.ko 27449 27794 345 1.26%
> virtio_pci_modern_dev.ko 2537 2819 282 11.12%
> rdma_cm.ko 32504 51823 19319 59.44%
> fat.ko 11958 13297 1339 11.20%
> dm-log.ko 6529 6986 457 7.00%
> pata_acpi.ko 9231 9700 469 5.08%
> ata_piix.ko 10998 12598 1600 14.55%
> ipt_REJECT.ko 956 1311 355 37.13%
> drm_ttm_helper.ko 33160 33544 384 1.16%
> be2iscsi.ko 55078 56993 1915 3.48%
> i2c-smbus.ko 582 973 391 67.18%
> cuse.ko 8435 9241 806 9.56%
> nft_fib_inet.ko 579 995 416 71.85%
> ib_core.ko 103656 123701 20045 19.34%
> pulse8-cec.ko 9153 9890 737 8.05%
> pvpanic.ko 494 1087 593 120.04%
> dm-mod.ko 31377 35265 3888 12.39%
> raid6_pq.ko 2774 4207 1433 51.66%
> nft_reject_ipv6.ko 517 939 422 81.62%
> cxgb4i.ko 47490 49021 1531 3.22%
> ata_generic.ko 9008 9666 658 7.30%
> vboxvideo.ko 47622 48844 1222 2.57%
> ip_tables.ko 3109 3564 455 14.63%
>
> ALL MODS 9153268 11895301 2742033 29.96%
> vmlinux 5484970 7330096 1845126 33.64%
> TOTAL 14638238 19225397 4587159 31.34%
>
> So this shows a 1.8 MiB increase in vmlinux size, or 33.6%.
> And for these modules in aggregate, an increase of 2.7 MiB or 30.0%.
>
> > Any outliers there?
> > I would expect modules to have few global variables.
>
> In terms of outliers, there are groups that stand out to me:
>
> 1. Large percentage increases are usually always for modules that had
> very tiny BTF before. The module system inherently creates a few
> global variables for each module, so there's always a slight constant
> increase of the BTF size (184 bytes, as far as I can tell), and in those
> cases it can be a quite large percentage. Here's an example,
> "configs.ko" which comes from the CONFIG_IKCONFIG enablement:
>
> BEFORE:
> $ bpftool btf dump file ../build_pahole_novars/kernel/configs.ko -B ../build_pahole_novars/vmlinux
> [127877] CONST '(anon)' type_id=11124
> [127878] ARRAY '(anon)' type_id=127877 index_type_id=21 nr_elems=1
> [127879] CONST '(anon)' type_id=127878
>
> AFTER:
> $ bpftool btf dump file ../build_pahole_vars/kernel/configs.ko -B ../build_pahole_vars/vmlinux
> [162827] CONST '(anon)' type_id=11124
> [162828] ARRAY '(anon)' type_id=162827 index_type_id=21 nr_elems=1
> [162829] CONST '(anon)' type_id=162828
> [162830] VAR '____versions' type_id=162829, linkage=static
> [162831] DATASEC '__versions' size=64 vlen=1
> type_id=162830 offset=0 size=64 (VAR '____versions')
> [162832] VAR 'orc_header' type_id=8667, linkage=static
> [162833] DATASEC '.orc_header' size=20 vlen=1
> type_id=162832 offset=0 size=20 (VAR 'orc_header')
> [162834] VAR '__this_module' type_id=312, linkage=global
> [162835] DATASEC '.gnu.linkonce.this_module' size=1344 vlen=1
> type_id=162834 offset=0 size=1344 (VAR '__this_module')
>
> What is, I think interesting, is that the types in that module were
> totally useless to begin with, because they were used by a variable
> which didn't even get emitted. So while this is a substantial
> percentage-wise increase, I think it's a net improvement for this and
> other modules.
>
> 2. The largest absolute increases come from large, complex modules like
> xfs, kvm, sunrpc, btrfs, etc. For example, xfs had 5696 VAR
> declarations. What is disappointing is how much of this is due to
> automatically-generated "variables" from macros (e.g. tracepoints):
> Here is a list of variable prefixes like that:
>
> print_fmt_*
> trace_event_fields_*
> trace_event_type_funcs_*
> event_*
> __SCK__tp_func_*
> __bpf_trace_tp_map_*
> __event_*
> event_class_*
> TRACE_SYSTEM_*
> __TRACE_SYSTEM_*
> __tracepoint_*
>
> These are, unfortunately, all valid declarations produced by macros and
> they correspond to valid symbols as well. If you look at the kallsyms
> for the modules (and core kernel), these variables are present there as
> well. It may indeed make sense to have kallsyms entries for them: I
> don't know.
>
> These are all, as far as I'm concerned, totally uninteresting types. If
> you want to access any of this data, you probably already know its type
> and wouldn't need a BTF declaration. Unfortunately, the flip side is
> that I don't think we have a good way to automatically detect these,
> outside of prefix matching, which quickly goes out of date as the kernel
> changes, and can have false positives as well. For kernel modules, many
> of these may appear in separate ELF sections, but for vmlinux, they
> don't. I'd be happy to eliminate types for these auto-generated kinds of
> variables, if we could somehow annotate them so that pahole knows to
> ignore them. For instance, maybe we cauld use
>
> __attribute__((btf_decl_tag("btf_omit")))
>
> as an instruction to pahole to omit declarations for these things?
>
All such tracepoint-related variables, can't we just put them into
some separate ELF section, and teach pahole to ignore global variables
from that section? btf_decl_tag is a similar idea, but (currently)
won't work for GCC-built kernels. So I'd go with the ELF section.
> Thanks,
> Stephen
>
> > So before we decide on what to do with vars in mods lets figure out
> > the need.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] btf: Add the option to include global variable types
2025-02-25 10:01 ` Alan Maguire
@ 2025-02-25 21:52 ` Andrii Nakryiko
2025-02-26 14:20 ` Alan Maguire
2025-05-12 11:15 ` Tony Ambardar
1 sibling, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2025-02-25 21:52 UTC (permalink / raw)
To: Alan Maguire
Cc: Alexei Starovoitov, Stephen Brennan, Masahiro Yamada,
Andrii Nakryiko, Nicolas Schier, Kees Cook, KP Singh,
Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman, linux-arch,
Stanislav Fomichev, Kent Overstreet, Pasha Tatashin, Jiri Olsa,
John Fastabend, Jann Horn, Ard Biesheuvel, Yonghong Song, Hao Luo,
Andrew Morton, Linux Kbuild mailing list, Daniel Borkmann,
Arnd Bergmann, Nathan Chancellor, linux-debuggers,
Alexei Starovoitov, Song Liu, LKML, bpf
On Tue, Feb 25, 2025 at 2:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 07/02/2025 23:50, Alexei Starovoitov wrote:
> > On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> >> When the feature was implemented in pahole, my measurements indicated
> >> that vmlinux BTF size increased by about 25.8%, and module BTF size
> >> increased by 53.2%. Due to these increases, the feature is implemented
> >> behind a new config option, allowing users sensitive to increased memory
> >> usage to disable it.
> >>
> >
> > ...
> >> +config DEBUG_INFO_BTF_GLOBAL_VARS
> >> + bool "Generate BTF type information for all global variables"
> >> + default y
> >> + depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
> >> + help
> >> + Include type information for all global variables in the BTF. This
> >> + increases the size of the BTF information, which increases memory
> >> + usage at runtime. With global variable types available, runtime
> >> + debugging and tracers may be able to provide more detail.
> >
> > This is not a solution.
> > Even if it's changed to 'default n' distros will enable it
> > like they enable everything and will suffer a regression.
> >
> > We need to add a new module like vmlinux_btf.ko that will contain
> > this additional BTF data. For global vars and everything else we might need.
> >
>
> In this area, I've been exploring adding support for
> CONFIG_DEBUG_INFO_BTF=m , so that the BTF info for vmlinux is delivered
> via a module. From the consumer side, everything looks identical
> (/sys/kernel/btf/vmlinux is there etc), it is just that the .BTF section
> is delivered via btf_vmlinux.ko instead. The original need for this was
> that embedded folks noted that because in the current situation BTF data
> is in vmlinux, they cannot enable BTF because such small-footprint
> systems do not support a large vmlinux binary. However they could
> potentially use kernel BTF if it was delivered via a module. The other
> nice thing about module delivery in the general case is we can make use
> of module compression. In experiments I see a 5.8Mb vmlinux BTF reduce
> to a 1.8Mb btf_vmlinux.ko.gz module on-disk.
>
> The challenge in delivering vmlinux BTF in a module is that on module
> load during boot other modules expect vmlinux BTF to be there when
> adding their own BTF to /sys/kernel/btf. And kfunc registration from
> kernel and modules expects this also. So support for deferred BTF module
> load/kfunc registration is required too. I've implemented the former and
> now am working on the latter. Hope to have some RFC patches ready soon,
> but it looks feasible at this point.
Lazy btf_vmlinux.ko loading when BTF is actually needed (i.e., when
user reads /sys/kernel/btf/vmlinux for the first time; or when BPF
program is validated and needs kernel BTF) would be great. Curious too
see how all that fits together!
>
> Assuming such an option was available to small-footprint systems, should
> we consider adding global variables to core vmlinux BTF along with
> per-cpu variables? Then vmlinux BTF extras could be used for some of the
> additional optional representations like function site-specific data
> (inlines etc)? Or are there other factors other than on-disk footprint
> that we need to consider? Thanks!
I'd keep BTF for variables separate from "core" vmlinux BTF. We can
have /sys/kernel/btf/vmlinux.vars, which would depend on
/sys/kernel/btf/vmlinux as a base BTF. Separately, we could eventually
have /sys/kernel/btf/vmlinux.inlines which would also have
/sys/kernel/btf/vmlinux as base BTF. If no one needs vmlinux.vars on
the system, we won't need to waste memory on it. Seems more modular
and extensible.
>
> Alan
>
> > pw-bot: cr
> >
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] btf: Add the option to include global variable types
2025-02-25 21:52 ` Andrii Nakryiko
@ 2025-02-26 14:20 ` Alan Maguire
2025-02-26 16:57 ` Andrii Nakryiko
0 siblings, 1 reply; 17+ messages in thread
From: Alan Maguire @ 2025-02-26 14:20 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Alexei Starovoitov, Stephen Brennan, Masahiro Yamada,
Andrii Nakryiko, Nicolas Schier, Kees Cook, KP Singh,
Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman, linux-arch,
Stanislav Fomichev, Kent Overstreet, Pasha Tatashin, Jiri Olsa,
John Fastabend, Jann Horn, Ard Biesheuvel, Yonghong Song, Hao Luo,
Andrew Morton, Linux Kbuild mailing list, Daniel Borkmann,
Arnd Bergmann, Nathan Chancellor, linux-debuggers,
Alexei Starovoitov, Song Liu, LKML, bpf
On 25/02/2025 21:52, Andrii Nakryiko wrote:
> On Tue, Feb 25, 2025 at 2:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> On 07/02/2025 23:50, Alexei Starovoitov wrote:
>>> On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
>>> <stephen.s.brennan@oracle.com> wrote:
>>>> When the feature was implemented in pahole, my measurements indicated
>>>> that vmlinux BTF size increased by about 25.8%, and module BTF size
>>>> increased by 53.2%. Due to these increases, the feature is implemented
>>>> behind a new config option, allowing users sensitive to increased memory
>>>> usage to disable it.
>>>>
>>>
>>> ...
>>>> +config DEBUG_INFO_BTF_GLOBAL_VARS
>>>> + bool "Generate BTF type information for all global variables"
>>>> + default y
>>>> + depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
>>>> + help
>>>> + Include type information for all global variables in the BTF. This
>>>> + increases the size of the BTF information, which increases memory
>>>> + usage at runtime. With global variable types available, runtime
>>>> + debugging and tracers may be able to provide more detail.
>>>
>>> This is not a solution.
>>> Even if it's changed to 'default n' distros will enable it
>>> like they enable everything and will suffer a regression.
>>>
>>> We need to add a new module like vmlinux_btf.ko that will contain
>>> this additional BTF data. For global vars and everything else we might need.
>>>
>>
>> In this area, I've been exploring adding support for
>> CONFIG_DEBUG_INFO_BTF=m , so that the BTF info for vmlinux is delivered
>> via a module. From the consumer side, everything looks identical
>> (/sys/kernel/btf/vmlinux is there etc), it is just that the .BTF section
>> is delivered via btf_vmlinux.ko instead. The original need for this was
>> that embedded folks noted that because in the current situation BTF data
>> is in vmlinux, they cannot enable BTF because such small-footprint
>> systems do not support a large vmlinux binary. However they could
>> potentially use kernel BTF if it was delivered via a module. The other
>> nice thing about module delivery in the general case is we can make use
>> of module compression. In experiments I see a 5.8Mb vmlinux BTF reduce
>> to a 1.8Mb btf_vmlinux.ko.gz module on-disk.
>>
>> The challenge in delivering vmlinux BTF in a module is that on module
>> load during boot other modules expect vmlinux BTF to be there when
>> adding their own BTF to /sys/kernel/btf. And kfunc registration from
>> kernel and modules expects this also. So support for deferred BTF module
>> load/kfunc registration is required too. I've implemented the former and
>> now am working on the latter. Hope to have some RFC patches ready soon,
>> but it looks feasible at this point.
>
> Lazy btf_vmlinux.ko loading when BTF is actually needed (i.e., when
> user reads /sys/kernel/btf/vmlinux for the first time; or when BPF
> program is validated and needs kernel BTF) would be great. Curious too
> see how all that fits together!
>
>>
>> Assuming such an option was available to small-footprint systems, should
>> we consider adding global variables to core vmlinux BTF along with
>> per-cpu variables? Then vmlinux BTF extras could be used for some of the
>> additional optional representations like function site-specific data
>> (inlines etc)? Or are there other factors other than on-disk footprint
>> that we need to consider? Thanks!
>
> I'd keep BTF for variables separate from "core" vmlinux BTF. We can
> have /sys/kernel/btf/vmlinux.vars, which would depend on
> /sys/kernel/btf/vmlinux as a base BTF. Separately, we could eventually
> have /sys/kernel/btf/vmlinux.inlines which would also have
> /sys/kernel/btf/vmlinux as base BTF. If no one needs vmlinux.vars on
> the system, we won't need to waste memory on it. Seems more modular
> and extensible.
>
Sounds good. So thinking about how this fits with
CONFIG_DEBUG_INFO_BTF=m, perhaps the approach would be to use
btf_vmlinux.ko for all such extensible /sys/kernel/btf/vmlinux.vars,
vmlinux.inlines etc. Each of these is derived from .BTF.vars ,
.BTF.inlines sections in btf_vmlinux.ko. These are optionally included
via CONFIG_DEBUG_INFO_BTF_EXTRAS list. If CONFIG_DEBUG_INFO_BTF=y the
core vmlinux section stays in vmlinux itself and the extras are
delivered via btf_vmlinux.ko, but if CONFIG_DEBUG_INFO_BTF=m, the
vmlinux .BTF section is delivered in btf_vmlinux.ko too.
If this makes sense, I'll try and put together the
CONFIG_DEBUG_INFO_BTF=m support first, and that will give us a
btf_vmlinux.ko to work with for delivery of extras. Thanks!
Alan
>>
>> Alan
>>
>>> pw-bot: cr
>>>
>>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] btf: Add the option to include global variable types
2025-02-26 14:20 ` Alan Maguire
@ 2025-02-26 16:57 ` Andrii Nakryiko
0 siblings, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2025-02-26 16:57 UTC (permalink / raw)
To: Alan Maguire
Cc: Alexei Starovoitov, Stephen Brennan, Masahiro Yamada,
Andrii Nakryiko, Nicolas Schier, Kees Cook, KP Singh,
Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman, linux-arch,
Stanislav Fomichev, Kent Overstreet, Pasha Tatashin, Jiri Olsa,
John Fastabend, Jann Horn, Ard Biesheuvel, Yonghong Song, Hao Luo,
Andrew Morton, Linux Kbuild mailing list, Daniel Borkmann,
Arnd Bergmann, Nathan Chancellor, linux-debuggers,
Alexei Starovoitov, Song Liu, LKML, bpf
On Wed, Feb 26, 2025 at 6:20 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 25/02/2025 21:52, Andrii Nakryiko wrote:
> > On Tue, Feb 25, 2025 at 2:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> On 07/02/2025 23:50, Alexei Starovoitov wrote:
> >>> On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> >>> <stephen.s.brennan@oracle.com> wrote:
> >>>> When the feature was implemented in pahole, my measurements indicated
> >>>> that vmlinux BTF size increased by about 25.8%, and module BTF size
> >>>> increased by 53.2%. Due to these increases, the feature is implemented
> >>>> behind a new config option, allowing users sensitive to increased memory
> >>>> usage to disable it.
> >>>>
> >>>
> >>> ...
> >>>> +config DEBUG_INFO_BTF_GLOBAL_VARS
> >>>> + bool "Generate BTF type information for all global variables"
> >>>> + default y
> >>>> + depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
> >>>> + help
> >>>> + Include type information for all global variables in the BTF. This
> >>>> + increases the size of the BTF information, which increases memory
> >>>> + usage at runtime. With global variable types available, runtime
> >>>> + debugging and tracers may be able to provide more detail.
> >>>
> >>> This is not a solution.
> >>> Even if it's changed to 'default n' distros will enable it
> >>> like they enable everything and will suffer a regression.
> >>>
> >>> We need to add a new module like vmlinux_btf.ko that will contain
> >>> this additional BTF data. For global vars and everything else we might need.
> >>>
> >>
> >> In this area, I've been exploring adding support for
> >> CONFIG_DEBUG_INFO_BTF=m , so that the BTF info for vmlinux is delivered
> >> via a module. From the consumer side, everything looks identical
> >> (/sys/kernel/btf/vmlinux is there etc), it is just that the .BTF section
> >> is delivered via btf_vmlinux.ko instead. The original need for this was
> >> that embedded folks noted that because in the current situation BTF data
> >> is in vmlinux, they cannot enable BTF because such small-footprint
> >> systems do not support a large vmlinux binary. However they could
> >> potentially use kernel BTF if it was delivered via a module. The other
> >> nice thing about module delivery in the general case is we can make use
> >> of module compression. In experiments I see a 5.8Mb vmlinux BTF reduce
> >> to a 1.8Mb btf_vmlinux.ko.gz module on-disk.
> >>
> >> The challenge in delivering vmlinux BTF in a module is that on module
> >> load during boot other modules expect vmlinux BTF to be there when
> >> adding their own BTF to /sys/kernel/btf. And kfunc registration from
> >> kernel and modules expects this also. So support for deferred BTF module
> >> load/kfunc registration is required too. I've implemented the former and
> >> now am working on the latter. Hope to have some RFC patches ready soon,
> >> but it looks feasible at this point.
> >
> > Lazy btf_vmlinux.ko loading when BTF is actually needed (i.e., when
> > user reads /sys/kernel/btf/vmlinux for the first time; or when BPF
> > program is validated and needs kernel BTF) would be great. Curious too
> > see how all that fits together!
> >
> >>
> >> Assuming such an option was available to small-footprint systems, should
> >> we consider adding global variables to core vmlinux BTF along with
> >> per-cpu variables? Then vmlinux BTF extras could be used for some of the
> >> additional optional representations like function site-specific data
> >> (inlines etc)? Or are there other factors other than on-disk footprint
> >> that we need to consider? Thanks!
> >
> > I'd keep BTF for variables separate from "core" vmlinux BTF. We can
> > have /sys/kernel/btf/vmlinux.vars, which would depend on
> > /sys/kernel/btf/vmlinux as a base BTF. Separately, we could eventually
> > have /sys/kernel/btf/vmlinux.inlines which would also have
> > /sys/kernel/btf/vmlinux as base BTF. If no one needs vmlinux.vars on
> > the system, we won't need to waste memory on it. Seems more modular
> > and extensible.
> >
>
> Sounds good. So thinking about how this fits with
> CONFIG_DEBUG_INFO_BTF=m, perhaps the approach would be to use
> btf_vmlinux.ko for all such extensible /sys/kernel/btf/vmlinux.vars,
> vmlinux.inlines etc. Each of these is derived from .BTF.vars ,
> .BTF.inlines sections in btf_vmlinux.ko. These are optionally included
> via CONFIG_DEBUG_INFO_BTF_EXTRAS list. If CONFIG_DEBUG_INFO_BTF=y the
> core vmlinux section stays in vmlinux itself and the extras are
> delivered via btf_vmlinux.ko, but if CONFIG_DEBUG_INFO_BTF=m, the
> vmlinux .BTF section is delivered in btf_vmlinux.ko too.
>
> If this makes sense, I'll try and put together the
> CONFIG_DEBUG_INFO_BTF=m support first, and that will give us a
> btf_vmlinux.ko to work with for delivery of extras. Thanks!
I'd keep our options open as to whether btf_vmlinux.ko contains all
vmlinux BTFs (core BTF, inlines, variables) or we have a separate
module for some subsets. E.g., variables, while a useful thing,
probably won't be used all that frequently (i.e., only while debugging
with drgn), so co-locating it with vmlinux BTF itself might be a waste
in most cases.
But other than that makes sense.
>
> Alan
>
> >>
> >> Alan
> >>
> >>> pw-bot: cr
> >>>
> >>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] btf: Add the option to include global variable types
2025-02-25 10:01 ` Alan Maguire
2025-02-25 21:52 ` Andrii Nakryiko
@ 2025-05-12 11:15 ` Tony Ambardar
1 sibling, 0 replies; 17+ messages in thread
From: Tony Ambardar @ 2025-05-12 11:15 UTC (permalink / raw)
To: Alan Maguire
Cc: Alexei Starovoitov, Stephen Brennan, Masahiro Yamada,
Andrii Nakryiko, Nicolas Schier, Kees Cook, KP Singh,
Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman, linux-arch,
Stanislav Fomichev, Kent Overstreet, Pasha Tatashin, Jiri Olsa,
John Fastabend, Jann Horn, Ard Biesheuvel, Yonghong Song, Hao Luo,
Andrew Morton, Linux Kbuild mailing list, Daniel Borkmann,
Arnd Bergmann, Nathan Chancellor, linux-debuggers,
Alexei Starovoitov, Song Liu, LKML, bpf
On Tue, Feb 25, 2025 at 10:01:27AM +0000, Alan Maguire wrote:
> On 07/02/2025 23:50, Alexei Starovoitov wrote:
> > On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> >> When the feature was implemented in pahole, my measurements indicated
> >> that vmlinux BTF size increased by about 25.8%, and module BTF size
> >> increased by 53.2%. Due to these increases, the feature is implemented
> >> behind a new config option, allowing users sensitive to increased memory
> >> usage to disable it.
> >>
> >
> > ...
> >> +config DEBUG_INFO_BTF_GLOBAL_VARS
> >> + bool "Generate BTF type information for all global variables"
> >> + default y
> >> + depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
> >> + help
> >> + Include type information for all global variables in the BTF. This
> >> + increases the size of the BTF information, which increases memory
> >> + usage at runtime. With global variable types available, runtime
> >> + debugging and tracers may be able to provide more detail.
> >
> > This is not a solution.
> > Even if it's changed to 'default n' distros will enable it
> > like they enable everything and will suffer a regression.
> >
> > We need to add a new module like vmlinux_btf.ko that will contain
> > this additional BTF data. For global vars and everything else we might need.
> >
>
Hi Alan,
> In this area, I've been exploring adding support for
> CONFIG_DEBUG_INFO_BTF=m , so that the BTF info for vmlinux is delivered
> via a module. From the consumer side, everything looks identical
> (/sys/kernel/btf/vmlinux is there etc), it is just that the .BTF section
> is delivered via btf_vmlinux.ko instead. The original need for this was
> that embedded folks noted that because in the current situation BTF data
> is in vmlinux, they cannot enable BTF because such small-footprint
> systems do not support a large vmlinux binary. However they could
> potentially use kernel BTF if it was delivered via a module. The other
> nice thing about module delivery in the general case is we can make use
> of module compression. In experiments I see a 5.8Mb vmlinux BTF reduce
> to a 1.8Mb btf_vmlinux.ko.gz module on-disk.
>
Thank you very much for working on this. I was keen to see this since you
first mentioned it a few years back [1], and have been meaning to ping
you on where things stand. Your summary of motivations above is spot on,
and I can add some context w.r.t. OpenWrt, often used on small consumer
Linux routers to: improve security after support ends, expand
functionality, and increase lifetime/reduce e-waste.
This lifetime is already constrained by the limited kernel binary storage
of some devices and ever increasing kernel sizes. The biggest mitigation
is heavy use of loadable modules to avoid using kernel storage and also
reduce the kernel BTF. Even so, the (compressed) kernel BTF is ~400 KB,
and over the years I've seen kernel sizes grow by ~200 KB per annual LTS
release.
These rates can amount to penalizing BTF usage with _two years of reduced
lifetime_, which is a key obstacle to enabling BTF by default on such
small systems IMO. Having a module-based kernel BTF would be a huge
improvement!
> The challenge in delivering vmlinux BTF in a module is that on module
> load during boot other modules expect vmlinux BTF to be there when
> adding their own BTF to /sys/kernel/btf. And kfunc registration from
> kernel and modules expects this also. So support for deferred BTF module
> load/kfunc registration is required too. I've implemented the former and
> now am working on the latter. Hope to have some RFC patches ready soon,
> but it looks feasible at this point.
>
That sounds great. I'm looking forward to seeing and trying this out. If
there's anything you can share at this time please let me know.
Thanks,
Tony
1: https://lore.kernel.org/bpf/43fd3775-e796-6802-17f0-5c9fdbf368f5@oracle.com/
> Assuming such an option was available to small-footprint systems, should
> we consider adding global variables to core vmlinux BTF along with
> per-cpu variables? Then vmlinux BTF extras could be used for some of the
> additional optional representations like function site-specific data
> (inlines etc)? Or are there other factors other than on-disk footprint
> that we need to consider? Thanks!
>
> Alan
>
> > pw-bot: cr
> >
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-05-12 11:15 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07 1:20 [PATCH 0/2] Add option for generating BTF types of global variables Stephen Brennan
2025-02-07 1:20 ` [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata" Stephen Brennan
2025-02-15 14:21 ` Masahiro Yamada
2025-02-24 18:51 ` Andrii Nakryiko
2025-02-25 1:24 ` Stephen Brennan
2025-02-25 16:59 ` Andrii Nakryiko
2025-02-07 1:20 ` [PATCH 2/2] btf: Add the option to include global variable types Stephen Brennan
2025-02-07 23:50 ` Alexei Starovoitov
2025-02-11 23:58 ` Stephen Brennan
2025-02-14 1:18 ` Alexei Starovoitov
2025-02-18 23:09 ` Stephen Brennan
2025-02-25 21:47 ` Andrii Nakryiko
2025-02-25 10:01 ` Alan Maguire
2025-02-25 21:52 ` Andrii Nakryiko
2025-02-26 14:20 ` Alan Maguire
2025-02-26 16:57 ` Andrii Nakryiko
2025-05-12 11:15 ` Tony Ambardar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).