linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Add option for generating BTF types of global variables
@ 2025-02-07  1:20 Stephen Brennan
  2025-02-07  1:20 ` [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata" Stephen Brennan
  2025-02-07  1:20 ` [PATCH 2/2] btf: Add the option to include global variable types Stephen Brennan
  0 siblings, 2 replies; 17+ messages in thread
From: Stephen Brennan @ 2025-02-07  1:20 UTC (permalink / raw)
  To: Masahiro Yamada, Arnd Bergmann
  Cc: Andrii Nakryiko, Nicolas Schier, Kees Cook, KP Singh,
	Stephen Brennan, Martin KaFai Lau, Sami Tolvanen,
	Eduard Zingerman, linux-arch, Stanislav Fomichev, Kent Overstreet,
	Pasha Tatashin, Jiri Olsa, John Fastabend, Jann Horn,
	Ard Biesheuvel, Yonghong Song, Hao Luo, Andrew Morton,
	linux-kbuild, Daniel Borkmann, Nathan Chancellor, linux-debuggers,
	Alexei Starovoitov, Song Liu, linux-kernel, bpf

Hello everyone,

These patches add the CONFIG_DEBUG_INFO_BTF_GLOBAL_VARS option, which instructs
pahole to include types of global variables. Pahole >= 1.28 is required. More
context for what this feature enables can be seen in patch 2, as well as the
series which introduced this feature to pahole [1].

To demonstrate the functionality, my "btf_2024" branch of drgn (the current
development branch for the BTF debugging feature, despite the name) can be used
as below to debug a running kernel with these patches enabled.

    git clone https://github.com/brenns10/drgn -b btf_2024
    cd drgn
    python setup.py build_ext -i
    sudo python -m drgn --no-default-symbols --btf -k

The "--no-default-symbols" ensures that drgn doesn't accidentially find & use
your DWARF debuginfo :)

The resulting debugging session supports a similar level of capability as drgn
with DWARF debuginfo: variable & function types are available, stack traces may
be unwound (using ORC), and the kallsyms symbol table is available. You can also
try various drgn "contrib" scripts which implement useful utilities. All of the
ones I could readily test are working with BTF, for example:

    sudo python -m drgn --no-default-symbols --btf -k contrib/slabinfo.py

[1] https://lore.kernel.org/all/20241002235253.487251-1-stephen.s.brennan@oracle.com/#t

Stephen Brennan (2):
  kallsyms: output rodata to ".kallsyms_rodata"
  btf: Add the option to include global variable types

 include/asm-generic/vmlinux.lds.h |  1 +
 lib/Kconfig.debug                 | 10 ++++++++++
 scripts/Makefile.btf              |  3 +++
 scripts/kallsyms.c                |  2 +-
 4 files changed, 15 insertions(+), 1 deletion(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata"
  2025-02-07  1:20 [PATCH 0/2] Add option for generating BTF types of global variables Stephen Brennan
@ 2025-02-07  1:20 ` Stephen Brennan
  2025-02-15 14:21   ` Masahiro Yamada
  2025-02-07  1:20 ` [PATCH 2/2] btf: Add the option to include global variable types Stephen Brennan
  1 sibling, 1 reply; 17+ messages in thread
From: Stephen Brennan @ 2025-02-07  1:20 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Masahiro Yamada, Andrii Nakryiko, Nicolas Schier, Kees Cook,
	KP Singh, Stephen Brennan, Martin KaFai Lau, Sami Tolvanen,
	Eduard Zingerman, linux-arch, Stanislav Fomichev, Kent Overstreet,
	Pasha Tatashin, Jiri Olsa, John Fastabend, Jann Horn,
	Ard Biesheuvel, Yonghong Song, Hao Luo, Andrew Morton,
	linux-kbuild, Daniel Borkmann, Nathan Chancellor, linux-debuggers,
	Alexei Starovoitov, Song Liu, linux-kernel, bpf

When vmlinux is linked, the rodata from kallsyms is placed arbitrarily
within the .rodata section. The linking process is repeated several
times, since the kallsyms data size changes, which shifts symbols,
requiring re-generating the data and re-linking.

BTF is generated during the first link only. For variables, BTF includes
a BTF_K_DATASEC for each data section that may contain a variable, which
includes the variable's name, type, and offset within the data section.
Because the size of kallsyms data changes during later links, the
offsets of variables placed after it in .rodata will change. This means
that BTF_K_DATASEC information for those variables becomes inaccurate.

This is not currently a problem, because BTF currently only generates
variable data for percpu variables. However, the next commit will add
support for generating BTF for all global variables, including for the
.rodata section.

We could re-generate BTF each time vmlinux is linked, but this is quite
expensive, and should be avoided at all costs. Further as each chunk of
data (BTF and kallsyms) are re-generated, there's no guarantee that
their sizes will converge anyway.

Instead, we can take advantage of the fact that BTF only cares to store
the offset of variables from the start of their section. Therefore, so
long as the kallsyms data is stored last in the .rodata section, no
offsets will be affected. Adjust kallsyms to output to .rodata.kallsyms,
and update the linker script to include this at the end of .rodata.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 include/asm-generic/vmlinux.lds.h | 1 +
 scripts/kallsyms.c                | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 54504013c7491..9284f0e502e27 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -463,6 +463,7 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
 		. = ALIGN(8);						\
 		BOUNDED_SECTION_BY(__tracepoints_ptrs, ___tracepoints_ptrs) \
 		*(__tracepoints_strings)/* Tracepoints: strings */	\
+		*(.kallsyms_rodata)					\
 	}								\
 									\
 	.rodata1          : AT(ADDR(.rodata1) - LOAD_OFFSET) {		\
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 03852da3d2490..743d3dd453599 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -365,7 +365,7 @@ static void write_src(void)
 	printf("#define ALGN .balign 4\n");
 	printf("#endif\n");
 
-	printf("\t.section .rodata, \"a\"\n");
+	printf("\t.section .kallsyms_rodata, \"a\"\n");
 
 	output_label("kallsyms_num_syms");
 	printf("\t.long\t%u\n", table_cnt);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] btf: Add the option to include global variable types
  2025-02-07  1:20 [PATCH 0/2] Add option for generating BTF types of global variables Stephen Brennan
  2025-02-07  1:20 ` [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata" Stephen Brennan
@ 2025-02-07  1:20 ` Stephen Brennan
  2025-02-07 23:50   ` Alexei Starovoitov
  1 sibling, 1 reply; 17+ messages in thread
From: Stephen Brennan @ 2025-02-07  1:20 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Andrii Nakryiko, Nicolas Schier, Kees Cook, KP Singh,
	Stephen Brennan, Martin KaFai Lau, Sami Tolvanen,
	Eduard Zingerman, linux-arch, Stanislav Fomichev, Kent Overstreet,
	Pasha Tatashin, Jiri Olsa, John Fastabend, Jann Horn,
	Ard Biesheuvel, Yonghong Song, Hao Luo, Andrew Morton,
	linux-kbuild, Daniel Borkmann, Arnd Bergmann, Nathan Chancellor,
	linux-debuggers, Alexei Starovoitov, Song Liu, linux-kernel, bpf

Since pahole 1.28, BTF can now include types for all global variables.
Previously, BTF has only included types for functions, as well as percpu
variables.

There are a few applications for this type information. For one, runtime
debuggers like drgn[1] can consume it in the absence of DWARF debuginfo.
The support in drgn is currently implemented and moving through the
review process, see [2]. For distributions which don't distribute DWARF
debuginfo, or for situations where it can't be made available, the
compact BTF, combined with ORC for stack unwinding, and the kallsyms
symbol table, can be used for simple runtime debugging and
introspection.

Another application is verifying types of ksyms in BPF programs. libbpf
already supports resolving global variables with "__ksym", but they must
be declared as void. For example, in
tools/bpf/bpftool/skeleton/pid_iter.bpf.c we have:

    extern const void bpf_map_fops __ksym;

With global variable information, declarations like these would be able
to use the actual variable types, for example:

    extern const struct file_operations bpf_map_fops __ksym;

When the feature was implemented in pahole, my measurements indicated
that vmlinux BTF size increased by about 25.8%, and module BTF size
increased by 53.2%. Due to these increases, the feature is implemented
behind a new config option, allowing users sensitive to increased memory
usage to disable it.

[1]: https://github.com/osandov/drgn
[2]: https://github.com/osandov/drgn/issues/176

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 lib/Kconfig.debug    | 10 ++++++++++
 scripts/Makefile.btf |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1af972a92d06f..3fbdc5ba2d017 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -409,6 +409,16 @@ config PAHOLE_HAS_LANG_EXCLUDE
 	  otherwise it would emit malformed kernel and module binaries when
 	  using DEBUG_INFO_BTF_MODULES.
 
+config DEBUG_INFO_BTF_GLOBAL_VARS
+	bool "Generate BTF type information for all global variables"
+	default y
+	depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
+	help
+	  Include type information for all global variables in the BTF. This
+	  increases the size of the BTF information, which increases memory
+	  usage at runtime. With global variable types available, runtime
+	  debugging and tracers may be able to provide more detail.
+
 config DEBUG_INFO_BTF_MODULES
 	bool "Generate BTF type information for kernel modules"
 	default y
diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
index c3cbeb13de503..ad3c05a96a010 100644
--- a/scripts/Makefile.btf
+++ b/scripts/Makefile.btf
@@ -31,5 +31,8 @@ endif
 
 pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)		+= --lang_exclude=rust
 
+# Requires v1.28 or later, enforced by KConfig
+pahole-flags-$(CONFIG_DEBUG_INFO_BTF_GLOBAL_VARS)	+= --btf_features=global_var
+
 export PAHOLE_FLAGS := $(pahole-flags-y)
 export MODULE_PAHOLE_FLAGS := $(module-pahole-flags-y)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] btf: Add the option to include global variable types
  2025-02-07  1:20 ` [PATCH 2/2] btf: Add the option to include global variable types Stephen Brennan
@ 2025-02-07 23:50   ` Alexei Starovoitov
  2025-02-11 23:58     ` Stephen Brennan
  2025-02-25 10:01     ` Alan Maguire
  0 siblings, 2 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2025-02-07 23:50 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Masahiro Yamada, Andrii Nakryiko, Nicolas Schier, Kees Cook,
	KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
	linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
	Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
	Yonghong Song, Hao Luo, Andrew Morton, Linux Kbuild mailing list,
	Daniel Borkmann, Arnd Bergmann, Nathan Chancellor,
	linux-debuggers, Alexei Starovoitov, Song Liu, LKML, bpf

On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
> When the feature was implemented in pahole, my measurements indicated
> that vmlinux BTF size increased by about 25.8%, and module BTF size
> increased by 53.2%. Due to these increases, the feature is implemented
> behind a new config option, allowing users sensitive to increased memory
> usage to disable it.
>

...
> +config DEBUG_INFO_BTF_GLOBAL_VARS
> +       bool "Generate BTF type information for all global variables"
> +       default y
> +       depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
> +       help
> +         Include type information for all global variables in the BTF. This
> +         increases the size of the BTF information, which increases memory
> +         usage at runtime. With global variable types available, runtime
> +         debugging and tracers may be able to provide more detail.

This is not a solution.
Even if it's changed to 'default n' distros will enable it
like they enable everything and will suffer a regression.

We need to add a new module like vmlinux_btf.ko that will contain
this additional BTF data. For global vars and everything else we might need.

pw-bot: cr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] btf: Add the option to include global variable types
  2025-02-07 23:50   ` Alexei Starovoitov
@ 2025-02-11 23:58     ` Stephen Brennan
  2025-02-14  1:18       ` Alexei Starovoitov
  2025-02-25 10:01     ` Alan Maguire
  1 sibling, 1 reply; 17+ messages in thread
From: Stephen Brennan @ 2025-02-11 23:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Masahiro Yamada, Andrii Nakryiko, Nicolas Schier, Kees Cook,
	KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
	linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
	Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
	Yonghong Song, Hao Luo, Andrew Morton, Linux Kbuild mailing list,
	Daniel Borkmann, Arnd Bergmann, Nathan Chancellor,
	linux-debuggers, Alexei Starovoitov, Song Liu, LKML, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>> When the feature was implemented in pahole, my measurements indicated
>> that vmlinux BTF size increased by about 25.8%, and module BTF size
>> increased by 53.2%. Due to these increases, the feature is implemented
>> behind a new config option, allowing users sensitive to increased memory
>> usage to disable it.
>>
>
> ...
>> +config DEBUG_INFO_BTF_GLOBAL_VARS
>> +       bool "Generate BTF type information for all global variables"
>> +       default y
>> +       depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
>> +       help
>> +         Include type information for all global variables in the BTF. This
>> +         increases the size of the BTF information, which increases memory
>> +         usage at runtime. With global variable types available, runtime
>> +         debugging and tracers may be able to provide more detail.
>
> This is not a solution.
> Even if it's changed to 'default n' distros will enable it
> like they enable everything and will suffer a regression.
>
> We need to add a new module like vmlinux_btf.ko that will contain
> this additional BTF data. For global vars and everything else we might need.

Fair enough. I believe I had shared Alan Maguire's proof-of-concept for
that idea a while back for an older version of this feature:

https://lore.kernel.org/all/20221104231103.752040-10-stephen.s.brennan@oracle.com/

We can dust that off and include it for a new version of this series.
I'd be curious of what you'd like to see for kernel modules? A
three-level tree would be too complex, in my opinion.

As a separate note for this patch series, we discovered that variables
declared twice, where one is declared "__weak", will result in two DWARF
variable declarations, and thus two BTF variables. This trips up the BTF
validation code. So this series as it is cannot move forward. I'm
submitting a fix to dwarves today.

Thanks,
Stephen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] btf: Add the option to include global variable types
  2025-02-11 23:58     ` Stephen Brennan
@ 2025-02-14  1:18       ` Alexei Starovoitov
  2025-02-18 23:09         ` Stephen Brennan
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2025-02-14  1:18 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Masahiro Yamada, Andrii Nakryiko, Nicolas Schier, Kees Cook,
	KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
	linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
	Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
	Yonghong Song, Hao Luo, Andrew Morton, Linux Kbuild mailing list,
	Daniel Borkmann, Arnd Bergmann, Nathan Chancellor,
	linux-debuggers, Alexei Starovoitov, Song Liu, LKML, bpf

On Tue, Feb 11, 2025 at 3:59 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> > On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> >> When the feature was implemented in pahole, my measurements indicated
> >> that vmlinux BTF size increased by about 25.8%, and module BTF size
> >> increased by 53.2%. Due to these increases, the feature is implemented
> >> behind a new config option, allowing users sensitive to increased memory
> >> usage to disable it.
> >>
> >
> > ...
> >> +config DEBUG_INFO_BTF_GLOBAL_VARS
> >> +       bool "Generate BTF type information for all global variables"
> >> +       default y
> >> +       depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
> >> +       help
> >> +         Include type information for all global variables in the BTF. This
> >> +         increases the size of the BTF information, which increases memory
> >> +         usage at runtime. With global variable types available, runtime
> >> +         debugging and tracers may be able to provide more detail.
> >
> > This is not a solution.
> > Even if it's changed to 'default n' distros will enable it
> > like they enable everything and will suffer a regression.
> >
> > We need to add a new module like vmlinux_btf.ko that will contain
> > this additional BTF data. For global vars and everything else we might need.
>
> Fair enough. I believe I had shared Alan Maguire's proof-of-concept for
> that idea a while back for an older version of this feature:
>
> https://lore.kernel.org/all/20221104231103.752040-10-stephen.s.brennan@oracle.com/

Right vmlinux_extra was discussed in various context, so let's make it happen.

> We can dust that off and include it for a new version of this series.
> I'd be curious of what you'd like to see for kernel modules? A
> three-level tree would be too complex, in my opinion.

What is the use case for vars in kernel modules?

> module BTF size increased by 53.2%.

This is the sum of all mods with vars divided by
the sum of all mods without?
Any outliers there?
I would expect modules to have few global variables.

So before we decide on what to do with vars in mods lets figure out
the need.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata"
  2025-02-07  1:20 ` [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata" Stephen Brennan
@ 2025-02-15 14:21   ` Masahiro Yamada
  2025-02-24 18:51     ` Andrii Nakryiko
  0 siblings, 1 reply; 17+ messages in thread
From: Masahiro Yamada @ 2025-02-15 14:21 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Arnd Bergmann, Andrii Nakryiko, Nicolas Schier, Kees Cook,
	KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
	linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
	Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
	Yonghong Song, Hao Luo, Andrew Morton, linux-kbuild,
	Daniel Borkmann, Nathan Chancellor, linux-debuggers,
	Alexei Starovoitov, Song Liu, linux-kernel, bpf

On Fri, Feb 7, 2025 at 10:21 AM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> When vmlinux is linked, the rodata from kallsyms is placed arbitrarily
> within the .rodata section. The linking process is repeated several
> times, since the kallsyms data size changes, which shifts symbols,
> requiring re-generating the data and re-linking.
>
> BTF is generated during the first link only. For variables, BTF includes
> a BTF_K_DATASEC for each data section that may contain a variable, which
> includes the variable's name, type, and offset within the data section.
> Because the size of kallsyms data changes during later links, the
> offsets of variables placed after it in .rodata will change. This means
> that BTF_K_DATASEC information for those variables becomes inaccurate.
>
> This is not currently a problem, because BTF currently only generates
> variable data for percpu variables. However, the next commit will add
> support for generating BTF for all global variables, including for the
> .rodata section.
>
> We could re-generate BTF each time vmlinux is linked, but this is quite
> expensive, and should be avoided at all costs. Further as each chunk of
> data (BTF and kallsyms) are re-generated, there's no guarantee that
> their sizes will converge anyway.
>
> Instead, we can take advantage of the fact that BTF only cares to store
> the offset of variables from the start of their section. Therefore, so
> long as the kallsyms data is stored last in the .rodata section, no
> offsets will be affected. Adjust kallsyms to output to .rodata.kallsyms,
> and update the linker script to include this at the end of .rodata.
>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> ---

I am fine if this is helpful for BTF.



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] btf: Add the option to include global variable types
  2025-02-14  1:18       ` Alexei Starovoitov
@ 2025-02-18 23:09         ` Stephen Brennan
  2025-02-25 21:47           ` Andrii Nakryiko
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Brennan @ 2025-02-18 23:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Masahiro Yamada, Andrii Nakryiko, Nicolas Schier, Kees Cook,
	KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
	linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
	Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
	Yonghong Song, Hao Luo, Andrew Morton, Linux Kbuild mailing list,
	Daniel Borkmann, Arnd Bergmann, Nathan Chancellor,
	linux-debuggers, Alexei Starovoitov, Song Liu, LKML, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Tue, Feb 11, 2025 at 3:59 PM Stephen Brennan
[...]
>> We can dust that off and include it for a new version of this series.
>> I'd be curious of what you'd like to see for kernel modules? A
>> three-level tree would be too complex, in my opinion.
>
> What is the use case for vars in kernel modules?

The use case would be the same as for the core kernel. My primary
motivation is to allow drgn to understand the types of global variables,
and that extends to kernel modules too.

>> module BTF size increased by 53.2%.
>
> This is the sum of all mods with vars divided by
> the sum of all mods without?

That was a poorly done comparison, so let me provide this one that I did
using 6.13 and these patches. It was essentially a localmodconfig for a
VM instance, so I could still do better by picking a popular
distribution config. But I think this is far more representative.

MODULE                   BASE   COMP    CHG     PCT
drm.ko                   115833 123410  7577    6.54%
iscsi_boot_sysfs.ko      2627   5380    2753    104.80%
joydev.ko                1816   2289    473     26.05%
libcxgbi.ko              24556  25266   710     2.89%
drm_vram_helper.ko       22325  22751   426     1.91%
nvme-tcp.ko              25044  25973   929     3.71%
vfat.ko                  3448   3953    505     14.65%
btrfs.ko                 275139 343686  68547   24.91%
libiscsi.ko              21177  21977   800     3.78%
xt_owner.ko              449    803     354     78.84%
nft_ct.ko                4912   6157    1245    25.35%
iscsi_ibft.ko            3967   4463    496     12.50%
pcspkr.ko                283    682     399     140.99%
crc32-pclmul.ko          390    771     381     97.69%
nf_conntrack.ko          23686  28191   4505    19.02%
iscsi_tcp.ko             16827  17750   923     5.49%
nft_fib.ko               835    1117    282     33.77%
nf_reject_ipv6.ko        699    981     282     40.34%
rfkill.ko                4233   6410    2177    51.43%
dm-region-hash.ko        6214   6496    282     4.54%
cxgb3i.ko                35469  37078   1609    4.54%
dm-mirror.ko             7576   8191    615     8.12%
pvpanic-pci.ko           174    574     400     229.89%
crct10dif-pclmul.ko      146    525     379     259.59%
nvme-fabrics.ko          17341  18124   783     4.52%
kvm-amd.ko               47302  51914   4612    9.75%
crc8.ko                  221    405     184     83.26%
ib_iser.ko               27769  29116   1347    4.85%
sg.ko                    4234   5656    1422    33.59%
intel_rapl_common.ko     5678   8446    2768    48.75%
bochs.ko                 35643  36997   1354    3.80%
sha1-ssse3.ko            790    1305    515     65.19%
kvm-intel.ko             53802  59220   5418    10.07%
nft_chain_nat.ko         279    714     435     155.91%
vmlinux                  5484970        7330096 1845126 33.64%
sha256-ssse3.ko          851    1378    527     61.93%
nf_nat.ko                6341   7240    899     14.18%
configs.ko               72     256     184     255.56%
xt_comment.ko            151    507     356     235.76%
ccp.ko                   30433  34782   4349    14.29%
cxgb3.ko                 44981  47504   2523    5.61%
crypto_simd.ko           1331   1613    282     21.19%
iptable_filter.ko        855    1456    601     70.29%
qedi.ko                  70653  72786   2133    3.02%
drm_kms_helper.ko        63238  65000   1762    2.79%
cnic.ko                  117074 117790  716     0.61%
failover.ko              780    1216    436     55.90%
nft_redir.ko             874    1529    655     74.94%
serio_raw.ko             708    1234    526     74.29%
nf_defrag_ipv6.ko        1520   2253    733     48.22%
nf_defrag_ipv4.ko        306    770     464     151.63%
nft_reject_ipv4.ko       517    939     422     81.62%
nft_nat.ko               1192   1732    540     45.30%
nft_reject_inet.ko       554    976     422     76.17%
fuse.ko                  32181  41859   9678    30.07%
nft_compat.ko            3705   4404    699     18.87%
zstd_compress.ko         42597  43622   1025    2.41%
tls.ko                   15140  20683   5543    36.61%
virtio_pci.ko            8456   9193    737     8.72%
blake2b_generic.ko       1364   1699    335     24.56%
cryptd.ko                3697   4297    600     16.23%
xor.ko                   1358   1879    521     38.37%
intel_rapl_msr.ko        2851   3440    589     20.66%
kvm.ko                   177060 256377  79317   44.80%
cxgb4.ko                 215865 220844  4979    2.31%
bnx2i.ko                 39524  41477   1953    4.94%
dm-round-robin.ko        1795   2123    328     18.27%
virtio_pci_legacy_dev.ko 909    1191    282     31.02%
qla4xxx.ko               79040  82694   3654    4.62%
nfs.ko                   108350 169642  61292   56.57%
libata.ko                47301  66188   18887   39.93%
ghash-clmulni-intel.ko   578    997     419     72.49%
nf_reject_ipv4.ko        706    988     282     39.94%
nft_reject.ko            820    1196    376     45.85%
sunrpc.ko                127496 197841  70345   55.17%
nft_fib_ipv4.ko          803    1257    454     56.54%
scsi_transport_iscsi.ko  40419  57633   17214   42.59%
lockd.ko                 36144  42137   5993    16.58%
drm_shmem_helper.ko      32555  33043   488     1.50%
nvme-core.ko             50275  58298   8023    15.96%
iw_cm.ko                 13405  14796   1391    10.38%
mdio.ko                  857    1041    184     21.47%
bnx2.ko                  20354  21611   1257    6.18%
net_failover.ko          1742   2187    445     25.55%
ip_set.ko                11812  13093   1281    10.84%
libcxgb.ko               8698   8980    282     3.24%
dm-multipath.ko          8124   8898    774     9.53%
grace.ko                 462    890     428     92.64%
virtio_net.ko            12322  14896   2574    20.89%
qed.ko                   228735 232231  3496    1.53%
cdc-acm.ko               2923   3679    756     25.86%
i2c-piix4.ko             1124   2341    1217    108.27%
pvpanic-mmio.ko          177    625     448     253.11%
virtio_scsi.ko           3154   3898    744     23.59%
uio.ko                   2602   4295    1693    65.07%
nft_fib_ipv6.ko          956    1410    454     47.49%
cec.ko                   28370  29266   896     3.16%
qemu_fw_cfg.ko           1601   3476    1875    117.11%
ttm.ko                   23672  25727   2055    8.68%
sd_mod.ko                9976   13030   3054    30.61%
xfs.ko                   574594 926637  352043  61.27%
libiscsi_tcp.ko          17444  17911   467     2.68%
ib_cm.ko                 32324  62373   30049   92.96%
aesni-intel.ko           3370   4922    1552    46.05%
drm_client_lib.ko        27449  27794   345     1.26%
virtio_pci_modern_dev.ko 2537   2819    282     11.12%
rdma_cm.ko               32504  51823   19319   59.44%
fat.ko                   11958  13297   1339    11.20%
dm-log.ko                6529   6986    457     7.00%
pata_acpi.ko             9231   9700    469     5.08%
ata_piix.ko              10998  12598   1600    14.55%
ipt_REJECT.ko            956    1311    355     37.13%
drm_ttm_helper.ko        33160  33544   384     1.16%
be2iscsi.ko              55078  56993   1915    3.48%
i2c-smbus.ko             582    973     391     67.18%
cuse.ko                  8435   9241    806     9.56%
nft_fib_inet.ko          579    995     416     71.85%
ib_core.ko               103656 123701  20045   19.34%
pulse8-cec.ko            9153   9890    737     8.05%
pvpanic.ko               494    1087    593     120.04%
dm-mod.ko                31377  35265   3888    12.39%
raid6_pq.ko              2774   4207    1433    51.66%
nft_reject_ipv6.ko       517    939     422     81.62%
cxgb4i.ko                47490  49021   1531    3.22%
ata_generic.ko           9008   9666    658     7.30%
vboxvideo.ko             47622  48844   1222    2.57%
ip_tables.ko             3109   3564    455     14.63%

ALL MODS                 9153268        11895301        2742033 29.96%
vmlinux                  5484970        7330096 1845126 33.64%
TOTAL                    14638238       19225397        4587159 31.34%

So this shows a 1.8 MiB increase in vmlinux size, or 33.6%.
And for these modules in aggregate, an increase of 2.7 MiB or 30.0%.

> Any outliers there?
> I would expect modules to have few global variables.

In terms of outliers, there are groups that stand out to me:

1. Large percentage increases are usually always for modules that had
very tiny BTF before. The module system inherently creates a few
global variables for each module, so there's always a slight constant
increase of the BTF size (184 bytes, as far as I can tell), and in those
cases it can be a quite large percentage. Here's an example,
"configs.ko" which comes from the CONFIG_IKCONFIG enablement:

BEFORE:
    $ bpftool btf dump file ../build_pahole_novars/kernel/configs.ko -B ../build_pahole_novars/vmlinux
    [127877] CONST '(anon)' type_id=11124
    [127878] ARRAY '(anon)' type_id=127877 index_type_id=21 nr_elems=1
    [127879] CONST '(anon)' type_id=127878

AFTER:
    $ bpftool btf dump file ../build_pahole_vars/kernel/configs.ko -B ../build_pahole_vars/vmlinux
    [162827] CONST '(anon)' type_id=11124
    [162828] ARRAY '(anon)' type_id=162827 index_type_id=21 nr_elems=1
    [162829] CONST '(anon)' type_id=162828
    [162830] VAR '____versions' type_id=162829, linkage=static
    [162831] DATASEC '__versions' size=64 vlen=1
            type_id=162830 offset=0 size=64 (VAR '____versions')
    [162832] VAR 'orc_header' type_id=8667, linkage=static
    [162833] DATASEC '.orc_header' size=20 vlen=1
            type_id=162832 offset=0 size=20 (VAR 'orc_header')
    [162834] VAR '__this_module' type_id=312, linkage=global
    [162835] DATASEC '.gnu.linkonce.this_module' size=1344 vlen=1
            type_id=162834 offset=0 size=1344 (VAR '__this_module')

What is, I think interesting, is that the types in that module were
totally useless to begin with, because they were used by a variable
which didn't even get emitted. So while this is a substantial
percentage-wise increase, I think it's a net improvement for this and
other modules.

2. The largest absolute increases come from large, complex modules like
xfs, kvm, sunrpc, btrfs, etc. For example, xfs had 5696 VAR
declarations. What is disappointing is how much of this is due to
automatically-generated "variables" from macros (e.g. tracepoints):
Here is a list of variable prefixes like that:

  print_fmt_*
  trace_event_fields_*
  trace_event_type_funcs_*
  event_*
  __SCK__tp_func_*
  __bpf_trace_tp_map_*
  __event_*
  event_class_*
  TRACE_SYSTEM_*
  __TRACE_SYSTEM_*
  __tracepoint_*

These are, unfortunately, all valid declarations produced by macros and
they correspond to valid symbols as well. If you look at the kallsyms
for the modules (and core kernel), these variables are present there as
well. It may indeed make sense to have kallsyms entries for them: I
don't know.

These are all, as far as I'm concerned, totally uninteresting types. If
you want to access any of this data, you probably already know its type
and wouldn't need a BTF declaration. Unfortunately, the flip side is
that I don't think we have a good way to automatically detect these,
outside of prefix matching, which quickly goes out of date as the kernel
changes, and can have false positives as well. For kernel modules, many
of these may appear in separate ELF sections, but for vmlinux, they
don't. I'd be happy to eliminate types for these auto-generated kinds of
variables, if we could somehow annotate them so that pahole knows to
ignore them. For instance, maybe we cauld use

__attribute__((btf_decl_tag("btf_omit")))

as an instruction to pahole to omit declarations for these things?

Thanks,
Stephen

> So before we decide on what to do with vars in mods lets figure out
> the need.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata"
  2025-02-15 14:21   ` Masahiro Yamada
@ 2025-02-24 18:51     ` Andrii Nakryiko
  2025-02-25  1:24       ` Stephen Brennan
  0 siblings, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2025-02-24 18:51 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Stephen Brennan, Arnd Bergmann, Andrii Nakryiko, Nicolas Schier,
	Kees Cook, KP Singh, Martin KaFai Lau, Sami Tolvanen,
	Eduard Zingerman, linux-arch, Stanislav Fomichev, Kent Overstreet,
	Pasha Tatashin, Jiri Olsa, John Fastabend, Jann Horn,
	Ard Biesheuvel, Yonghong Song, Hao Luo, Andrew Morton,
	linux-kbuild, Daniel Borkmann, Nathan Chancellor, linux-debuggers,
	Alexei Starovoitov, Song Liu, linux-kernel, bpf

On Sat, Feb 15, 2025 at 6:21 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> On Fri, Feb 7, 2025 at 10:21 AM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
> >
> > When vmlinux is linked, the rodata from kallsyms is placed arbitrarily
> > within the .rodata section. The linking process is repeated several
> > times, since the kallsyms data size changes, which shifts symbols,
> > requiring re-generating the data and re-linking.
> >
> > BTF is generated during the first link only. For variables, BTF includes
> > a BTF_K_DATASEC for each data section that may contain a variable, which
> > includes the variable's name, type, and offset within the data section.
> > Because the size of kallsyms data changes during later links, the
> > offsets of variables placed after it in .rodata will change. This means
> > that BTF_K_DATASEC information for those variables becomes inaccurate.
> >
> > This is not currently a problem, because BTF currently only generates
> > variable data for percpu variables. However, the next commit will add
> > support for generating BTF for all global variables, including for the
> > .rodata section.
> >
> > We could re-generate BTF each time vmlinux is linked, but this is quite
> > expensive, and should be avoided at all costs. Further as each chunk of
> > data (BTF and kallsyms) are re-generated, there's no guarantee that
> > their sizes will converge anyway.
> >
> > Instead, we can take advantage of the fact that BTF only cares to store
> > the offset of variables from the start of their section. Therefore, so
> > long as the kallsyms data is stored last in the .rodata section, no
> > offsets will be affected. Adjust kallsyms to output to .rodata.kallsyms,
> > and update the linker script to include this at the end of .rodata.
> >
> > Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> > ---
>
> I am fine if this is helpful for BTF.

This seems like a useful change all by itself even while the main
feature of this patch set is still being developed and reviewed.
Should we land just this .kallsyms_rodata change?

>
>
>
> --
> Best Regards
> Masahiro Yamada

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata"
  2025-02-24 18:51     ` Andrii Nakryiko
@ 2025-02-25  1:24       ` Stephen Brennan
  2025-02-25 16:59         ` Andrii Nakryiko
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Brennan @ 2025-02-25  1:24 UTC (permalink / raw)
  To: Andrii Nakryiko, Masahiro Yamada
  Cc: Arnd Bergmann, Andrii Nakryiko, Nicolas Schier, Kees Cook,
	KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
	linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
	Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
	Yonghong Song, Hao Luo, Andrew Morton, linux-kbuild,
	Daniel Borkmann, Nathan Chancellor, linux-debuggers,
	Alexei Starovoitov, Song Liu, linux-kernel, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Sat, Feb 15, 2025 at 6:21 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>>
>> On Fri, Feb 7, 2025 at 10:21 AM Stephen Brennan
>> <stephen.s.brennan@oracle.com> wrote:
>> >
>> > When vmlinux is linked, the rodata from kallsyms is placed arbitrarily
>> > within the .rodata section. The linking process is repeated several
>> > times, since the kallsyms data size changes, which shifts symbols,
>> > requiring re-generating the data and re-linking.
>> >
>> > BTF is generated during the first link only. For variables, BTF includes
>> > a BTF_K_DATASEC for each data section that may contain a variable, which
>> > includes the variable's name, type, and offset within the data section.
>> > Because the size of kallsyms data changes during later links, the
>> > offsets of variables placed after it in .rodata will change. This means
>> > that BTF_K_DATASEC information for those variables becomes inaccurate.
>> >
>> > This is not currently a problem, because BTF currently only generates
>> > variable data for percpu variables. However, the next commit will add
>> > support for generating BTF for all global variables, including for the
>> > .rodata section.
>> >
>> > We could re-generate BTF each time vmlinux is linked, but this is quite
>> > expensive, and should be avoided at all costs. Further as each chunk of
>> > data (BTF and kallsyms) are re-generated, there's no guarantee that
>> > their sizes will converge anyway.
>> >
>> > Instead, we can take advantage of the fact that BTF only cares to store
>> > the offset of variables from the start of their section. Therefore, so
>> > long as the kallsyms data is stored last in the .rodata section, no
>> > offsets will be affected. Adjust kallsyms to output to .rodata.kallsyms,
>> > and update the linker script to include this at the end of .rodata.
>> >
>> > Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
>> > ---
>>
>> I am fine if this is helpful for BTF.
>
> This seems like a useful change all by itself even while the main
> feature of this patch set is still being developed and reviewed.
> Should we land just this .kallsyms_rodata change?

I would be happy to see it merged now.

I don't think it would help anything other than BTF, because most other
things (e.g. kallsyms) refer to symbols via an absolute address. Using
the section offset seems pretty uncommon.

But it still is a nice cleanup anyway.

Thanks,
Stephen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] btf: Add the option to include global variable types
  2025-02-07 23:50   ` Alexei Starovoitov
  2025-02-11 23:58     ` Stephen Brennan
@ 2025-02-25 10:01     ` Alan Maguire
  2025-02-25 21:52       ` Andrii Nakryiko
  2025-05-12 11:15       ` Tony Ambardar
  1 sibling, 2 replies; 17+ messages in thread
From: Alan Maguire @ 2025-02-25 10:01 UTC (permalink / raw)
  To: Alexei Starovoitov, Stephen Brennan
  Cc: Masahiro Yamada, Andrii Nakryiko, Nicolas Schier, Kees Cook,
	KP Singh, Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman,
	linux-arch, Stanislav Fomichev, Kent Overstreet, Pasha Tatashin,
	Jiri Olsa, John Fastabend, Jann Horn, Ard Biesheuvel,
	Yonghong Song, Hao Luo, Andrew Morton, Linux Kbuild mailing list,
	Daniel Borkmann, Arnd Bergmann, Nathan Chancellor,
	linux-debuggers, Alexei Starovoitov, Song Liu, LKML, bpf

On 07/02/2025 23:50, Alexei Starovoitov wrote:
> On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>> When the feature was implemented in pahole, my measurements indicated
>> that vmlinux BTF size increased by about 25.8%, and module BTF size
>> increased by 53.2%. Due to these increases, the feature is implemented
>> behind a new config option, allowing users sensitive to increased memory
>> usage to disable it.
>>
> 
> ...
>> +config DEBUG_INFO_BTF_GLOBAL_VARS
>> +       bool "Generate BTF type information for all global variables"
>> +       default y
>> +       depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
>> +       help
>> +         Include type information for all global variables in the BTF. This
>> +         increases the size of the BTF information, which increases memory
>> +         usage at runtime. With global variable types available, runtime
>> +         debugging and tracers may be able to provide more detail.
> 
> This is not a solution.
> Even if it's changed to 'default n' distros will enable it
> like they enable everything and will suffer a regression.
> 
> We need to add a new module like vmlinux_btf.ko that will contain
> this additional BTF data. For global vars and everything else we might need.
> 

In this area, I've been exploring adding support for
CONFIG_DEBUG_INFO_BTF=m , so that the BTF info for vmlinux is delivered
via a module. From the consumer side, everything looks identical
(/sys/kernel/btf/vmlinux is there etc), it is just that the .BTF section
is delivered via btf_vmlinux.ko instead. The original need for this was
that embedded folks noted that because in the current situation BTF data
is in vmlinux, they cannot enable BTF because such small-footprint
systems do not support a large vmlinux binary. However they could
potentially use kernel BTF if it was delivered via a module. The other
nice thing about module delivery in the general case is we can make use
of module compression. In experiments I see a 5.8Mb vmlinux BTF reduce
to a 1.8Mb btf_vmlinux.ko.gz module on-disk.

The challenge in delivering vmlinux BTF in a module is that on module
load during boot other modules expect vmlinux BTF to be there when
adding their own BTF to /sys/kernel/btf. And kfunc registration from
kernel and modules expects this also. So support for deferred BTF module
load/kfunc registration is required too. I've implemented the former and
now am working on the latter. Hope to have some RFC patches ready soon,
but it looks feasible at this point.

Assuming such an option was available to small-footprint systems, should
we consider adding global variables to core vmlinux BTF along with
per-cpu variables? Then vmlinux BTF extras could be used for some of the
additional optional representations like function site-specific data
(inlines etc)? Or are there other factors other than on-disk footprint
that we need to consider? Thanks!

Alan

> pw-bot: cr
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata"
  2025-02-25  1:24       ` Stephen Brennan
@ 2025-02-25 16:59         ` Andrii Nakryiko
  0 siblings, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2025-02-25 16:59 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Masahiro Yamada, Arnd Bergmann, Andrii Nakryiko, Nicolas Schier,
	Kees Cook, KP Singh, Martin KaFai Lau, Sami Tolvanen,
	Eduard Zingerman, linux-arch, Stanislav Fomichev, Kent Overstreet,
	Pasha Tatashin, Jiri Olsa, John Fastabend, Jann Horn,
	Ard Biesheuvel, Yonghong Song, Hao Luo, Andrew Morton,
	linux-kbuild, Daniel Borkmann, Nathan Chancellor, linux-debuggers,
	Alexei Starovoitov, Song Liu, linux-kernel, bpf

On Mon, Feb 24, 2025 at 5:24 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> > On Sat, Feb 15, 2025 at 6:21 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
> >>
> >> On Fri, Feb 7, 2025 at 10:21 AM Stephen Brennan
> >> <stephen.s.brennan@oracle.com> wrote:
> >> >
> >> > When vmlinux is linked, the rodata from kallsyms is placed arbitrarily
> >> > within the .rodata section. The linking process is repeated several
> >> > times, since the kallsyms data size changes, which shifts symbols,
> >> > requiring re-generating the data and re-linking.
> >> >
> >> > BTF is generated during the first link only. For variables, BTF includes
> >> > a BTF_K_DATASEC for each data section that may contain a variable, which
> >> > includes the variable's name, type, and offset within the data section.
> >> > Because the size of kallsyms data changes during later links, the
> >> > offsets of variables placed after it in .rodata will change. This means
> >> > that BTF_K_DATASEC information for those variables becomes inaccurate.
> >> >
> >> > This is not currently a problem, because BTF currently only generates
> >> > variable data for percpu variables. However, the next commit will add
> >> > support for generating BTF for all global variables, including for the
> >> > .rodata section.
> >> >
> >> > We could re-generate BTF each time vmlinux is linked, but this is quite
> >> > expensive, and should be avoided at all costs. Further as each chunk of
> >> > data (BTF and kallsyms) are re-generated, there's no guarantee that
> >> > their sizes will converge anyway.
> >> >
> >> > Instead, we can take advantage of the fact that BTF only cares to store
> >> > the offset of variables from the start of their section. Therefore, so
> >> > long as the kallsyms data is stored last in the .rodata section, no
> >> > offsets will be affected. Adjust kallsyms to output to .rodata.kallsyms,
> >> > and update the linker script to include this at the end of .rodata.
> >> >
> >> > Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> >> > ---
> >>
> >> I am fine if this is helpful for BTF.
> >
> > This seems like a useful change all by itself even while the main
> > feature of this patch set is still being developed and reviewed.
> > Should we land just this .kallsyms_rodata change?
>
> I would be happy to see it merged now.
>
> I don't think it would help anything other than BTF, because most other
> things (e.g. kallsyms) refer to symbols via an absolute address. Using
> the section offset seems pretty uncommon.
>
> But it still is a nice cleanup anyway.

I was thinking about possible use cases of some tooling wanting to
access kallsyms data from vmlinux (instead of from /proc/kallsyms).
But, frankly, having a separate section doesn't help all that much
even there. We either way seem to have ELF symbols pointing to
relevant pieces of information, so it's not hard to get it even if
it's part of .rodata. So I guess we don't have to rush landing this
patch separately.

>
> Thanks,
> Stephen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] btf: Add the option to include global variable types
  2025-02-18 23:09         ` Stephen Brennan
@ 2025-02-25 21:47           ` Andrii Nakryiko
  0 siblings, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2025-02-25 21:47 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Alexei Starovoitov, Masahiro Yamada, Andrii Nakryiko,
	Nicolas Schier, Kees Cook, KP Singh, Martin KaFai Lau,
	Sami Tolvanen, Eduard Zingerman, linux-arch, Stanislav Fomichev,
	Kent Overstreet, Pasha Tatashin, Jiri Olsa, John Fastabend,
	Jann Horn, Ard Biesheuvel, Yonghong Song, Hao Luo, Andrew Morton,
	Linux Kbuild mailing list, Daniel Borkmann, Arnd Bergmann,
	Nathan Chancellor, linux-debuggers, Alexei Starovoitov, Song Liu,
	LKML, bpf

On Tue, Feb 18, 2025 at 3:10 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> > On Tue, Feb 11, 2025 at 3:59 PM Stephen Brennan
> [...]
> >> We can dust that off and include it for a new version of this series.
> >> I'd be curious of what you'd like to see for kernel modules? A
> >> three-level tree would be too complex, in my opinion.
> >
> > What is the use case for vars in kernel modules?
>
> The use case would be the same as for the core kernel. My primary
> motivation is to allow drgn to understand the types of global variables,
> and that extends to kernel modules too.
>
> >> module BTF size increased by 53.2%.
> >
> > This is the sum of all mods with vars divided by
> > the sum of all mods without?
>
> That was a poorly done comparison, so let me provide this one that I did
> using 6.13 and these patches. It was essentially a localmodconfig for a
> VM instance, so I could still do better by picking a popular
> distribution config. But I think this is far more representative.
>
> MODULE                   BASE   COMP    CHG     PCT
> drm.ko                   115833 123410  7577    6.54%
> iscsi_boot_sysfs.ko      2627   5380    2753    104.80%
> joydev.ko                1816   2289    473     26.05%
> libcxgbi.ko              24556  25266   710     2.89%
> drm_vram_helper.ko       22325  22751   426     1.91%
> nvme-tcp.ko              25044  25973   929     3.71%
> vfat.ko                  3448   3953    505     14.65%
> btrfs.ko                 275139 343686  68547   24.91%
> libiscsi.ko              21177  21977   800     3.78%
> xt_owner.ko              449    803     354     78.84%
> nft_ct.ko                4912   6157    1245    25.35%
> iscsi_ibft.ko            3967   4463    496     12.50%
> pcspkr.ko                283    682     399     140.99%
> crc32-pclmul.ko          390    771     381     97.69%
> nf_conntrack.ko          23686  28191   4505    19.02%
> iscsi_tcp.ko             16827  17750   923     5.49%
> nft_fib.ko               835    1117    282     33.77%
> nf_reject_ipv6.ko        699    981     282     40.34%
> rfkill.ko                4233   6410    2177    51.43%
> dm-region-hash.ko        6214   6496    282     4.54%
> cxgb3i.ko                35469  37078   1609    4.54%
> dm-mirror.ko             7576   8191    615     8.12%
> pvpanic-pci.ko           174    574     400     229.89%
> crct10dif-pclmul.ko      146    525     379     259.59%
> nvme-fabrics.ko          17341  18124   783     4.52%
> kvm-amd.ko               47302  51914   4612    9.75%
> crc8.ko                  221    405     184     83.26%
> ib_iser.ko               27769  29116   1347    4.85%
> sg.ko                    4234   5656    1422    33.59%
> intel_rapl_common.ko     5678   8446    2768    48.75%
> bochs.ko                 35643  36997   1354    3.80%
> sha1-ssse3.ko            790    1305    515     65.19%
> kvm-intel.ko             53802  59220   5418    10.07%
> nft_chain_nat.ko         279    714     435     155.91%
> vmlinux                  5484970        7330096 1845126 33.64%
> sha256-ssse3.ko          851    1378    527     61.93%
> nf_nat.ko                6341   7240    899     14.18%
> configs.ko               72     256     184     255.56%
> xt_comment.ko            151    507     356     235.76%
> ccp.ko                   30433  34782   4349    14.29%
> cxgb3.ko                 44981  47504   2523    5.61%
> crypto_simd.ko           1331   1613    282     21.19%
> iptable_filter.ko        855    1456    601     70.29%
> qedi.ko                  70653  72786   2133    3.02%
> drm_kms_helper.ko        63238  65000   1762    2.79%
> cnic.ko                  117074 117790  716     0.61%
> failover.ko              780    1216    436     55.90%
> nft_redir.ko             874    1529    655     74.94%
> serio_raw.ko             708    1234    526     74.29%
> nf_defrag_ipv6.ko        1520   2253    733     48.22%
> nf_defrag_ipv4.ko        306    770     464     151.63%
> nft_reject_ipv4.ko       517    939     422     81.62%
> nft_nat.ko               1192   1732    540     45.30%
> nft_reject_inet.ko       554    976     422     76.17%
> fuse.ko                  32181  41859   9678    30.07%
> nft_compat.ko            3705   4404    699     18.87%
> zstd_compress.ko         42597  43622   1025    2.41%
> tls.ko                   15140  20683   5543    36.61%
> virtio_pci.ko            8456   9193    737     8.72%
> blake2b_generic.ko       1364   1699    335     24.56%
> cryptd.ko                3697   4297    600     16.23%
> xor.ko                   1358   1879    521     38.37%
> intel_rapl_msr.ko        2851   3440    589     20.66%
> kvm.ko                   177060 256377  79317   44.80%
> cxgb4.ko                 215865 220844  4979    2.31%
> bnx2i.ko                 39524  41477   1953    4.94%
> dm-round-robin.ko        1795   2123    328     18.27%
> virtio_pci_legacy_dev.ko 909    1191    282     31.02%
> qla4xxx.ko               79040  82694   3654    4.62%
> nfs.ko                   108350 169642  61292   56.57%
> libata.ko                47301  66188   18887   39.93%
> ghash-clmulni-intel.ko   578    997     419     72.49%
> nf_reject_ipv4.ko        706    988     282     39.94%
> nft_reject.ko            820    1196    376     45.85%
> sunrpc.ko                127496 197841  70345   55.17%
> nft_fib_ipv4.ko          803    1257    454     56.54%
> scsi_transport_iscsi.ko  40419  57633   17214   42.59%
> lockd.ko                 36144  42137   5993    16.58%
> drm_shmem_helper.ko      32555  33043   488     1.50%
> nvme-core.ko             50275  58298   8023    15.96%
> iw_cm.ko                 13405  14796   1391    10.38%
> mdio.ko                  857    1041    184     21.47%
> bnx2.ko                  20354  21611   1257    6.18%
> net_failover.ko          1742   2187    445     25.55%
> ip_set.ko                11812  13093   1281    10.84%
> libcxgb.ko               8698   8980    282     3.24%
> dm-multipath.ko          8124   8898    774     9.53%
> grace.ko                 462    890     428     92.64%
> virtio_net.ko            12322  14896   2574    20.89%
> qed.ko                   228735 232231  3496    1.53%
> cdc-acm.ko               2923   3679    756     25.86%
> i2c-piix4.ko             1124   2341    1217    108.27%
> pvpanic-mmio.ko          177    625     448     253.11%
> virtio_scsi.ko           3154   3898    744     23.59%
> uio.ko                   2602   4295    1693    65.07%
> nft_fib_ipv6.ko          956    1410    454     47.49%
> cec.ko                   28370  29266   896     3.16%
> qemu_fw_cfg.ko           1601   3476    1875    117.11%
> ttm.ko                   23672  25727   2055    8.68%
> sd_mod.ko                9976   13030   3054    30.61%
> xfs.ko                   574594 926637  352043  61.27%
> libiscsi_tcp.ko          17444  17911   467     2.68%
> ib_cm.ko                 32324  62373   30049   92.96%
> aesni-intel.ko           3370   4922    1552    46.05%
> drm_client_lib.ko        27449  27794   345     1.26%
> virtio_pci_modern_dev.ko 2537   2819    282     11.12%
> rdma_cm.ko               32504  51823   19319   59.44%
> fat.ko                   11958  13297   1339    11.20%
> dm-log.ko                6529   6986    457     7.00%
> pata_acpi.ko             9231   9700    469     5.08%
> ata_piix.ko              10998  12598   1600    14.55%
> ipt_REJECT.ko            956    1311    355     37.13%
> drm_ttm_helper.ko        33160  33544   384     1.16%
> be2iscsi.ko              55078  56993   1915    3.48%
> i2c-smbus.ko             582    973     391     67.18%
> cuse.ko                  8435   9241    806     9.56%
> nft_fib_inet.ko          579    995     416     71.85%
> ib_core.ko               103656 123701  20045   19.34%
> pulse8-cec.ko            9153   9890    737     8.05%
> pvpanic.ko               494    1087    593     120.04%
> dm-mod.ko                31377  35265   3888    12.39%
> raid6_pq.ko              2774   4207    1433    51.66%
> nft_reject_ipv6.ko       517    939     422     81.62%
> cxgb4i.ko                47490  49021   1531    3.22%
> ata_generic.ko           9008   9666    658     7.30%
> vboxvideo.ko             47622  48844   1222    2.57%
> ip_tables.ko             3109   3564    455     14.63%
>
> ALL MODS                 9153268        11895301        2742033 29.96%
> vmlinux                  5484970        7330096 1845126 33.64%
> TOTAL                    14638238       19225397        4587159 31.34%
>
> So this shows a 1.8 MiB increase in vmlinux size, or 33.6%.
> And for these modules in aggregate, an increase of 2.7 MiB or 30.0%.
>
> > Any outliers there?
> > I would expect modules to have few global variables.
>
> In terms of outliers, there are groups that stand out to me:
>
> 1. Large percentage increases are usually always for modules that had
> very tiny BTF before. The module system inherently creates a few
> global variables for each module, so there's always a slight constant
> increase of the BTF size (184 bytes, as far as I can tell), and in those
> cases it can be a quite large percentage. Here's an example,
> "configs.ko" which comes from the CONFIG_IKCONFIG enablement:
>
> BEFORE:
>     $ bpftool btf dump file ../build_pahole_novars/kernel/configs.ko -B ../build_pahole_novars/vmlinux
>     [127877] CONST '(anon)' type_id=11124
>     [127878] ARRAY '(anon)' type_id=127877 index_type_id=21 nr_elems=1
>     [127879] CONST '(anon)' type_id=127878
>
> AFTER:
>     $ bpftool btf dump file ../build_pahole_vars/kernel/configs.ko -B ../build_pahole_vars/vmlinux
>     [162827] CONST '(anon)' type_id=11124
>     [162828] ARRAY '(anon)' type_id=162827 index_type_id=21 nr_elems=1
>     [162829] CONST '(anon)' type_id=162828
>     [162830] VAR '____versions' type_id=162829, linkage=static
>     [162831] DATASEC '__versions' size=64 vlen=1
>             type_id=162830 offset=0 size=64 (VAR '____versions')
>     [162832] VAR 'orc_header' type_id=8667, linkage=static
>     [162833] DATASEC '.orc_header' size=20 vlen=1
>             type_id=162832 offset=0 size=20 (VAR 'orc_header')
>     [162834] VAR '__this_module' type_id=312, linkage=global
>     [162835] DATASEC '.gnu.linkonce.this_module' size=1344 vlen=1
>             type_id=162834 offset=0 size=1344 (VAR '__this_module')
>
> What is, I think interesting, is that the types in that module were
> totally useless to begin with, because they were used by a variable
> which didn't even get emitted. So while this is a substantial
> percentage-wise increase, I think it's a net improvement for this and
> other modules.
>
> 2. The largest absolute increases come from large, complex modules like
> xfs, kvm, sunrpc, btrfs, etc. For example, xfs had 5696 VAR
> declarations. What is disappointing is how much of this is due to
> automatically-generated "variables" from macros (e.g. tracepoints):
> Here is a list of variable prefixes like that:
>
>   print_fmt_*
>   trace_event_fields_*
>   trace_event_type_funcs_*
>   event_*
>   __SCK__tp_func_*
>   __bpf_trace_tp_map_*
>   __event_*
>   event_class_*
>   TRACE_SYSTEM_*
>   __TRACE_SYSTEM_*
>   __tracepoint_*
>
> These are, unfortunately, all valid declarations produced by macros and
> they correspond to valid symbols as well. If you look at the kallsyms
> for the modules (and core kernel), these variables are present there as
> well. It may indeed make sense to have kallsyms entries for them: I
> don't know.
>
> These are all, as far as I'm concerned, totally uninteresting types. If
> you want to access any of this data, you probably already know its type
> and wouldn't need a BTF declaration. Unfortunately, the flip side is
> that I don't think we have a good way to automatically detect these,
> outside of prefix matching, which quickly goes out of date as the kernel
> changes, and can have false positives as well. For kernel modules, many
> of these may appear in separate ELF sections, but for vmlinux, they
> don't. I'd be happy to eliminate types for these auto-generated kinds of
> variables, if we could somehow annotate them so that pahole knows to
> ignore them. For instance, maybe we cauld use
>
> __attribute__((btf_decl_tag("btf_omit")))
>
> as an instruction to pahole to omit declarations for these things?
>

All such tracepoint-related variables, can't we just put them into
some separate ELF section, and teach pahole to ignore global variables
from that section? btf_decl_tag is a similar idea, but (currently)
won't work for GCC-built kernels. So I'd go with the ELF section.

> Thanks,
> Stephen
>
> > So before we decide on what to do with vars in mods lets figure out
> > the need.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] btf: Add the option to include global variable types
  2025-02-25 10:01     ` Alan Maguire
@ 2025-02-25 21:52       ` Andrii Nakryiko
  2025-02-26 14:20         ` Alan Maguire
  2025-05-12 11:15       ` Tony Ambardar
  1 sibling, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2025-02-25 21:52 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Alexei Starovoitov, Stephen Brennan, Masahiro Yamada,
	Andrii Nakryiko, Nicolas Schier, Kees Cook, KP Singh,
	Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman, linux-arch,
	Stanislav Fomichev, Kent Overstreet, Pasha Tatashin, Jiri Olsa,
	John Fastabend, Jann Horn, Ard Biesheuvel, Yonghong Song, Hao Luo,
	Andrew Morton, Linux Kbuild mailing list, Daniel Borkmann,
	Arnd Bergmann, Nathan Chancellor, linux-debuggers,
	Alexei Starovoitov, Song Liu, LKML, bpf

On Tue, Feb 25, 2025 at 2:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 07/02/2025 23:50, Alexei Starovoitov wrote:
> > On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> >> When the feature was implemented in pahole, my measurements indicated
> >> that vmlinux BTF size increased by about 25.8%, and module BTF size
> >> increased by 53.2%. Due to these increases, the feature is implemented
> >> behind a new config option, allowing users sensitive to increased memory
> >> usage to disable it.
> >>
> >
> > ...
> >> +config DEBUG_INFO_BTF_GLOBAL_VARS
> >> +       bool "Generate BTF type information for all global variables"
> >> +       default y
> >> +       depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
> >> +       help
> >> +         Include type information for all global variables in the BTF. This
> >> +         increases the size of the BTF information, which increases memory
> >> +         usage at runtime. With global variable types available, runtime
> >> +         debugging and tracers may be able to provide more detail.
> >
> > This is not a solution.
> > Even if it's changed to 'default n' distros will enable it
> > like they enable everything and will suffer a regression.
> >
> > We need to add a new module like vmlinux_btf.ko that will contain
> > this additional BTF data. For global vars and everything else we might need.
> >
>
> In this area, I've been exploring adding support for
> CONFIG_DEBUG_INFO_BTF=m , so that the BTF info for vmlinux is delivered
> via a module. From the consumer side, everything looks identical
> (/sys/kernel/btf/vmlinux is there etc), it is just that the .BTF section
> is delivered via btf_vmlinux.ko instead. The original need for this was
> that embedded folks noted that because in the current situation BTF data
> is in vmlinux, they cannot enable BTF because such small-footprint
> systems do not support a large vmlinux binary. However they could
> potentially use kernel BTF if it was delivered via a module. The other
> nice thing about module delivery in the general case is we can make use
> of module compression. In experiments I see a 5.8Mb vmlinux BTF reduce
> to a 1.8Mb btf_vmlinux.ko.gz module on-disk.
>
> The challenge in delivering vmlinux BTF in a module is that on module
> load during boot other modules expect vmlinux BTF to be there when
> adding their own BTF to /sys/kernel/btf. And kfunc registration from
> kernel and modules expects this also. So support for deferred BTF module
> load/kfunc registration is required too. I've implemented the former and
> now am working on the latter. Hope to have some RFC patches ready soon,
> but it looks feasible at this point.

Lazy btf_vmlinux.ko loading when BTF is actually needed (i.e., when
user reads /sys/kernel/btf/vmlinux for the first time; or when BPF
program is validated and needs kernel BTF) would be great. Curious too
see how all that fits together!

>
> Assuming such an option was available to small-footprint systems, should
> we consider adding global variables to core vmlinux BTF along with
> per-cpu variables? Then vmlinux BTF extras could be used for some of the
> additional optional representations like function site-specific data
> (inlines etc)? Or are there other factors other than on-disk footprint
> that we need to consider? Thanks!

I'd keep BTF for variables separate from "core" vmlinux BTF. We can
have /sys/kernel/btf/vmlinux.vars, which would depend on
/sys/kernel/btf/vmlinux as a base BTF. Separately, we could eventually
have /sys/kernel/btf/vmlinux.inlines which would also have
/sys/kernel/btf/vmlinux as base BTF. If no one needs vmlinux.vars on
the system, we won't need to waste memory on it. Seems more modular
and extensible.

>
> Alan
>
> > pw-bot: cr
> >
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] btf: Add the option to include global variable types
  2025-02-25 21:52       ` Andrii Nakryiko
@ 2025-02-26 14:20         ` Alan Maguire
  2025-02-26 16:57           ` Andrii Nakryiko
  0 siblings, 1 reply; 17+ messages in thread
From: Alan Maguire @ 2025-02-26 14:20 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Stephen Brennan, Masahiro Yamada,
	Andrii Nakryiko, Nicolas Schier, Kees Cook, KP Singh,
	Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman, linux-arch,
	Stanislav Fomichev, Kent Overstreet, Pasha Tatashin, Jiri Olsa,
	John Fastabend, Jann Horn, Ard Biesheuvel, Yonghong Song, Hao Luo,
	Andrew Morton, Linux Kbuild mailing list, Daniel Borkmann,
	Arnd Bergmann, Nathan Chancellor, linux-debuggers,
	Alexei Starovoitov, Song Liu, LKML, bpf

On 25/02/2025 21:52, Andrii Nakryiko wrote:
> On Tue, Feb 25, 2025 at 2:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> On 07/02/2025 23:50, Alexei Starovoitov wrote:
>>> On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
>>> <stephen.s.brennan@oracle.com> wrote:
>>>> When the feature was implemented in pahole, my measurements indicated
>>>> that vmlinux BTF size increased by about 25.8%, and module BTF size
>>>> increased by 53.2%. Due to these increases, the feature is implemented
>>>> behind a new config option, allowing users sensitive to increased memory
>>>> usage to disable it.
>>>>
>>>
>>> ...
>>>> +config DEBUG_INFO_BTF_GLOBAL_VARS
>>>> +       bool "Generate BTF type information for all global variables"
>>>> +       default y
>>>> +       depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
>>>> +       help
>>>> +         Include type information for all global variables in the BTF. This
>>>> +         increases the size of the BTF information, which increases memory
>>>> +         usage at runtime. With global variable types available, runtime
>>>> +         debugging and tracers may be able to provide more detail.
>>>
>>> This is not a solution.
>>> Even if it's changed to 'default n' distros will enable it
>>> like they enable everything and will suffer a regression.
>>>
>>> We need to add a new module like vmlinux_btf.ko that will contain
>>> this additional BTF data. For global vars and everything else we might need.
>>>
>>
>> In this area, I've been exploring adding support for
>> CONFIG_DEBUG_INFO_BTF=m , so that the BTF info for vmlinux is delivered
>> via a module. From the consumer side, everything looks identical
>> (/sys/kernel/btf/vmlinux is there etc), it is just that the .BTF section
>> is delivered via btf_vmlinux.ko instead. The original need for this was
>> that embedded folks noted that because in the current situation BTF data
>> is in vmlinux, they cannot enable BTF because such small-footprint
>> systems do not support a large vmlinux binary. However they could
>> potentially use kernel BTF if it was delivered via a module. The other
>> nice thing about module delivery in the general case is we can make use
>> of module compression. In experiments I see a 5.8Mb vmlinux BTF reduce
>> to a 1.8Mb btf_vmlinux.ko.gz module on-disk.
>>
>> The challenge in delivering vmlinux BTF in a module is that on module
>> load during boot other modules expect vmlinux BTF to be there when
>> adding their own BTF to /sys/kernel/btf. And kfunc registration from
>> kernel and modules expects this also. So support for deferred BTF module
>> load/kfunc registration is required too. I've implemented the former and
>> now am working on the latter. Hope to have some RFC patches ready soon,
>> but it looks feasible at this point.
> 
> Lazy btf_vmlinux.ko loading when BTF is actually needed (i.e., when
> user reads /sys/kernel/btf/vmlinux for the first time; or when BPF
> program is validated and needs kernel BTF) would be great. Curious too
> see how all that fits together!
> 
>>
>> Assuming such an option was available to small-footprint systems, should
>> we consider adding global variables to core vmlinux BTF along with
>> per-cpu variables? Then vmlinux BTF extras could be used for some of the
>> additional optional representations like function site-specific data
>> (inlines etc)? Or are there other factors other than on-disk footprint
>> that we need to consider? Thanks!
> 
> I'd keep BTF for variables separate from "core" vmlinux BTF. We can
> have /sys/kernel/btf/vmlinux.vars, which would depend on
> /sys/kernel/btf/vmlinux as a base BTF. Separately, we could eventually
> have /sys/kernel/btf/vmlinux.inlines which would also have
> /sys/kernel/btf/vmlinux as base BTF. If no one needs vmlinux.vars on
> the system, we won't need to waste memory on it. Seems more modular
> and extensible.
>

Sounds good. So thinking about how this fits with
CONFIG_DEBUG_INFO_BTF=m, perhaps the approach would be to use
btf_vmlinux.ko for all such extensible /sys/kernel/btf/vmlinux.vars,
vmlinux.inlines etc. Each of these is derived from .BTF.vars ,
.BTF.inlines sections in btf_vmlinux.ko. These are optionally included
via CONFIG_DEBUG_INFO_BTF_EXTRAS list. If CONFIG_DEBUG_INFO_BTF=y the
core vmlinux section stays in vmlinux itself and the extras are
delivered via btf_vmlinux.ko, but if CONFIG_DEBUG_INFO_BTF=m, the
vmlinux .BTF section is delivered in btf_vmlinux.ko too.

If this makes sense, I'll try and put together the
CONFIG_DEBUG_INFO_BTF=m support first, and that will give us a
btf_vmlinux.ko to work with for delivery of extras. Thanks!

Alan

>>
>> Alan
>>
>>> pw-bot: cr
>>>
>>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] btf: Add the option to include global variable types
  2025-02-26 14:20         ` Alan Maguire
@ 2025-02-26 16:57           ` Andrii Nakryiko
  0 siblings, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2025-02-26 16:57 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Alexei Starovoitov, Stephen Brennan, Masahiro Yamada,
	Andrii Nakryiko, Nicolas Schier, Kees Cook, KP Singh,
	Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman, linux-arch,
	Stanislav Fomichev, Kent Overstreet, Pasha Tatashin, Jiri Olsa,
	John Fastabend, Jann Horn, Ard Biesheuvel, Yonghong Song, Hao Luo,
	Andrew Morton, Linux Kbuild mailing list, Daniel Borkmann,
	Arnd Bergmann, Nathan Chancellor, linux-debuggers,
	Alexei Starovoitov, Song Liu, LKML, bpf

On Wed, Feb 26, 2025 at 6:20 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 25/02/2025 21:52, Andrii Nakryiko wrote:
> > On Tue, Feb 25, 2025 at 2:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> On 07/02/2025 23:50, Alexei Starovoitov wrote:
> >>> On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> >>> <stephen.s.brennan@oracle.com> wrote:
> >>>> When the feature was implemented in pahole, my measurements indicated
> >>>> that vmlinux BTF size increased by about 25.8%, and module BTF size
> >>>> increased by 53.2%. Due to these increases, the feature is implemented
> >>>> behind a new config option, allowing users sensitive to increased memory
> >>>> usage to disable it.
> >>>>
> >>>
> >>> ...
> >>>> +config DEBUG_INFO_BTF_GLOBAL_VARS
> >>>> +       bool "Generate BTF type information for all global variables"
> >>>> +       default y
> >>>> +       depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
> >>>> +       help
> >>>> +         Include type information for all global variables in the BTF. This
> >>>> +         increases the size of the BTF information, which increases memory
> >>>> +         usage at runtime. With global variable types available, runtime
> >>>> +         debugging and tracers may be able to provide more detail.
> >>>
> >>> This is not a solution.
> >>> Even if it's changed to 'default n' distros will enable it
> >>> like they enable everything and will suffer a regression.
> >>>
> >>> We need to add a new module like vmlinux_btf.ko that will contain
> >>> this additional BTF data. For global vars and everything else we might need.
> >>>
> >>
> >> In this area, I've been exploring adding support for
> >> CONFIG_DEBUG_INFO_BTF=m , so that the BTF info for vmlinux is delivered
> >> via a module. From the consumer side, everything looks identical
> >> (/sys/kernel/btf/vmlinux is there etc), it is just that the .BTF section
> >> is delivered via btf_vmlinux.ko instead. The original need for this was
> >> that embedded folks noted that because in the current situation BTF data
> >> is in vmlinux, they cannot enable BTF because such small-footprint
> >> systems do not support a large vmlinux binary. However they could
> >> potentially use kernel BTF if it was delivered via a module. The other
> >> nice thing about module delivery in the general case is we can make use
> >> of module compression. In experiments I see a 5.8Mb vmlinux BTF reduce
> >> to a 1.8Mb btf_vmlinux.ko.gz module on-disk.
> >>
> >> The challenge in delivering vmlinux BTF in a module is that on module
> >> load during boot other modules expect vmlinux BTF to be there when
> >> adding their own BTF to /sys/kernel/btf. And kfunc registration from
> >> kernel and modules expects this also. So support for deferred BTF module
> >> load/kfunc registration is required too. I've implemented the former and
> >> now am working on the latter. Hope to have some RFC patches ready soon,
> >> but it looks feasible at this point.
> >
> > Lazy btf_vmlinux.ko loading when BTF is actually needed (i.e., when
> > user reads /sys/kernel/btf/vmlinux for the first time; or when BPF
> > program is validated and needs kernel BTF) would be great. Curious too
> > see how all that fits together!
> >
> >>
> >> Assuming such an option was available to small-footprint systems, should
> >> we consider adding global variables to core vmlinux BTF along with
> >> per-cpu variables? Then vmlinux BTF extras could be used for some of the
> >> additional optional representations like function site-specific data
> >> (inlines etc)? Or are there other factors other than on-disk footprint
> >> that we need to consider? Thanks!
> >
> > I'd keep BTF for variables separate from "core" vmlinux BTF. We can
> > have /sys/kernel/btf/vmlinux.vars, which would depend on
> > /sys/kernel/btf/vmlinux as a base BTF. Separately, we could eventually
> > have /sys/kernel/btf/vmlinux.inlines which would also have
> > /sys/kernel/btf/vmlinux as base BTF. If no one needs vmlinux.vars on
> > the system, we won't need to waste memory on it. Seems more modular
> > and extensible.
> >
>
> Sounds good. So thinking about how this fits with
> CONFIG_DEBUG_INFO_BTF=m, perhaps the approach would be to use
> btf_vmlinux.ko for all such extensible /sys/kernel/btf/vmlinux.vars,
> vmlinux.inlines etc. Each of these is derived from .BTF.vars ,
> .BTF.inlines sections in btf_vmlinux.ko. These are optionally included
> via CONFIG_DEBUG_INFO_BTF_EXTRAS list. If CONFIG_DEBUG_INFO_BTF=y the
> core vmlinux section stays in vmlinux itself and the extras are
> delivered via btf_vmlinux.ko, but if CONFIG_DEBUG_INFO_BTF=m, the
> vmlinux .BTF section is delivered in btf_vmlinux.ko too.
>
> If this makes sense, I'll try and put together the
> CONFIG_DEBUG_INFO_BTF=m support first, and that will give us a
> btf_vmlinux.ko to work with for delivery of extras. Thanks!

I'd keep our options open as to whether btf_vmlinux.ko contains all
vmlinux BTFs (core BTF, inlines, variables) or we have a separate
module for some subsets. E.g., variables, while a useful thing,
probably won't be used all that frequently (i.e., only while debugging
with drgn), so co-locating it with vmlinux BTF itself might be a waste
in most cases.

But other than that makes sense.

>
> Alan
>
> >>
> >> Alan
> >>
> >>> pw-bot: cr
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] btf: Add the option to include global variable types
  2025-02-25 10:01     ` Alan Maguire
  2025-02-25 21:52       ` Andrii Nakryiko
@ 2025-05-12 11:15       ` Tony Ambardar
  1 sibling, 0 replies; 17+ messages in thread
From: Tony Ambardar @ 2025-05-12 11:15 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Alexei Starovoitov, Stephen Brennan, Masahiro Yamada,
	Andrii Nakryiko, Nicolas Schier, Kees Cook, KP Singh,
	Martin KaFai Lau, Sami Tolvanen, Eduard Zingerman, linux-arch,
	Stanislav Fomichev, Kent Overstreet, Pasha Tatashin, Jiri Olsa,
	John Fastabend, Jann Horn, Ard Biesheuvel, Yonghong Song, Hao Luo,
	Andrew Morton, Linux Kbuild mailing list, Daniel Borkmann,
	Arnd Bergmann, Nathan Chancellor, linux-debuggers,
	Alexei Starovoitov, Song Liu, LKML, bpf

On Tue, Feb 25, 2025 at 10:01:27AM +0000, Alan Maguire wrote:
> On 07/02/2025 23:50, Alexei Starovoitov wrote:
> > On Thu, Feb 6, 2025 at 5:21 PM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> >> When the feature was implemented in pahole, my measurements indicated
> >> that vmlinux BTF size increased by about 25.8%, and module BTF size
> >> increased by 53.2%. Due to these increases, the feature is implemented
> >> behind a new config option, allowing users sensitive to increased memory
> >> usage to disable it.
> >>
> > 
> > ...
> >> +config DEBUG_INFO_BTF_GLOBAL_VARS
> >> +       bool "Generate BTF type information for all global variables"
> >> +       default y
> >> +       depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 128
> >> +       help
> >> +         Include type information for all global variables in the BTF. This
> >> +         increases the size of the BTF information, which increases memory
> >> +         usage at runtime. With global variable types available, runtime
> >> +         debugging and tracers may be able to provide more detail.
> > 
> > This is not a solution.
> > Even if it's changed to 'default n' distros will enable it
> > like they enable everything and will suffer a regression.
> > 
> > We need to add a new module like vmlinux_btf.ko that will contain
> > this additional BTF data. For global vars and everything else we might need.
> > 
>

Hi Alan,

> In this area, I've been exploring adding support for
> CONFIG_DEBUG_INFO_BTF=m , so that the BTF info for vmlinux is delivered
> via a module. From the consumer side, everything looks identical
> (/sys/kernel/btf/vmlinux is there etc), it is just that the .BTF section
> is delivered via btf_vmlinux.ko instead. The original need for this was
> that embedded folks noted that because in the current situation BTF data
> is in vmlinux, they cannot enable BTF because such small-footprint
> systems do not support a large vmlinux binary. However they could
> potentially use kernel BTF if it was delivered via a module. The other
> nice thing about module delivery in the general case is we can make use
> of module compression. In experiments I see a 5.8Mb vmlinux BTF reduce
> to a 1.8Mb btf_vmlinux.ko.gz module on-disk.
> 

Thank you very much for working on this. I was keen to see this since you
first mentioned it a few years back [1], and have been meaning to ping
you on where things stand. Your summary of motivations above is spot on,
and I can add some context w.r.t. OpenWrt, often used on small consumer
Linux routers to: improve security after support ends, expand
functionality, and increase lifetime/reduce e-waste.

This lifetime is already constrained by the limited kernel binary storage
of some devices and ever increasing kernel sizes. The biggest mitigation
is heavy use of loadable modules to avoid using kernel storage and also
reduce the kernel BTF.  Even so, the (compressed) kernel BTF is ~400 KB,
and over the years I've seen kernel sizes grow by ~200 KB per annual LTS
release.

These rates can amount to penalizing BTF usage with _two years of reduced
lifetime_, which is a key obstacle to enabling BTF by default on such
small systems IMO. Having a module-based kernel BTF would be a huge
improvement!

> The challenge in delivering vmlinux BTF in a module is that on module
> load during boot other modules expect vmlinux BTF to be there when
> adding their own BTF to /sys/kernel/btf. And kfunc registration from
> kernel and modules expects this also. So support for deferred BTF module
> load/kfunc registration is required too. I've implemented the former and
> now am working on the latter. Hope to have some RFC patches ready soon,
> but it looks feasible at this point.
> 

That sounds great. I'm looking forward to seeing and trying this out. If
there's anything you can share at this time please let me know.

Thanks,
Tony
 
1: https://lore.kernel.org/bpf/43fd3775-e796-6802-17f0-5c9fdbf368f5@oracle.com/ 

> Assuming such an option was available to small-footprint systems, should
> we consider adding global variables to core vmlinux BTF along with
> per-cpu variables? Then vmlinux BTF extras could be used for some of the
> additional optional representations like function site-specific data
> (inlines etc)? Or are there other factors other than on-disk footprint
> that we need to consider? Thanks!
> 
> Alan
> 
> > pw-bot: cr
> > 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-05-12 11:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07  1:20 [PATCH 0/2] Add option for generating BTF types of global variables Stephen Brennan
2025-02-07  1:20 ` [PATCH 1/2] kallsyms: output rodata to ".kallsyms_rodata" Stephen Brennan
2025-02-15 14:21   ` Masahiro Yamada
2025-02-24 18:51     ` Andrii Nakryiko
2025-02-25  1:24       ` Stephen Brennan
2025-02-25 16:59         ` Andrii Nakryiko
2025-02-07  1:20 ` [PATCH 2/2] btf: Add the option to include global variable types Stephen Brennan
2025-02-07 23:50   ` Alexei Starovoitov
2025-02-11 23:58     ` Stephen Brennan
2025-02-14  1:18       ` Alexei Starovoitov
2025-02-18 23:09         ` Stephen Brennan
2025-02-25 21:47           ` Andrii Nakryiko
2025-02-25 10:01     ` Alan Maguire
2025-02-25 21:52       ` Andrii Nakryiko
2025-02-26 14:20         ` Alan Maguire
2025-02-26 16:57           ` Andrii Nakryiko
2025-05-12 11:15       ` Tony Ambardar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).