public inbox for dwarves@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] BTF archive with unmodified pahole+toolchain
@ 2025-08-07 18:25 Arnaldo Carvalho de Melo
  2025-08-07 18:25 ` [PATCH 1/4] libbpf: Simplify error handling removing needless repeated err checks Arnaldo Carvalho de Melo
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-08-07 18:25 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo, Andrii Nakryiko, Alexei Starovoitov,
	Yonghong Song, Jose E. Marchesi, Nick Alcock, Namhyung Kim, bpf

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Hi,

	I've finally managed to act on some idea I shared with a few
folks while in Montreal, namely using unmodified pahole to generate BTF
for each .o right after it is produced, i.e. with this patch:

  acme@number:~/git/linux$ git diff
  diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
  index 1d581ba5df66f2b5..ad9e788910636715 100644
  --- a/scripts/Makefile.lib
  +++ b/scripts/Makefile.lib
  @@ -240,7 +240,7 @@ cmd_ld_single = $(if $(objtool-enabled)$(is-single-obj-m), ; $(LD) $(ld_flags) -
   endif
   
   quiet_cmd_cc_o_c = CC $(quiet_modtag)  $@
  -      cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< \
  +      cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< && ${PAHOLE} --btf_encode ${PAHOLE_FLAGS} $@ \
                  $(cmd_ld_single) \
                  $(cmd_objtool)
   
  acme@number:~/git/linux$

A kernel built with this ends up with a vmlinux with a .BTF section that
has all the .o .BTF sections concatenated.

This (the series of .BTF concatenated by the unmodified linker) somehow
survives the pre-existing pahole call to generate BTF from DWARF and we
end up with this "BTF archive".

With the minimal set of changes in this series:

 tools/lib/bpf/btf.c | 91 ++++++++++++++++++++++++++++++++++++++-------
 tools/lib/bpf/btf.h |  3 ++
 2 files changed, 81 insertions(+), 13 deletions(-)

With the first patch being just a trivial error handling simplification,
we end up being able to get the same vmlinux.h result from bpftool built
with this libbpf:

  acme@number:~/git/bpf-next$ tools/bpf/bpftool/bpftool btf dump file ~/vmlinux-v6.16.0+.btf_archive format c >
  +from_archive_combined+dedup_in_libbpf
  acme@number:~/git/bpf-next$ tools/bpf/bpftool/bpftool btf dump file ../build/v6.16.0+/vmlinux format c >
  +from_unmodified_pahole_DWARF2BTF+dedup_in_libbpf
  acme@number:~/git/bpf-next$ diff -u from_archive_combined+dedup_in_libbpf from_unmodified_pahole_DWARF2BTF+dedup_in_libbpf | head
  acme@number:~/git/bpf-next$ wc -l from_archive_combined+dedup_in_libbpf from_unmodified_pahole_DWARF2BTF+dedup_in_libbpf
   161588 from_archive_combined+dedup_in_libbpf
   161588 from_unmodified_pahole_DWARF2BTF+dedup_in_libbpf
   323176 total
  acme@number:~/git/bpf-next$

If we use completely unmodified libbpf, bpftool, etc, the "BTF archive"
in the resulting vmlinux .BTF ELF section is still consumable, but just
the first "CU" (the first .o .BTF ELF section) is visible, the one for
init/main.o:

acme@number:~/git/linux$ bpftool version
bpftool v7.5.0
using libbpf v1.5
features: llvm, skeletons
acme@number:~/git/linux$

acme@number:~/git/bpf-next$ bpftool btf dump file ~/vmlinux-v6.16.0+.btf_archive format c | wc -l
11361
acme@number:~/git/linux$ bpftool btf dump file ../build/v6.16.0+/init/main.o format c | wc -l
11361
acme@number:~/git/linux$

Furthermore:

acme@number:~/git/linux$ bpftool btf dump file ../build/v6.16.0+/init/main.o format c > a
acme@number:~/git/linux$ bpftool btf dump file ~/vmlinux-v6.16.0+.btf_archive format c > b
acme@number:~/git/linux$

Each patch has extra explanations of the process.

This is complementary to today's series from Alan Maguire, as we can use
the one liner for the kernel build process to test his series without
requiring installing a toolchain that generates BTF for each .o file
that will result in vmlinux.

Next steps on my side are to:

1. change pahole for when it receives --format_path=btf check if
btf__is_archive(btf) is true, then just replace the current vmlinux .BTF
contents with the raw data in this just loaded BTF, short circuiting
the whole process.

2. the kernel build process should be changed to allow one to ask for
just BTF, not DWARF, and if so, using the above method, strip the DWARF
info after using it to generate BTF.

Then when compilers are producing BTF, we switch to that, falling back
to the above method when a compiler is known to generate buggy BTF.

And also to use in CIs, to compare the output generated by the various
methods in the various components.

3. In 2 we can even use the same scheme we use for parallelizing DWARF
loading when loading all the BTF archive members concatenated in vmlinux
to dedup them.

BTW, this is the size of a vmlinux ELF .BTF section with an BTF archive:

acme@number:~/git/linux$ readelf -SW ../build/v6.16.0+/vmlinux | grep BTF
  [15] .BTF        PROGBITS   ffffffff82ff2000  21f2000 16db5976 00   A  0   0  1
  [16] .BTF_ids    PROGBITS   ffffffff99da8000 18fa8000   001238 00   A  0   0  1
acme@number:~/git/linux$

~365 MiB

While the DWARF for that file is at:

  [44] .debug_aranges    PROGBITS  0000000000000000 1aa00000   03bfb0 00      0   0 16
  [45] .debug_info       PROGBITS  0000000000000000 1aa3bfb0 1154b512 00      0   0  1
  [46] .debug_abbrev     PROGBITS  0000000000000000 2bf874c2   81492f 00      0   0  1
  [47] .debug_line       PROGBITS  0000000000000000 2c79bdf1  1ec4abd 00      0   0  1
  [48] .debug_frame      PROGBITS  0000000000000000 2e6608b0   3fd470 00      0   0  8
  [49] .debug_str        PROGBITS  0000000000000000 2ea5dd20   59bbe8 01  MS  0   0  1
  [50] .debug_line_str   PROGBITS  0000000000000000 2eff9908   02de43 01  MS  0   0  1
  [51] .debug_loclists   PROGBITS  0000000000000000 2f02774b  2683cbb 00      0   0  1
  [52] .debug_rnglists   PROGBITS  0000000000000000 316ab406   4f875b 00      0   0  1

>>> 0x1154b512 + 0x81492f + 0x1ec4abd + 0x3fd470 + 0x59bbe8
341563734

~325 MiB

But then BTF, when dedup'ed gets down to:

acme@number:~/git/linux$ readelf -SW ../build/v6.16.0+.no-btf_archive/vmlinux | grep BTF
  [15] .BTF              PROGBITS        ffffffff82fef000 21ef000 64fb32 00   A  0   0  1
  [16] .BTF_ids          PROGBITS        ffffffff8363f000 283f000 001238 00   A  0   0  1
acme@number:~/git/linux$ 

~6.3 MiB

And also BTF has some info generated from other sources besides DWARF,
like kfuncs, per cpu, etc.

Also an observation: for distros the optimal way to produce BTF _and_
DWARF seems to be is the one we have now, don't bother generating .BTF
for all .o, just generate DWARF and at the end generate BTF from it 8-)

For developers not needing DWARF and not caring about reproducible
builds then there are other clever tricks to use like go on adding each
generated BTF using the technique in this patchset, i.e. using
btf__add_btf() and trowing away the just generated BTF to then at the end
do the btf__archive_dedup() (also introduced in this patchset) to have
the end result dropped to disk. But I'm getting carried away, sry.

There are many other details that need to be double checked but I think
the current status is good enough for experimentation.

Cheers,

- Arnaldo

Arnaldo Carvalho de Melo (4):
  libbpf: Simplify error handling removing needless repeated err checks
  libbpf: Check if there is extra data at the end of a BTF
  libbpf: Add support for detecting and dedup'ing a BTF archive
  libbpf: Check if an ELF .BTF section is an archive and combine/dedup

 tools/lib/bpf/btf.c | 91 ++++++++++++++++++++++++++++++++++++++-------
 tools/lib/bpf/btf.h |  3 ++
 2 files changed, 81 insertions(+), 13 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] libbpf: Simplify error handling removing needless repeated err checks
  2025-08-07 18:25 [RFC 0/4] BTF archive with unmodified pahole+toolchain Arnaldo Carvalho de Melo
@ 2025-08-07 18:25 ` Arnaldo Carvalho de Melo
  2025-08-07 18:25 ` [PATCH 2/4] libbpf: Check if there is extra data at the end of a BTF Arnaldo Carvalho de Melo
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-08-07 18:25 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo, Alexei Starovoitov, Andrii Nakryiko,
	Jose E. Marchesi, Namhyung Kim, Nick Alcock, Yonghong Song

From: Arnaldo Carvalho de Melo <acme@redhat.com>

The 'done' label can be inside the last test for 'err', as all jumps
there are immediately preceded by setting 'err' to a non-zero value, so
no need to check it.

Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Alcock <nick.alcock@oracle.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/bpf/btf.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 37682908cb0f3bd4..9bacd4dddff366bf 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1090,11 +1090,9 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf, b
 	err = btf_parse_str_sec(btf);
 	err = err ?: btf_parse_type_sec(btf);
 	err = err ?: btf_sanity_check(btf);
-	if (err)
-		goto done;
 
-done:
 	if (err) {
+done:
 		btf__free(btf);
 		return ERR_PTR(err);
 	}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/4] libbpf: Check if there is extra data at the end of a BTF
  2025-08-07 18:25 [RFC 0/4] BTF archive with unmodified pahole+toolchain Arnaldo Carvalho de Melo
  2025-08-07 18:25 ` [PATCH 1/4] libbpf: Simplify error handling removing needless repeated err checks Arnaldo Carvalho de Melo
@ 2025-08-07 18:25 ` Arnaldo Carvalho de Melo
  2025-08-07 18:25 ` [PATCH 3/4] libbpf: Add support for detecting and dedup'ing a BTF archive Arnaldo Carvalho de Melo
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-08-07 18:25 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo, Alexei Starovoitov, Andrii Nakryiko,
	Jose E. Marchesi, Namhyung Kim, Nick Alcock, Yonghong Song

From: Arnaldo Carvalho de Melo <acme@redhat.com>

We get the size for btf->raw_size from the ELF section where it is or
from the size of a detached BTF, check if there is extra data at the
end and avoid wasting space, when not using mmap.

This can happen if we generate BTF for all .o files and it then gets
combined by the linker, which is the default action when finding
unhandled sections with the same name in multiple .o files.

For instance:

  root@x1:~/bla# bpftool -d btf dump file ~acme/btf2btf/vmlinux | wc -l
  libbpf: BTF raw_size chopped from 380507209 to 238545
  12084
  root@x1:~/bla#

The above is one such file, where that vmlinux .BTF section has lots of
.BTF files combined by the linker.

A deduplicated one, generated by pahole + libbpf when converting from
DWARF has way more stuff in it and the raw_size matches the ELF size:

  root@x1:~/bla# bpftool -d btf dump file /sys/kernel/btf/vmlinux |wc -l
  355927
  root@x1:~/bla#

Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Alcock <nick.alcock@oracle.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/bpf/btf.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 9bacd4dddff366bf..ee45d461d53bea9a 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -124,6 +124,9 @@ struct btf {
 	/* whether raw_data is a (read-only) mmap */
 	bool raw_data_is_mmap;
 
+	/* Wheter there was more data after the end of strings */
+	bool extra_raw_data;
+
 	/* BTF object FD, if loaded into kernel */
 	int fd;
 
@@ -225,10 +228,9 @@ static void btf_bswap_hdr(struct btf_header *h)
 	h->str_len = bswap_32(h->str_len);
 }
 
-static int btf_parse_hdr(struct btf *btf)
+static int btf_parse_hdr(struct btf *btf, struct btf_header *hdr)
 {
-	struct btf_header *hdr = btf->hdr;
-	__u32 meta_left;
+	__u32 meta_left, raw_size;
 
 	if (btf->raw_size < sizeof(struct btf_header)) {
 		pr_debug("BTF header not found\n");
@@ -271,6 +273,18 @@ static int btf_parse_hdr(struct btf *btf)
 		return -EINVAL;
 	}
 
+	/* If there is more data after the strings, it will not be used,
+	 * so we might as well trim here and don't waste memory.
+	 * This paves the way for a BTF archive, created by default
+	 * by the linker when finding .BTF in multiple .o files.
+	 */
+	raw_size = sizeof(*hdr) + hdr->str_off + hdr->str_len;
+	if (raw_size != btf->raw_size) {
+		pr_debug("BTF raw_size chopped from %u to %u\n", btf->raw_size, raw_size);
+		btf->raw_size = raw_size;
+		btf->extra_raw_data = true;
+	}
+
 	return 0;
 }
 
@@ -1047,6 +1061,7 @@ struct btf *btf__new_empty_split(struct btf *base_btf)
 
 static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf, bool is_mmap)
 {
+	struct btf_header hdr;
 	struct btf *btf;
 	int err;
 
@@ -1065,24 +1080,31 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf, b
 		btf->start_str_off = base_btf->hdr->str_len;
 	}
 
+	/* We still don't know if this is an archive, i.e. if 'size' is
+	 * the raw_size of a BTF or the sum of all BTFs in an archive,
+	 * it'll be adjusted when we parse the header.
+	 */
+	btf->raw_size = size;
+	memcpy(&hdr, data, sizeof(hdr));
+
+	err = btf_parse_hdr(btf, &hdr);
+	if (err)
+		goto done;
+
 	if (is_mmap) {
 		btf->raw_data = (void *)data;
 		btf->raw_data_is_mmap = true;
 	} else {
-		btf->raw_data = malloc(size);
+		btf->raw_data = malloc(btf->raw_size);
 		if (!btf->raw_data) {
 			err = -ENOMEM;
 			goto done;
 		}
-		memcpy(btf->raw_data, data, size);
+		memcpy(btf->raw_data, data, btf->raw_size);
 	}
 
-	btf->raw_size = size;
-
 	btf->hdr = btf->raw_data;
-	err = btf_parse_hdr(btf);
-	if (err)
-		goto done;
+	memcpy(btf->hdr, &hdr, sizeof(hdr));
 
 	btf->strs_data = btf->raw_data + btf->hdr->hdr_len + btf->hdr->str_off;
 	btf->types_data = btf->raw_data + btf->hdr->hdr_len + btf->hdr->type_off;
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/4] libbpf: Add support for detecting and dedup'ing a BTF archive
  2025-08-07 18:25 [RFC 0/4] BTF archive with unmodified pahole+toolchain Arnaldo Carvalho de Melo
  2025-08-07 18:25 ` [PATCH 1/4] libbpf: Simplify error handling removing needless repeated err checks Arnaldo Carvalho de Melo
  2025-08-07 18:25 ` [PATCH 2/4] libbpf: Check if there is extra data at the end of a BTF Arnaldo Carvalho de Melo
@ 2025-08-07 18:25 ` Arnaldo Carvalho de Melo
  2025-08-07 18:25 ` [PATCH 4/4] libbpf: Check if an ELF .BTF section is an archive and combine/dedup Arnaldo Carvalho de Melo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-08-07 18:25 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo, Alexei Starovoitov, Andrii Nakryiko,
	Jose E. Marchesi, Namhyung Kim, Nick Alcock, Yonghong Song

From: Arnaldo Carvalho de Melo <acme@redhat.com>

As defined by being a series of BTF raw_data concatenated that then
are combined using btf__add_btf() to then be passed to btf__dedup().

This is a simpler interface, a more involved one, maybe for pahole
would involve doing the same approach as for encoding BTF from DWARF:
create a series of threads that would load the BTF archive in parallel
to then dedup it.

Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Alcock <nick.alcock@oracle.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/bpf/btf.c | 38 ++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/btf.h |  3 +++
 2 files changed, 41 insertions(+)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index ee45d461d53bea9a..73a6d94eeda125e1 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1127,6 +1127,44 @@ struct btf *btf__new(const void *data, __u32 size)
 	return libbpf_ptr(btf_new(data, size, NULL, false));
 }
 
+bool btf__is_archive(const struct btf *btf)
+{
+	return btf->extra_raw_data;
+}
+
+int btf__dedup_archive(struct btf *btf, const void *data, __u32 size, const struct btf_dedup_opts *opts)
+{
+	__u32 raw_size = btf->raw_size;
+	struct btf *brother;
+	int err = 0;
+
+	while (size > raw_size) {
+		data += raw_size;
+		size -= raw_size;
+		brother = btf_new(data, size, btf->base_btf, btf->raw_data_is_mmap);
+
+		if (IS_ERR(brother)) {
+			err = PTR_ERR(brother);
+			pr_debug("%s: __btf_new() failed! %d\n", __func__, err);
+			goto out;
+		}
+
+		if (btf__add_btf(btf, brother) < 0) {
+			err = -errno;
+			pr_debug("%s: btf__add_btf() failed: %d(%s)!\n", __func__, errno, strerror(errno));
+			btf__free(brother);
+			goto out;
+		}
+
+		raw_size = brother->raw_size;
+		btf__free(brother);
+	}
+
+	err = btf__dedup(btf, opts);
+out:
+	return libbpf_err(err);
+}
+
 struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf)
 {
 	return libbpf_ptr(btf_new(data, size, base_btf, false));
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index ccfd905f03dfe7b6..71a6b8e037f5c98b 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -258,6 +258,9 @@ struct btf_dedup_opts {
 #define btf_dedup_opts__last_field force_collisions
 
 LIBBPF_API int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts);
+LIBBPF_API int btf__dedup_archive(struct btf *btf, const void *data, __u32 size,
+				  const struct btf_dedup_opts *opts);
+LIBBPF_API bool btf__is_archive(const struct btf *btf);
 
 /**
  * @brief **btf__relocate()** will check the split BTF *btf* for references
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/4] libbpf: Check if an ELF .BTF section is an archive and combine/dedup
  2025-08-07 18:25 [RFC 0/4] BTF archive with unmodified pahole+toolchain Arnaldo Carvalho de Melo
                   ` (2 preceding siblings ...)
  2025-08-07 18:25 ` [PATCH 3/4] libbpf: Add support for detecting and dedup'ing a BTF archive Arnaldo Carvalho de Melo
@ 2025-08-07 18:25 ` Arnaldo Carvalho de Melo
  2025-08-07 18:46 ` [RFC 0/4] BTF archive with unmodified pahole+toolchain Arnaldo Carvalho de Melo
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-08-07 18:25 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo, Alexei Starovoitov, Andrii Nakryiko,
	Jose E. Marchesi, Namhyung Kim, Nick Alcock, Yonghong Song

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Since we don't have some sort of btf_opts to influence that and having
an ELF archive is more likely at this point in an ELF section,
vmlinux's, lets do it in btf_parse_elf() so that we can demonstrate the
concept.

So, if we use an unmodified bpftool with a vmlinux generated with an
unmodified pahole and toolchain (compiler and linker):

  $ cat cmd_pahole_btf_o.patch
  diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
  index 4d543054f72356a4..02a595b82b299151 100644
  --- a/scripts/Makefile.lib
  +++ b/scripts/Makefile.lib
  @@ -313,7 +313,7 @@ cmd_ld_single = $(if $(objtool-enabled)$(is-single-obj-m), ; $(LD) $(ld_flags) -
   endif

   quiet_cmd_cc_o_c = CC $(quiet_modtag)  $@
  -      cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< \
  +      cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< && ${PAHOLE} --btf_encode ${PAHOLE_FLAGS} $@ \
                $(cmd_ld_single) \
                $(cmd_objtool)

  $

We get this:

  $ bpftool btf dump file ~/vmlinux.btf_archive > dedup_combined_btf_archive
  $ wc -l dedup_combined_btf_archive
  12084 dedup_combined_btf_archive
  $ head dedup_combined_btf_archive
  [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
  [2] CONST '(anon)' type_id=1
  [3] VOLATILE '(anon)' type_id=2
  [4] ARRAY '(anon)' type_id=1 index_type_id=21 nr_elems=2
  [5] PTR '(anon)' type_id=8
  [6] CONST '(anon)' type_id=5
  [7] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
  [8] CONST '(anon)' type_id=7
  [9] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)
  [10] CONST '(anon)' type_id=9
  $

While with one that detects it is a BTF archive (multiple .o .BTF ELF
sections concatenated into the .BTF ELF section for vmlinux):

  $ tools/bpf/bpftool/bpftool btf dump file ~/vmlinux.btf_archive > dedup_combined_btf_archive
  $ wc -l dedup_combined_btf_archive
  358141 dedup_combined_btf_archive
  $ head dedup_combined_btf_archive
  [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
  [2] CONST '(anon)' type_id=1
  [3] VOLATILE '(anon)' type_id=2
  [4] ARRAY '(anon)' type_id=1 index_type_id=21 nr_elems=2
  [5] PTR '(anon)' type_id=8
  [6] CONST '(anon)' type_id=5
  [7] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
  [8] CONST '(anon)' type_id=7
  [9] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)
  [10] CONST '(anon)' type_id=9
  $

Which is in the same ballpark number of lines for BTF in a distro
kernel:

  $ tools/bpf/bpftool/bpftool btf dump file /sys/kernel/btf/vmlinux | wc -l
  355944
  $

Doing a fresh build with the above cmd_cc_o_c that generates BTF from
DWARF for every .o file, still not stripping the DWARF after that:

  $ bpftool btf dump file ../build/v6.16.0+/vmlinux | wc -l
  11927
  $ ../bpf-next/tools/bpf/bpftool/bpftool btf dump file ../build/v6.16.0+/vmlinux | wc -l
  360016
  $ ../bpf-next/tools/bpf/bpftool/bpftool btf dump file ../build/v6.16.0+/init/main.o | wc -l
  11927
  $ #bpftool btf dump file ../build/v6.16.0+/vmlinux | wc -l
  $ bpftool btf dump file ../build/v6.16.0+/vmlinux > just_first_entry_in_the_archive_using_old_non_btf_archive_aware_bpftool
  $ bpftool btf dump file ../build/v6.16.0+/init/main.o > first_CU_BTF_using_old_non_btf_archive_aware_bpftool
  $ diff -u just_first_entry_in_the_archive_using_old_non_btf_archive_aware_bpftool first_CU_BTF_using_old_non_btf_archive_aware_bpftool

Ok, now lets save that vmlinux with .BTF in all its .o files:

  $ cp ../build/v6.16.0+/vmlinux ~/vmlinux-v6.16.0+.btf_archive

And remove that per .o BTF encoding so that the end result isn't a BTF
archive:

  $ patch -p1 -R < cmd_cc_encode_btf_per_o.patch
  patching file scripts/Makefile.lib
  $

Lets rebuild it with that and make sure the end result doesn't have any
.BTF per .o:

  $ readelf -SW ../build/v6.16.0+/init/main.o  | grep BTF
  $ bpftool btf dump file ../build/v6.16.0+/init/main.o
  libbpf: failed to find '.BTF' ELF section in ../build/v6.16.0+/init/main.o
  Error: failed to load BTF from ../build/v6.16.0+/init/main.o: No data available
  $

So with an old bpftool we should get the same number of lines and the
same result when dumping from the .BTF dumped from the new bpftool for
both the BTF archive and the one generated from DWARF only at the last
minute, from DWARF:

  $ bpftool btf dump file ../build/v6.16.0+/vmlinux | wc -l
  357654
  $
  $ ../bpf-next/tools/bpf/bpftool/bpftool btf dump file ../build/v6.16.0+/vmlinux | wc -l
  357654
  $

So there is a difference, which one?

  $ bpftool btf dump file ../build/v6.16.0+/vmlinux > DWARF-to-BTF-after+vmlinux
  $ ../bpf-next/tools/bpf/bpftool/bpftool btf dump file ~/vmlinux-v6.16.0+.btf_archive > DWARF-to-BTF-from-btf_archive

It starts with anon types like:

  --- DWARF-to-BTF-after+vmlinux  2025-08-06 13:31:02.814268740 -0300
  +++ DWARF-to-BTF-from-btf_archive       2025-08-06 13:31:27.818597644 -0300
  @@ -499,7 +499,7 @@
          'target' type_id=34 bits_offset=32
          'key' type_id=44 bits_offset=64
   [155] PTR '(anon)' type_id=154
  -[156] PTR '(anon)' type_id=16561
  +[156] PTR '(anon)' type_id=41426
   [157] STRUCT 'static_key' size=16 vlen=2
          'enabled' type_id=91 bits_offset=0
          '(anon)' type_id=153 bits_offset=64
  ...
  +[16561] FUNC 'alloc_rmp_segment_table' type_id=858 linkage=static
  -[16561] STRUCT 'static_key_mod' size=24 vlen=3
  ...
  +[41426] STRUCT 'static_key_mod' size=24 vlen=3
          'next' type_id=156 bits_offset=0
          'entries' type_id=155 bits_offset=64
          'mod' type_id=166 bits_offset=128
  ...
  -[41426] STRUCT 'ohci_hcd' size=1160 vlen=34

So there is some drift, is it coming from btf__add_btf()? This one isn't
used in pahole... Maybe this is something Alan addressed in his series
he pointed to me? Time to relook...

But, as explained in the cover letter of this series, the vmlinux.h
produced by 'bpftool bpf dump file vmlinux format c" with/without this
series matches, its just something that btf__add_btf() does that is
slightly different from what is done by pahole when converting from
DWARF to BTF, not using btf__add_btf() but each of the tags converted
from DWARF -> internal pahole representation-> libbpf -> BTF.

Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Alcock <nick.alcock@oracle.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/bpf/btf.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 73a6d94eeda125e1..df6810ad83ecff85 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1302,6 +1302,13 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
 		err = PTR_ERR(btf);
 		goto done;
 	}
+
+	if (btf__is_archive(btf)) {
+		err = btf__dedup_archive(btf, secs.btf_data->d_buf, secs.btf_data->d_size, NULL);
+		if (err)
+			goto done;
+	}
+
 	if (dist_base_btf && base_btf) {
 		err = btf__relocate(btf, base_btf);
 		if (err)
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-07 18:25 [RFC 0/4] BTF archive with unmodified pahole+toolchain Arnaldo Carvalho de Melo
                   ` (3 preceding siblings ...)
  2025-08-07 18:25 ` [PATCH 4/4] libbpf: Check if an ELF .BTF section is an archive and combine/dedup Arnaldo Carvalho de Melo
@ 2025-08-07 18:46 ` Arnaldo Carvalho de Melo
  2025-08-07 20:23 ` Arnaldo Carvalho de Melo
  2025-08-08  2:09 ` Alexei Starovoitov
  6 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-08-07 18:46 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo, Andrii Nakryiko, Alexei Starovoitov,
	Yonghong Song, Jose E. Marchesi, Nick Alcock, Namhyung Kim, bpf

On Thu, Aug 07, 2025 at 03:25:34PM -0300, Arnaldo Carvalho de Melo wrote:
> 	I've finally managed to act on some idea I shared with a few
> folks while in Montreal, namely using unmodified pahole to generate BTF
> for each .o right after it is produced, i.e. with this patch:

The patches ended up not flowing to bpf@vger.kernel.org, sry, but the
series is available at:

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=btf_archive

Cheers,

- Arnaldo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-07 18:25 [RFC 0/4] BTF archive with unmodified pahole+toolchain Arnaldo Carvalho de Melo
                   ` (4 preceding siblings ...)
  2025-08-07 18:46 ` [RFC 0/4] BTF archive with unmodified pahole+toolchain Arnaldo Carvalho de Melo
@ 2025-08-07 20:23 ` Arnaldo Carvalho de Melo
  2025-08-08  2:09 ` Alexei Starovoitov
  6 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-08-07 20:23 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Andrii Nakryiko, Alexei Starovoitov, Yonghong Song,
	Jose E. Marchesi, Nick Alcock, Jiri Olsa, Namhyung Kim,
	bpf@vger.kernel.org Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo

On Thu, Aug 07, 2025 at 03:25:34PM -0300, Arnaldo Carvalho de Melo wrote:
> If we use completely unmodified libbpf, bpftool, etc, the "BTF archive"
> in the resulting vmlinux .BTF ELF section is still consumable, but just
> the first "CU" (the first .o .BTF ELF section) is visible, the one for
> init/main.o:
 
> acme@number:~/git/linux$ bpftool version
> bpftool v7.5.0
> using libbpf v1.5
> features: llvm, skeletons
> acme@number:~/git/linux$
 
> acme@number:~/git/bpf-next$ bpftool btf dump file ~/vmlinux-v6.16.0+.btf_archive format c | wc -l
> 11361
> acme@number:~/git/linux$ bpftool btf dump file ../build/v6.16.0+/init/main.o format c | wc -l
> 11361
> acme@number:~/git/linux$
 
> Furthermore:

> acme@number:~/git/linux$ bpftool btf dump file ../build/v6.16.0+/init/main.o format c > a
> acme@number:~/git/linux$ bpftool btf dump file ~/vmlinux-v6.16.0+.btf_archive format c > b

Oops, the expected:

acme@number:~/git/linux$ diff a b
acme@number:~/git/linux$

> acme@number:~/git/linux$

- Arnaldo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-07 18:25 [RFC 0/4] BTF archive with unmodified pahole+toolchain Arnaldo Carvalho de Melo
                   ` (5 preceding siblings ...)
  2025-08-07 20:23 ` Arnaldo Carvalho de Melo
@ 2025-08-08  2:09 ` Alexei Starovoitov
       [not found]   ` <CA+JHD92DODDESCfwiiCs_ZQ5bGesK5NC+xe5EvONF5g+-Bg+9Q@mail.gmail.com>
  2025-08-08 18:28   ` Eduard Zingerman
  6 siblings, 2 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2025-08-08  2:09 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alan Maguire, Jiri Olsa, Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo, Andrii Nakryiko, Yonghong Song,
	Jose E. Marchesi, Nick Alcock, Namhyung Kim, bpf

On Thu, Aug 7, 2025 at 11:25 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
>
> This is complementary to today's series from Alan Maguire, as we can use
> the one liner for the kernel build process to test his series without
> requiring installing a toolchain that generates BTF for each .o file
> that will result in vmlinux.
>
> Next steps on my side are to:
>
> 1. change pahole for when it receives --format_path=btf check if
> btf__is_archive(btf) is true, then just replace the current vmlinux .BTF
> contents with the raw data in this just loaded BTF, short circuiting
> the whole process.
>
> 2. the kernel build process should be changed to allow one to ask for
> just BTF, not DWARF, and if so, using the above method, strip the DWARF
> info after using it to generate BTF.
>
> Then when compilers are producing BTF, we switch to that, falling back
> to the above method when a compiler is known to generate buggy BTF.
>
> And also to use in CIs, to compare the output generated by the various
> methods in the various components.
>
> 3. In 2 we can even use the same scheme we use for parallelizing DWARF
> loading when loading all the BTF archive members concatenated in vmlinux
> to dedup them.

Before you jump into 1,2,3 let's discuss the end goal.
I think the assumption here is that this btf-for-each-.o approach
is supposed to speed up the build, right ?
pahole step on vmlinux is noticeable, but it's still a fraction
of three vmlinux linking steps.
How much are we realistically thinking to shave off of that pahole dedup time?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
       [not found]   ` <CA+JHD92DODDESCfwiiCs_ZQ5bGesK5NC+xe5EvONF5g+-Bg+9Q@mail.gmail.com>
@ 2025-08-08  2:52     ` Alexei Starovoitov
  2025-08-08  3:25       ` Arnaldo Carvalho de Melo
  2025-08-08 15:15       ` Nick Alcock
  0 siblings, 2 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2025-08-08  2:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Arnaldo Carvalho de Melo, Alan Maguire, Jiri Olsa, Clark Williams,
	Kate Carcia, dwarves, Arnaldo Carvalho de Melo, Andrii Nakryiko,
	Yonghong Song, Jose E. Marchesi, Nick Alcock, Namhyung Kim, bpf

On Thu, Aug 7, 2025 at 7:36 PM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> On Thu, Aug 7, 2025, 11:09 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>
>> On Thu, Aug 7, 2025 at 11:25 AM Arnaldo Carvalho de Melo
>> <acme@kernel.org> wrote:
>> >
>> >
>> > This is complementary to today's series from Alan Maguire, as we can use
>> > the one liner for the kernel build process to test his series without
>> > requiring installing a toolchain that generates BTF for each .o file
>> > that will result in vmlinux.
>> >
>> > Next steps on my side are to:
>> >
>> > 1. change pahole for when it receives --format_path=btf check if
>> > btf__is_archive(btf) is true, then just replace the current vmlinux .BTF
>> > contents with the raw data in this just loaded BTF, short circuiting
>> > the whole process.
>> >
>> > 2. the kernel build process should be changed to allow one to ask for
>> > just BTF, not DWARF, and if so, using the above method, strip the DWARF
>> > info after using it to generate BTF.
>> >
>> > Then when compilers are producing BTF, we switch to that, falling back
>> > to the above method when a compiler is known to generate buggy BTF.
>> >
>> > And also to use in CIs, to compare the output generated by the various
>> > methods in the various components.
>> >
>> > 3. In 2 we can even use the same scheme we use for parallelizing DWARF
>> > loading when loading all the BTF archive members concatenated in vmlinux
>> > to dedup them.
>>
>> Before you jump into 1,2,3 let's discuss the end goal.
>> I think the assumption here is that this btf-for-each-.o approach
>> is supposed to speed up the build, right ?
>>
>> pahole step on vmlinux is noticeable, but it's still a fraction
>> of three vmlinux linking steps.
>
>
> I'll need to try thunderbird on the smartphone to send from the smartphone, having said that:
>
>
> I never looked at why we have those three linking steps, will try to educate myself about that.
>
>> How much are we realistically thinking to shave off of that pahole dedup time?
>
>
> Difficult to say, but given this comment I made:
>
> "Also an observation: for distros the optimal way to produce BTF _and_ DWARF seems to be the one we have now, don't bother generating .BTF for all .o, just generate DWARF and at the end generate BTF from it 8-)"
>
> I fear that most approaches to generate BTF for vmlinux by generating BTF by the compiler or pahole for every .o will only make the total vmlinux generation for the common case (distros) slower, not faster.

Yes. My gut feel is the same.

> Be it the compiler or pahole from DWARF, generating BTF _in addition to DWARF_ for each .o will double the space for the things being represented, as the major benefit from BTF is dedup, not per .o (it's more compact, but not by orders of magnitude as with dedup for the whole vmlinux).
>
> Option 3 may end up to be the best, i.e. generate BTF directly (compiler) or from DWARF (pahole) and immediately add it using btf__add_btf() via some BTF thread, _stripping_ it right away from the .o, to avoid doubling the disk space needed (DWARF+BTF per .o), and then, in the end, just dedup, having DWARF (if asked, which 99% of distros will do) and BTF, again, most distros will want (except things like raspberry pi distros, sigh).
>
> The same technique, BTW, could be used to reduce the build disk space needed for DWARF, if we can live with completely stripped .o files (no BTF, no DWARF) having it only (dedup'ed: BTF, or not: DWARF) after we harvest it for use in the final vmlinux.

I see where you're going, but disk space is cheap and modern
build systems have fast drives. Spinning rust is a thing of the past.
The total size of intermediate objects doesn't matter much.
Stripping dwarf won't reduce .o by sizable amount, so I/O throughput
won't budge.

> But the changes in my series are so small that I think they merit consideration even so.

Agree with that as well, but I'm just not easy about "BTF archives" :)
The name is too ambitious. Concatenated BTF sections is fine,
but let's not make a big deal out of it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-08  2:52     ` Alexei Starovoitov
@ 2025-08-08  3:25       ` Arnaldo Carvalho de Melo
  2025-08-08  3:33         ` Sam James
  2025-08-08 14:45         ` Nick Alcock
  2025-08-08 15:15       ` Nick Alcock
  1 sibling, 2 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-08-08  3:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alan Maguire, Andrii Nakryiko, bpf, Jiri Olsa, Namhyung Kim,
	Clark Williams, Yonghong Song, dwarves, Nick Alcock, Kate Carcia,
	Jose E. Marchesi

On August 7, 2025 11:52:51 PM GMT-03:00, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>On Thu, Aug 7, 2025 at 7:36 PM Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote:

>> On Thu, Aug 7, 2025, 11:09 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

>>> On Thu, Aug 7, 2025 at 11:25 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

>>> > This is complementary to today's series from Alan Maguire, as we can use
>>> > the one liner for the kernel build process to test his series without
>>> > requiring installing a toolchain that generates BTF for each .o file
>>> > that will result in vmlinux.

>>> > Next steps on my side are to:

>>> > 1. change pahole for when it receives --format_path=btf check if
>>> > btf__is_archive(btf) is true, then just replace the current vmlinux .BTF
>>> > contents with the raw data in this just loaded BTF, short circuiting
>>> > the whole process.

>>> > 2. the kernel build process should be changed to allow one to ask for
>>> > just BTF, not DWARF, and if so, using the above method, strip the DWARF
>>> > info after using it to generate BTF.

>>> > Then when compilers are producing BTF, we switch to that, falling back
>>> > to the above method when a compiler is known to generate buggy BTF.

>>> > And also to use in CIs, to compare the output generated by the various
>>> > methods in the various components.

>>> > 3. In 2 we can even use the same scheme we use for parallelizing DWARF
>>> > loading when loading all the BTF archive members concatenated in vmlinux
>>> > to dedup them.

>>> Before you jump into 1,2,3 let's discuss the end goal.
>>> I think the assumption here is that this btf-for-each-.o approach
>>> is supposed to speed up the build, right ?
>>>
>>> pahole step on vmlinux is noticeable, but it's still a fraction
>>> of three vmlinux linking steps.

>> I'll need to try thunderbird on the smartphone to send from the smartphone, having said that:

Done, easier than expected, let's see if this gets thru vger...

>> I never looked at why we have those three linking steps, will try to educate myself about that.

>>> How much are we realistically thinking to shave off of that pahole dedup time?

>> Difficult to say, but given this comment I made:

>> "Also an observation: for distros the optimal way to produce BTF _and_ DWARF seems to be the one we have now, don't bother generating .BTF for all .o, just generate DWARF and at the end generate BTF from it 8-)"

>> I fear that most approaches to generate BTF for vmlinux by generating BTF by the compiler or pahole for every .o will only make the total vmlinux generation for the common case (distros) slower, not faster.

>Yes. My gut feel is the same.

:-)

>> Be it the compiler or pahole from DWARF, generating BTF _in addition to DWARF_ for each .o will double the space for the things being represented, as the major benefit from BTF is dedup, not per .o (it's more compact, but not by orders of magnitude as with dedup for the whole vmlinux).

>> Option 3 may end up to be the best, i.e. generate BTF directly (compiler) or from DWARF (pahole) and immediately add it using btf__add_btf() via some BTF thread, _stripping_ it right away from the .o, to avoid doubling the disk space needed (DWARF+BTF per .o), and then, in the end, just dedup, having DWARF (if asked, which 99% of distros will do) and BTF, again, most distros will want (except things like raspberry pi distros, sigh).

>> The same technique, BTW, could be used to reduce the build disk space needed for DWARF, if we can live with completely stripped .o files (no BTF, no DWARF) having it only (dedup'ed: BTF, or not: DWARF) after we harvest it for use in the final vmlinux.

>I see where you're going, but disk space is cheap and modern
>build systems have fast drives. Spinning rust is a thing of the past.
>The total size of intermediate objects doesn't matter much.
>Stripping dwarf won't reduce .o by sizable amount, so I/O throughput
>won't budge.

This is something I think is worth measuring, to clear this doubt with numbers, I'll try to do it, I was already planning to.

>> But the changes in my series are so small that I think they merit consideration even so.

>Agree with that as well, but I'm just not easy about "BTF archives" :)
>The name is too ambitious. Concatenated BTF sections is fine,
>but let's not make a big deal out of it.

Well, other proposals being discussed would add more metadata to traverse these archives, I was just tagging along on the jargon being created :-)

It was just convenient that an unmodified linker was concatenating everything and that from the existing BTF headers I could use a preexisting libbpf API, btf__add_btf() merge everything to then use another preexisting API, btf__dedup() to get to the same end result. 

I don't see, so far, any other use for a "BTF archive", only as a happy intermediate step from a one line change to the kernel to get the linker to have the BTF "Compile Units" put together in the same order as the DWARF ones for the final merge+dedup.

- Arnaldo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-08  3:25       ` Arnaldo Carvalho de Melo
@ 2025-08-08  3:33         ` Sam James
  2025-08-08  3:54           ` Arnaldo Carvalho de Melo
  2025-08-08 14:45         ` Nick Alcock
  1 sibling, 1 reply; 21+ messages in thread
From: Sam James @ 2025-08-08  3:33 UTC (permalink / raw)
  To: arnaldo.melo
  Cc: alan.maguire, alexei.starovoitov, andrii, bpf, dwarves, jolsa,
	jose.marchesi, kcarcia, namhyung, nick.alcock, williams,
	yonghong.song

FWIW, as a source-based distro, we'd love to have BTF-only be quite
cheap, because right now, having DWARF makes it challenging for us to
enable it by default as users build on a range of different hardware and
the increased size of the unstripped vmlinux binary plus build-time
requirements doesn't make it worth it.

(Not every distro is building once and shipping to many and has the
luxury of stripping out components ;))

Thanks for working on this.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-08  3:33         ` Sam James
@ 2025-08-08  3:54           ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-08-08  3:54 UTC (permalink / raw)
  To: Sam James
  Cc: alan.maguire, alexei.starovoitov, andrii, bpf, dwarves, jolsa,
	jose.marchesi, kcarcia, namhyung, nick.alcock, williams,
	yonghong.song, Guilherme Amadio



On August 8, 2025 12:33:14 AM GMT-03:00, Sam James <sam@gentoo.org> wrote:
>FWIW, as a source-based distro, we'd love to have BTF-only be quite
>cheap, because right now, having DWARF makes it challenging for us to
>enable it by default as users build on a range of different hardware and
>the increased size of the unstripped vmlinux binary plus build-time
>requirements doesn't make it worth it.
>
>(Not every distro is building once and shipping to many and has the
>luxury of stripping out components ;))

That is so cool, to have feedback from distros at this stage! 

Myself, I think this is interesting as putting the stepping stones for allowing the selection of features that may affect build time, disk space used at build time and in the resulting deliverables.

I think that a DWARF-less system is something desirable in some cases, so worth supporting.

With sframe, ORC, BTF and hardware alternatives, when available and usable, such as the various things called LBR, and Intel PT subsets, ditto for arm's coresight, etc, DWARF support can in some cases be disabled, and sometimes this may be wanted or useful, so should be an option.

>Thanks for working on this.

Glad you find it useful,

- Arnaldo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-08  3:25       ` Arnaldo Carvalho de Melo
  2025-08-08  3:33         ` Sam James
@ 2025-08-08 14:45         ` Nick Alcock
  1 sibling, 0 replies; 21+ messages in thread
From: Nick Alcock @ 2025-08-08 14:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alexei Starovoitov, Alan Maguire, Andrii Nakryiko, bpf, Jiri Olsa,
	Namhyung Kim, Clark Williams, Yonghong Song, dwarves, Nick Alcock,
	Kate Carcia, Jose E. Marchesi

On 8 Aug 2025, Arnaldo Carvalho de Melo said:

> On August 7, 2025 11:52:51 PM GMT-03:00, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>On Thu, Aug 7, 2025 at 7:36 PM Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote:
>
>>> On Thu, Aug 7, 2025, 11:09 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
>>Agree with that as well, but I'm just not easy about "BTF archives" :)
>>The name is too ambitious. Concatenated BTF sections is fine,
>>but let's not make a big deal out of it.
>
> Well, other proposals being discussed would add more metadata to
> traverse these archives, I was just tagging along on the jargon being
> created :-)

We don't actually need *much* more. I think concatenation is less than
ideal simply because it's hard to tell when to stop looking for more
archive members in a concatenated stream.

In the model Jose proposed (more or less the model split BTF is
basically already using), the first member is special, being the parent
and holding most of the shared types, vmlinux etc. Because it's special,
I think we want to be able to identify it even if you, say, take two
sections full of concatenated members and concatenate *them*. Just
relying on straight concat would have all the tools treating the second
vmlinux in that concatenated stream as if it were a module! If there was
a link field (or just a "stop here" bit), it could say "there are no
more members" reliably, or you could ask tools to hunt through
concatenated streams of BTF and tell you which ones in that stream look
like they're vmlinuxes (and all the non-vmlinuxes after those are
modules).

e.g. if you accidentally concatenated

vmlinux -> a -> b -> c

and

vmlinux -> d -> e -> f

You would get

vmlinux -> a -> b -> c -> vmlinux -> d -> e -> f

and it would be nice if the format could at least *tell* that the second
vmlinux *was* a vmlinux without relying on awful hacks like "oh it
contains basic integer or mm types, it must be vmlinux". We can do that
with a link field, or with one single bit saying "stop here", or with a
bit saying "this is the parent, start here". I don't mind which.

We could also do with a single field (long-existing in CTF, which calls
it "cuname") which lets you tell the source of types in different BTF
members. The first, the vmlinux/shared one, is easily identifiable, it's
first: but all the others need to be told apart somehow. Since each
corresponds to a module (in vmlinux) or a compilation unit containing
conflicted types (in userspace CTF), giving it *some* sort of optional
name field in the header seems necessary. I don't really mind what we
call the field: cuname, btf_name, member_name, file_name, anything.

> It was just convenient that an unmodified linker was concatenating
> everything and that from the existing BTF headers I could use a
> preexisting libbpf API, btf__add_btf() merge everything to then use
> another preexisting API, btf__dedup() to get to the same end result.

Yeah.

> I don't see, so far, any other use for a "BTF archive", only as a
> happy intermediate step from a one line change to the kernel to get
> the linker to have the BTF "Compile Units" put together in the same
> order as the DWARF ones for the final merge+dedup.

We use them in userspace. (I think I can converge enough, without BTF
format changes beyond this one, to completely eliminate .ctf in
userspace and just let us use BTF everywhere: but as described in
https://lore.kernel.org/dwarves/87bjpkmak2.fsf@esperi.org.uk/, the BTF
we're using will usually be archives of BTF stuck into a single ELF
section, whether we use a link field or concatenation or some weird
archive format like I used to, it's going to be multiple BTFs-full in a
great many programs).

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-08  2:52     ` Alexei Starovoitov
  2025-08-08  3:25       ` Arnaldo Carvalho de Melo
@ 2025-08-08 15:15       ` Nick Alcock
  1 sibling, 0 replies; 21+ messages in thread
From: Nick Alcock @ 2025-08-08 15:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo, Alan Maguire,
	Jiri Olsa, Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo, Andrii Nakryiko, Yonghong Song,
	Jose E. Marchesi, Nick Alcock, Namhyung Kim, bpf

On 8 Aug 2025, Alexei Starovoitov stated:

> On Thu, Aug 7, 2025 at 7:36 PM Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
>> But the changes in my series are so small that I think they merit consideration even so.
>
> Agree with that as well, but I'm just not easy about "BTF archives" :)
> The name is too ambitious. Concatenated BTF sections is fine,
> but let's not make a big deal out of it.

Just a note about the name -- it's ultimately derived from a thing I
wrote a decade ago to make it easier to package up CTF in kernels
without people losing half of it. It was rather more complex (its
descendant can still be seen at the tip of
https://ourceware.org/git/binutils-gdb.git users/nalcock/road-to-ctfv4
but I expect to remove support for writing that format and move to
something simpler: read support will be kept).

So the name "archive" is already embedded in libctf type names, source
file names, and its public API, and there is code using the term out
there in the wild. It seems like a reasonable term to me -- I mean,
obviously it does, I coined it, but a bunch of concatenated things with
minimal further structure is called an archive when tar does it.

Fundamentally, just as pahole's deduplicator imposes meaning on the BTF
sections in vmlinux and modules, so too libctf's deduplicator imposes
meaning on a concatenated stream of archives ("the first is shared
stuff, the rest is not"), so we do need a way to talk about this entity
in some fashion, for those occasions when it is in use (internally in
the kernel build process, as the content of ELF sections in userspace).

We have to call it *something*, and if you do end up calling it
something other than an archive the existing uses that do call it an
archive aren't going to instantly go away, so now we have to deal with
*two* terms.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-08  2:09 ` Alexei Starovoitov
       [not found]   ` <CA+JHD92DODDESCfwiiCs_ZQ5bGesK5NC+xe5EvONF5g+-Bg+9Q@mail.gmail.com>
@ 2025-08-08 18:28   ` Eduard Zingerman
  2025-08-08 19:10     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 21+ messages in thread
From: Eduard Zingerman @ 2025-08-08 18:28 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, nick.alcock,
	Alan Maguire
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo, Andrii Nakryiko, Yonghong Song,
	Jose E. Marchesi, Nick Alcock, Namhyung Kim, bpf

On Thu, 2025-08-07 at 19:09 -0700, Alexei Starovoitov wrote:

[...]

> Before you jump into 1,2,3 let's discuss the end goal.
> I think the assumption here is that this btf-for-each-.o approach
> is supposed to speed up the build, right ?
> pahole step on vmlinux is noticeable, but it's still a fraction
> of three vmlinux linking steps.
> How much are we realistically thinking to shave off of that pahole dedup time?

Hi Alan, Arnaldo, Nick,

I'd like to second Alexei's question.
In the cover letter Arnaldo points out that un-deduplicated BTF
amounts for 325Mb, while total DWARF size is 365Mb.
I tried measuring total amount of DWARF in my kernel building directory:

  for f in $(find . -name "*.o" | grep -Ev '(scripts|vmlinux|tools|module-common)'); do \
    readelf -SW $f | grep "\.debug";
  done \
  | awk 'BEGIN {val=0} {val += strtonum("0x"$6)} END {printf("%d", val)}' \
  | numfmt --to=si

And it says 845M.
The size of DWARF sections in the final vmlinux is comparable to yours: 307Mb.
The total size of the generated binaries is 905Mb.
So, unless the above calculations are messed up, the total gain here is:
- save ~500Mb generated during build
- save some time on pahole not needing to parse/convert DWARF

Is this is what you are trying to achieve?

In theory, having BTF handled completely by compiler and linker makes
sense to me.  However, pahole is already here and it does the job.
So, I see several drawbacks:
- As you note, there would be two avenues to generate BTF now:
  - DWARF + pahole
  - BTF + pahole (replaced by BTF + ld at some point?)
  This is a potential source of bugs.
  Is the goal to forgo DWARF+pahole at some point in the future?
- I assume that it is much faster to land changes in pahole compared
  to changes in gcc, so future btf modifications/features might be a
  bit harder to execute. Wdyt?

Thanks,
Eduard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-08 18:28   ` Eduard Zingerman
@ 2025-08-08 19:10     ` Arnaldo Carvalho de Melo
  2025-08-08 20:15       ` Eduard Zingerman
  2025-08-21 21:35       ` Nick Alcock
  0 siblings, 2 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-08-08 19:10 UTC (permalink / raw)
  To: Eduard Zingerman, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	nick.alcock, Alan Maguire
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo, Andrii Nakryiko, Yonghong Song,
	Jose E. Marchesi, Nick Alcock, Namhyung Kim, bpf



On August 8, 2025 3:28:13 PM GMT-03:00, Eduard Zingerman <eddyz87@gmail.com> wrote:
>On Thu, 2025-08-07 at 19:09 -0700, Alexei Starovoitov wrote:
>
>[...]
>
>> Before you jump into 1,2,3 let's discuss the end goal.
>> I think the assumption here is that this btf-for-each-.o approach
>> is supposed to speed up the build, right ?
>> pahole step on vmlinux is noticeable, but it's still a fraction
>> of three vmlinux linking steps.
>> How much are we realistically thinking to shave off of that pahole dedup time?
>
>Hi Alan, Arnaldo, Nick,
>
>I'd like to second Alexei's question.
>In the cover letter Arnaldo points out that un-deduplicated BTF
>amounts for 325Mb, while total DWARF size is 365Mb.
>I tried measuring total amount of DWARF in my kernel building directory:
>
>  for f in $(find . -name "*.o" | grep -Ev '(scripts|vmlinux|tools|module-common)'); do \
>    readelf -SW $f | grep "\.debug";
>  done \
>  | awk 'BEGIN {val=0} {val += strtonum("0x"$6)} END {printf("%d", val)}' \
>  | numfmt --to=si
>
>And it says 845M.
>The size of DWARF sections in the final vmlinux is comparable to yours: 307Mb.
>The total size of the generated binaries is 905Mb.
>So, unless the above calculations are messed up, the total gain here is:
>- save ~500Mb generated during build
>- save some time on pahole not needing to parse/convert DWARF


Well, this 845M number includes modules, that I didn't take into account in my quick calculation for both DWARF and BTF.

>Is this is what you are trying to achieve?

>In theory, having BTF handled completely by compiler and linker makes
>sense to me.  

It looks right, no? But it's not efficient as BTF, as you point out in your next paragraph, can be generated from DWARF, so better do it as a final step if we want to have DWARF _and_ BTF.

> However, pahole is already here and it does the job.
>So, I see several drawbacks:
>- As you note, there would be two avenues to generate BTF now:
>  - DWARF + pahole
>  - BTF + pahole (replaced by BTF + ld at some point?)
>  This is a potential source of bugs.
>  Is the goal to forgo DWARF+pahole at some point in the future?

I think the goal is to allow DWARF less builds, which can probably save time even if we do use pahole to convert DWARF generated from the compiler into BTF and right away strip DWARF.

This is for use cases where DWARF isn't needed and we want to for example have CI systems running faster.

My initial interest was to do minimal changes to pave the way for BTF generated for vmlinux directly from the compiler, but the realization that DWARF still has a lot of mileage, meaning distros will continue to enable it for the foreseeable future makes me think that maybe doing nothing and continue to use the current method is the sensible thing to do.

>- I assume that it is much faster to land changes in pahole compared
>  to changes in gcc, so future btf modifications/features might be a
>  bit harder to execute. Wdyt?

Right, that too, even if we enable generation of BTF for native .o files by the compiler we would still want to use pahole to augment it with new features or to fixup compiler BTF generation bugs. And maybe for generating tags that are only possible to have the necessary info at the last moment.

So something that looked like a hack seems not to really be one.

Then there's Gentoo, the one that likes the idea of a DWARF less build... I like that too, so will continue working on this 8-)

Now if we could have hooks in the linker associated with a given ELF section name (.BTF) to use instead of just concatenating, and then at the end have another hook that would finish the process by doing the dedup, just like I do in this series, that would save one of those linker calls.

I did some quick research and couldn't find such infrastructure in the linkers, I think this is a sensible path, use the minimal changes in my patch series to have a .so plugin to use with a linker that supports this, but then this, again, would make sense only for a BTF only build.


- Arnaldo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-08 19:10     ` Arnaldo Carvalho de Melo
@ 2025-08-08 20:15       ` Eduard Zingerman
  2025-08-08 20:59         ` Arnaldo Carvalho de Melo
  2025-08-21 21:35       ` Nick Alcock
  1 sibling, 1 reply; 21+ messages in thread
From: Eduard Zingerman @ 2025-08-08 20:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, nick.alcock, Alan Maguire
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo, Andrii Nakryiko, Yonghong Song,
	Jose E. Marchesi, Namhyung Kim, bpf

On Fri, 2025-08-08 at 16:10 -0300, Arnaldo Carvalho de Melo wrote:

[...]

> > I'd like to second Alexei's question.
> > In the cover letter Arnaldo points out that un-deduplicated BTF
> > amounts for 325Mb, while total DWARF size is 365Mb.
> > I tried measuring total amount of DWARF in my kernel building directory:

[...]

> > And it says 845M.
> > The size of DWARF sections in the final vmlinux is comparable to yours: 307Mb.
> > The total size of the generated binaries is 905Mb.
> > So, unless the above calculations are messed up, the total gain here is:
> > - save ~500Mb generated during build
> > - save some time on pahole not needing to parse/convert DWARF
> 
> Well, this 845M number includes modules, that I didn't take into
> account in my quick calculation for both DWARF and BTF.

Sorry about that. I have just a few in my config, for those about 6Mb
of DWARF is generated.

> > Is this is what you are trying to achieve?
> 
> > In theory, having BTF handled completely by compiler and linker makes
> > sense to me.  
> 
> It looks right, no? But it's not efficient as BTF, as you point out
> in your next paragraph, can be generated from DWARF, so better do it
> as a final step if we want to have DWARF _and_ BTF.

Idk, I'd stick to a single way of generating BTF, either using an old
scheme or a new scheme. Allowing both will add one more variable when
debugging BPF/BTF related issues reported from distros.

> > However, pahole is already here and it does the job.
> > So, I see several drawbacks:
> > - As you note, there would be two avenues to generate BTF now:
> >  - DWARF + pahole
> >  - BTF + pahole (replaced by BTF + ld at some point?)
> >  This is a potential source of bugs.
> >  Is the goal to forgo DWARF+pahole at some point in the future?
> 
> I think the goal is to allow DWARF less builds, which can probably
> save time even if we do use pahole to convert DWARF generated from
> the compiler into BTF and right away strip DWARF.
> 
> This is for use cases where DWARF isn't needed and we want to for
> example have CI systems running faster.

Ack, thank you for clarification.

> My initial interest was to do minimal changes to pave the way for
> BTF generated for vmlinux directly from the compiler, but the
> realization that DWARF still has a lot of mileage, meaning distros
> will continue to enable it for the foreseeable future makes me think
> that maybe doing nothing and continue to use the current method is
> the sensible thing to do.
> 
> > - I assume that it is much faster to land changes in pahole compared
> >  to changes in gcc, so future btf modifications/features might be a
> >  bit harder to execute. Wdyt?
> 
> Right, that too, even if we enable generation of BTF for native .o
> files by the compiler we would still want to use pahole to augment
> it with new features or to fixup compiler BTF generation bugs. And
> maybe for generating tags that are only possible to have the
> necessary info at the last moment.
> 
> So something that looked like a hack seems not to really be one.

Agree.

> Then there's Gentoo, the one that likes the idea of a DWARF less
> build... I like that too, so will continue working on this 8-)

Out of curiosity, w/o DWARF how do you debug issues when something
goes wrong?

> Now if we could have hooks in the linker associated with a given ELF
> section name (.BTF) to use instead of just concatenating, and then
> at the end have another hook that would finish the process by doing
> the dedup, just like I do in this series, that would save one of
> those linker calls.
> 
> I did some quick research and couldn't find such infrastructure in
> the linkers, I think this is a sensible path, use the minimal
> changes in my patch series to have a .so plugin to use with a linker
> that supports this, but then this, again, would make sense only for
> a BTF only build.

LD documentation page mentions existence of plugins [1],
but after a cursory look at the source code I'm unable to tell how
easy/hard/possible is BTF modification from such a plugin.

[1] https://sourceware.org/binutils/docs/ld.html#Plugins

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-08 20:15       ` Eduard Zingerman
@ 2025-08-08 20:59         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-08-08 20:59 UTC (permalink / raw)
  To: Eduard Zingerman, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	nick.alcock, Alan Maguire
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, dwarves,
	Arnaldo Carvalho de Melo, Andrii Nakryiko, Yonghong Song,
	Jose E. Marchesi, Namhyung Kim, bpf



On August 8, 2025 5:15:34 PM GMT-03:00, Eduard Zingerman <eddyz87@gmail.com> wrote:
>On Fri, 2025-08-08 at 16:10 -0300, Arnaldo Carvalho de Melo wrote:
>
>[...]
>
>> > I'd like to second Alexei's question.
>> > In the cover letter Arnaldo points out that un-deduplicated BTF
>> > amounts for 325Mb, while total DWARF size is 365Mb.
>> > I tried measuring total amount of DWARF in my kernel building directory:
>
>[...]
>
>> > And it says 845M.
>> > The size of DWARF sections in the final vmlinux is comparable to yours: 307Mb.
>> > The total size of the generated binaries is 905Mb.
>> > So, unless the above calculations are messed up, the total gain here is:
>> > - save ~500Mb generated during build
>> > - save some time on pahole not needing to parse/convert DWARF
>> 
>> Well, this 845M number includes modules, that I didn't take into
>> account in my quick calculation for both DWARF and BTF.
>
>Sorry about that. I have just a few in my config, for those about 6Mb
>of DWARF is generated.

Initial numbers, I'll try and have some more comprehensive way to collect the relevant numbers and be able to compare approaches.

>> > Is this is what you are trying to achieve?
>> 
>> > In theory, having BTF handled completely by compiler and linker makes
>> > sense to me.  
>> 
>> It looks right, no? But it's not efficient as BTF, as you point out
>> in your next paragraph, can be generated from DWARF, so better do it
>> as a final step if we want to have DWARF _and_ BTF.

>Idk, I'd stick to a single way of generating BTF, either using an old
>scheme or a new scheme. Allowing both will add one more variable when
>debugging BPF/BTF related issues reported from distros.

Well, I understand the push to pool scarce developer resources to get one way of doing things, be it using pahole or the tool chain (compiler + linker).

But at the same time having multiple ways to do the same thing, like we have with multiple compilers and linkers is a good thing (tm).

With multiple ways, developed mostly independently in ways some camps think hackish, we can give the people working in CI systems more job sekurity, build in many ways and see if differences are bugs, i.e. we want reliable info for co-re, etc, so having multiple producers and continuously comparing their results seems desirable.

Sure, at should see what's the fastest, most reliable by track record, cheapest way to produce both DWARF and BTF and use it.

Right now, among the schemes being discussed, it's what we have in place. Good.

>> > However, pahole is already here and it does the job.
>> > So, I see several drawbacks:
>> > - As you note, there would be two avenues to generate BTF now:
>> >  - DWARF + pahole
>> >  - BTF + pahole (replaced by BTF + ld at some point?)
>> >  This is a potential source of bugs.
>> >  Is the goal to forgo DWARF+pahole at some point in the future?

>> I think the goal is to allow DWARF less builds, which can probably
>> save time even if we do use pahole to convert DWARF generated from
>> the compiler into BTF and right away strip DWARF.

>> This is for use cases where DWARF isn't needed and we want to for
>> example have CI systems running faster.

>Ack, thank you for clarification.

>> My initial interest was to do minimal changes to pave the way for
>> BTF generated for vmlinux directly from the compiler, but the
>> realization that DWARF still has a lot of mileage, meaning distros
>> will continue to enable it for the foreseeable future makes me think
>> that maybe doing nothing and continue to use the current method is
>> the sensible thing to do.

>> > - I assume that it is much faster to land changes in pahole compared
>> >  to changes in gcc, so future btf modifications/features might be a
>> >  bit harder to execute. Wdyt?

>> Right, that too, even if we enable generation of BTF for native .o
>> files by the compiler we would still want to use pahole to augment
>> it with new features or to fixup compiler BTF generation bugs. And
>> maybe for generating tags that are only possible to have the
>> necessary info at the last moment.

>> So something that looked like a hack seems not to really be one.

>Agree.

>> Then there's Gentoo, the one that likes the idea of a DWARF less
>> build... I like that too, so will continue working on this 8-)
>
>Out of curiosity, w/o DWARF how do you debug issues when something
>goes wrong?

Well, modern tooling support BTF when debugging/tracing/etc the _kernel_, see drgn, perf, and now even ftrace. Look ma, no DWARF :-)

>> Now if we could have hooks in the linker associated with a given ELF
>> section name (.BTF) to use instead of just concatenating, and then
>> at the end have another hook that would finish the process by doing
>> the dedup, just like I do in this series, that would save one of
>> those linker calls.
>> 
>> I did some quick research and couldn't find such infrastructure in
>> the linkers, I think this is a sensible path, use the minimal
>> changes in my patch series to have a .so plugin to use with a linker
>> that supports this, but then this, again, would make sense only for
>> a BTF only build.

>LD documentation page mentions existence of plugins [1],
>but after a cursory look at the source code I'm unable to tell how
>easy/hard/possible is BTF modification from such a plugin.

Yeah, linking looked like something done with, no need for this kind of extensibility. Nope, we need it now. 

- Arnaldo 

>
>[1] https://sourceware.org/binutils/docs/ld.html#Plugins

- Arnaldo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-08 19:10     ` Arnaldo Carvalho de Melo
  2025-08-08 20:15       ` Eduard Zingerman
@ 2025-08-21 21:35       ` Nick Alcock
  2025-08-27  0:14         ` Alexei Starovoitov
  1 sibling, 1 reply; 21+ messages in thread
From: Nick Alcock @ 2025-08-21 21:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Eduard Zingerman, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	nick.alcock, Alan Maguire, Jiri Olsa, Clark Williams, Kate Carcia,
	dwarves, Arnaldo Carvalho de Melo, Andrii Nakryiko, Yonghong Song,
	Jose E. Marchesi, Namhyung Kim, bpf

On 8 Aug 2025, Arnaldo Carvalho de Melo told this:

> On August 8, 2025 3:28:13 PM GMT-03:00, Eduard Zingerman <eddyz87@gmail.com> wrote:
>>On Thu, 2025-08-07 at 19:09 -0700, Alexei Starovoitov wrote:
>>
>>> Before you jump into 1,2,3 let's discuss the end goal.
>>> I think the assumption here is that this btf-for-each-.o approach
>>> is supposed to speed up the build, right ?

Generating BTF directly in the compiler certainly does, in situations
where we can avoid DWARF. We reduce the amount of data written out by
something like 11GiB (!) in my tests.

>>I'd like to second Alexei's question.
>>In the cover letter Arnaldo points out that un-deduplicated BTF
>>amounts for 325Mb, while total DWARF size is 365Mb.

That very much depends on the kernels you build. In my tests of
enterprise kernels (including modules) with the GCC+btfarchive toolchain
(not feeding it to pahole yet), I found total DWARF of 11.2GiB,
undeduplicated BTF of 550MiB (counting raw .o compiler output alone),
and a final dedupicated BTF size (including all modules) of about 38MiB
(which I'm sure I can reduce).

>>The size of DWARF sections in the final vmlinux is comparable to yours: 307Mb.
>>The total size of the generated binaries is 905Mb.
>>So, unless the above calculations are messed up, the total gain here is:
>>- save ~500Mb generated during build

For me, 11GiB :)

>>- save some time on pahole not needing to parse/convert DWARF

In my tests, a *lot*. I think Arnaldo has recently improved this, but
back in April when I was comparing things, I had to kill pahole when it
was dedupping an allmodconfig kernel-plus-modules because it ate more
than 70GiB of RAM and was still chewing on all 20 cores of my machine
after two hours. btfdedup (which uses the libctf deduplicator used by
GNU ld), despite being single-threaded and doing things like ambiguous
type detection as well, used 12GiB and took 19 minutes. (Multithreading
it is in progress, too). allyesconfig is faster. Anything sane is faster
yet. Enterprise kernels take about four minutes, which is not too
different from pahole.

I was shocked by this: I thought libctf would be slower than pahole, and
instead it turned out to be faster, sometimes much faster. I suspect
much of this frankly ridiculous difference was DWARF conversion, and so
would be improved by doing it in parallel (as here), but... still. Not
having to generate and consume all that DWARF is bound to help! It's
like 95% less work...

>>So, I see several drawbacks:
>>- As you note, there would be two avenues to generate BTF now:
>>  - DWARF + pahole
>>  - BTF + pahole (replaced by BTF + ld at some point?)

The code exists... BTF + ld + dedupping the resulting ld-dedupped output
together.

Note that the code used to deduplicate BTF with libctf (as used by ld)
is not large. Look:
https://github.com/nickalcock/linux/blob/nix/btfa/scripts/btf/btfarchive.c
(and of those functions, you don't need transform_module_names(),
suck_in_modules(), or suck_in_lines(): it's really no more code than is
needed to tell it which inputs map to which modules, then a couple of
lines to trigger dedup and emit the resulting BTF archive).

It's entirely reasonable for pahole in future to simply call libctf's
deduplicator to dedup BTF if it sees that the linker hasn't done it, or
to do what btfarchive does here itself to dedup the linker-deduplicated
per-module output and the vmlinux BTF against each other (and then we
don't need btfarchive at all, which means fewer build system changes).

This would let pahole dedup BTF if needed while not wasting time on it
if the linker already did it, *and* let you ditch the pahole
deduplicator so you don't need to maintain it any more, even when clang
et al are being used. (Obviously, you'd only do this once libctf's dedup
is up to scratch and once it's in a release binutils, since I'm sure
there will be bugs I need to fix!)

>>  This is a potential source of bugs.

That's not a very good argument. *Everything* is a potential source of
bugs. I will of course prioritize fixing any bugs in libctf that affect
pahole's operation: not breaking pahole matters!

>>  Is the goal to forgo DWARF+pahole at some point in the future?
>
> I think the goal is to allow DWARF less builds, which can probably save time even if we do use pahole to convert DWARF generated from the compiler into BTF and right away strip DWARF.
>
> This is for use cases where DWARF isn't needed and we want to for example have CI systems running faster.

Yep! Also this means that you can get new features like type and decl
tags into BTF faster, because it's much quicker to get them into GCC and
libctf (at least for recent compiler releases) than it is to get them
into DWARF just so you can get them out of DWARF again and translate
them into BTF. DWARF simply has many more consumers to think about,
while the kernel is obviously a critical consumer of GCC's and libctf's
generated BTF (we do need to consider userspace, but we don't need to be
as conservative as a giant behemoth like DWARF must be. I'm confident
enough in my testing to be willing to backport things to binutils
release branches as needed, though probably not to points before the
first release where BTF support is added to libctf because that change
is pretty massive.)

> My initial interest was to do minimal changes to pave the way for BTF
> generated for vmlinux directly from the compiler, but the realization
> that DWARF still has a lot of mileage, meaning distros will continue
> to enable it for the foreseeable future makes me think that maybe
> doing nothing and continue to use the current method is the sensible
> thing to do.

Speaking purely selfishly, I would be... unhappy to find that I'd spent
all this effort on a BTF-capable deduplicator only to find you didn't
want to use it no matter how good it ended up being :( this seems like a
rather sudden change of heart...

>>- I assume that it is much faster to land changes in pahole compared
>>  to changes in gcc, so future btf modifications/features might be a
>>  bit harder to execute. Wdyt?

As noted, I think this is not really true, at least once the core BTF
dedup stuff has landed: I can backport stuff on top of them without
doing releases, and distros usually pick it up within a few days. The
principal delay is testing...

> Right, that too, even if we enable generation of BTF for native .o
> files by the compiler we would still want to use pahole to augment it
> with new features or to fixup compiler BTF generation bugs. And maybe
> for generating tags that are only possible to have the necessary info
> at the last moment.

Well, yes. I thought it was always the plan for pahole to keep consuming
and augmenting BTF! Among other things, the kernel uses a bunch of
additional sections that reference BTF types that GNU ld has no idea how
to generate, and which nobody is planning to use outside the kernel.
That's also where a lot of the innovation is happening, and GCC and GNU
ld don't need to get involved in that at all (unless and until you want
them to).

I can say that changing libctf to support *every difference from CTF
that BTF has got* and teaching GNU ld to handle that took about two
months, so implementing single changes in future doesn't seem like an
insurmountable burden (and much of that two months was spent on
infrastructural adjustments to allow easier changes in future -- the
hardest single BTF feature to suppoert was probably datasecs and vars,
and that took about a week including deduplication). Obviously there
will be bugs, but when they show up I'll fix them.

I am not worried about the maintenance burden of supporting new BTF
stuff in binutils libctf and I don't think Jose is worried about it in
GCC either.

I mean, it's not like it's going to be an extra burden for long: the
medium-term goal is to replace CTF with BTF entirely, even for userspace
consumption. There are surprisingly few new features needed before we
can consign CTF to history and converge on one type format to rule them
all. (I think they're all entirely nondisruptive too.)

> Now if we could have hooks in the linker associated with a given ELF
> section name (.BTF) to use instead of just concatenating, and then at
> the end have another hook that would finish the process by doing the
> dedup, just like I do in this series, that would save one of those
> linker calls.

Yeah, we looked at that, but GNU ld's plugin support is totally focused
on the needs of LTO and can't really handle what dedup needs at all:
fixing that would likely be a substantial and fiddly change. As part of
the CTF and BTF work there *are* internal hooks in ld and libbfd that do
what is needed, but they're not exported outside the linker, and
exporting them looks to be... painful. (But it seems unnecessary for GNU
ld, since it will after all be able to dedup BTF with no plugins at all,
and already can in my proof-of-concept branch on binutils-gdb git.)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-21 21:35       ` Nick Alcock
@ 2025-08-27  0:14         ` Alexei Starovoitov
  2025-09-15 10:11           ` Nick Alcock
  0 siblings, 1 reply; 21+ messages in thread
From: Alexei Starovoitov @ 2025-08-27  0:14 UTC (permalink / raw)
  To: Nick Alcock
  Cc: Arnaldo Carvalho de Melo, Eduard Zingerman,
	Arnaldo Carvalho de Melo, Alan Maguire, Jiri Olsa, Clark Williams,
	Kate Carcia, dwarves, Arnaldo Carvalho de Melo, Andrii Nakryiko,
	Yonghong Song, Jose E. Marchesi, Namhyung Kim, bpf

On Thu, Aug 21, 2025 at 2:35 PM Nick Alcock <nick.alcock@oracle.com> wrote:
>
> On 8 Aug 2025, Arnaldo Carvalho de Melo told this:
>
> > On August 8, 2025 3:28:13 PM GMT-03:00, Eduard Zingerman <eddyz87@gmail.com> wrote:
> >>On Thu, 2025-08-07 at 19:09 -0700, Alexei Starovoitov wrote:
> >>
> >>> Before you jump into 1,2,3 let's discuss the end goal.
> >>> I think the assumption here is that this btf-for-each-.o approach
> >>> is supposed to speed up the build, right ?
>
> Generating BTF directly in the compiler certainly does, in situations
> where we can avoid DWARF. We reduce the amount of data written out by
> something like 11GiB (!) in my tests.
>
> >>I'd like to second Alexei's question.
> >>In the cover letter Arnaldo points out that un-deduplicated BTF
> >>amounts for 325Mb, while total DWARF size is 365Mb.
>
> That very much depends on the kernels you build. In my tests of
> enterprise kernels (including modules) with the GCC+btfarchive toolchain
> (not feeding it to pahole yet), I found total DWARF of 11.2GiB,
> undeduplicated BTF of 550MiB (counting raw .o compiler output alone),
> and a final dedupicated BTF size (including all modules) of about 38MiB
> (which I'm sure I can reduce).

11.2G doesn't match Arnaldo's 365Mb.
Frankly I've never seen such huge dwarf objects.
I'm guessing you're using some ultra verbose dwarf compilation
mode. If so, it's not a realistic comparison, since typical
kernel build is what Arnaldo reported.
That's what I observe as well.

> >>The size of DWARF sections in the final vmlinux is comparable to yours: 307Mb.
> >>The total size of the generated binaries is 905Mb.
> >>So, unless the above calculations are messed up, the total gain here is:
> >>- save ~500Mb generated during build
>
> For me, 11GiB :)
>
> >>- save some time on pahole not needing to parse/convert DWARF
>
> In my tests, a *lot*. I think Arnaldo has recently improved this, but
> back in April when I was comparing things, I had to kill pahole when it
> was dedupping an allmodconfig kernel-plus-modules because it ate more
> than 70GiB of RAM and was still chewing on all 20 cores of my machine
> after two hours. btfdedup (which uses the libctf deduplicator used by
> GNU ld), despite being single-threaded and doing things like ambiguous
> type detection as well, used 12GiB and took 19 minutes. (Multithreading
> it is in progress, too). allyesconfig is faster. Anything sane is faster
> yet. Enterprise kernels take about four minutes, which is not too
> different from pahole.
>
> I was shocked by this: I thought libctf would be slower than pahole, and
> instead it turned out to be faster, sometimes much faster. I suspect
> much of this frankly ridiculous difference was DWARF conversion, and so
> would be improved by doing it in parallel (as here), but... still. Not
> having to generate and consume all that DWARF is bound to help! It's
> like 95% less work...

Something doesn't add up here.
Everyone is using pahole and lots of people doing allmodconfig builds
with pahole. Noone reported that pahole consumes 70G and runs for hours.
Something is really not right in your setup.
I suspect the root cause is your 11G size of dwarf.
Pls use typical kernel build configs then we can have apple to apple
comparison and reason about libctf pros/cons.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain
  2025-08-27  0:14         ` Alexei Starovoitov
@ 2025-09-15 10:11           ` Nick Alcock
  0 siblings, 0 replies; 21+ messages in thread
From: Nick Alcock @ 2025-09-15 10:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Nick Alcock, Arnaldo Carvalho de Melo, Eduard Zingerman,
	Arnaldo Carvalho de Melo, Alan Maguire, Jiri Olsa, Clark Williams,
	Kate Carcia, dwarves, Arnaldo Carvalho de Melo, Andrii Nakryiko,
	Yonghong Song, Jose E. Marchesi, Namhyung Kim, bpf

On 27 Aug 2025, Alexei Starovoitov stated:

> On Thu, Aug 21, 2025 at 2:35 PM Nick Alcock <nick.alcock@oracle.com> wrote:
>>
>> >>I'd like to second Alexei's question.
>> >>In the cover letter Arnaldo points out that un-deduplicated BTF
>> >>amounts for 325Mb, while total DWARF size is 365Mb.
>>
>> That very much depends on the kernels you build. In my tests of
>> enterprise kernels (including modules) with the GCC+btfarchive toolchain
>> (not feeding it to pahole yet), I found total DWARF of 11.2GiB,
>> undeduplicated BTF of 550MiB (counting raw .o compiler output alone),
>> and a final dedupicated BTF size (including all modules) of about 38MiB
>> (which I'm sure I can reduce).
>
> 11.2G doesn't match Arnaldo's 365Mb.
> Frankly I've never seen such huge dwarf objects.

I have, but... it was a while back. I shouldn't have worked from memory.

Regenerating with a more recent toolchain, summing up all written
section sizes (so, undeduplicated .BTF compiler output *and* all the
deduplicated module intermediate links) I usually see DWARF sizes about
two to three times that of the .BTF (e.g. the BTF selftest is about
800MiB versus about 400MiB of BTF: the final BTF size from both
btfarchive and pahole is on the order of 2MiB).

Using a random enterprise kernel config (so 2900+ modules, etc), I see
4072236343 bytes of DWARF, 2199803264 bytes of undeduplicated .BTF
sections: so, again, about 50% reduction.

(toolchain-level dedup on this one takes two minutes and peaks at 5GiB
memory usage, producing a 40MiB BTF archive: I know this output can be
greatly reduced by a fix I'm planning shortly. :) )

> I'm guessing you're using some ultra verbose dwarf compilation
> mode. If so, it's not a realistic comparison, since typical
> kernel build is what Arnaldo reported.
> That's what I observe as well.
>
>> >>The size of DWARF sections in the final vmlinux is comparable to yours: 307Mb.
>> >>The total size of the generated binaries is 905Mb.

Ditto, now. I dont know what weirdo config I was using before (I suspect
it was just an older GCC with a different default DWARF version, and
this is simply DWARF 2/3 versus 5). It's still a nontrivial saving.

>> GNU ld), despite being single-threaded and doing things like ambiguous
>> type detection as well, used 12GiB and took 19 minutes. (Multithreading
>> it is in progress, too). allyesconfig is faster. Anything sane is faster
>> yet. Enterprise kernels take about four minutes, which is not too
>> different from pahole.
>>
>> I was shocked by this: I thought libctf would be slower than pahole, and
>> instead it turned out to be faster, sometimes much faster. I suspect
>> much of this frankly ridiculous difference was DWARF conversion, and so
>> would be improved by doing it in parallel (as here), but... still. Not
>> having to generate and consume all that DWARF is bound to help! It's
>> like 95% less work...
>
> Something doesn't add up here.
> Everyone is using pahole and lots of people doing allmodconfig builds
> with pahole. Noone reported that pahole consumes 70G and runs for hours.
> Something is really not right in your setup.

Well... yeah, that would be the make allmodconfig / allyesconfig
configuration options. pahole takes more reasonable times with more
reasonable configurations, but still ten minutes or more is fairly
routine for me.

> Pls use typical kernel build configs then we can have apple to apple
> comparison and reason about libctf pros/cons.

I'm not sure there is such a thing as typical, really. I hope random
enterprise configs will do, but they probably have more modules than
"normal" and God knows the BTF test configs have fewer :)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-09-15 10:11 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-07 18:25 [RFC 0/4] BTF archive with unmodified pahole+toolchain Arnaldo Carvalho de Melo
2025-08-07 18:25 ` [PATCH 1/4] libbpf: Simplify error handling removing needless repeated err checks Arnaldo Carvalho de Melo
2025-08-07 18:25 ` [PATCH 2/4] libbpf: Check if there is extra data at the end of a BTF Arnaldo Carvalho de Melo
2025-08-07 18:25 ` [PATCH 3/4] libbpf: Add support for detecting and dedup'ing a BTF archive Arnaldo Carvalho de Melo
2025-08-07 18:25 ` [PATCH 4/4] libbpf: Check if an ELF .BTF section is an archive and combine/dedup Arnaldo Carvalho de Melo
2025-08-07 18:46 ` [RFC 0/4] BTF archive with unmodified pahole+toolchain Arnaldo Carvalho de Melo
2025-08-07 20:23 ` Arnaldo Carvalho de Melo
2025-08-08  2:09 ` Alexei Starovoitov
     [not found]   ` <CA+JHD92DODDESCfwiiCs_ZQ5bGesK5NC+xe5EvONF5g+-Bg+9Q@mail.gmail.com>
2025-08-08  2:52     ` Alexei Starovoitov
2025-08-08  3:25       ` Arnaldo Carvalho de Melo
2025-08-08  3:33         ` Sam James
2025-08-08  3:54           ` Arnaldo Carvalho de Melo
2025-08-08 14:45         ` Nick Alcock
2025-08-08 15:15       ` Nick Alcock
2025-08-08 18:28   ` Eduard Zingerman
2025-08-08 19:10     ` Arnaldo Carvalho de Melo
2025-08-08 20:15       ` Eduard Zingerman
2025-08-08 20:59         ` Arnaldo Carvalho de Melo
2025-08-21 21:35       ` Nick Alcock
2025-08-27  0:14         ` Alexei Starovoitov
2025-09-15 10:11           ` Nick Alcock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox