* Re: [PATCH v3 1/2] libbpf: add support for printing BTF character arrays as strings
[not found] ` <174915723301.3244853.343931856692302765.git-patchwork-notify@kernel.org>
@ 2025-08-30 5:19 ` Ian Rogers
2025-09-01 4:17 ` Yonghong Song
0 siblings, 1 reply; 7+ messages in thread
From: Ian Rogers @ 2025-08-30 5:19 UTC (permalink / raw)
To: Blake Jones, namhyung, Arnaldo Carvalho de Melo
Cc: ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, mykolal, shuah,
ihor.solodrai, bpf, linux-kernel, linux-kselftest,
linux-perf-users, Howard Chu
On Thu, Jun 5, 2025 at 2:00 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
>
> Hello:
>
> This series was applied to bpf/bpf-next.git (master)
> by Andrii Nakryiko <andrii@kernel.org>:
>
> On Tue, 3 Jun 2025 13:37:00 -0700 you wrote:
> > The BTF dumper code currently displays arrays of characters as just that -
> > arrays, with each character formatted individually. Sometimes this is what
> > makes sense, but it's nice to be able to treat that array as a string.
> >
> > This change adds a special case to the btf_dump functionality to allow
> > 0-terminated arrays of single-byte integer values to be printed as
> > character strings. Characters for which isprint() returns false are
> > printed as hex-escaped values. This is enabled when the new ".emit_strings"
> > is set to 1 in the btf_dump_type_data_opts structure.
> >
> > [...]
>
> Here is the summary with links:
> - [v3,1/2] libbpf: add support for printing BTF character arrays as strings
> https://git.kernel.org/bpf/bpf-next/c/87c9c79a02b4
> - [v3,2/2] Tests for the ".emit_strings" functionality in the BTF dumper.
> https://git.kernel.org/bpf/bpf-next/c/a570f386f3d1
>
> You are awesome, thank you!
I believe this patch is responsible for segvs occurring in v6.17 in
various perf tests when the perf tests run in parallel. There's lots
of BPF things happening in parallel in the test but the failures are
happening in a shell and I did get to attach a debugger. I've not seen
this problem earlier as the patches weren't in the perf-tools-next
tree. Through bisection I was able to blame the patch and I came up
with this minimal fix:
```
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index ccfd905f03df..71e198b30c5f 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -326,10 +326,10 @@ struct btf_dump_type_data_opts {
bool compact; /* no newlines/indentation */
bool skip_names; /* skip member/type names */
bool emit_zeroes; /* show 0-valued fields */
- bool emit_strings; /* print char arrays as strings */
+ //bool emit_strings; /* print char arrays as strings */
size_t :0;
};
-#define btf_dump_type_data_opts__last_field emit_strings
+#define btf_dump_type_data_opts__last_field emit_zeroes
LIBBPF_API int
btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
index f09f25eccf3c..c7b5a376642f 100644
--- a/tools/lib/bpf/btf_dump.c
+++ b/tools/lib/bpf/btf_dump.c
@@ -2599,7 +2599,7 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
d->typed_dump->compact = OPTS_GET(opts, compact, false);
d->typed_dump->skip_names = OPTS_GET(opts, skip_names, false);
d->typed_dump->emit_zeroes = OPTS_GET(opts, emit_zeroes, false);
- d->typed_dump->emit_strings = OPTS_GET(opts, emit_strings, false);
+ d->typed_dump->emit_strings = true; // OPTS_GET(opts,
emit_strings, false);
ret = btf_dump_dump_type_data(d, NULL, t, id, data, 0, 0);
```
So I think the problem relates to modifying struct
btf_dump_type_data_opts. Given I'm statically linking libbpf into perf
I'm not sure on the exact route of the segv, no doubt this report will
be enough for someone else to figure it out.
Given this is a regression what should the fix be?
Thanks,
Ian
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] libbpf: add support for printing BTF character arrays as strings
2025-08-30 5:19 ` [PATCH v3 1/2] libbpf: add support for printing BTF character arrays as strings Ian Rogers
@ 2025-09-01 4:17 ` Yonghong Song
2025-09-01 5:22 ` Ian Rogers
2025-09-01 7:57 ` Namhyung Kim
0 siblings, 2 replies; 7+ messages in thread
From: Yonghong Song @ 2025-09-01 4:17 UTC (permalink / raw)
To: Ian Rogers, Blake Jones, namhyung, Arnaldo Carvalho de Melo
Cc: ast, daniel, andrii, martin.lau, eddyz87, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, mykolal, shuah, ihor.solodrai, bpf,
linux-kernel, linux-kselftest, linux-perf-users, Howard Chu
On 8/29/25 10:19 PM, Ian Rogers wrote:
> On Thu, Jun 5, 2025 at 2:00 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
>> Hello:
>>
>> This series was applied to bpf/bpf-next.git (master)
>> by Andrii Nakryiko <andrii@kernel.org>:
>>
>> On Tue, 3 Jun 2025 13:37:00 -0700 you wrote:
>>> The BTF dumper code currently displays arrays of characters as just that -
>>> arrays, with each character formatted individually. Sometimes this is what
>>> makes sense, but it's nice to be able to treat that array as a string.
>>>
>>> This change adds a special case to the btf_dump functionality to allow
>>> 0-terminated arrays of single-byte integer values to be printed as
>>> character strings. Characters for which isprint() returns false are
>>> printed as hex-escaped values. This is enabled when the new ".emit_strings"
>>> is set to 1 in the btf_dump_type_data_opts structure.
>>>
>>> [...]
>> Here is the summary with links:
>> - [v3,1/2] libbpf: add support for printing BTF character arrays as strings
>> https://git.kernel.org/bpf/bpf-next/c/87c9c79a02b4
>> - [v3,2/2] Tests for the ".emit_strings" functionality in the BTF dumper.
>> https://git.kernel.org/bpf/bpf-next/c/a570f386f3d1
>>
>> You are awesome, thank you!
> I believe this patch is responsible for segvs occurring in v6.17 in
> various perf tests when the perf tests run in parallel. There's lots
Could you share the command line to reproduce this failure?
This will help debugging. Thanks!
> of BPF things happening in parallel in the test but the failures are
> happening in a shell and I did get to attach a debugger. I've not seen
> this problem earlier as the patches weren't in the perf-tools-next
> tree. Through bisection I was able to blame the patch and I came up
> with this minimal fix:
> ```
> diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> index ccfd905f03df..71e198b30c5f 100644
> --- a/tools/lib/bpf/btf.h
> +++ b/tools/lib/bpf/btf.h
> @@ -326,10 +326,10 @@ struct btf_dump_type_data_opts {
> bool compact; /* no newlines/indentation */
> bool skip_names; /* skip member/type names */
> bool emit_zeroes; /* show 0-valued fields */
> - bool emit_strings; /* print char arrays as strings */
> + //bool emit_strings; /* print char arrays as strings */
> size_t :0;
> };
> -#define btf_dump_type_data_opts__last_field emit_strings
> +#define btf_dump_type_data_opts__last_field emit_zeroes
>
> LIBBPF_API int
> btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
> diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
> index f09f25eccf3c..c7b5a376642f 100644
> --- a/tools/lib/bpf/btf_dump.c
> +++ b/tools/lib/bpf/btf_dump.c
> @@ -2599,7 +2599,7 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
> d->typed_dump->compact = OPTS_GET(opts, compact, false);
> d->typed_dump->skip_names = OPTS_GET(opts, skip_names, false);
> d->typed_dump->emit_zeroes = OPTS_GET(opts, emit_zeroes, false);
> - d->typed_dump->emit_strings = OPTS_GET(opts, emit_strings, false);
> + d->typed_dump->emit_strings = true; // OPTS_GET(opts,
> emit_strings, false);
>
> ret = btf_dump_dump_type_data(d, NULL, t, id, data, 0, 0);
>
>
> ```
> So I think the problem relates to modifying struct
> btf_dump_type_data_opts. Given I'm statically linking libbpf into perf
> I'm not sure on the exact route of the segv, no doubt this report will
> be enough for someone else to figure it out.
>
> Given this is a regression what should the fix be?
>
> Thanks,
> Ian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] libbpf: add support for printing BTF character arrays as strings
2025-09-01 4:17 ` Yonghong Song
@ 2025-09-01 5:22 ` Ian Rogers
2025-09-01 7:57 ` Namhyung Kim
1 sibling, 0 replies; 7+ messages in thread
From: Ian Rogers @ 2025-09-01 5:22 UTC (permalink / raw)
To: Yonghong Song
Cc: Blake Jones, namhyung, Arnaldo Carvalho de Melo, ast, daniel,
andrii, martin.lau, eddyz87, song, john.fastabend, kpsingh, sdf,
haoluo, jolsa, mykolal, shuah, ihor.solodrai, bpf, linux-kernel,
linux-kselftest, linux-perf-users, Howard Chu
On Sun, Aug 31, 2025 at 9:17 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 8/29/25 10:19 PM, Ian Rogers wrote:
> > On Thu, Jun 5, 2025 at 2:00 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
> >> Hello:
> >>
> >> This series was applied to bpf/bpf-next.git (master)
> >> by Andrii Nakryiko <andrii@kernel.org>:
> >>
> >> On Tue, 3 Jun 2025 13:37:00 -0700 you wrote:
> >>> The BTF dumper code currently displays arrays of characters as just that -
> >>> arrays, with each character formatted individually. Sometimes this is what
> >>> makes sense, but it's nice to be able to treat that array as a string.
> >>>
> >>> This change adds a special case to the btf_dump functionality to allow
> >>> 0-terminated arrays of single-byte integer values to be printed as
> >>> character strings. Characters for which isprint() returns false are
> >>> printed as hex-escaped values. This is enabled when the new ".emit_strings"
> >>> is set to 1 in the btf_dump_type_data_opts structure.
> >>>
> >>> [...]
> >> Here is the summary with links:
> >> - [v3,1/2] libbpf: add support for printing BTF character arrays as strings
> >> https://git.kernel.org/bpf/bpf-next/c/87c9c79a02b4
> >> - [v3,2/2] Tests for the ".emit_strings" functionality in the BTF dumper.
> >> https://git.kernel.org/bpf/bpf-next/c/a570f386f3d1
> >>
> >> You are awesome, thank you!
> > I believe this patch is responsible for segvs occurring in v6.17 in
> > various perf tests when the perf tests run in parallel. There's lots
>
> Could you share the command line to reproduce this failure?
> This will help debugging. Thanks!
Sure:
```
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
...
$ cd linux
$ mkdir /tmp/perf
$ make -C tools/perf O=/tmp/perf
make: Entering directory 'linux/tools/perf'
BUILD: Doing 'make -j28' parallel build
Warning: Kernel ABI header differences:
diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
Auto-detecting system features:
... libdw: [ on ]
... glibc: [ on ]
... libelf: [ on ]
... libnuma: [ on ]
... numa_num_possible_cpus: [ on ]
... libperl: [ on ]
... libpython: [ on ]
... libcapstone: [ on ]
... llvm-perf: [ on ]
... zlib: [ on ]
... lzma: [ on ]
... get_cpuid: [ on ]
... bpf: [ on ]
... libaio: [ on ]
... libzstd: [ on ]
GEN /tmp/perf/common-cmds.h
...
LINK /tmp/perf/perf
GEN /tmp/perf/python/perf.cpython-313-x86_64-linux-gnu.so
make: Leaving directory 'linux/tools/perf'
$ git describe
v6.17-rc4
$ sudo /tmp/perf/perf test -v
1: vmlinux symtab matches kallsyms : Skip
2: Detect openat syscall event : Ok
...
79: build id cache operations : Ok
--- start ---
test child forked, pid 2546565
test daemon list
FAILED: wrong name
FAILED: wrong run
FAILED: wrong base
FAILED: wrong output
FAILED: wrong control
FAILED: wrong ack
test daemon reconfig
FAILED: Timeout waiting for daemon to ping
---- end(-1) ----
80: daemon operations : FAILED!
81: perf diff tests : Ok
82: DRM PMU : Ok
83: perf ftrace tests : Ok
--- start ---
test child forked, pid 2546573
Test perf header file
perf: Segmentation fault
linux/tools/perf/tests/shell/header.sh: line 51: 2546664 Segmentation
fault (core dumped
) perf record -o "${perfdata}" -- perf test -w noploop
Unexpected signal in test_file
---- end(-1) ----
84: perf header tests : FAILED!
85: perf list tests : Ok
...
92: perf sched tests : Ok
--- start ---
test child forked, pid 2546593
DB test
perf: Segmentation fault
/tmp/x/linux/tools/perf/tests/shell/script.sh: line 35: 2546822
Segmentation fault (core dumped
) perf record $cmd_flags -o "${perfdatafile}" true
--- Cleaning up ---
---- end(-1) ----
93: perf script tests : FAILED!
...
$ /tmp/perf/perf version --build-options
perf version 6.17.rc4.gb320789d6883
aio: [ on ] # HAVE_AIO_SUPPORT
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
bpf_skeletons: [ on ] # HAVE_BPF_SKEL
debuginfod: [ on ] # HAVE_DEBUGINFOD_SUPPORT
dwarf: [ on ] # HAVE_LIBDW_SUPPORT
dwarf_getlocations: [ on ] # HAVE_LIBDW_SUPPORT
dwarf-unwind: [ on ] # HAVE_DWARF_UNWIND_SUPPORT
auxtrace: [ on ] # HAVE_AUXTRACE_SUPPORT
libbfd: [ OFF ] # HAVE_LIBBFD_SUPPORT ( tip:
Deprecated, license incompatibility, u
se BUILD_NONDISTRO=1 and install binutils-dev[el] )
libbpf-strings: [ on ] # HAVE_LIBBPF_STRINGS_SUPPORT
libcapstone: [ on ] # HAVE_LIBCAPSTONE_SUPPORT
libdw-dwarf-unwind: [ on ] # HAVE_LIBDW_SUPPORT
libelf: [ on ] # HAVE_LIBELF_SUPPORT
libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT
libopencsd: [ OFF ] # HAVE_CSTRACE_SUPPORT
libperl: [ on ] # HAVE_LIBPERL_SUPPORT
libpfm4: [ on ] # HAVE_LIBPFM
libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT
libslang: [ on ] # HAVE_SLANG_SUPPORT
libtraceevent: [ on ] # HAVE_LIBTRACEEVENT
libunwind: [ OFF ] # HAVE_LIBUNWIND_SUPPORT ( tip:
Deprecated, use LIBUNWIND=1 and ins
tall libunwind-dev[el] to build with it )
lzma: [ on ] # HAVE_LZMA_SUPPORT
numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT
zlib: [ on ] # HAVE_ZLIB_SUPPORT
zstd: [ on ] # HAVE_ZSTD_SUPPORT
```
Thanks,
Ian
> > of BPF things happening in parallel in the test but the failures are
> > happening in a shell and I did get to attach a debugger. I've not seen
> > this problem earlier as the patches weren't in the perf-tools-next
> > tree. Through bisection I was able to blame the patch and I came up
> > with this minimal fix:
> > ```
> > diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> > index ccfd905f03df..71e198b30c5f 100644
> > --- a/tools/lib/bpf/btf.h
> > +++ b/tools/lib/bpf/btf.h
> > @@ -326,10 +326,10 @@ struct btf_dump_type_data_opts {
> > bool compact; /* no newlines/indentation */
> > bool skip_names; /* skip member/type names */
> > bool emit_zeroes; /* show 0-valued fields */
> > - bool emit_strings; /* print char arrays as strings */
> > + //bool emit_strings; /* print char arrays as strings */
> > size_t :0;
> > };
> > -#define btf_dump_type_data_opts__last_field emit_strings
> > +#define btf_dump_type_data_opts__last_field emit_zeroes
> >
> > LIBBPF_API int
> > btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
> > diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
> > index f09f25eccf3c..c7b5a376642f 100644
> > --- a/tools/lib/bpf/btf_dump.c
> > +++ b/tools/lib/bpf/btf_dump.c
> > @@ -2599,7 +2599,7 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
> > d->typed_dump->compact = OPTS_GET(opts, compact, false);
> > d->typed_dump->skip_names = OPTS_GET(opts, skip_names, false);
> > d->typed_dump->emit_zeroes = OPTS_GET(opts, emit_zeroes, false);
> > - d->typed_dump->emit_strings = OPTS_GET(opts, emit_strings, false);
> > + d->typed_dump->emit_strings = true; // OPTS_GET(opts,
> > emit_strings, false);
> >
> > ret = btf_dump_dump_type_data(d, NULL, t, id, data, 0, 0);
> >
> >
> > ```
> > So I think the problem relates to modifying struct
> > btf_dump_type_data_opts. Given I'm statically linking libbpf into perf
> > I'm not sure on the exact route of the segv, no doubt this report will
> > be enough for someone else to figure it out.
> >
> > Given this is a regression what should the fix be?
> >
> > Thanks,
> > Ian
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] libbpf: add support for printing BTF character arrays as strings
2025-09-01 4:17 ` Yonghong Song
2025-09-01 5:22 ` Ian Rogers
@ 2025-09-01 7:57 ` Namhyung Kim
2025-09-01 17:04 ` Ian Rogers
1 sibling, 1 reply; 7+ messages in thread
From: Namhyung Kim @ 2025-09-01 7:57 UTC (permalink / raw)
To: Yonghong Song
Cc: Ian Rogers, Blake Jones, Arnaldo Carvalho de Melo, ast, daniel,
andrii, martin.lau, eddyz87, song, john.fastabend, kpsingh, sdf,
haoluo, jolsa, mykolal, shuah, ihor.solodrai, bpf, linux-kernel,
linux-kselftest, linux-perf-users, Howard Chu
Hello,
On Sun, Aug 31, 2025 at 09:17:34PM -0700, Yonghong Song wrote:
>
>
> On 8/29/25 10:19 PM, Ian Rogers wrote:
> > On Thu, Jun 5, 2025 at 2:00 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
> > > Hello:
> > >
> > > This series was applied to bpf/bpf-next.git (master)
> > > by Andrii Nakryiko <andrii@kernel.org>:
> > >
> > > On Tue, 3 Jun 2025 13:37:00 -0700 you wrote:
> > > > The BTF dumper code currently displays arrays of characters as just that -
> > > > arrays, with each character formatted individually. Sometimes this is what
> > > > makes sense, but it's nice to be able to treat that array as a string.
> > > >
> > > > This change adds a special case to the btf_dump functionality to allow
> > > > 0-terminated arrays of single-byte integer values to be printed as
> > > > character strings. Characters for which isprint() returns false are
> > > > printed as hex-escaped values. This is enabled when the new ".emit_strings"
> > > > is set to 1 in the btf_dump_type_data_opts structure.
> > > >
> > > > [...]
> > > Here is the summary with links:
> > > - [v3,1/2] libbpf: add support for printing BTF character arrays as strings
> > > https://git.kernel.org/bpf/bpf-next/c/87c9c79a02b4
> > > - [v3,2/2] Tests for the ".emit_strings" functionality in the BTF dumper.
> > > https://git.kernel.org/bpf/bpf-next/c/a570f386f3d1
> > >
> > > You are awesome, thank you!
> > I believe this patch is responsible for segvs occurring in v6.17 in
> > various perf tests when the perf tests run in parallel. There's lots
>
> Could you share the command line to reproduce this failure?
> This will help debugging. Thanks!
My reproducer is below:
terminal 1: run perf trace in a loop.
$ while true; do sudo ./perf trace true; done
terminal 2: run perf record in a loop until hit the segfault.
$ while true; do sudo ./perf record true || break; done
...
perf: Segmentation fault
#0 0x560b2db790e4 in dump_stack debug.c:366
#1 0x560b2db7915a in sighandler_dump_stack debug.c:378
#2 0x560b2d973b1b in sigsegv_handler builtin-record.c:722
#3 0x7f975f249df0 in __restore_rt libc_sigaction.c:0
#4 0x560b2dca1ee6 in snprintf_hex bpf-event.c:39
#5 0x560b2dca2306 in synthesize_bpf_prog_name bpf-event.c:144
#6 0x560b2dca2d92 in bpf_metadata_create bpf-event.c:401
#7 0x560b2dca3838 in perf_event__synthesize_one_bpf_prog bpf-event.c:673
#8 0x560b2dca3dd5 in perf_event__synthesize_bpf_events bpf-event.c:798
#9 0x560b2d977ef5 in record__synthesize builtin-record.c:2131
#10 0x560b2d9797c1 in __cmd_record builtin-record.c:2581
#11 0x560b2d97db30 in cmd_record builtin-record.c:4376
#12 0x560b2da0672e in run_builtin perf.c:349
#13 0x560b2da069c6 in handle_internal_command perf.c:401
#14 0x560b2da06b1f in run_argv perf.c:448
#15 0x560b2da06e68 in main perf.c:555
#16 0x7f975f233ca8 in __libc_start_call_main libc_start_call_main.h:74
#17 0x7f975f233d65 in __libc_start_main_alias_2 libc-start.c:128
#18 0x560b2d959b11 in _start perf[4cb11]
I manually ran it with gdb to get some more hints.
Thread 1 "perf" received signal SIGSEGV, Segmentation fault.
0x00005555558e8ee6 in snprintf_hex (buf=0x5555562c1d79 "", size=503, data=0x40 <error: Cannot access memory at address 0x40>, len=8)
at util/bpf-event.c:39
39 ret += snprintf(buf + ret, size - ret, "%02x", data[i]);
The data is bpf_prog_info->prog_tags and it's called from
synthesize_bpf_prog_name().
(gdb) bt
#0 0x00005555558e8ee6 in snprintf_hex (buf=0x5555562c1d79 "", size=503, data=0x40 <error: Cannot access memory at address 0x40>,
len=8) at util/bpf-event.c:39
#1 0x00005555558e9306 in synthesize_bpf_prog_name (buf=0x5555562c1d70 "bpf_prog_", size=512, info=0x55555665e400, btf=0x5555562c5630,
sub_id=0) at util/bpf-event.c:144
#2 0x00005555558e9db5 in bpf_metadata_create (info=0x55555665e400) at util/bpf-event.c:403
#3 0x00005555558ea85b in perf_event__synthesize_one_bpf_prog (session=0x555556178510,
process=0x5555555ba7ab <process_synthesized_event>, machine=0x555556178728, fd=25, event=0x5555561b73a0,
opts=0x5555560d33a8 <record+328>) at util/bpf-event.c:674
#4 0x00005555558eadf8 in perf_event__synthesize_bpf_events (session=0x555556178510,
process=0x5555555ba7ab <process_synthesized_event>, machine=0x555556178728, opts=0x5555560d33a8 <record+328>)
at util/bpf-event.c:799
#5 0x00005555555beef5 in record__synthesize (rec=0x5555560d3260 <record>, tail=false) at builtin-record.c:2131
#6 0x00005555555c07c1 in __cmd_record (rec=0x5555560d3260 <record>, argc=1, argv=0x7fffffffe2e0) at builtin-record.c:2581
#7 0x00005555555c4b30 in cmd_record (argc=1, argv=0x7fffffffe2e0) at builtin-record.c:4376
#8 0x000055555564d72e in run_builtin (p=0x5555560d63c0 <commands+288>, argc=6, argv=0x7fffffffe2e0) at perf.c:349
#9 0x000055555564d9c6 in handle_internal_command (argc=6, argv=0x7fffffffe2e0) at perf.c:401
#10 0x000055555564db1f in run_argv (argcp=0x7fffffffe0dc, argv=0x7fffffffe0d0) at perf.c:445
#11 0x000055555564de68 in main (argc=6, argv=0x7fffffffe2e0) at perf.c:553
I seems bpf_prog_info is broken for some reason.
(gdb) up
#1 0x00005555558e9306 in synthesize_bpf_prog_name (buf=0x5555563305b0 "bpf_prog_", size=512, info=0x55555664e1d0, btf=0x55555637ad40,
sub_id=0) at util/bpf-event.c:144
144 name_len += snprintf_hex(buf + name_len, size - name_len,
(gdb) p *info
$1 = {type = 68, id = 80, tag = "\\\000\000\000\214\000\000", jited_prog_len = 152, xlated_prog_len = 164,
jited_prog_insns = 824633721012, xlated_prog_insns = 1185410973912, load_time = 1305670058276, created_by_uid = 352,
nr_map_ids = 364, map_ids = 1975684956608, name = "\330\001\000\000\350\001\000\000$\002\000\0004\002\000", ifindex = 576,
gpl_compatible = 0, netns_dev = 2697239462496, netns_ino = 2834678416000, nr_jited_ksyms = 756, nr_jited_func_lens = 768,
jited_ksyms = 3418793968396, jited_func_lens = 3573412791092, btf_id = 844, func_info_rec_size = 880, func_info = 3934190044028,
nr_func_info = 928, nr_line_info = 952, line_info = 4294967296988, jited_line_info = 4449586119680, nr_jited_line_info = 1060,
line_info_rec_size = 1076, jited_line_info_rec_size = 1092, nr_prog_tags = 1108, prog_tags = 4861902980192,
run_time_ns = 5085241279632, run_cnt = 5257039971512, recursion_misses = 5360119186644, verified_insns = 1264,
attach_btf_obj_id = 1288, attach_btf_id = 1312}
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] libbpf: add support for printing BTF character arrays as strings
2025-09-01 7:57 ` Namhyung Kim
@ 2025-09-01 17:04 ` Ian Rogers
2025-09-02 16:47 ` Ian Rogers
0 siblings, 1 reply; 7+ messages in thread
From: Ian Rogers @ 2025-09-01 17:04 UTC (permalink / raw)
To: Namhyung Kim, song, Yonghong Song, jolsa,
Arnaldo Carvalho de Melo
Cc: Blake Jones, ast, daniel, andrii, martin.lau, eddyz87,
john.fastabend, kpsingh, sdf, haoluo, mykolal, shuah,
ihor.solodrai, bpf, linux-kernel, linux-kselftest,
linux-perf-users, Howard Chu
On Mon, Sep 1, 2025 at 12:57 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> On Sun, Aug 31, 2025 at 09:17:34PM -0700, Yonghong Song wrote:
> >
> >
> > On 8/29/25 10:19 PM, Ian Rogers wrote:
> > > On Thu, Jun 5, 2025 at 2:00 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
> > > > Hello:
> > > >
> > > > This series was applied to bpf/bpf-next.git (master)
> > > > by Andrii Nakryiko <andrii@kernel.org>:
> > > >
> > > > On Tue, 3 Jun 2025 13:37:00 -0700 you wrote:
> > > > > The BTF dumper code currently displays arrays of characters as just that -
> > > > > arrays, with each character formatted individually. Sometimes this is what
> > > > > makes sense, but it's nice to be able to treat that array as a string.
> > > > >
> > > > > This change adds a special case to the btf_dump functionality to allow
> > > > > 0-terminated arrays of single-byte integer values to be printed as
> > > > > character strings. Characters for which isprint() returns false are
> > > > > printed as hex-escaped values. This is enabled when the new ".emit_strings"
> > > > > is set to 1 in the btf_dump_type_data_opts structure.
> > > > >
> > > > > [...]
> > > > Here is the summary with links:
> > > > - [v3,1/2] libbpf: add support for printing BTF character arrays as strings
> > > > https://git.kernel.org/bpf/bpf-next/c/87c9c79a02b4
> > > > - [v3,2/2] Tests for the ".emit_strings" functionality in the BTF dumper.
> > > > https://git.kernel.org/bpf/bpf-next/c/a570f386f3d1
> > > >
> > > > You are awesome, thank you!
> > > I believe this patch is responsible for segvs occurring in v6.17 in
> > > various perf tests when the perf tests run in parallel. There's lots
> >
> > Could you share the command line to reproduce this failure?
> > This will help debugging. Thanks!
>
> My reproducer is below:
>
> terminal 1: run perf trace in a loop.
>
> $ while true; do sudo ./perf trace true; done
>
> terminal 2: run perf record in a loop until hit the segfault.
>
> $ while true; do sudo ./perf record true || break; done
> ...
> perf: Segmentation fault
> #0 0x560b2db790e4 in dump_stack debug.c:366
> #1 0x560b2db7915a in sighandler_dump_stack debug.c:378
> #2 0x560b2d973b1b in sigsegv_handler builtin-record.c:722
> #3 0x7f975f249df0 in __restore_rt libc_sigaction.c:0
> #4 0x560b2dca1ee6 in snprintf_hex bpf-event.c:39
> #5 0x560b2dca2306 in synthesize_bpf_prog_name bpf-event.c:144
> #6 0x560b2dca2d92 in bpf_metadata_create bpf-event.c:401
> #7 0x560b2dca3838 in perf_event__synthesize_one_bpf_prog bpf-event.c:673
> #8 0x560b2dca3dd5 in perf_event__synthesize_bpf_events bpf-event.c:798
> #9 0x560b2d977ef5 in record__synthesize builtin-record.c:2131
> #10 0x560b2d9797c1 in __cmd_record builtin-record.c:2581
> #11 0x560b2d97db30 in cmd_record builtin-record.c:4376
> #12 0x560b2da0672e in run_builtin perf.c:349
> #13 0x560b2da069c6 in handle_internal_command perf.c:401
> #14 0x560b2da06b1f in run_argv perf.c:448
> #15 0x560b2da06e68 in main perf.c:555
> #16 0x7f975f233ca8 in __libc_start_call_main libc_start_call_main.h:74
> #17 0x7f975f233d65 in __libc_start_main_alias_2 libc-start.c:128
> #18 0x560b2d959b11 in _start perf[4cb11]
>
>
> I manually ran it with gdb to get some more hints.
>
> Thread 1 "perf" received signal SIGSEGV, Segmentation fault.
> 0x00005555558e8ee6 in snprintf_hex (buf=0x5555562c1d79 "", size=503, data=0x40 <error: Cannot access memory at address 0x40>, len=8)
> at util/bpf-event.c:39
> 39 ret += snprintf(buf + ret, size - ret, "%02x", data[i]);
>
> The data is bpf_prog_info->prog_tags and it's called from
> synthesize_bpf_prog_name().
>
> (gdb) bt
> #0 0x00005555558e8ee6 in snprintf_hex (buf=0x5555562c1d79 "", size=503, data=0x40 <error: Cannot access memory at address 0x40>,
> len=8) at util/bpf-event.c:39
> #1 0x00005555558e9306 in synthesize_bpf_prog_name (buf=0x5555562c1d70 "bpf_prog_", size=512, info=0x55555665e400, btf=0x5555562c5630,
> sub_id=0) at util/bpf-event.c:144
> #2 0x00005555558e9db5 in bpf_metadata_create (info=0x55555665e400) at util/bpf-event.c:403
> #3 0x00005555558ea85b in perf_event__synthesize_one_bpf_prog (session=0x555556178510,
> process=0x5555555ba7ab <process_synthesized_event>, machine=0x555556178728, fd=25, event=0x5555561b73a0,
> opts=0x5555560d33a8 <record+328>) at util/bpf-event.c:674
> #4 0x00005555558eadf8 in perf_event__synthesize_bpf_events (session=0x555556178510,
> process=0x5555555ba7ab <process_synthesized_event>, machine=0x555556178728, opts=0x5555560d33a8 <record+328>)
> at util/bpf-event.c:799
> #5 0x00005555555beef5 in record__synthesize (rec=0x5555560d3260 <record>, tail=false) at builtin-record.c:2131
> #6 0x00005555555c07c1 in __cmd_record (rec=0x5555560d3260 <record>, argc=1, argv=0x7fffffffe2e0) at builtin-record.c:2581
> #7 0x00005555555c4b30 in cmd_record (argc=1, argv=0x7fffffffe2e0) at builtin-record.c:4376
> #8 0x000055555564d72e in run_builtin (p=0x5555560d63c0 <commands+288>, argc=6, argv=0x7fffffffe2e0) at perf.c:349
> #9 0x000055555564d9c6 in handle_internal_command (argc=6, argv=0x7fffffffe2e0) at perf.c:401
> #10 0x000055555564db1f in run_argv (argcp=0x7fffffffe0dc, argv=0x7fffffffe0d0) at perf.c:445
> #11 0x000055555564de68 in main (argc=6, argv=0x7fffffffe2e0) at perf.c:553
>
> I seems bpf_prog_info is broken for some reason.
>
> (gdb) up
> #1 0x00005555558e9306 in synthesize_bpf_prog_name (buf=0x5555563305b0 "bpf_prog_", size=512, info=0x55555664e1d0, btf=0x55555637ad40,
> sub_id=0) at util/bpf-event.c:144
> 144 name_len += snprintf_hex(buf + name_len, size - name_len,
>
> (gdb) p *info
> $1 = {type = 68, id = 80, tag = "\\\000\000\000\214\000\000", jited_prog_len = 152, xlated_prog_len = 164,
> jited_prog_insns = 824633721012, xlated_prog_insns = 1185410973912, load_time = 1305670058276, created_by_uid = 352,
> nr_map_ids = 364, map_ids = 1975684956608, name = "\330\001\000\000\350\001\000\000$\002\000\0004\002\000", ifindex = 576,
> gpl_compatible = 0, netns_dev = 2697239462496, netns_ino = 2834678416000, nr_jited_ksyms = 756, nr_jited_func_lens = 768,
> jited_ksyms = 3418793968396, jited_func_lens = 3573412791092, btf_id = 844, func_info_rec_size = 880, func_info = 3934190044028,
> nr_func_info = 928, nr_line_info = 952, line_info = 4294967296988, jited_line_info = 4449586119680, nr_jited_line_info = 1060,
> line_info_rec_size = 1076, jited_line_info_rec_size = 1092, nr_prog_tags = 1108, prog_tags = 4861902980192,
> run_time_ns = 5085241279632, run_cnt = 5257039971512, recursion_misses = 5360119186644, verified_insns = 1264,
> attach_btf_obj_id = 1288, attach_btf_id = 1312}
Thanks Namhyung!
So it looks like my "fix" was breaking the tools/perf build feature
test for btf_dump_type_data_opts opts.emit_strings and that was
avoiding this code.
Having terminal 1 run perf trace is going to be loading/unloading a
BPF program for system call augmentation. This must be creating the
race condition that is causing perf record to segv when it is
inspecting the bpf_prog_info.
The cast in:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-event.c#n135
```
static int synthesize_bpf_prog_name(char *buf, int size,
struct bpf_prog_info *info,
struct btf *btf,
u32 sub_id)
{
u8 (*prog_tags)[BPF_TAG_SIZE] = (void *)(uintptr_t)(info->prog_tags);
```
looks concerning given the bad address comes from:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-event.c#n144
```
name_len += snprintf_hex(buf + name_len, size - name_len,
prog_tags[sub_id], BPF_TAG_SIZE);
```
Checking git blame this code has existed since 2019, commit
7b612e291a5a ("perf tools: Synthesize PERF_RECORD_* for loaded BPF
programs"):
http://lkml.kernel.org/r/20190117161521.1341602-8-songliubraving@fb.com
it was refactored in 2019 to a single memory allocation commit
("a742258af131 perf bpf: Synthesize bpf events with
bpf_program__get_prog_info_linear()")
http://lkml.kernel.org/r/20190312053051.2690567-5-songliubraving@fb.com
There seems like a potential race here:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-utils.c#n123
```
/* step 1: get array dimensions */
err = bpf_obj_get_info_by_fd(fd, &info, &info_len);
```
and later:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-utils.c#n185
```
/* step 5: call syscall again to get required arrays */
err = bpf_obj_get_info_by_fd(fd, &info_linear->info, &info_len);
```
There's a verification step that looks to cover issues with the race.
I thought making those warnings fatal may help, but no:
```
--- a/tools/perf/util/bpf-utils.c
+++ b/tools/perf/util/bpf-utils.c
@@ -202,14 +202,20 @@ get_bpf_prog_info_linear(int fd, __u64 arrays)
v1 = bpf_prog_info_read_offset_u32(&info, desc->count_offset);
v2 = bpf_prog_info_read_offset_u32(&info_linear->info,
desc->count_offset);
- if (v1 != v2)
+ if (v1 != v2) {
pr_warning("%s: mismatch in element count\n", __func__);
+ free(info_linear);
+ return ERR_PTR(-EFAULT);
+ }
v1 = bpf_prog_info_read_offset_u32(&info, desc->size_offset);
v2 = bpf_prog_info_read_offset_u32(&info_linear->info,
desc->size_offset);
- if (v1 != v2)
+ if (v1 != v2) {
pr_warning("%s: mismatch in rec size\n", __func__);
+ free(info_linear);
+ return ERR_PTR(-EFAULT);
+ }
}
/* step 7: update info_len and data_len */
```
Fwiw, the address of "data=0x40" in the stack trace makes it looks
like an offset has been applied to NULL. 0x40 is 64 which corresponds
with "name" info a bpf_prog_info by way of pahole:
```
struct bpf_prog_info {
__u32 type; /* 0 4 */
/* --- cacheline 1 boundary (64 bytes) --- */
...
char name[16]; /* 64 16 */
```
I feel we're relatively close to discovering a proper fix for the
issue, if others could lend a hand as I'm not overly familiar with the
BPF code. I'm wondering if the second bpf_obj_get_info_by_fd could be
filling in offsets relative to NULL rather than returning an error,
but this would be (I believe) a kernel issue :-(
Thanks,
Ian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] libbpf: add support for printing BTF character arrays as strings
2025-09-01 17:04 ` Ian Rogers
@ 2025-09-02 16:47 ` Ian Rogers
2025-09-02 18:20 ` Ian Rogers
0 siblings, 1 reply; 7+ messages in thread
From: Ian Rogers @ 2025-09-02 16:47 UTC (permalink / raw)
To: Namhyung Kim, song, Yonghong Song, jolsa,
Arnaldo Carvalho de Melo
Cc: Blake Jones, ast, daniel, andrii, martin.lau, eddyz87,
john.fastabend, kpsingh, sdf, haoluo, mykolal, shuah,
ihor.solodrai, bpf, linux-kernel, linux-kselftest,
linux-perf-users, Howard Chu
On Mon, Sep 1, 2025 at 10:04 AM Ian Rogers <irogers@google.com> wrote:
>
> On Mon, Sep 1, 2025 at 12:57 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hello,
> >
> > On Sun, Aug 31, 2025 at 09:17:34PM -0700, Yonghong Song wrote:
> > >
> > >
> > > On 8/29/25 10:19 PM, Ian Rogers wrote:
> > > > On Thu, Jun 5, 2025 at 2:00 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
> > > > > Hello:
> > > > >
> > > > > This series was applied to bpf/bpf-next.git (master)
> > > > > by Andrii Nakryiko <andrii@kernel.org>:
> > > > >
> > > > > On Tue, 3 Jun 2025 13:37:00 -0700 you wrote:
> > > > > > The BTF dumper code currently displays arrays of characters as just that -
> > > > > > arrays, with each character formatted individually. Sometimes this is what
> > > > > > makes sense, but it's nice to be able to treat that array as a string.
> > > > > >
> > > > > > This change adds a special case to the btf_dump functionality to allow
> > > > > > 0-terminated arrays of single-byte integer values to be printed as
> > > > > > character strings. Characters for which isprint() returns false are
> > > > > > printed as hex-escaped values. This is enabled when the new ".emit_strings"
> > > > > > is set to 1 in the btf_dump_type_data_opts structure.
> > > > > >
> > > > > > [...]
> > > > > Here is the summary with links:
> > > > > - [v3,1/2] libbpf: add support for printing BTF character arrays as strings
> > > > > https://git.kernel.org/bpf/bpf-next/c/87c9c79a02b4
> > > > > - [v3,2/2] Tests for the ".emit_strings" functionality in the BTF dumper.
> > > > > https://git.kernel.org/bpf/bpf-next/c/a570f386f3d1
> > > > >
> > > > > You are awesome, thank you!
> > > > I believe this patch is responsible for segvs occurring in v6.17 in
> > > > various perf tests when the perf tests run in parallel. There's lots
> > >
> > > Could you share the command line to reproduce this failure?
> > > This will help debugging. Thanks!
> >
> > My reproducer is below:
> >
> > terminal 1: run perf trace in a loop.
> >
> > $ while true; do sudo ./perf trace true; done
> >
> > terminal 2: run perf record in a loop until hit the segfault.
> >
> > $ while true; do sudo ./perf record true || break; done
> > ...
> > perf: Segmentation fault
> > #0 0x560b2db790e4 in dump_stack debug.c:366
> > #1 0x560b2db7915a in sighandler_dump_stack debug.c:378
> > #2 0x560b2d973b1b in sigsegv_handler builtin-record.c:722
> > #3 0x7f975f249df0 in __restore_rt libc_sigaction.c:0
> > #4 0x560b2dca1ee6 in snprintf_hex bpf-event.c:39
> > #5 0x560b2dca2306 in synthesize_bpf_prog_name bpf-event.c:144
> > #6 0x560b2dca2d92 in bpf_metadata_create bpf-event.c:401
> > #7 0x560b2dca3838 in perf_event__synthesize_one_bpf_prog bpf-event.c:673
> > #8 0x560b2dca3dd5 in perf_event__synthesize_bpf_events bpf-event.c:798
> > #9 0x560b2d977ef5 in record__synthesize builtin-record.c:2131
> > #10 0x560b2d9797c1 in __cmd_record builtin-record.c:2581
> > #11 0x560b2d97db30 in cmd_record builtin-record.c:4376
> > #12 0x560b2da0672e in run_builtin perf.c:349
> > #13 0x560b2da069c6 in handle_internal_command perf.c:401
> > #14 0x560b2da06b1f in run_argv perf.c:448
> > #15 0x560b2da06e68 in main perf.c:555
> > #16 0x7f975f233ca8 in __libc_start_call_main libc_start_call_main.h:74
> > #17 0x7f975f233d65 in __libc_start_main_alias_2 libc-start.c:128
> > #18 0x560b2d959b11 in _start perf[4cb11]
> >
> >
> > I manually ran it with gdb to get some more hints.
> >
> > Thread 1 "perf" received signal SIGSEGV, Segmentation fault.
> > 0x00005555558e8ee6 in snprintf_hex (buf=0x5555562c1d79 "", size=503, data=0x40 <error: Cannot access memory at address 0x40>, len=8)
> > at util/bpf-event.c:39
> > 39 ret += snprintf(buf + ret, size - ret, "%02x", data[i]);
> >
> > The data is bpf_prog_info->prog_tags and it's called from
> > synthesize_bpf_prog_name().
> >
> > (gdb) bt
> > #0 0x00005555558e8ee6 in snprintf_hex (buf=0x5555562c1d79 "", size=503, data=0x40 <error: Cannot access memory at address 0x40>,
> > len=8) at util/bpf-event.c:39
> > #1 0x00005555558e9306 in synthesize_bpf_prog_name (buf=0x5555562c1d70 "bpf_prog_", size=512, info=0x55555665e400, btf=0x5555562c5630,
> > sub_id=0) at util/bpf-event.c:144
> > #2 0x00005555558e9db5 in bpf_metadata_create (info=0x55555665e400) at util/bpf-event.c:403
> > #3 0x00005555558ea85b in perf_event__synthesize_one_bpf_prog (session=0x555556178510,
> > process=0x5555555ba7ab <process_synthesized_event>, machine=0x555556178728, fd=25, event=0x5555561b73a0,
> > opts=0x5555560d33a8 <record+328>) at util/bpf-event.c:674
> > #4 0x00005555558eadf8 in perf_event__synthesize_bpf_events (session=0x555556178510,
> > process=0x5555555ba7ab <process_synthesized_event>, machine=0x555556178728, opts=0x5555560d33a8 <record+328>)
> > at util/bpf-event.c:799
> > #5 0x00005555555beef5 in record__synthesize (rec=0x5555560d3260 <record>, tail=false) at builtin-record.c:2131
> > #6 0x00005555555c07c1 in __cmd_record (rec=0x5555560d3260 <record>, argc=1, argv=0x7fffffffe2e0) at builtin-record.c:2581
> > #7 0x00005555555c4b30 in cmd_record (argc=1, argv=0x7fffffffe2e0) at builtin-record.c:4376
> > #8 0x000055555564d72e in run_builtin (p=0x5555560d63c0 <commands+288>, argc=6, argv=0x7fffffffe2e0) at perf.c:349
> > #9 0x000055555564d9c6 in handle_internal_command (argc=6, argv=0x7fffffffe2e0) at perf.c:401
> > #10 0x000055555564db1f in run_argv (argcp=0x7fffffffe0dc, argv=0x7fffffffe0d0) at perf.c:445
> > #11 0x000055555564de68 in main (argc=6, argv=0x7fffffffe2e0) at perf.c:553
> >
> > I seems bpf_prog_info is broken for some reason.
> >
> > (gdb) up
> > #1 0x00005555558e9306 in synthesize_bpf_prog_name (buf=0x5555563305b0 "bpf_prog_", size=512, info=0x55555664e1d0, btf=0x55555637ad40,
> > sub_id=0) at util/bpf-event.c:144
> > 144 name_len += snprintf_hex(buf + name_len, size - name_len,
> >
> > (gdb) p *info
> > $1 = {type = 68, id = 80, tag = "\\\000\000\000\214\000\000", jited_prog_len = 152, xlated_prog_len = 164,
> > jited_prog_insns = 824633721012, xlated_prog_insns = 1185410973912, load_time = 1305670058276, created_by_uid = 352,
> > nr_map_ids = 364, map_ids = 1975684956608, name = "\330\001\000\000\350\001\000\000$\002\000\0004\002\000", ifindex = 576,
> > gpl_compatible = 0, netns_dev = 2697239462496, netns_ino = 2834678416000, nr_jited_ksyms = 756, nr_jited_func_lens = 768,
> > jited_ksyms = 3418793968396, jited_func_lens = 3573412791092, btf_id = 844, func_info_rec_size = 880, func_info = 3934190044028,
> > nr_func_info = 928, nr_line_info = 952, line_info = 4294967296988, jited_line_info = 4449586119680, nr_jited_line_info = 1060,
> > line_info_rec_size = 1076, jited_line_info_rec_size = 1092, nr_prog_tags = 1108, prog_tags = 4861902980192,
> > run_time_ns = 5085241279632, run_cnt = 5257039971512, recursion_misses = 5360119186644, verified_insns = 1264,
> > attach_btf_obj_id = 1288, attach_btf_id = 1312}
>
> Thanks Namhyung!
>
> So it looks like my "fix" was breaking the tools/perf build feature
> test for btf_dump_type_data_opts opts.emit_strings and that was
> avoiding this code.
>
> Having terminal 1 run perf trace is going to be loading/unloading a
> BPF program for system call augmentation. This must be creating the
> race condition that is causing perf record to segv when it is
> inspecting the bpf_prog_info.
>
> The cast in:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-event.c#n135
> ```
> static int synthesize_bpf_prog_name(char *buf, int size,
> struct bpf_prog_info *info,
> struct btf *btf,
> u32 sub_id)
> {
> u8 (*prog_tags)[BPF_TAG_SIZE] = (void *)(uintptr_t)(info->prog_tags);
> ```
> looks concerning given the bad address comes from:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-event.c#n144
> ```
> name_len += snprintf_hex(buf + name_len, size - name_len,
> prog_tags[sub_id], BPF_TAG_SIZE);
> ```
> Checking git blame this code has existed since 2019, commit
> 7b612e291a5a ("perf tools: Synthesize PERF_RECORD_* for loaded BPF
> programs"):
> http://lkml.kernel.org/r/20190117161521.1341602-8-songliubraving@fb.com
> it was refactored in 2019 to a single memory allocation commit
> ("a742258af131 perf bpf: Synthesize bpf events with
> bpf_program__get_prog_info_linear()")
> http://lkml.kernel.org/r/20190312053051.2690567-5-songliubraving@fb.com
>
> There seems like a potential race here:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-utils.c#n123
> ```
> /* step 1: get array dimensions */
> err = bpf_obj_get_info_by_fd(fd, &info, &info_len);
> ```
> and later:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-utils.c#n185
> ```
> /* step 5: call syscall again to get required arrays */
> err = bpf_obj_get_info_by_fd(fd, &info_linear->info, &info_len);
> ```
> There's a verification step that looks to cover issues with the race.
> I thought making those warnings fatal may help, but no:
> ```
> --- a/tools/perf/util/bpf-utils.c
> +++ b/tools/perf/util/bpf-utils.c
> @@ -202,14 +202,20 @@ get_bpf_prog_info_linear(int fd, __u64 arrays)
> v1 = bpf_prog_info_read_offset_u32(&info, desc->count_offset);
> v2 = bpf_prog_info_read_offset_u32(&info_linear->info,
> desc->count_offset);
> - if (v1 != v2)
> + if (v1 != v2) {
> pr_warning("%s: mismatch in element count\n", __func__);
> + free(info_linear);
> + return ERR_PTR(-EFAULT);
> + }
>
> v1 = bpf_prog_info_read_offset_u32(&info, desc->size_offset);
> v2 = bpf_prog_info_read_offset_u32(&info_linear->info,
> desc->size_offset);
> - if (v1 != v2)
> + if (v1 != v2) {
> pr_warning("%s: mismatch in rec size\n", __func__);
> + free(info_linear);
> + return ERR_PTR(-EFAULT);
> + }
> }
>
> /* step 7: update info_len and data_len */
> ```
>
> Fwiw, the address of "data=0x40" in the stack trace makes it looks
> like an offset has been applied to NULL. 0x40 is 64 which corresponds
> with "name" info a bpf_prog_info by way of pahole:
> ```
> struct bpf_prog_info {
> __u32 type; /* 0 4 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> ...
> char name[16]; /* 64 16 */
> ```
>
> I feel we're relatively close to discovering a proper fix for the
> issue, if others could lend a hand as I'm not overly familiar with the
> BPF code. I'm wondering if the second bpf_obj_get_info_by_fd could be
> filling in offsets relative to NULL rather than returning an error,
> but this would be (I believe) a kernel issue :-(
It's a use-after-free perf bug:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/bpf-event.c?h=perf-tools-next#n660
```
/* save bpf_prog_info to env */
info_node = malloc(sizeof(struct bpf_prog_info_node));
if (!info_node) {
err = -1;
goto out;
}
info_node->info_linear = info_linear;
info_node->metadata = NULL;
if (!perf_env__insert_bpf_prog_info(env, info_node)) {
free(info_linear);
free(info_node);
}
info_linear = NULL;
...
/* Synthesize PERF_RECORD_BPF_METADATA */
metadata = bpf_metadata_create(info);
```
The insert can fail due to a duplicated BPF program (the sideband data
thread is racing with the synthesizing thread in perf) leading to
info_linear to be freed which the info points into. The later use of
the freed memory leads to the crash.
I hope that's all it is and I'll work on a fix.
Thanks,
Ian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] libbpf: add support for printing BTF character arrays as strings
2025-09-02 16:47 ` Ian Rogers
@ 2025-09-02 18:20 ` Ian Rogers
0 siblings, 0 replies; 7+ messages in thread
From: Ian Rogers @ 2025-09-02 18:20 UTC (permalink / raw)
To: Namhyung Kim, song, Yonghong Song, jolsa,
Arnaldo Carvalho de Melo
Cc: Blake Jones, ast, daniel, andrii, martin.lau, eddyz87,
john.fastabend, kpsingh, sdf, haoluo, mykolal, shuah,
ihor.solodrai, bpf, linux-kernel, linux-kselftest,
linux-perf-users, Howard Chu
On Tue, Sep 2, 2025 at 9:47 AM Ian Rogers <irogers@google.com> wrote:
>
> On Mon, Sep 1, 2025 at 10:04 AM Ian Rogers <irogers@google.com> wrote:
> >
> > On Mon, Sep 1, 2025 at 12:57 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Hello,
> > >
> > > On Sun, Aug 31, 2025 at 09:17:34PM -0700, Yonghong Song wrote:
> > > >
> > > >
> > > > On 8/29/25 10:19 PM, Ian Rogers wrote:
> > > > > On Thu, Jun 5, 2025 at 2:00 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
> > > > > > Hello:
> > > > > >
> > > > > > This series was applied to bpf/bpf-next.git (master)
> > > > > > by Andrii Nakryiko <andrii@kernel.org>:
> > > > > >
> > > > > > On Tue, 3 Jun 2025 13:37:00 -0700 you wrote:
> > > > > > > The BTF dumper code currently displays arrays of characters as just that -
> > > > > > > arrays, with each character formatted individually. Sometimes this is what
> > > > > > > makes sense, but it's nice to be able to treat that array as a string.
> > > > > > >
> > > > > > > This change adds a special case to the btf_dump functionality to allow
> > > > > > > 0-terminated arrays of single-byte integer values to be printed as
> > > > > > > character strings. Characters for which isprint() returns false are
> > > > > > > printed as hex-escaped values. This is enabled when the new ".emit_strings"
> > > > > > > is set to 1 in the btf_dump_type_data_opts structure.
> > > > > > >
> > > > > > > [...]
> > > > > > Here is the summary with links:
> > > > > > - [v3,1/2] libbpf: add support for printing BTF character arrays as strings
> > > > > > https://git.kernel.org/bpf/bpf-next/c/87c9c79a02b4
> > > > > > - [v3,2/2] Tests for the ".emit_strings" functionality in the BTF dumper.
> > > > > > https://git.kernel.org/bpf/bpf-next/c/a570f386f3d1
> > > > > >
> > > > > > You are awesome, thank you!
> > > > > I believe this patch is responsible for segvs occurring in v6.17 in
> > > > > various perf tests when the perf tests run in parallel. There's lots
> > > >
> > > > Could you share the command line to reproduce this failure?
> > > > This will help debugging. Thanks!
> > >
> > > My reproducer is below:
> > >
> > > terminal 1: run perf trace in a loop.
> > >
> > > $ while true; do sudo ./perf trace true; done
> > >
> > > terminal 2: run perf record in a loop until hit the segfault.
> > >
> > > $ while true; do sudo ./perf record true || break; done
> > > ...
> > > perf: Segmentation fault
> > > #0 0x560b2db790e4 in dump_stack debug.c:366
> > > #1 0x560b2db7915a in sighandler_dump_stack debug.c:378
> > > #2 0x560b2d973b1b in sigsegv_handler builtin-record.c:722
> > > #3 0x7f975f249df0 in __restore_rt libc_sigaction.c:0
> > > #4 0x560b2dca1ee6 in snprintf_hex bpf-event.c:39
> > > #5 0x560b2dca2306 in synthesize_bpf_prog_name bpf-event.c:144
> > > #6 0x560b2dca2d92 in bpf_metadata_create bpf-event.c:401
> > > #7 0x560b2dca3838 in perf_event__synthesize_one_bpf_prog bpf-event.c:673
> > > #8 0x560b2dca3dd5 in perf_event__synthesize_bpf_events bpf-event.c:798
> > > #9 0x560b2d977ef5 in record__synthesize builtin-record.c:2131
> > > #10 0x560b2d9797c1 in __cmd_record builtin-record.c:2581
> > > #11 0x560b2d97db30 in cmd_record builtin-record.c:4376
> > > #12 0x560b2da0672e in run_builtin perf.c:349
> > > #13 0x560b2da069c6 in handle_internal_command perf.c:401
> > > #14 0x560b2da06b1f in run_argv perf.c:448
> > > #15 0x560b2da06e68 in main perf.c:555
> > > #16 0x7f975f233ca8 in __libc_start_call_main libc_start_call_main.h:74
> > > #17 0x7f975f233d65 in __libc_start_main_alias_2 libc-start.c:128
> > > #18 0x560b2d959b11 in _start perf[4cb11]
> > >
> > >
> > > I manually ran it with gdb to get some more hints.
> > >
> > > Thread 1 "perf" received signal SIGSEGV, Segmentation fault.
> > > 0x00005555558e8ee6 in snprintf_hex (buf=0x5555562c1d79 "", size=503, data=0x40 <error: Cannot access memory at address 0x40>, len=8)
> > > at util/bpf-event.c:39
> > > 39 ret += snprintf(buf + ret, size - ret, "%02x", data[i]);
> > >
> > > The data is bpf_prog_info->prog_tags and it's called from
> > > synthesize_bpf_prog_name().
> > >
> > > (gdb) bt
> > > #0 0x00005555558e8ee6 in snprintf_hex (buf=0x5555562c1d79 "", size=503, data=0x40 <error: Cannot access memory at address 0x40>,
> > > len=8) at util/bpf-event.c:39
> > > #1 0x00005555558e9306 in synthesize_bpf_prog_name (buf=0x5555562c1d70 "bpf_prog_", size=512, info=0x55555665e400, btf=0x5555562c5630,
> > > sub_id=0) at util/bpf-event.c:144
> > > #2 0x00005555558e9db5 in bpf_metadata_create (info=0x55555665e400) at util/bpf-event.c:403
> > > #3 0x00005555558ea85b in perf_event__synthesize_one_bpf_prog (session=0x555556178510,
> > > process=0x5555555ba7ab <process_synthesized_event>, machine=0x555556178728, fd=25, event=0x5555561b73a0,
> > > opts=0x5555560d33a8 <record+328>) at util/bpf-event.c:674
> > > #4 0x00005555558eadf8 in perf_event__synthesize_bpf_events (session=0x555556178510,
> > > process=0x5555555ba7ab <process_synthesized_event>, machine=0x555556178728, opts=0x5555560d33a8 <record+328>)
> > > at util/bpf-event.c:799
> > > #5 0x00005555555beef5 in record__synthesize (rec=0x5555560d3260 <record>, tail=false) at builtin-record.c:2131
> > > #6 0x00005555555c07c1 in __cmd_record (rec=0x5555560d3260 <record>, argc=1, argv=0x7fffffffe2e0) at builtin-record.c:2581
> > > #7 0x00005555555c4b30 in cmd_record (argc=1, argv=0x7fffffffe2e0) at builtin-record.c:4376
> > > #8 0x000055555564d72e in run_builtin (p=0x5555560d63c0 <commands+288>, argc=6, argv=0x7fffffffe2e0) at perf.c:349
> > > #9 0x000055555564d9c6 in handle_internal_command (argc=6, argv=0x7fffffffe2e0) at perf.c:401
> > > #10 0x000055555564db1f in run_argv (argcp=0x7fffffffe0dc, argv=0x7fffffffe0d0) at perf.c:445
> > > #11 0x000055555564de68 in main (argc=6, argv=0x7fffffffe2e0) at perf.c:553
> > >
> > > I seems bpf_prog_info is broken for some reason.
> > >
> > > (gdb) up
> > > #1 0x00005555558e9306 in synthesize_bpf_prog_name (buf=0x5555563305b0 "bpf_prog_", size=512, info=0x55555664e1d0, btf=0x55555637ad40,
> > > sub_id=0) at util/bpf-event.c:144
> > > 144 name_len += snprintf_hex(buf + name_len, size - name_len,
> > >
> > > (gdb) p *info
> > > $1 = {type = 68, id = 80, tag = "\\\000\000\000\214\000\000", jited_prog_len = 152, xlated_prog_len = 164,
> > > jited_prog_insns = 824633721012, xlated_prog_insns = 1185410973912, load_time = 1305670058276, created_by_uid = 352,
> > > nr_map_ids = 364, map_ids = 1975684956608, name = "\330\001\000\000\350\001\000\000$\002\000\0004\002\000", ifindex = 576,
> > > gpl_compatible = 0, netns_dev = 2697239462496, netns_ino = 2834678416000, nr_jited_ksyms = 756, nr_jited_func_lens = 768,
> > > jited_ksyms = 3418793968396, jited_func_lens = 3573412791092, btf_id = 844, func_info_rec_size = 880, func_info = 3934190044028,
> > > nr_func_info = 928, nr_line_info = 952, line_info = 4294967296988, jited_line_info = 4449586119680, nr_jited_line_info = 1060,
> > > line_info_rec_size = 1076, jited_line_info_rec_size = 1092, nr_prog_tags = 1108, prog_tags = 4861902980192,
> > > run_time_ns = 5085241279632, run_cnt = 5257039971512, recursion_misses = 5360119186644, verified_insns = 1264,
> > > attach_btf_obj_id = 1288, attach_btf_id = 1312}
> >
> > Thanks Namhyung!
> >
> > So it looks like my "fix" was breaking the tools/perf build feature
> > test for btf_dump_type_data_opts opts.emit_strings and that was
> > avoiding this code.
> >
> > Having terminal 1 run perf trace is going to be loading/unloading a
> > BPF program for system call augmentation. This must be creating the
> > race condition that is causing perf record to segv when it is
> > inspecting the bpf_prog_info.
> >
> > The cast in:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-event.c#n135
> > ```
> > static int synthesize_bpf_prog_name(char *buf, int size,
> > struct bpf_prog_info *info,
> > struct btf *btf,
> > u32 sub_id)
> > {
> > u8 (*prog_tags)[BPF_TAG_SIZE] = (void *)(uintptr_t)(info->prog_tags);
> > ```
> > looks concerning given the bad address comes from:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-event.c#n144
> > ```
> > name_len += snprintf_hex(buf + name_len, size - name_len,
> > prog_tags[sub_id], BPF_TAG_SIZE);
> > ```
> > Checking git blame this code has existed since 2019, commit
> > 7b612e291a5a ("perf tools: Synthesize PERF_RECORD_* for loaded BPF
> > programs"):
> > http://lkml.kernel.org/r/20190117161521.1341602-8-songliubraving@fb.com
> > it was refactored in 2019 to a single memory allocation commit
> > ("a742258af131 perf bpf: Synthesize bpf events with
> > bpf_program__get_prog_info_linear()")
> > http://lkml.kernel.org/r/20190312053051.2690567-5-songliubraving@fb.com
> >
> > There seems like a potential race here:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-utils.c#n123
> > ```
> > /* step 1: get array dimensions */
> > err = bpf_obj_get_info_by_fd(fd, &info, &info_len);
> > ```
> > and later:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-utils.c#n185
> > ```
> > /* step 5: call syscall again to get required arrays */
> > err = bpf_obj_get_info_by_fd(fd, &info_linear->info, &info_len);
> > ```
> > There's a verification step that looks to cover issues with the race.
> > I thought making those warnings fatal may help, but no:
> > ```
> > --- a/tools/perf/util/bpf-utils.c
> > +++ b/tools/perf/util/bpf-utils.c
> > @@ -202,14 +202,20 @@ get_bpf_prog_info_linear(int fd, __u64 arrays)
> > v1 = bpf_prog_info_read_offset_u32(&info, desc->count_offset);
> > v2 = bpf_prog_info_read_offset_u32(&info_linear->info,
> > desc->count_offset);
> > - if (v1 != v2)
> > + if (v1 != v2) {
> > pr_warning("%s: mismatch in element count\n", __func__);
> > + free(info_linear);
> > + return ERR_PTR(-EFAULT);
> > + }
> >
> > v1 = bpf_prog_info_read_offset_u32(&info, desc->size_offset);
> > v2 = bpf_prog_info_read_offset_u32(&info_linear->info,
> > desc->size_offset);
> > - if (v1 != v2)
> > + if (v1 != v2) {
> > pr_warning("%s: mismatch in rec size\n", __func__);
> > + free(info_linear);
> > + return ERR_PTR(-EFAULT);
> > + }
> > }
> >
> > /* step 7: update info_len and data_len */
> > ```
> >
> > Fwiw, the address of "data=0x40" in the stack trace makes it looks
> > like an offset has been applied to NULL. 0x40 is 64 which corresponds
> > with "name" info a bpf_prog_info by way of pahole:
> > ```
> > struct bpf_prog_info {
> > __u32 type; /* 0 4 */
> > /* --- cacheline 1 boundary (64 bytes) --- */
> > ...
> > char name[16]; /* 64 16 */
> > ```
> >
> > I feel we're relatively close to discovering a proper fix for the
> > issue, if others could lend a hand as I'm not overly familiar with the
> > BPF code. I'm wondering if the second bpf_obj_get_info_by_fd could be
> > filling in offsets relative to NULL rather than returning an error,
> > but this would be (I believe) a kernel issue :-(
>
> It's a use-after-free perf bug:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/bpf-event.c?h=perf-tools-next#n660
> ```
> /* save bpf_prog_info to env */
> info_node = malloc(sizeof(struct bpf_prog_info_node));
> if (!info_node) {
> err = -1;
> goto out;
> }
>
> info_node->info_linear = info_linear;
> info_node->metadata = NULL;
> if (!perf_env__insert_bpf_prog_info(env, info_node)) {
> free(info_linear);
> free(info_node);
> }
> info_linear = NULL;
> ...
> /* Synthesize PERF_RECORD_BPF_METADATA */
> metadata = bpf_metadata_create(info);
> ```
> The insert can fail due to a duplicated BPF program (the sideband data
> thread is racing with the synthesizing thread in perf) leading to
> info_linear to be freed which the info points into. The later use of
> the freed memory leads to the crash.
>
> I hope that's all it is and I'll work on a fix.
That was all. Fixes sent in:
https://lore.kernel.org/all/20250902181713.309797-1-irogers@google.com/
Thanks,
Ian
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-02 18:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250603203701.520541-1-blakejones@google.com>
[not found] ` <174915723301.3244853.343931856692302765.git-patchwork-notify@kernel.org>
2025-08-30 5:19 ` [PATCH v3 1/2] libbpf: add support for printing BTF character arrays as strings Ian Rogers
2025-09-01 4:17 ` Yonghong Song
2025-09-01 5:22 ` Ian Rogers
2025-09-01 7:57 ` Namhyung Kim
2025-09-01 17:04 ` Ian Rogers
2025-09-02 16:47 ` Ian Rogers
2025-09-02 18:20 ` Ian Rogers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).