* Re: [PATCH] perf callchain: Handle multiple address spaces
2026-04-14 12:42 [PATCH] perf callchain: Handle multiple address spaces Thomas Richter
@ 2026-04-14 13:13 ` sashiko-bot
2026-04-21 11:10 ` [PATCH Ping ] " Thomas Richter
2026-04-21 16:30 ` [PATCH] " Namhyung Kim
2 siblings, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-04-14 13:13 UTC (permalink / raw)
To: Thomas Richter; +Cc: linux-perf-users
Sashiko AI review found 2 potential issue(s):
- [High] Bypassing the kernel IP check on architectures with split address spaces (like s390) causes kernel IPs to be duplicated into the user space callchain.
- [High] Unbounded and off-by-one array writes to `inject->raw_callchain->ips` cause heap buffer overflows. This problem wasn't introduced by this patch, but existed previously.
--
commit 26481cc9533f02e4ba0dd0fa26a4b3d91969b29b
Author: Thomas Richter <tmricht@linux.ibm.com>
perf callchain: Handle multiple address spaces
The patch modifies perf_event__convert_sample_callchain to handle
architectures like s390 that use separate address spaces for user and
kernel memory. It prevents incorrectly dropping user-space functions that
happen to share the same instruction pointer values as kernel addresses.
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index 5b29f42968611..4f07e90431546 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
[ ... ]
> @@ -439,7 +439,8 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
>
> node = cursor->first;
> for (k = 0; k < cursor->nr && i < PERF_MAX_STACK_DEPTH; k++) {
This wasn't introduced by this patch, but does this loop and the preceding code
write past the end of the inject->raw_callchain->ips array?
The raw_callchain buffer is allocated with
calloc(PERF_MAX_STACK_DEPTH, sizeof(u64)), which allocates 1016 bytes
(127 * 8). The struct ip_callchain begins with an 8-byte nr header,
leaving only 126 elements for the ips flexible array.
The first loop in perf_event__convert_sample_callchain() iterates up to
sample->callchain->nr without any bounds check against PERF_MAX_STACK_DEPTH.
Since sample->callchain->nr comes from the input perf.data file, a
maliciously crafted file might cause a massive unbounded heap buffer overflow
when perf inject processes it.
Additionally, this loop iterates while i < PERF_MAX_STACK_DEPTH
(i.e. up to i = 126), which allows a write to ips[126]. Accessing ips[126]
writes 8 bytes to offset 1016-1023, which is strictly past the 1016-byte
allocation boundary, causing an off-by-one overflow even for well-formed
callchains.
> - if (machine__kernel_ip(machine, node->ip))
> + if (machine->single_address_space &&
> + machine__kernel_ip(machine, node->ip))
Will this bypass of the kernel IP check on architectures with split address
spaces (like s390) cause kernel IPs to be duplicated into the user space
callchain?
The cursor populated by thread__resolve_callchain() contains BOTH the kernel
IPs (resolved from sample->callchain via thread__resolve_callchain_sample())
and the user IPs (unwound from DWARF via thread__resolve_callchain_unwind()).
The first loop in perf_event__convert_sample_callchain() copies the kernel IPs
and the PERF_CONTEXT_USER marker directly from the raw sample->callchain into
the newly injected callchain:
/* copy kernel callchain and context entries */
for (i = 0; i < sample->callchain->nr; i++) {
inject->raw_callchain->ips[i] = sample->callchain->ips[i];
if (sample->callchain->ips[i] == PERF_CONTEXT_USER) {
i++;
break;
}
}
Because the kernel IPs are present at the beginning of the cursor,
the original code relied on machine__kernel_ip() to identify and skip
them so they wouldn't be appended a second time.
By completely bypassing this check when machine->single_address_space is
false, the kernel IPs in the cursor are no longer skipped. Instead, they
might be erroneously appended to the raw callchain after the
PERF_CONTEXT_USER marker.
This duplicates the kernel callchain and incorrectly classifies those kernel
IPs as user-space addresses, corrupting the callchain data in the injected
perf.data file.
> /* kernel IPs were added already */;
> else if (node->ms.sym && node->ms.sym->inlined)
> /* we can't handle inlined callchains */;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260414124241.490185-1-tmricht@linux.ibm.com?part=1
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH Ping ] perf callchain: Handle multiple address spaces
2026-04-14 12:42 [PATCH] perf callchain: Handle multiple address spaces Thomas Richter
2026-04-14 13:13 ` sashiko-bot
@ 2026-04-21 11:10 ` Thomas Richter
2026-04-21 16:30 ` [PATCH] " Namhyung Kim
2 siblings, 0 replies; 4+ messages in thread
From: Thomas Richter @ 2026-04-21 11:10 UTC (permalink / raw)
To: linux-kernel, linux-s390, linux-perf-users, acme, namhyung
Cc: agordeev, gor, sumanthk, hca, japo
Friendly ping...
On 4/14/26 14:42, Thomas Richter wrote:
> perf test 'perf inject to convert DWARF callchains to regular ones'
> fails on s390. It was introduced with
> commit 92ea788d2af4 ("perf inject: Add --convert-callchain option")
>
> The failure comes the difference in output. Without the inject script to
> convert DWARF the callchains is:
> # ./perf record -F 999 --call-graph dwarf -- perf test -w noploop
> # ./perf report -i perf.data --stdio --no-children -q \
> --percent-limit=1 > /tmp/111
> # cat /tmp/111
> 99.30% perf-noploop perf [.] noploop
> |
> ---noploop
> run_workload (inlined)
> cmd_test
> run_builtin (inlined)
> handle_internal_command
> run_argv (inlined)
> main
> __libc_start_call_main
> __libc_start_main_impl (inlined)
> _start
> #
>
> With the inject script step the output is:
> # ./perf inject -i perf.data --convert-callchain -o /tmp/perf-inject-1.out
> # ./perf report -i /tmp/perf-inject-1.out --stdio --no-children -q \
> --percent-limit=1 > /tmp/222
> # cat /tmp/222
> 99.40% perf-noploop perf [.] noploop
> |
> ---noploop
> run_workload (inlined)
> cmd_test
> run_builtin (inlined)
> handle_internal_command
> run_argv (inlined)
> main
> _start
> # diff /tmp/111 /tmp/222
> 1c1
> < 99.30% perf-noploop perf [.] noploop
> ---
> > 99.40% perf-noploop perf [.] noploop
> 10,11d9
> < __libc_start_call_main
> < __libc_start_main_impl (inlined)
> #
>
> The difference are the symbols __libc_start_call_main and
> __libc_start_main_impl.
>
> On x86_64, kernel and user space share a single virtual address space,
> with the kernel mapped to the upper end of memory. The instruction
> pointer value alone is sufficient to distinguish between user space
> and kernel space addresses. This is not true for s390, which uses
> separate address spaces for user and kernel. The same virtual address
> can be valid in both address spaces, so the instruction pointer value
> alone cannot determine whether an address belongs to the kernel or
> user space. Instead, perf must rely on the cpumode metadata derived
> from the processor status word (PSW) at sample time.
>
> In function perf_event__convert_sample_callchain() the first part
> copies a kernel callchain and context entries, if any.
> It then appends additional entries ignoring the address space
> architecture. Taking that into account, the symbols at addresses
>
> 0x3ff970348cb __libc_start_call_main
> 0x3ff970349c5 __libc_start_main_impl
>
> (located after the kernel address space on s390) are now included.
>
> Output before:
> # ./perf test 83
> 83: perf inject to convert DWARF callchains to regular ones : FAILED!
>
> Output after:
> # ./perf test 83
> 83: perf inject to convert DWARF callchains to regular ones : Ok
>
> Question to Namhyung:
> In function perf_event__convert_sample_callchain() just before the
> for() loop this patch modifies, the kernel callchain is copied,
> see this comment and the next 5 lines:
> /* copy kernel callchain and context entries */
> Then why is machine__kernel_ip() needed in the for() loop, when
> the kernel entries have been copied just before the loop?
>
> Note: This patch was tested on x86_64 virtual machine and succeeded.
>
> Fixes: 92ea788d2af4 ("perf inject: Add --convert-callchain option")
> Cc: Namhyung Kim <namhyung@kernel.org>
>
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> ---
> tools/perf/arch/common.c | 4 +++-
> tools/perf/builtin-inject.c | 3 ++-
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
> index 21836f70f231..ad0cab830a4d 100644
> --- a/tools/perf/arch/common.c
> +++ b/tools/perf/arch/common.c
> @@ -237,5 +237,7 @@ int perf_env__lookup_objdump(struct perf_env *env, char **path)
> */
> bool perf_env__single_address_space(struct perf_env *env)
> {
> - return strcmp(perf_env__arch(env), "sparc");
> + const char *arch = perf_env__arch(env);
> +
> + return strcmp(arch, "s390") && strcmp(arch, "sparc");
> }
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index f174bc69cec4..6ab20df358c4 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -438,7 +438,8 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
>
> node = cursor->first;
> for (k = 0; k < cursor->nr && i < PERF_MAX_STACK_DEPTH; k++) {
> - if (machine__kernel_ip(machine, node->ip))
> + if (machine->single_address_space &&
> + machine__kernel_ip(machine, node->ip))
> /* kernel IPs were added already */;
> else if (node->ms.sym && node->ms.sym->inlined)
> /* we can't handle inlined callchains */;
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] perf callchain: Handle multiple address spaces
2026-04-14 12:42 [PATCH] perf callchain: Handle multiple address spaces Thomas Richter
2026-04-14 13:13 ` sashiko-bot
2026-04-21 11:10 ` [PATCH Ping ] " Thomas Richter
@ 2026-04-21 16:30 ` Namhyung Kim
2 siblings, 0 replies; 4+ messages in thread
From: Namhyung Kim @ 2026-04-21 16:30 UTC (permalink / raw)
To: Thomas Richter
Cc: linux-kernel, linux-s390, linux-perf-users, acme, agordeev, gor,
sumanthk, hca, japo
Hello,
On Tue, Apr 14, 2026 at 02:42:41PM +0200, Thomas Richter wrote:
> perf test 'perf inject to convert DWARF callchains to regular ones'
> fails on s390. It was introduced with
> commit 92ea788d2af4 ("perf inject: Add --convert-callchain option")
>
> The failure comes the difference in output. Without the inject script to
> convert DWARF the callchains is:
> # ./perf record -F 999 --call-graph dwarf -- perf test -w noploop
> # ./perf report -i perf.data --stdio --no-children -q \
> --percent-limit=1 > /tmp/111
> # cat /tmp/111
> 99.30% perf-noploop perf [.] noploop
> |
> ---noploop
> run_workload (inlined)
> cmd_test
> run_builtin (inlined)
> handle_internal_command
> run_argv (inlined)
> main
> __libc_start_call_main
> __libc_start_main_impl (inlined)
> _start
> #
>
> With the inject script step the output is:
> # ./perf inject -i perf.data --convert-callchain -o /tmp/perf-inject-1.out
> # ./perf report -i /tmp/perf-inject-1.out --stdio --no-children -q \
> --percent-limit=1 > /tmp/222
> # cat /tmp/222
> 99.40% perf-noploop perf [.] noploop
> |
> ---noploop
> run_workload (inlined)
> cmd_test
> run_builtin (inlined)
> handle_internal_command
> run_argv (inlined)
> main
> _start
> # diff /tmp/111 /tmp/222
> 1c1
> < 99.30% perf-noploop perf [.] noploop
> ---
> > 99.40% perf-noploop perf [.] noploop
> 10,11d9
> < __libc_start_call_main
> < __libc_start_main_impl (inlined)
> #
>
> The difference are the symbols __libc_start_call_main and
> __libc_start_main_impl.
>
> On x86_64, kernel and user space share a single virtual address space,
> with the kernel mapped to the upper end of memory. The instruction
> pointer value alone is sufficient to distinguish between user space
> and kernel space addresses. This is not true for s390, which uses
> separate address spaces for user and kernel. The same virtual address
> can be valid in both address spaces, so the instruction pointer value
> alone cannot determine whether an address belongs to the kernel or
> user space. Instead, perf must rely on the cpumode metadata derived
> from the processor status word (PSW) at sample time.
>
> In function perf_event__convert_sample_callchain() the first part
> copies a kernel callchain and context entries, if any.
> It then appends additional entries ignoring the address space
> architecture. Taking that into account, the symbols at addresses
>
> 0x3ff970348cb __libc_start_call_main
> 0x3ff970349c5 __libc_start_main_impl
>
> (located after the kernel address space on s390) are now included.
>
> Output before:
> # ./perf test 83
> 83: perf inject to convert DWARF callchains to regular ones : FAILED!
>
> Output after:
> # ./perf test 83
> 83: perf inject to convert DWARF callchains to regular ones : Ok
>
> Question to Namhyung:
> In function perf_event__convert_sample_callchain() just before the
> for() loop this patch modifies, the kernel callchain is copied,
> see this comment and the next 5 lines:
> /* copy kernel callchain and context entries */
> Then why is machine__kernel_ip() needed in the for() loop, when
> the kernel entries have been copied just before the loop?
IIRC I wanted to make sure to have PERF_CONTEXT_* part in the raw
callchains.
>
> Note: This patch was tested on x86_64 virtual machine and succeeded.
>
> Fixes: 92ea788d2af4 ("perf inject: Add --convert-callchain option")
> Cc: Namhyung Kim <namhyung@kernel.org>
>
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Thanks,
Namhyung
> ---
> tools/perf/arch/common.c | 4 +++-
> tools/perf/builtin-inject.c | 3 ++-
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
> index 21836f70f231..ad0cab830a4d 100644
> --- a/tools/perf/arch/common.c
> +++ b/tools/perf/arch/common.c
> @@ -237,5 +237,7 @@ int perf_env__lookup_objdump(struct perf_env *env, char **path)
> */
> bool perf_env__single_address_space(struct perf_env *env)
> {
> - return strcmp(perf_env__arch(env), "sparc");
> + const char *arch = perf_env__arch(env);
> +
> + return strcmp(arch, "s390") && strcmp(arch, "sparc");
> }
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index f174bc69cec4..6ab20df358c4 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -438,7 +438,8 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
>
> node = cursor->first;
> for (k = 0; k < cursor->nr && i < PERF_MAX_STACK_DEPTH; k++) {
> - if (machine__kernel_ip(machine, node->ip))
> + if (machine->single_address_space &&
> + machine__kernel_ip(machine, node->ip))
> /* kernel IPs were added already */;
> else if (node->ms.sym && node->ms.sym->inlined)
> /* we can't handle inlined callchains */;
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 4+ messages in thread