public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf callchain: Handle multiple address spaces
@ 2026-04-14 12:42 Thomas Richter
  2026-04-21 11:10 ` [PATCH Ping ] " Thomas Richter
  2026-04-21 16:30 ` [PATCH] " Namhyung Kim
  0 siblings, 2 replies; 3+ messages in thread
From: Thomas Richter @ 2026-04-14 12:42 UTC (permalink / raw)
  To: linux-kernel, linux-s390, linux-perf-users, acme, namhyung
  Cc: agordeev, gor, sumanthk, hca, japo, Thomas Richter

perf test 'perf inject to convert DWARF callchains to regular ones'
fails on s390. It was introduced with
commit 92ea788d2af4 ("perf inject: Add --convert-callchain option")

The failure comes the difference in output. Without the inject script to
convert DWARF the callchains is:
 # ./perf record -F 999 --call-graph dwarf -- perf test -w noploop
 # ./perf report -i perf.data --stdio --no-children -q \
					 --percent-limit=1 > /tmp/111
 # cat /tmp/111
    99.30%  perf-noploop  perf               [.] noploop
            |
            ---noploop
               run_workload (inlined)
               cmd_test
               run_builtin (inlined)
               handle_internal_command
               run_argv (inlined)
               main
               __libc_start_call_main
               __libc_start_main_impl (inlined)
               _start
 #

With the inject script step the output is:
 # ./perf inject -i perf.data --convert-callchain -o /tmp/perf-inject-1.out
 # ./perf report -i /tmp/perf-inject-1.out --stdio --no-children -q \
		--percent-limit=1 > /tmp/222
 # cat /tmp/222
    99.40%  perf-noploop  perf               [.] noploop
            |
            ---noploop
               run_workload (inlined)
               cmd_test
               run_builtin (inlined)
               handle_internal_command
               run_argv (inlined)
               main
               _start
 # diff /tmp/111 /tmp/222
 1c1
 <     99.30%  perf-noploop  perf               [.] noploop
 ---
 >     99.40%  perf-noploop  perf               [.] noploop
 10,11d9
 <                __libc_start_call_main
 <                __libc_start_main_impl (inlined)
 #

The difference are the symbols __libc_start_call_main and
__libc_start_main_impl.

On x86_64, kernel and user space share a single virtual address space,
with the kernel mapped to the upper end of memory. The instruction
pointer value alone is sufficient to distinguish between user space
and kernel space addresses. This is not true for s390, which uses
separate address spaces for user and kernel. The same virtual address
can be valid in both address spaces, so the instruction pointer value
alone cannot determine whether an address belongs to the kernel or
user space. Instead, perf must rely on the cpumode metadata derived
from the processor status word (PSW) at sample time.

In function perf_event__convert_sample_callchain() the first part
copies a kernel callchain and context entries, if any.
It then appends additional entries ignoring the address space
architecture. Taking that into account, the symbols at addresses

   0x3ff970348cb __libc_start_call_main
   0x3ff970349c5 __libc_start_main_impl

(located after the kernel address space on s390) are now included.

Output before:
 # ./perf test 83
 83: perf inject to convert DWARF callchains to regular ones : FAILED!

Output after:
 # ./perf test 83
 83: perf inject to convert DWARF callchains to regular ones : Ok

Question to Namhyung:
In function perf_event__convert_sample_callchain() just before the
for() loop this patch modifies, the kernel callchain is copied,
see this comment and the next 5 lines:
   /* copy kernel callchain and context entries */ 
Then why is machine__kernel_ip() needed in the for() loop, when
the kernel entries have been copied just before the loop?

Note: This patch was tested on x86_64 virtual machine and succeeded.

Fixes: 92ea788d2af4 ("perf inject: Add --convert-callchain option")
Cc: Namhyung Kim <namhyung@kernel.org>

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
---
 tools/perf/arch/common.c    | 4 +++-
 tools/perf/builtin-inject.c | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
index 21836f70f231..ad0cab830a4d 100644
--- a/tools/perf/arch/common.c
+++ b/tools/perf/arch/common.c
@@ -237,5 +237,7 @@ int perf_env__lookup_objdump(struct perf_env *env, char **path)
  */
 bool perf_env__single_address_space(struct perf_env *env)
 {
-	return strcmp(perf_env__arch(env), "sparc");
+	const char *arch = perf_env__arch(env);
+
+	return strcmp(arch, "s390") && strcmp(arch, "sparc");
 }
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index f174bc69cec4..6ab20df358c4 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -438,7 +438,8 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
 
 	node = cursor->first;
 	for (k = 0; k < cursor->nr && i < PERF_MAX_STACK_DEPTH; k++) {
-		if (machine__kernel_ip(machine, node->ip))
+		if (machine->single_address_space &&
+		    machine__kernel_ip(machine, node->ip))
 			/* kernel IPs were added already */;
 		else if (node->ms.sym && node->ms.sym->inlined)
 			/* we can't handle inlined callchains */;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH Ping ] perf callchain: Handle multiple address spaces
  2026-04-14 12:42 [PATCH] perf callchain: Handle multiple address spaces Thomas Richter
@ 2026-04-21 11:10 ` Thomas Richter
  2026-04-21 16:30 ` [PATCH] " Namhyung Kim
  1 sibling, 0 replies; 3+ messages in thread
From: Thomas Richter @ 2026-04-21 11:10 UTC (permalink / raw)
  To: linux-kernel, linux-s390, linux-perf-users, acme, namhyung
  Cc: agordeev, gor, sumanthk, hca, japo

Friendly ping...

On 4/14/26 14:42, Thomas Richter wrote:
> perf test 'perf inject to convert DWARF callchains to regular ones'
> fails on s390. It was introduced with
> commit 92ea788d2af4 ("perf inject: Add --convert-callchain option")
> 
> The failure comes the difference in output. Without the inject script to
> convert DWARF the callchains is:
>  # ./perf record -F 999 --call-graph dwarf -- perf test -w noploop
>  # ./perf report -i perf.data --stdio --no-children -q \
> 					 --percent-limit=1 > /tmp/111
>  # cat /tmp/111
>     99.30%  perf-noploop  perf               [.] noploop
>             |
>             ---noploop
>                run_workload (inlined)
>                cmd_test
>                run_builtin (inlined)
>                handle_internal_command
>                run_argv (inlined)
>                main
>                __libc_start_call_main
>                __libc_start_main_impl (inlined)
>                _start
>  #
> 
> With the inject script step the output is:
>  # ./perf inject -i perf.data --convert-callchain -o /tmp/perf-inject-1.out
>  # ./perf report -i /tmp/perf-inject-1.out --stdio --no-children -q \
> 		--percent-limit=1 > /tmp/222
>  # cat /tmp/222
>     99.40%  perf-noploop  perf               [.] noploop
>             |
>             ---noploop
>                run_workload (inlined)
>                cmd_test
>                run_builtin (inlined)
>                handle_internal_command
>                run_argv (inlined)
>                main
>                _start
>  # diff /tmp/111 /tmp/222
>  1c1
>  <     99.30%  perf-noploop  perf               [.] noploop
>  ---
>  >     99.40%  perf-noploop  perf               [.] noploop
>  10,11d9
>  <                __libc_start_call_main
>  <                __libc_start_main_impl (inlined)
>  #
> 
> The difference are the symbols __libc_start_call_main and
> __libc_start_main_impl.
> 
> On x86_64, kernel and user space share a single virtual address space,
> with the kernel mapped to the upper end of memory. The instruction
> pointer value alone is sufficient to distinguish between user space
> and kernel space addresses. This is not true for s390, which uses
> separate address spaces for user and kernel. The same virtual address
> can be valid in both address spaces, so the instruction pointer value
> alone cannot determine whether an address belongs to the kernel or
> user space. Instead, perf must rely on the cpumode metadata derived
> from the processor status word (PSW) at sample time.
> 
> In function perf_event__convert_sample_callchain() the first part
> copies a kernel callchain and context entries, if any.
> It then appends additional entries ignoring the address space
> architecture. Taking that into account, the symbols at addresses
> 
>    0x3ff970348cb __libc_start_call_main
>    0x3ff970349c5 __libc_start_main_impl
> 
> (located after the kernel address space on s390) are now included.
> 
> Output before:
>  # ./perf test 83
>  83: perf inject to convert DWARF callchains to regular ones : FAILED!
> 
> Output after:
>  # ./perf test 83
>  83: perf inject to convert DWARF callchains to regular ones : Ok
> 
> Question to Namhyung:
> In function perf_event__convert_sample_callchain() just before the
> for() loop this patch modifies, the kernel callchain is copied,
> see this comment and the next 5 lines:
>    /* copy kernel callchain and context entries */ 
> Then why is machine__kernel_ip() needed in the for() loop, when
> the kernel entries have been copied just before the loop?
> 
> Note: This patch was tested on x86_64 virtual machine and succeeded.
> 
> Fixes: 92ea788d2af4 ("perf inject: Add --convert-callchain option")
> Cc: Namhyung Kim <namhyung@kernel.org>
> 
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> ---
>  tools/perf/arch/common.c    | 4 +++-
>  tools/perf/builtin-inject.c | 3 ++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
> index 21836f70f231..ad0cab830a4d 100644
> --- a/tools/perf/arch/common.c
> +++ b/tools/perf/arch/common.c
> @@ -237,5 +237,7 @@ int perf_env__lookup_objdump(struct perf_env *env, char **path)
>   */
>  bool perf_env__single_address_space(struct perf_env *env)
>  {
> -	return strcmp(perf_env__arch(env), "sparc");
> +	const char *arch = perf_env__arch(env);
> +
> +	return strcmp(arch, "s390") && strcmp(arch, "sparc");
>  }
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index f174bc69cec4..6ab20df358c4 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -438,7 +438,8 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
>  
>  	node = cursor->first;
>  	for (k = 0; k < cursor->nr && i < PERF_MAX_STACK_DEPTH; k++) {
> -		if (machine__kernel_ip(machine, node->ip))
> +		if (machine->single_address_space &&
> +		    machine__kernel_ip(machine, node->ip))
>  			/* kernel IPs were added already */;
>  		else if (node->ms.sym && node->ms.sym->inlined)
>  			/* we can't handle inlined callchains */;


-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH

Vorsitzender des Aufsichtsrats: Wolfgang Wendt

Geschäftsführung: David Faller

Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] perf callchain: Handle multiple address spaces
  2026-04-14 12:42 [PATCH] perf callchain: Handle multiple address spaces Thomas Richter
  2026-04-21 11:10 ` [PATCH Ping ] " Thomas Richter
@ 2026-04-21 16:30 ` Namhyung Kim
  1 sibling, 0 replies; 3+ messages in thread
From: Namhyung Kim @ 2026-04-21 16:30 UTC (permalink / raw)
  To: Thomas Richter
  Cc: linux-kernel, linux-s390, linux-perf-users, acme, agordeev, gor,
	sumanthk, hca, japo

Hello,

On Tue, Apr 14, 2026 at 02:42:41PM +0200, Thomas Richter wrote:
> perf test 'perf inject to convert DWARF callchains to regular ones'
> fails on s390. It was introduced with
> commit 92ea788d2af4 ("perf inject: Add --convert-callchain option")
> 
> The failure comes the difference in output. Without the inject script to
> convert DWARF the callchains is:
>  # ./perf record -F 999 --call-graph dwarf -- perf test -w noploop
>  # ./perf report -i perf.data --stdio --no-children -q \
> 					 --percent-limit=1 > /tmp/111
>  # cat /tmp/111
>     99.30%  perf-noploop  perf               [.] noploop
>             |
>             ---noploop
>                run_workload (inlined)
>                cmd_test
>                run_builtin (inlined)
>                handle_internal_command
>                run_argv (inlined)
>                main
>                __libc_start_call_main
>                __libc_start_main_impl (inlined)
>                _start
>  #
> 
> With the inject script step the output is:
>  # ./perf inject -i perf.data --convert-callchain -o /tmp/perf-inject-1.out
>  # ./perf report -i /tmp/perf-inject-1.out --stdio --no-children -q \
> 		--percent-limit=1 > /tmp/222
>  # cat /tmp/222
>     99.40%  perf-noploop  perf               [.] noploop
>             |
>             ---noploop
>                run_workload (inlined)
>                cmd_test
>                run_builtin (inlined)
>                handle_internal_command
>                run_argv (inlined)
>                main
>                _start
>  # diff /tmp/111 /tmp/222
>  1c1
>  <     99.30%  perf-noploop  perf               [.] noploop
>  ---
>  >     99.40%  perf-noploop  perf               [.] noploop
>  10,11d9
>  <                __libc_start_call_main
>  <                __libc_start_main_impl (inlined)
>  #
> 
> The difference are the symbols __libc_start_call_main and
> __libc_start_main_impl.
> 
> On x86_64, kernel and user space share a single virtual address space,
> with the kernel mapped to the upper end of memory. The instruction
> pointer value alone is sufficient to distinguish between user space
> and kernel space addresses. This is not true for s390, which uses
> separate address spaces for user and kernel. The same virtual address
> can be valid in both address spaces, so the instruction pointer value
> alone cannot determine whether an address belongs to the kernel or
> user space. Instead, perf must rely on the cpumode metadata derived
> from the processor status word (PSW) at sample time.
> 
> In function perf_event__convert_sample_callchain() the first part
> copies a kernel callchain and context entries, if any.
> It then appends additional entries ignoring the address space
> architecture. Taking that into account, the symbols at addresses
> 
>    0x3ff970348cb __libc_start_call_main
>    0x3ff970349c5 __libc_start_main_impl
> 
> (located after the kernel address space on s390) are now included.
> 
> Output before:
>  # ./perf test 83
>  83: perf inject to convert DWARF callchains to regular ones : FAILED!
> 
> Output after:
>  # ./perf test 83
>  83: perf inject to convert DWARF callchains to regular ones : Ok
> 
> Question to Namhyung:
> In function perf_event__convert_sample_callchain() just before the
> for() loop this patch modifies, the kernel callchain is copied,
> see this comment and the next 5 lines:
>    /* copy kernel callchain and context entries */ 
> Then why is machine__kernel_ip() needed in the for() loop, when
> the kernel entries have been copied just before the loop?

IIRC I wanted to make sure to have PERF_CONTEXT_* part in the raw
callchains.

> 
> Note: This patch was tested on x86_64 virtual machine and succeeded.
> 
> Fixes: 92ea788d2af4 ("perf inject: Add --convert-callchain option")
> Cc: Namhyung Kim <namhyung@kernel.org>
> 
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung

> ---
>  tools/perf/arch/common.c    | 4 +++-
>  tools/perf/builtin-inject.c | 3 ++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
> index 21836f70f231..ad0cab830a4d 100644
> --- a/tools/perf/arch/common.c
> +++ b/tools/perf/arch/common.c
> @@ -237,5 +237,7 @@ int perf_env__lookup_objdump(struct perf_env *env, char **path)
>   */
>  bool perf_env__single_address_space(struct perf_env *env)
>  {
> -	return strcmp(perf_env__arch(env), "sparc");
> +	const char *arch = perf_env__arch(env);
> +
> +	return strcmp(arch, "s390") && strcmp(arch, "sparc");
>  }
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index f174bc69cec4..6ab20df358c4 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -438,7 +438,8 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
>  
>  	node = cursor->first;
>  	for (k = 0; k < cursor->nr && i < PERF_MAX_STACK_DEPTH; k++) {
> -		if (machine__kernel_ip(machine, node->ip))
> +		if (machine->single_address_space &&
> +		    machine__kernel_ip(machine, node->ip))
>  			/* kernel IPs were added already */;
>  		else if (node->ms.sym && node->ms.sym->inlined)
>  			/* we can't handle inlined callchains */;
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-21 16:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-14 12:42 [PATCH] perf callchain: Handle multiple address spaces Thomas Richter
2026-04-21 11:10 ` [PATCH Ping ] " Thomas Richter
2026-04-21 16:30 ` [PATCH] " Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox