* Re: [RFC][PATCH] unwind: Add stacktrace_setup system call
[not found] <20260429114355.6c712e6a@gandalf.local.home>
@ 2026-05-07 12:37 ` Jose E. Marchesi
2026-05-08 1:57 ` Steven Rostedt
2026-05-08 7:46 ` Jens Remus
1 sibling, 1 reply; 4+ messages in thread
From: Jose E. Marchesi @ 2026-05-07 12:37 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, mhiramat, mathieu.desnoyers,
jremus, jpoimboe, peterz, mingo, jolsa, acme, namhyung, tglx,
andrii, indu.bhagat, beaub, torvalds, akpm, fweimer, kees,
codonell, sam, dylanbhatch, bp, dave.hansen, david, hpa,
Liam.Howlett, lorenzo.stoakes, mhocko, rppt, surenb, vbabka, hca,
gor
> +/**
> + * sys_stacktrace_setup - register an address for user space stacktrace walking.
> + * @op: Type of operation to perform
> + * @addr_start: The virtual address of the stacktrace information
> + * @addr_length: The length of the stacktrace information
> + * @text_start: The virtual address of the text that @addr_start represents
> + * @text_length: The length of teh text
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Currently only sframes are supported, but in the future, this may be used
> + * to tell the kernel about JIT code which will most likely have a different
> + * format.
> + *
> + * The type command may be extended and parameters may be used for other
> + * purposes.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE5(stacktrace_setup, int, op, unsigned long, addr_start,
> + unsigned long, addr_length, unsigned long, text_start,
> + unsigned long, text_length)
> +{
> + switch (op) {
> + case STACKTRACE_REGISTER_SFRAME:
> + return sframe_add_section(addr_start, addr_start + addr_length,
> + text_start, text_start+text_length);
> + case STACKTRACE_UNREGISTER_SFRAME:
> + return sframe_remove_section(addr_start);
> + }
> + return -EINVAL;
> +}
FWIW passing start and end of both the tracing data and the text segment
it covers seems reasonable to me. This covers the case in which the
same tracing data describes several text segments, which can happen with
SFrame and other similar formats.
> diff --git a/scripts/syscall.tbl b/scripts/syscall.tbl
> index 7a42b32b6577..54a99cffeec4 100644
> --- a/scripts/syscall.tbl
> +++ b/scripts/syscall.tbl
> @@ -412,3 +412,4 @@
> 469 common file_setattr sys_file_setattr
> 470 common listns sys_listns
> 471 common rseq_slice_yield sys_rseq_slice_yield
> +472 common stacktrace_setup sys_stacktrace_setup
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC][PATCH] unwind: Add stacktrace_setup system call
2026-05-07 12:37 ` [RFC][PATCH] unwind: Add stacktrace_setup system call Jose E. Marchesi
@ 2026-05-08 1:57 ` Steven Rostedt
2026-05-08 7:32 ` Jose E. Marchesi
0 siblings, 1 reply; 4+ messages in thread
From: Steven Rostedt @ 2026-05-08 1:57 UTC (permalink / raw)
To: Jose E. Marchesi
Cc: linux-kernel, linux-trace-kernel, mhiramat, mathieu.desnoyers,
jremus, jpoimboe, peterz, mingo, jolsa, acme, namhyung, tglx,
andrii, indu.bhagat, beaub, torvalds, akpm, fweimer, kees,
codonell, sam, dylanbhatch, bp, dave.hansen, david, hpa,
Liam.Howlett, lorenzo.stoakes, mhocko, rppt, surenb, vbabka, hca,
gor
On Thu, 07 May 2026 14:37:36 +0200
"Jose E. Marchesi" <jemarch@gnu.org> wrote:
> FWIW passing start and end of both the tracing data and the text segment
> it covers seems reasonable to me. This covers the case in which the
> same tracing data describes several text segments, which can happen with
> SFrame and other similar formats.
Just so I understand you. You are suggesting to pass in the end address
instead of the length?
-- Steve
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC][PATCH] unwind: Add stacktrace_setup system call
2026-05-08 1:57 ` Steven Rostedt
@ 2026-05-08 7:32 ` Jose E. Marchesi
0 siblings, 0 replies; 4+ messages in thread
From: Jose E. Marchesi @ 2026-05-08 7:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, mhiramat, mathieu.desnoyers,
jremus, jpoimboe, peterz, mingo, jolsa, acme, namhyung, tglx,
andrii, indu.bhagat, beaub, torvalds, akpm, fweimer, kees,
codonell, sam, dylanbhatch, bp, dave.hansen, david, hpa,
Liam.Howlett, lorenzo.stoakes, mhocko, rppt, surenb, vbabka, hca,
gor
> On Thu, 07 May 2026 14:37:36 +0200
> "Jose E. Marchesi" <jemarch@gnu.org> wrote:
>
>> FWIW passing start and end of both the tracing data and the text segment
>> it covers seems reasonable to me. This covers the case in which the
>> same tracing data describes several text segments, which can happen with
>> SFrame and other similar formats.
>
> Just so I understand you. You are suggesting to pass in the end address
> instead of the length?
No no, was talking conceptually. Denoting the end of the region by its
length is better. I see no reason to do otherwise in this case.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC][PATCH] unwind: Add stacktrace_setup system call
[not found] <20260429114355.6c712e6a@gandalf.local.home>
2026-05-07 12:37 ` [RFC][PATCH] unwind: Add stacktrace_setup system call Jose E. Marchesi
@ 2026-05-08 7:46 ` Jens Remus
1 sibling, 0 replies; 4+ messages in thread
From: Jens Remus @ 2026-05-08 7:46 UTC (permalink / raw)
To: Steven Rostedt, LKML, Linux Trace Kernel
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik
On 4/29/2026 5:43 PM, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> [
> This is an RFC that adds a system call for dynamic linkers to use to
> tell the kernel where the sframe sections are when it loads dynamic
> libraries.
>
> It is built on top of Jens's sframe implementation for v3:
>
> https://lore.kernel.org/linux-trace-kernel/20260127150554.2760964-1-jremus@linux.ibm.com/
>
> I have a repo with that code that this applies on top of here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git sframe/core
>
>
> The name of the system call is "stacktrace_setup", but I'm not attached
> to this name. If anyone can think of a better name I'm happy to take
> suggestions.
>
> This patch is just to get the conversation going and the final result
> may be much different. I tested this with the attached program which is a
> major hack. I built glibc with sframe v3 support and I used readelf to
> find the sframe size and location of glibc.
>
> readelf -e /work/usr/lib/libc.so.6 | grep sframe
> [19] .sframe GNU_SFRAME 00000000001d3fc0 001d3fc0
>
> Then I wrote a program that takes the above location and size of the
> .sframe section in libc as parameters, scans /proc/self/maps to find
> where it loaded libc and then calls this new system call with a pointer
> to the location of the sframe along with its size, as well as where the
> libc text is located.
>
> It then spins for 2 seconds, calls the system call again to remove the
> sframe section it loaded, and spins for another 2 seconds.
>
> I ran perf record --call-graph fp,defer on the program and looked for
> the do_spin() function.
>
> With sframe loaded:
>
> sframe-test 1350 1396.333593: 202366 cpu/cycles/P:
> 7fdf0ec38a44 [unknown] ([vdso])
> 5621a6b97243 get_time+0x19 (/work/c/sframe-test)
> 5621a6b9727f do_spin+0x1f (/work/c/sframe-test)
> 5621a6b975cd main+0xd4 (/work/c/sframe-test)
> 7fdf0ea26bda __libc_start_call_main+0x6a (/work/usr/lib/libc.so.6)
> 7fdf0ea26d05 __libc_start_main@@GLIBC_2.34+0x85 (/work/usr/lib/libc.so.6)
> 5621a6b97131 _start+0x21 (/work/c/sframe-test)
>
> After it unloads the sframe:
>
> sframe-test 1350 1400.332902: 657582 cpu/cycles/P:
> 7fdf0ec38a5e [unknown] ([vdso])
> 5621a6b97243 get_time+0x19 (/work/c/sframe-test)
> 5621a6b9727f do_spin+0x1f (/work/c/sframe-test)
> 5621a6b97602 main+0x109 (/work/c/sframe-test)
> 7fdf0ea26bda __libc_start_call_main+0x6a (/work/usr/lib/libc.so.6)
>
> As you can see, with the sframe loaded, it was able to walk further up
> the libc library.
>
> Again, this is just an RFC, but I would like to get agreement on the
> system call so that we can then update the dynamic linker to do this
> instead of using my hack ;-)
> ]
>
> Add a system call that can be used by dynamic linkers to tell the kernel
> where the sframe section is in memory for libraries it loads.
>
> The system call stacktrace_setup takes 5 parameters:
>
> op - the type of operation to perform
> addr_start - The virtual address of the sframe section
> addr_length - The length of the sframe section
> text_start - the text section the sframe represents
> test_length - the length of the section
>
> The current op values are:
>
> STACKTRACE_REGISTER_SFRAME - This registers the sframe
> STACKTRACE_UNREGISTER_SFRAME - This removes the sframe
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
LGTM. Some comments/questions below.
> diff --git a/include/uapi/linux/stacktrace.h b/include/uapi/linux/stacktrace.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_STACKTRACE_H
> +#define _UAPI_LINUX_STACKTRACE_H
> +
> +enum stacktrace_setup_types {
> + STACKTRACE_REGISTER_SFRAME = 1,
> + STACKTRACE_UNREGISTER_SFRAME = 2,
> +};
> +
> +#endif /* _UAPI_LINUX_STACKTRACE_H */
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
Having the syscall live in kernel/unwind/sframe.c means it is only
available if config option HAVE_UNWIND_USER_SFRAME is selected (which
triggers sframe.o to be built and linked into the kernel), which makes
sense as long as it only implements sframe-specific functionality.
I suppose it could be moved elsewhere if non-sframe use cases would
arise in the future?
Would Dylan need to guard it when introducing HAVE_UNWIND_KERNEL_SFRAME?
Provided the syscall fails with -ENOSYS if not implemented (e.g. when
HAVE_UNWIND_USER_SFRAME is not enabled) the dummy implementations of
sframe_add_section() and sframe_remove_section() in linux/sframe.h also
return -ENOSYS, so the user observable behavior would be the same and
it would not matter. Do you agree?
> @@ -12,8 +12,10 @@
> #include <linux/mm.h>
> #include <linux/string_helpers.h>
> #include <linux/sframe.h>
> +#include <linux/syscalls.h>
> #include <asm/unwind_user_sframe.h>
> #include <linux/unwind_user_types.h>
> +#include <uapi/linux/stacktrace.h>
>
> #include "sframe.h"
> #include "sframe_debug.h"
> @@ -838,3 +840,38 @@ void sframe_free_mm(struct mm_struct *mm)
>
> mtree_destroy(&mm->sframe_mt);
> }
> +
> +/**
> + * sys_stacktrace_setup - register an address for user space stacktrace walking.
> + * @op: Type of operation to perform
> + * @addr_start: The virtual address of the stacktrace information
> + * @addr_length: The length of the stacktrace information
> + * @text_start: The virtual address of the text that @addr_start represents
> + * @text_length: The length of teh text
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Currently only sframes are supported, but in the future, this may be used
> + * to tell the kernel about JIT code which will most likely have a different
> + * format.
> + *
> + * The type command may be extended and parameters may be used for other
> + * purposes.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE5(stacktrace_setup, int, op, unsigned long, addr_start,
> + unsigned long, addr_length, unsigned long, text_start,
> + unsigned long, text_length)
Would it make sense to keep the parameters generic from start, similar
to how it is done in prctl()? Or can this be changed later, if the need
arises?
SYSCALL_DEFINE5(stacktrace_setup, int, op, unsigned long, arg2,
unsigned long, arg3, unsigned long, arg4, unsigned long, arg5)
> +{
> + switch (op) {
> + case STACKTRACE_REGISTER_SFRAME:
> + return sframe_add_section(addr_start, addr_start + addr_length,
> + text_start, text_start+text_length);
Nit:
text_start, text_start + text_length);
> + case STACKTRACE_UNREGISTER_SFRAME:
> + return sframe_remove_section(addr_start);
> + }
> + return -EINVAL;
> +}
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-08 7:47 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260429114355.6c712e6a@gandalf.local.home>
2026-05-07 12:37 ` [RFC][PATCH] unwind: Add stacktrace_setup system call Jose E. Marchesi
2026-05-08 1:57 ` Steven Rostedt
2026-05-08 7:32 ` Jose E. Marchesi
2026-05-08 7:46 ` Jens Remus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox