* [PATCH] unwind: Add sframe_(un)register() system calls
@ 2026-05-21 22:35 Steven Rostedt
2026-05-22 9:43 ` Jens Remus
2026-05-22 14:36 ` Thomas Weißschuh
0 siblings, 2 replies; 5+ messages in thread
From: Steven Rostedt @ 2026-05-21 22:35 UTC (permalink / raw)
To: LKML, Linux Trace Kernel, bpf
Cc: Masami Hiramatsu, Mathieu Desnoyers, Jens Remus, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik
From: Steven Rostedt <rostedt@goodmis.org>
Add system calls to register and unregister sframes that can be used by
dynamic linkers to tell the kernel where the sframe section is in memory
for libraries it loads.
Both system calls take a pointer to a new structure:
struct sframe_setup {
unsigned long sframe_start;
unsigned long sframe_size;
unsigned long text_start;
unsigned long text_size;
};
and a size of the passed in structure. If the system call needs to be
extended, then the structure could be changed and the size of that
structure will tell the kernel that it is the new version. If the kernel
does not recognize the structure size, it will return -EINVAL.
sframe_start - The virtual address of the sframe section
sframe_size - The length of the sframe section
text_start - the text section the sframe represents
test_size - the length of the section
If other stack tracing functionality is added, it will require a new
system call.
The unregister only needs the sframe_start and requires all the rest of
the fields to be 0. In the future, if more can be done, then user space
can update the other values and check the return code to see if the kernel
supports it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
Based on top of Jens patches here:
https://lore.kernel.org/linux-trace-kernel/20260520154004.3845823-1-jremus@linux.ibm.com/
[ Note, I tested this with the same program from the RFC patch ]
Changes from RFC: https://patch.msgid.link/20260429114355.6c712e6a@gandalf.local.home
- Remove the ioctl() like system call for a unique system call for each
functionality. Right now there's two functionalities:
1. register sframe section
2. unregister sframe sections
- Added taking a lock around the mtree logic in __sframe_remove_section()
as Sashiko mentioned that there could be races from user space
registering and unregistering sframe sections at the same time.
- Removed [RFC] from subject as I believe this is more likely the way
this system call will be done.
arch/alpha/kernel/syscalls/syscall.tbl | 2 +
arch/arm/tools/syscall.tbl | 2 +
arch/arm64/tools/syscall_32.tbl | 2 +
arch/m68k/kernel/syscalls/syscall.tbl | 2 +
arch/microblaze/kernel/syscalls/syscall.tbl | 2 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 2 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 2 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +
arch/parisc/kernel/syscalls/syscall.tbl | 2 +
arch/powerpc/kernel/syscalls/syscall.tbl | 2 +
arch/s390/kernel/syscalls/syscall.tbl | 3 +
arch/sh/kernel/syscalls/syscall.tbl | 2 +
arch/sparc/kernel/syscalls/syscall.tbl | 2 +
arch/x86/entry/syscalls/syscall_32.tbl | 2 +
arch/x86/entry/syscalls/syscall_64.tbl | 2 +
arch/xtensa/kernel/syscalls/syscall.tbl | 2 +
include/linux/syscalls.h | 2 +
include/uapi/asm-generic/unistd.h | 7 ++-
include/uapi/linux/sframe.h | 12 ++++
kernel/sys_ni.c | 3 +
kernel/unwind/sframe.c | 63 ++++++++++++++++++++-
scripts/syscall.tbl | 2 +
22 files changed, 118 insertions(+), 4 deletions(-)
create mode 100644 include/uapi/linux/sframe.h
diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index f31b7afffc34..f0639b831f2a 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -511,3 +511,5 @@
579 common file_setattr sys_file_setattr
580 common listns sys_listns
581 common rseq_slice_yield sys_rseq_slice_yield
+582 common sframe_register sys_sframe_register
+583 common sframe_unregister sys_sframe_unregister
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 94351e22bfcf..887b242ffb25 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -486,3 +486,5 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 common sframe_register sys_sframe_register
+473 common sframe_unregister sys_sframe_unregister
diff --git a/arch/arm64/tools/syscall_32.tbl b/arch/arm64/tools/syscall_32.tbl
index 62d93d88e0fe..c820f1ff718c 100644
--- a/arch/arm64/tools/syscall_32.tbl
+++ b/arch/arm64/tools/syscall_32.tbl
@@ -483,3 +483,5 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 common sframe_register sys_sframe_register
+473 common sframe_unregister sys_sframe_unregister
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 248934257101..4c7f17f0364b 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -471,3 +471,5 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 common sframe_register sys_sframe_register
+473 common sframe_unregister sys_sframe_unregister
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 223d26303627..e8dc2cc149f4 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -477,3 +477,5 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 common sframe_register sys_sframe_register
+473 common sframe_unregister sys_sframe_unregister
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 7430714e2b8f..d0bae05d16af 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -410,3 +410,5 @@
469 n32 file_setattr sys_file_setattr
470 n32 listns sys_listns
471 n32 rseq_slice_yield sys_rseq_slice_yield
+472 n32 sframe_register sys_sframe_register
+473 n32 sframe_unregister sys_sframe_unregister
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 630aab9e5425..2e200de6a58c 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -386,3 +386,5 @@
469 n64 file_setattr sys_file_setattr
470 n64 listns sys_listns
471 n64 rseq_slice_yield sys_rseq_slice_yield
+472 n64 sframe_register sys_sframe_register
+473 n64 sframe_unregister sys_sframe_unregister
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 128653112284..0e3b82011ae2 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -459,3 +459,5 @@
469 o32 file_setattr sys_file_setattr
470 o32 listns sys_listns
471 o32 rseq_slice_yield sys_rseq_slice_yield
+472 o32 sframe_register sys_sframe_register
+473 o32 sframe_unregister sys_sframe_unregister
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index c6331dad9461..e0758ef8667d 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -470,3 +470,5 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 common sframe_register sys_sframe_register
+473 common sframe_unregister sys_sframe_unregister
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 4fcc7c58a105..eda40c4f4f2f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -562,3 +562,5 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 nospu rseq_slice_yield sys_rseq_slice_yield
+472 nospu sframe_register sys_sframe_register
+473 nospu sframe_unregister sys_sframe_unregister
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 09a7ef04d979..52519e2acdc8 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -398,3 +398,6 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 common stacktrace_setup sys_stacktrace_setup
+472 common sframe_register sys_sframe_register
+473 common sframe_unregister sys_sframe_unregister
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 70b315cbe710..62ac7b1b4dd4 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -475,3 +475,5 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 common sframe_register sys_sframe_register
+473 common sframe_unregister sys_sframe_unregister
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 7e71bf7fcd14..f92273ae608a 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -517,3 +517,5 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 common sframe_register sys_sframe_register
+473 common sframe_unregister sys_sframe_unregister
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index f832ebd2d79b..409a50df3b21 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -477,3 +477,5 @@
469 i386 file_setattr sys_file_setattr
470 i386 listns sys_listns
471 i386 rseq_slice_yield sys_rseq_slice_yield
+472 i386 sframe_register sys_sframe_register
+473 i386 sframe_unregister sys_sframe_unregister
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 524155d655da..9b7c5a449751 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -396,6 +396,8 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 common sframe_register sys_sframe_register
+473 common sframe_unregister sys_sframe_unregister
#
# Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index a9bca4e484de..037b8040f69d 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -442,3 +442,5 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 common sframe_register' sys_sframe_register
+473 common sframe_unregister sys_sframe_unregister
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index f5639d5ac331..992ccc401c5e 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -999,6 +999,8 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr, struct lsm_ctx __user *
asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx __user *ctx,
u32 size, u32 flags);
asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32 flags);
+asmlinkage long sys_sframe_register(void *data, unsigned int size);
+asmlinkage long sys_sframe_unregister(void *data, unsigned int size);
/*
* Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index a627acc8fb5f..17042d7e5e87 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -863,8 +863,13 @@ __SYSCALL(__NR_listns, sys_listns)
#define __NR_rseq_slice_yield 471
__SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield)
+#define __NR_sframe_register 472
+__SYSCALL(__NR_sframe_register, sys_sframe_register)
+#define __NR_sframe_unregister 473
+__SYSCALL(__NR_sframe_unregister, sys_sframe_unregister)
+
#undef __NR_syscalls
-#define __NR_syscalls 472
+#define __NR_syscalls 474
/*
* 32 bit systems traditionally used different
diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h
new file mode 100644
index 000000000000..137a2ebf91f4
--- /dev/null
+++ b/include/uapi/linux/sframe.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+#ifndef _UAPI_LINUX_SFRAME_H
+#define _UAPI_LINUX_SFRAME_H
+
+struct sframe_setup {
+ unsigned long sframe_start;
+ unsigned long sframe_size;
+ unsigned long text_start;
+ unsigned long text_size;
+};
+
+#endif /* _UAPI_LINUX_SFRAME_H */
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index add3032da16f..eca5293f5d40 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -394,3 +394,6 @@ COND_SYSCALL(rseq_slice_yield);
COND_SYSCALL(uretprobe);
COND_SYSCALL(uprobe);
+
+COND_SYSCALL(sframe_register);
+COND_SYSCALL(sframe_unregister);
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index db88d993dff1..9956f1e3aba1 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -12,8 +12,10 @@
#include <linux/mm.h>
#include <linux/string_helpers.h>
#include <linux/sframe.h>
+#include <linux/syscalls.h>
#include <asm/unwind_user_sframe.h>
#include <linux/unwind_user_types.h>
+#include <uapi/linux/sframe.h>
#include "sframe.h"
#include "sframe_debug.h"
@@ -842,9 +844,11 @@ static void sframe_free_srcu(struct rcu_head *rcu)
static int __sframe_remove_section(struct mm_struct *mm,
struct sframe_section *sec)
{
- if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
- dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
- return -EINVAL;
+ scoped_guard(mmap_read_lock, mm) {
+ if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
+ dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
+ return -EINVAL;
+ }
}
call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
@@ -936,3 +940,56 @@ void sframe_free_mm(struct mm_struct *mm)
mtree_destroy(&mm->sframe_mt);
}
+
+/**
+ * sys_sframe_register - register an address for user space stacktrace walking.
+ * @data: Structure of sframe data used to register the sframe section
+ * @size: The size of the given structure.
+ *
+ * This system call is used by dynamic library utilities to inform the kernel
+ * of meta data that it loaded that can be used by the kernel to know how
+ * to stack walk the given text locations.
+ *
+ * Return: 0 if successful, otherwise a negative error.
+ */
+SYSCALL_DEFINE2(sframe_register, __user struct sframe_setup *, data, unsigned int, size)
+{
+ struct sframe_setup sframe;
+
+ if (sizeof(sframe) != size)
+ return -EINVAL;
+
+ if (copy_from_user(&sframe, data, size))
+ return -EFAULT;
+
+ return sframe_add_section(sframe.sframe_start,
+ sframe.sframe_start + sframe.sframe_size,
+ sframe.text_start,
+ sframe.text_start + sframe.text_size);
+}
+
+/**
+ * sys_sframe_unregister - unregister an sframe address
+ * @data: Structure of sframe data used to register the sframe section
+ * @size: The size of the given structure.
+ *
+ * The data->sframe_start is the only value that is used. The rest must
+ * be zero.
+ *
+ * Return: 0 if successful, otherwise a negative error.
+ */
+SYSCALL_DEFINE2(sframe_unregister, __user struct sframe_setup *, data, unsigned int, size)
+{
+ struct sframe_setup sframe;
+
+ if (sizeof(sframe) != size)
+ return -EINVAL;
+
+ if (copy_from_user(&sframe, data, size))
+ return -EFAULT;
+
+ if (sframe.sframe_size || sframe.text_start || sframe.text_size)
+ return -EINVAL;
+
+ return sframe_remove_section(sframe.sframe_start);
+}
diff --git a/scripts/syscall.tbl b/scripts/syscall.tbl
index 7a42b32b6577..46ec22b50042 100644
--- a/scripts/syscall.tbl
+++ b/scripts/syscall.tbl
@@ -412,3 +412,5 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 common sframe_register sys_sframe_register
+473 common sframe_unregister sys_sframe_unregister
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] unwind: Add sframe_(un)register() system calls
2026-05-21 22:35 [PATCH] unwind: Add sframe_(un)register() system calls Steven Rostedt
@ 2026-05-22 9:43 ` Jens Remus
2026-05-22 11:18 ` Steven Rostedt
2026-05-22 14:36 ` Thomas Weißschuh
1 sibling, 1 reply; 5+ messages in thread
From: Jens Remus @ 2026-05-22 9:43 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Linux Trace Kernel, bpf, Masami Hiramatsu,
Mathieu Desnoyers, Josh Poimboeuf, Peter Zijlstra, Ingo Molnar,
Jiri Olsa, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
Kees Cook, Carlos O'Donell, Sam James, Dylan Hatch,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
On 5/22/2026 12:35 AM, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> Add system calls to register and unregister sframes that can be used by
> dynamic linkers to tell the kernel where the sframe section is in memory
> for libraries it loads.
Why two separate system calls? Can't that be one single stacktracectl?
Could they at least be non-sframe specific, e.g. stracktrace_register
and stracktrace_unregister, so that if one would implement e.g. unwind
user dwarf/eh_frame in the future one could pass ehframe_start and
ehframe_end in addition to sframe_start and sframe_end?
>
> Both system calls take a pointer to a new structure:
>
> struct sframe_setup {
> unsigned long sframe_start;
> unsigned long sframe_size;
> unsigned long text_start;
> unsigned long text_size;
> };
>
> and a size of the passed in structure. If the system call needs to be
> extended, then the structure could be changed and the size of that
> structure will tell the kernel that it is the new version. If the kernel
> does not recognize the structure size, it will return -EINVAL.
>
> sframe_start - The virtual address of the sframe section
> sframe_size - The length of the sframe section
> text_start - the text section the sframe represents
> test_size - the length of the section
>
> If other stack tracing functionality is added, it will require a new
> system call.
>
> The unregister only needs the sframe_start and requires all the rest of
> the fields to be 0. In the future, if more can be done, then user space
> can update the other values and check the return code to see if the kernel
> supports it.
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>
> Based on top of Jens patches here:
>
> https://lore.kernel.org/linux-trace-kernel/20260520154004.3845823-1-jremus@linux.ibm.com/
>
> [ Note, I tested this with the same program from the RFC patch ]
>
> Changes from RFC: https://patch.msgid.link/20260429114355.6c712e6a@gandalf.local.home
>
> - Remove the ioctl() like system call for a unique system call for each
> functionality. Right now there's two functionalities:
> 1. register sframe section
> 2. unregister sframe sections
>
> - Added taking a lock around the mtree logic in __sframe_remove_section()
> as Sashiko mentioned that there could be races from user space
> registering and unregistering sframe sections at the same time.
Doesn't sframe_add_section() then also need likewise?
>
> - Removed [RFC] from subject as I believe this is more likely the way
> this system call will be done.
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> @@ -999,6 +999,8 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr, struct lsm_ctx __user *
> asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx __user *ctx,
> u32 size, u32 flags);
> asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32 flags);
> +asmlinkage long sys_sframe_register(void *data, unsigned int size);
> +asmlinkage long sys_sframe_unregister(void *data, unsigned int size);
>
> /*
> * Architecture-specific system calls
> diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_SFRAME_H
> +#define _UAPI_LINUX_SFRAME_H
> +
> +struct sframe_setup {
> + unsigned long sframe_start;
> + unsigned long sframe_size;
> + unsigned long text_start;
> + unsigned long text_size;
> +};
> +
> +#endif /* _UAPI_LINUX_SFRAME_H */
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> @@ -842,9 +844,11 @@ static void sframe_free_srcu(struct rcu_head *rcu)
> static int __sframe_remove_section(struct mm_struct *mm,
> struct sframe_section *sec)
> {
> - if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> - dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
> - return -EINVAL;
> + scoped_guard(mmap_read_lock, mm) {
Why is a read lock sufficient? Doesn't that allow multiple readers?
How does that prevent a concurrent modification of the mm->sframe_mt?
> + if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> + dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
> + return -EINVAL;
> + }
Is (or why not) likewise required in sframe_add_section() for the
mtree_insert_range()?
Wasn't the reported issue that while mt_for_each() in
sframe_remove_section() there could be concurrent mtree_erase() in
__sframe_remove_section() followed by mtree_insert_range() in
sframe_add_section(), so that the mt_for_each() could get confused?
> }
>
> call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
> @@ -936,3 +940,56 @@ void sframe_free_mm(struct mm_struct *mm)
>
> mtree_destroy(&mm->sframe_mt);
> }
> +
> +/**
> + * sys_sframe_register - register an address for user space stacktrace walking.
> + * @data: Structure of sframe data used to register the sframe section
> + * @size: The size of the given structure.
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE2(sframe_register, __user struct sframe_setup *, data, unsigned int, size)
> +{
> + struct sframe_setup sframe;
> +
> + if (sizeof(sframe) != size)
> + return -EINVAL;
> +
> + if (copy_from_user(&sframe, data, size))
> + return -EFAULT;
> +
> + return sframe_add_section(sframe.sframe_start,
> + sframe.sframe_start + sframe.sframe_size,
> + sframe.text_start,
> + sframe.text_start + sframe.text_size);
> +}
> +
> +/**
> + * sys_sframe_unregister - unregister an sframe address
> + * @data: Structure of sframe data used to register the sframe section
> + * @size: The size of the given structure.
> + *
> + * The data->sframe_start is the only value that is used. The rest must
> + * be zero.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE2(sframe_unregister, __user struct sframe_setup *, data, unsigned int, size)
> +{
> + struct sframe_setup sframe;
> +
> + if (sizeof(sframe) != size)
> + return -EINVAL;
> +
> + if (copy_from_user(&sframe, data, size))
> + return -EFAULT;
> +
> + if (sframe.sframe_size || sframe.text_start || sframe.text_size)
> + return -EINVAL;
> +
> + return sframe_remove_section(sframe.sframe_start);
> +}
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] unwind: Add sframe_(un)register() system calls
2026-05-22 9:43 ` Jens Remus
@ 2026-05-22 11:18 ` Steven Rostedt
0 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2026-05-22 11:18 UTC (permalink / raw)
To: Jens Remus
Cc: LKML, Linux Trace Kernel, bpf, Masami Hiramatsu,
Mathieu Desnoyers, Josh Poimboeuf, Peter Zijlstra, Ingo Molnar,
Jiri Olsa, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
Kees Cook, Carlos O'Donell, Sam James, Dylan Hatch,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
On Fri, 22 May 2026 11:43:06 +0200
Jens Remus <jremus@linux.ibm.com> wrote:
> On 5/22/2026 12:35 AM, Steven Rostedt wrote:
> > From: Steven Rostedt <rostedt@goodmis.org>
> >
> > Add system calls to register and unregister sframes that can be used by
> > dynamic linkers to tell the kernel where the sframe section is in memory
> > for libraries it loads.
>
> Why two separate system calls? Can't that be one single stacktracectl?
> Could they at least be non-sframe specific, e.g. stracktrace_register
> and stracktrace_unregister, so that if one would implement e.g. unwind
> user dwarf/eh_frame in the future one could pass ehframe_start and
> ehframe_end in addition to sframe_start and sframe_end?
Talking with everyone at LSF/MM/BPF the consensus was to avoid an ioctl
like system call. Everyone hates them. They told me that a system call
should do one thing. They wanted a separate system call to register and to
unregister.
Note this also helps to see what the user is doing via monitoring via
ftrace, strace, and security wise via LSMs and seccomp.
>
> >
> > Both system calls take a pointer to a new structure:
> >
> > struct sframe_setup {
> > unsigned long sframe_start;
> > unsigned long sframe_size;
> > unsigned long text_start;
> > unsigned long text_size;
> > };
> >
> > and a size of the passed in structure. If the system call needs to be
> > extended, then the structure could be changed and the size of that
> > structure will tell the kernel that it is the new version. If the kernel
> > does not recognize the structure size, it will return -EINVAL.
> >
> > sframe_start - The virtual address of the sframe section
> > sframe_size - The length of the sframe section
> > text_start - the text section the sframe represents
> > test_size - the length of the section
> >
> > If other stack tracing functionality is added, it will require a new
> > system call.
> >
> > The unregister only needs the sframe_start and requires all the rest of
> > the fields to be 0. In the future, if more can be done, then user space
> > can update the other values and check the return code to see if the kernel
> > supports it.
> >
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> >
> > Based on top of Jens patches here:
> >
> > https://lore.kernel.org/linux-trace-kernel/20260520154004.3845823-1-jremus@linux.ibm.com/
> >
> > [ Note, I tested this with the same program from the RFC patch ]
> >
> > Changes from RFC: https://patch.msgid.link/20260429114355.6c712e6a@gandalf.local.home
> >
> > - Remove the ioctl() like system call for a unique system call for each
> > functionality. Right now there's two functionalities:
> > 1. register sframe section
> > 2. unregister sframe sections
> >
> > - Added taking a lock around the mtree logic in __sframe_remove_section()
> > as Sashiko mentioned that there could be races from user space
> > registering and unregistering sframe sections at the same time.
>
> Doesn't sframe_add_section() then also need likewise?
Ah, I saw the lock grabbed on the vma lookup. It should also be done for the
mtree_insert_range(). Thanks, will fix.
>
> >
> > - Removed [RFC] from subject as I believe this is more likely the way
> > this system call will be done.
>
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>
> > @@ -999,6 +999,8 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr, struct lsm_ctx __user *
> > asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx __user *ctx,
> > u32 size, u32 flags);
> > asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32 flags);
> > +asmlinkage long sys_sframe_register(void *data, unsigned int size);
> > +asmlinkage long sys_sframe_unregister(void *data, unsigned int size);
> >
> > /*
> > * Architecture-specific system calls
>
>
> > diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h
>
> > @@ -0,0 +1,12 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> > +#ifndef _UAPI_LINUX_SFRAME_H
> > +#define _UAPI_LINUX_SFRAME_H
> > +
> > +struct sframe_setup {
> > + unsigned long sframe_start;
> > + unsigned long sframe_size;
> > + unsigned long text_start;
> > + unsigned long text_size;
> > +};
> > +
> > +#endif /* _UAPI_LINUX_SFRAME_H */
>
> > diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>
> > @@ -842,9 +844,11 @@ static void sframe_free_srcu(struct rcu_head *rcu)
> > static int __sframe_remove_section(struct mm_struct *mm,
> > struct sframe_section *sec)
> > {
> > - if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> > - dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
> > - return -EINVAL;
> > + scoped_guard(mmap_read_lock, mm) {
>
> Why is a read lock sufficient? Doesn't that allow multiple readers?
> How does that prevent a concurrent modification of the mm->sframe_mt?
That was a cut and paste error. I meant to change it to a write lock, but
got distracted :-p Thanks, will fix.
>
> > + if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> > + dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
> > + return -EINVAL;
> > + }
>
> Is (or why not) likewise required in sframe_add_section() for the
> mtree_insert_range()?
>
> Wasn't the reported issue that while mt_for_each() in
> sframe_remove_section() there could be concurrent mtree_erase() in
> __sframe_remove_section() followed by mtree_insert_range() in
> sframe_add_section(), so that the mt_for_each() could get confused?
I'll take a closer look. But let me fix the obvious bugs first.
-- Steve
>
> > }
> >
> > call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
> > @@ -936,3 +940,56 @@ void sframe_free_mm(struct mm_struct *mm)
> >
> > mtree_destroy(&mm->sframe_mt);
> > }
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] unwind: Add sframe_(un)register() system calls
2026-05-21 22:35 [PATCH] unwind: Add sframe_(un)register() system calls Steven Rostedt
2026-05-22 9:43 ` Jens Remus
@ 2026-05-22 14:36 ` Thomas Weißschuh
2026-05-22 15:01 ` Steven Rostedt
1 sibling, 1 reply; 5+ messages in thread
From: Thomas Weißschuh @ 2026-05-22 14:36 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Linux Trace Kernel, bpf, Masami Hiramatsu,
Mathieu Desnoyers, Jens Remus, Josh Poimboeuf, Peter Zijlstra,
Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
Kees Cook, Carlos O'Donell, Sam James, Dylan Hatch,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
On 2026-05-21 18:35:32-0400, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> Add system calls to register and unregister sframes that can be used by
> dynamic linkers to tell the kernel where the sframe section is in memory
> for libraries it loads.
How is this system call related to the prctl() with the same
functionality from Jens' series? I guess it will replace it,
but some explanation would be nice.
(...)
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index f5639d5ac331..992ccc401c5e 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -999,6 +999,8 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr, struct lsm_ctx __user *
> asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx __user *ctx,
> u32 size, u32 flags);
> asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32 flags);
> +asmlinkage long sys_sframe_register(void *data, unsigned int size);
> +asmlinkage long sys_sframe_unregister(void *data, unsigned int size);
Why not use the actual structure here?
> /*
> * Architecture-specific system calls
(...)
> diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h
> new file mode 100644
> index 000000000000..137a2ebf91f4
> --- /dev/null
> +++ b/include/uapi/linux/sframe.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_SFRAME_H
> +#define _UAPI_LINUX_SFRAME_H
> +
> +struct sframe_setup {
> + unsigned long sframe_start;
> + unsigned long sframe_size;
> + unsigned long text_start;
> + unsigned long text_size;
> +};
This will break for compat processes, as they use a different 'unsigned
long' than the host kernel. Maybe just use __u64.
> +
> +#endif /* _UAPI_LINUX_SFRAME_H */
(...)
> +/**
> + * sys_sframe_register - register an address for user space stacktrace walking.
> + * @data: Structure of sframe data used to register the sframe section
> + * @size: The size of the given structure.
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE2(sframe_register, __user struct sframe_setup *, data, unsigned int, size)
AFAIK the normal place for the '__user' is right before '*':
struct sframe_setup __user *, data,
Use __kernel_size_t for 'size'?
> +{
> + struct sframe_setup sframe;
(...)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] unwind: Add sframe_(un)register() system calls
2026-05-22 14:36 ` Thomas Weißschuh
@ 2026-05-22 15:01 ` Steven Rostedt
0 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2026-05-22 15:01 UTC (permalink / raw)
To: Thomas Weißschuh
Cc: LKML, Linux Trace Kernel, bpf, Masami Hiramatsu,
Mathieu Desnoyers, Jens Remus, Josh Poimboeuf, Peter Zijlstra,
Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
Kees Cook, Carlos O'Donell, Sam James, Dylan Hatch,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
On Fri, 22 May 2026 16:36:56 +0200
Thomas Weißschuh <thomas@t-8ch.de> wrote:
> On 2026-05-21 18:35:32-0400, Steven Rostedt wrote:
> > From: Steven Rostedt <rostedt@goodmis.org>
> >
> > Add system calls to register and unregister sframes that can be used by
> > dynamic linkers to tell the kernel where the sframe section is in memory
> > for libraries it loads.
>
> How is this system call related to the prctl() with the same
> functionality from Jens' series? I guess it will replace it,
> but some explanation would be nice.
I thought the patch with the prctl() stated it was for debug purposes only.
From the change log:
[
This adds an interface for prctl() for testing loading of sframes for
libraries. But this interface should really be a system call. This patch
is for testing purposes only and should not be applied to mainline.
]
Hence I didn't think there needs to be any explanation. The prctl() patch
should never be applied upstream.
>
> (...)
>
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index f5639d5ac331..992ccc401c5e 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -999,6 +999,8 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr, struct lsm_ctx __user *
> > asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx __user *ctx,
> > u32 size, u32 flags);
> > asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32 flags);
> > +asmlinkage long sys_sframe_register(void *data, unsigned int size);
> > +asmlinkage long sys_sframe_unregister(void *data, unsigned int size);
>
> Why not use the actual structure here?
Yeah, I was somewhat lazy here to make sure that this was the direction we
want to go. I just need to add a structure pointer reference at the top of
that file.
Will update in v2.
>
> > /*
> > * Architecture-specific system calls
>
> (...)
>
> > diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h
> > new file mode 100644
> > index 000000000000..137a2ebf91f4
> > --- /dev/null
> > +++ b/include/uapi/linux/sframe.h
> > @@ -0,0 +1,12 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> > +#ifndef _UAPI_LINUX_SFRAME_H
> > +#define _UAPI_LINUX_SFRAME_H
> > +
> > +struct sframe_setup {
> > + unsigned long sframe_start;
> > + unsigned long sframe_size;
> > + unsigned long text_start;
> > + unsigned long text_size;
> > +};
>
> This will break for compat processes, as they use a different 'unsigned
> long' than the host kernel. Maybe just use __u64.
I'll update it. I was thinking we wouldn't support compat, but in case we
decide we should forcing the size is better than being architecture
specific.
>
> > +
> > +#endif /* _UAPI_LINUX_SFRAME_H */
>
> (...)
>
> > +/**
> > + * sys_sframe_register - register an address for user space stacktrace walking.
> > + * @data: Structure of sframe data used to register the sframe section
> > + * @size: The size of the given structure.
> > + *
> > + * This system call is used by dynamic library utilities to inform the kernel
> > + * of meta data that it loaded that can be used by the kernel to know how
> > + * to stack walk the given text locations.
> > + *
> > + * Return: 0 if successful, otherwise a negative error.
> > + */
> > +SYSCALL_DEFINE2(sframe_register, __user struct sframe_setup *, data, unsigned int, size)
>
> AFAIK the normal place for the '__user' is right before '*':
>
> struct sframe_setup __user *, data,
Will update.
>
> Use __kernel_size_t for 'size'?
Looking at the history of the accept() system call that started with int
and then wanted size_t, then changed to socklen_t, I guess there's
precedence to use __kernel_size_t.
Will update.
Thanks!
-- Steve
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-22 15:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-21 22:35 [PATCH] unwind: Add sframe_(un)register() system calls Steven Rostedt
2026-05-22 9:43 ` Jens Remus
2026-05-22 11:18 ` Steven Rostedt
2026-05-22 14:36 ` Thomas Weißschuh
2026-05-22 15:01 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox