From: Jens Remus <jremus@linux.ibm.com>
To: Steven Rostedt <rostedt@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>,
bpf@vger.kernel.org, Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>, Jiri Olsa <jolsa@kernel.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Andrii Nakryiko <andrii@kernel.org>,
Indu Bhagat <indu.bhagat@oracle.com>,
"Jose E. Marchesi" <jemarch@gnu.org>,
Beau Belgrave <beaub@linux.microsoft.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Florian Weimer <fweimer@redhat.com>, Kees Cook <kees@kernel.org>,
"Carlos O'Donell" <codonell@redhat.com>,
Sam James <sam@gentoo.org>, Dylan Hatch <dylanbhatch@google.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Hildenbrand <david@redhat.com>,
"H. Peter Anvin" <hpa@zytor.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Michal Hocko <mhocko@suse.com>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>
Subject: Re: [PATCH] unwind: Add sframe_(un)register() system calls
Date: Fri, 22 May 2026 11:43:06 +0200 [thread overview]
Message-ID: <d82ea328-758e-4e41-a58e-46f3137feca7@linux.ibm.com> (raw)
In-Reply-To: <20260521183532.7a145c8a@gandalf.local.home>
On 5/22/2026 12:35 AM, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> Add system calls to register and unregister sframes that can be used by
> dynamic linkers to tell the kernel where the sframe section is in memory
> for libraries it loads.
Why two separate system calls? Can't that be one single stacktracectl?
Could they at least be non-sframe specific, e.g. stracktrace_register
and stracktrace_unregister, so that if one would implement e.g. unwind
user dwarf/eh_frame in the future one could pass ehframe_start and
ehframe_end in addition to sframe_start and sframe_end?
>
> Both system calls take a pointer to a new structure:
>
> struct sframe_setup {
> unsigned long sframe_start;
> unsigned long sframe_size;
> unsigned long text_start;
> unsigned long text_size;
> };
>
> and a size of the passed in structure. If the system call needs to be
> extended, then the structure could be changed and the size of that
> structure will tell the kernel that it is the new version. If the kernel
> does not recognize the structure size, it will return -EINVAL.
>
> sframe_start - The virtual address of the sframe section
> sframe_size - The length of the sframe section
> text_start - the text section the sframe represents
> test_size - the length of the section
>
> If other stack tracing functionality is added, it will require a new
> system call.
>
> The unregister only needs the sframe_start and requires all the rest of
> the fields to be 0. In the future, if more can be done, then user space
> can update the other values and check the return code to see if the kernel
> supports it.
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>
> Based on top of Jens patches here:
>
> https://lore.kernel.org/linux-trace-kernel/20260520154004.3845823-1-jremus@linux.ibm.com/
>
> [ Note, I tested this with the same program from the RFC patch ]
>
> Changes from RFC: https://patch.msgid.link/20260429114355.6c712e6a@gandalf.local.home
>
> - Remove the ioctl() like system call for a unique system call for each
> functionality. Right now there's two functionalities:
> 1. register sframe section
> 2. unregister sframe sections
>
> - Added taking a lock around the mtree logic in __sframe_remove_section()
> as Sashiko mentioned that there could be races from user space
> registering and unregistering sframe sections at the same time.
Doesn't sframe_add_section() then also need likewise?
>
> - Removed [RFC] from subject as I believe this is more likely the way
> this system call will be done.
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> @@ -999,6 +999,8 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr, struct lsm_ctx __user *
> asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx __user *ctx,
> u32 size, u32 flags);
> asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32 flags);
> +asmlinkage long sys_sframe_register(void *data, unsigned int size);
> +asmlinkage long sys_sframe_unregister(void *data, unsigned int size);
>
> /*
> * Architecture-specific system calls
> diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_SFRAME_H
> +#define _UAPI_LINUX_SFRAME_H
> +
> +struct sframe_setup {
> + unsigned long sframe_start;
> + unsigned long sframe_size;
> + unsigned long text_start;
> + unsigned long text_size;
> +};
> +
> +#endif /* _UAPI_LINUX_SFRAME_H */
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> @@ -842,9 +844,11 @@ static void sframe_free_srcu(struct rcu_head *rcu)
> static int __sframe_remove_section(struct mm_struct *mm,
> struct sframe_section *sec)
> {
> - if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> - dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
> - return -EINVAL;
> + scoped_guard(mmap_read_lock, mm) {
Why is a read lock sufficient? Doesn't that allow multiple readers?
How does that prevent a concurrent modification of the mm->sframe_mt?
> + if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> + dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
> + return -EINVAL;
> + }
Is (or why not) likewise required in sframe_add_section() for the
mtree_insert_range()?
Wasn't the reported issue that while mt_for_each() in
sframe_remove_section() there could be concurrent mtree_erase() in
__sframe_remove_section() followed by mtree_insert_range() in
sframe_add_section(), so that the mt_for_each() could get confused?
> }
>
> call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
> @@ -936,3 +940,56 @@ void sframe_free_mm(struct mm_struct *mm)
>
> mtree_destroy(&mm->sframe_mt);
> }
> +
> +/**
> + * sys_sframe_register - register an address for user space stacktrace walking.
> + * @data: Structure of sframe data used to register the sframe section
> + * @size: The size of the given structure.
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE2(sframe_register, __user struct sframe_setup *, data, unsigned int, size)
> +{
> + struct sframe_setup sframe;
> +
> + if (sizeof(sframe) != size)
> + return -EINVAL;
> +
> + if (copy_from_user(&sframe, data, size))
> + return -EFAULT;
> +
> + return sframe_add_section(sframe.sframe_start,
> + sframe.sframe_start + sframe.sframe_size,
> + sframe.text_start,
> + sframe.text_start + sframe.text_size);
> +}
> +
> +/**
> + * sys_sframe_unregister - unregister an sframe address
> + * @data: Structure of sframe data used to register the sframe section
> + * @size: The size of the given structure.
> + *
> + * The data->sframe_start is the only value that is used. The rest must
> + * be zero.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE2(sframe_unregister, __user struct sframe_setup *, data, unsigned int, size)
> +{
> + struct sframe_setup sframe;
> +
> + if (sizeof(sframe) != size)
> + return -EINVAL;
> +
> + if (copy_from_user(&sframe, data, size))
> + return -EFAULT;
> +
> + if (sframe.sframe_size || sframe.text_start || sframe.text_size)
> + return -EINVAL;
> +
> + return sframe_remove_section(sframe.sframe_start);
> +}
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
next prev parent reply other threads:[~2026-05-22 9:45 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-21 22:35 [PATCH] unwind: Add sframe_(un)register() system calls Steven Rostedt
2026-05-22 9:43 ` Jens Remus [this message]
2026-05-22 11:18 ` Steven Rostedt
2026-05-22 14:36 ` Thomas Weißschuh
2026-05-22 15:01 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d82ea328-758e-4e41-a58e-46f3137feca7@linux.ibm.com \
--to=jremus@linux.ibm.com \
--cc=Liam.Howlett@oracle.com \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=beaub@linux.microsoft.com \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=codonell@redhat.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=dylanbhatch@google.com \
--cc=fweimer@redhat.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=hpa@zytor.com \
--cc=indu.bhagat@oracle.com \
--cc=jemarch@gnu.org \
--cc=jolsa@kernel.org \
--cc=jpoimboe@kernel.org \
--cc=kees@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@kernel.org \
--cc=rppt@kernel.org \
--cc=sam@gentoo.org \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox