Linux Trace Kernel
 help / color / mirror / Atom feed
From: Jens Remus <jremus@linux.ibm.com>
To: Steven Rostedt <rostedt@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>,
	bpf@vger.kernel.org, Masami Hiramatsu <mhiramat@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Jiri Olsa <jolsa@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrii Nakryiko <andrii@kernel.org>,
	Indu Bhagat <indu.bhagat@oracle.com>,
	"Jose E. Marchesi" <jemarch@gnu.org>,
	Beau Belgrave <beaub@linux.microsoft.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Florian Weimer <fweimer@redhat.com>, Kees Cook <kees@kernel.org>,
	"Carlos O'Donell" <codonell@redhat.com>,
	Sam James <sam@gentoo.org>, Dylan Hatch <dylanbhatch@google.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Michal Hocko <mhocko@suse.com>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>
Subject: Re: [PATCH] unwind: Add sframe_(un)register() system calls
Date: Fri, 22 May 2026 11:43:06 +0200	[thread overview]
Message-ID: <d82ea328-758e-4e41-a58e-46f3137feca7@linux.ibm.com> (raw)
In-Reply-To: <20260521183532.7a145c8a@gandalf.local.home>

On 5/22/2026 12:35 AM, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> Add system calls to register and unregister sframes that can be used by
> dynamic linkers to tell the kernel where the sframe section is in memory
> for libraries it loads.

Why two separate system calls?  Can't that be one single stacktracectl?
Could they at least be non-sframe specific, e.g. stracktrace_register
and stracktrace_unregister, so that if one would implement e.g. unwind
user dwarf/eh_frame in the future one could pass ehframe_start and
ehframe_end in addition to sframe_start and sframe_end?

> 
> Both system calls take a pointer to a new structure:
> 
>   struct sframe_setup {
> 	unsigned long		sframe_start;
> 	unsigned long		sframe_size;
> 	unsigned long		text_start;
> 	unsigned long		text_size;
>   };
> 
> and a size of the passed in structure. If the system call needs to be
> extended, then the structure could be changed and the size of that
> structure will tell the kernel that it is the new version. If the kernel
> does not recognize the structure size, it will return -EINVAL.
> 
>   sframe_start - The virtual address of the sframe section
>   sframe_size  - The length of the sframe section
>   text_start   - the text section the sframe represents
>   test_size    - the length of the section
> 
> If other stack tracing functionality is added, it will require a new
> system call.
> 
> The unregister only needs the sframe_start and requires all the rest of
> the fields to be 0. In the future, if more can be done, then user space
> can update the other values and check the return code to see if the kernel
> supports it.
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
> 
> Based on top of Jens patches here:
> 
>   https://lore.kernel.org/linux-trace-kernel/20260520154004.3845823-1-jremus@linux.ibm.com/
> 
> [ Note, I tested this with the same program from the RFC patch ]
> 
> Changes from RFC: https://patch.msgid.link/20260429114355.6c712e6a@gandalf.local.home
> 
> - Remove the ioctl() like system call for a unique system call for each
>   functionality. Right now there's two functionalities:
>    1. register sframe section
>    2. unregister sframe sections
> 
> - Added taking a lock around the mtree logic in __sframe_remove_section()
>   as Sashiko mentioned that there could be races from user space
>   registering and unregistering sframe sections at the same time.

Doesn't sframe_add_section() then also need likewise?

> 
> - Removed [RFC] from subject as I believe this is more likely the way
>   this system call will be done.

> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h

> @@ -999,6 +999,8 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr, struct lsm_ctx __user *
>  asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx __user *ctx,
>  				      u32 size, u32 flags);
>  asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32 flags);
> +asmlinkage long sys_sframe_register(void *data, unsigned int size);
> +asmlinkage long sys_sframe_unregister(void *data, unsigned int size);
>  
>  /*
>   * Architecture-specific system calls


> diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h

> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_SFRAME_H
> +#define _UAPI_LINUX_SFRAME_H
> +
> +struct sframe_setup {
> +	unsigned long		sframe_start;
> +	unsigned long		sframe_size;
> +	unsigned long		text_start;
> +	unsigned long		text_size;
> +};
> +
> +#endif /* _UAPI_LINUX_SFRAME_H */

> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c

> @@ -842,9 +844,11 @@ static void sframe_free_srcu(struct rcu_head *rcu)
>  static int __sframe_remove_section(struct mm_struct *mm,
>  				   struct sframe_section *sec)
>  {
> -	if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> -		dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
> -		return -EINVAL;
> +	scoped_guard(mmap_read_lock, mm) {

Why is a read lock sufficient? Doesn't that allow multiple readers?
How does that prevent a concurrent modification of the mm->sframe_mt?

> +		if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> +			dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
> +			return -EINVAL;
> +		}

Is (or why not) likewise required in sframe_add_section() for the
mtree_insert_range()?

Wasn't the reported issue that while mt_for_each() in
sframe_remove_section() there could be concurrent mtree_erase() in
__sframe_remove_section() followed by mtree_insert_range() in
sframe_add_section(), so that the mt_for_each() could get confused?

>  	}
>  
>  	call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
> @@ -936,3 +940,56 @@ void sframe_free_mm(struct mm_struct *mm)
>  
>  	mtree_destroy(&mm->sframe_mt);
>  }
> +
> +/**
> + * sys_sframe_register - register an address for user space stacktrace walking.
> + * @data: Structure of sframe data used to register the sframe section
> + * @size: The size of the given structure.
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE2(sframe_register, __user struct sframe_setup *, data, unsigned int, size)
> +{
> +	struct sframe_setup sframe;
> +
> +	if (sizeof(sframe) != size)
> +		return -EINVAL;
> +
> +	if (copy_from_user(&sframe, data, size))
> +		return -EFAULT;
> +
> +	return sframe_add_section(sframe.sframe_start,
> +				  sframe.sframe_start + sframe.sframe_size,
> +				  sframe.text_start,
> +				  sframe.text_start + sframe.text_size);
> +}
> +
> +/**
> + * sys_sframe_unregister - unregister an sframe address
> + * @data: Structure of sframe data used to register the sframe section
> + * @size: The size of the given structure.
> + *
> + * The data->sframe_start is the only value that is used. The rest must
> + * be zero.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE2(sframe_unregister, __user struct sframe_setup *, data, unsigned int, size)
> +{
> +	struct sframe_setup sframe;
> +
> +	if (sizeof(sframe) != size)
> +		return -EINVAL;
> +
> +	if (copy_from_user(&sframe, data, size))
> +		return -EFAULT;
> +
> +	if (sframe.sframe_size || sframe.text_start || sframe.text_size)
> +		return -EINVAL;
> +
> +	return sframe_remove_section(sframe.sframe_start);
> +}
Thanks and regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/


  reply	other threads:[~2026-05-22  9:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-21 22:35 [PATCH] unwind: Add sframe_(un)register() system calls Steven Rostedt
2026-05-22  9:43 ` Jens Remus [this message]
2026-05-22 11:18   ` Steven Rostedt
2026-05-22 14:36 ` Thomas Weißschuh
2026-05-22 15:01   ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d82ea328-758e-4e41-a58e-46f3137feca7@linux.ibm.com \
    --to=jremus@linux.ibm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=beaub@linux.microsoft.com \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=codonell@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dylanbhatch@google.com \
    --cc=fweimer@redhat.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=indu.bhagat@oracle.com \
    --cc=jemarch@gnu.org \
    --cc=jolsa@kernel.org \
    --cc=jpoimboe@kernel.org \
    --cc=kees@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@kernel.org \
    --cc=rppt@kernel.org \
    --cc=sam@gentoo.org \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox