The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Jens Remus <jremus@linux.ibm.com>
To: Steven Rostedt <rostedt@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Jiri Olsa <jolsa@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrii Nakryiko <andrii@kernel.org>,
	Indu Bhagat <indu.bhagat@oracle.com>,
	"Jose E. Marchesi" <jemarch@gnu.org>,
	Beau Belgrave <beaub@linux.microsoft.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Florian Weimer <fweimer@redhat.com>, Kees Cook <kees@kernel.org>,
	"Carlos O'Donell" <codonell@redhat.com>,
	Sam James <sam@gentoo.org>, Dylan Hatch <dylanbhatch@google.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Michal Hocko <mhocko@suse.com>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>
Subject: Re: [RFC][PATCH] unwind: Add stacktrace_setup system call
Date: Fri, 8 May 2026 09:46:30 +0200	[thread overview]
Message-ID: <43158d95-b4c2-44d2-a244-eb546fb2bfaa@linux.ibm.com> (raw)
In-Reply-To: <20260429114355.6c712e6a@gandalf.local.home>

On 4/29/2026 5:43 PM, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> [
>    This is an RFC that adds a system call for dynamic linkers to use to
>    tell the kernel where the sframe sections are when it loads dynamic
>    libraries.
> 
>    It is built on top of Jens's sframe implementation for v3:
> 
>       https://lore.kernel.org/linux-trace-kernel/20260127150554.2760964-1-jremus@linux.ibm.com/
> 
>    I have a repo with that code that this applies on top of here:
> 
>       git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git sframe/core
>        
> 
>    The name of the system call is "stacktrace_setup", but I'm not attached
>    to this name. If anyone can think of a better name I'm happy to take
>    suggestions.
> 
>    This patch is just to get the conversation going and the final result
>    may be much different. I tested this with the attached program which is a
>    major hack. I built glibc with sframe v3 support and I used readelf to
>    find the sframe size and location of glibc.
> 
>    readelf -e /work/usr/lib/libc.so.6 | grep sframe
>      [19] .sframe           GNU_SFRAME       00000000001d3fc0  001d3fc0
> 
>    Then I wrote a program that takes the above location and size of the
>    .sframe section in libc as parameters, scans /proc/self/maps to find
>    where it loaded libc and then calls this new system call with a pointer
>    to the location of the sframe along with its size, as well as where the
>    libc text is located.
> 
>    It then spins for 2 seconds, calls the system call again to remove the
>    sframe section it loaded, and spins for another 2 seconds.
> 
>    I ran perf record --call-graph fp,defer on the program and looked for
>    the do_spin() function.
> 
>    With sframe loaded:
> 
> sframe-test    1350  1396.333593:     202366 cpu/cycles/P: 
>             7fdf0ec38a44 [unknown] ([vdso])
>             5621a6b97243 get_time+0x19 (/work/c/sframe-test)
>             5621a6b9727f do_spin+0x1f (/work/c/sframe-test)
>             5621a6b975cd main+0xd4 (/work/c/sframe-test)
>             7fdf0ea26bda __libc_start_call_main+0x6a (/work/usr/lib/libc.so.6)
>             7fdf0ea26d05 __libc_start_main@@GLIBC_2.34+0x85 (/work/usr/lib/libc.so.6)
>             5621a6b97131 _start+0x21 (/work/c/sframe-test)
> 
>    After it unloads the sframe:
> 
> sframe-test    1350  1400.332902:     657582 cpu/cycles/P: 
>             7fdf0ec38a5e [unknown] ([vdso])
>             5621a6b97243 get_time+0x19 (/work/c/sframe-test)
>             5621a6b9727f do_spin+0x1f (/work/c/sframe-test)
>             5621a6b97602 main+0x109 (/work/c/sframe-test)
>             7fdf0ea26bda __libc_start_call_main+0x6a (/work/usr/lib/libc.so.6)
> 
>    As you can see, with the sframe loaded, it was able to walk further up
>    the libc library.
> 
>    Again, this is just an RFC, but I would like to get agreement on the
>    system call so that we can then update the dynamic linker to do this
>    instead of using my hack ;-)
> ]
> 
> Add a system call that can be used by dynamic linkers to tell the kernel
> where the sframe section is in memory for libraries it loads.
> 
> The system call stacktrace_setup takes 5 parameters:
> 
>   op - the type of operation to perform
>   addr_start - The virtual address of the sframe section
>   addr_length - The length of the sframe section
>   text_start - the text section the sframe represents
>   test_length - the length of the section
> 
> The current op values are:
> 
>   STACKTRACE_REGISTER_SFRAME - This registers the sframe
>   STACKTRACE_UNREGISTER_SFRAME - This removes the sframe
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

LGTM.  Some comments/questions below.

> diff --git a/include/uapi/linux/stacktrace.h b/include/uapi/linux/stacktrace.h

> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_STACKTRACE_H
> +#define _UAPI_LINUX_STACKTRACE_H
> +
> +enum stacktrace_setup_types {
> +	STACKTRACE_REGISTER_SFRAME	= 1,
> +	STACKTRACE_UNREGISTER_SFRAME	= 2,
> +};
> +
> +#endif /* _UAPI_LINUX_STACKTRACE_H */

> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c

Having the syscall live in kernel/unwind/sframe.c means it is only
available if config option HAVE_UNWIND_USER_SFRAME is selected (which
triggers sframe.o to be built and linked into the kernel), which makes
sense as long as it only implements sframe-specific functionality.
I suppose it could be moved elsewhere if non-sframe use cases would
arise in the future?

Would Dylan need to guard it when introducing HAVE_UNWIND_KERNEL_SFRAME?
Provided the syscall fails with -ENOSYS if not implemented (e.g. when
HAVE_UNWIND_USER_SFRAME is not enabled) the dummy implementations of
sframe_add_section() and sframe_remove_section() in linux/sframe.h also
return -ENOSYS, so the user observable behavior would be the same and
it would not matter.  Do you agree?

> @@ -12,8 +12,10 @@
>  #include <linux/mm.h>
>  #include <linux/string_helpers.h>
>  #include <linux/sframe.h>
> +#include <linux/syscalls.h>
>  #include <asm/unwind_user_sframe.h>
>  #include <linux/unwind_user_types.h>
> +#include <uapi/linux/stacktrace.h>
>  
>  #include "sframe.h"
>  #include "sframe_debug.h"
> @@ -838,3 +840,38 @@ void sframe_free_mm(struct mm_struct *mm)
>  
>  	mtree_destroy(&mm->sframe_mt);
>  }
> +
> +/**
> + * sys_stacktrace_setup - register an address for user space stacktrace walking.
> + * @op: Type of operation to perform
> + * @addr_start: The virtual address of the stacktrace information
> + * @addr_length: The length of the stacktrace information
> + * @text_start: The virtual address of the text that @addr_start represents
> + * @text_length: The length of teh text
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Currently only sframes are supported, but in the future, this may be used
> + * to tell the kernel about JIT code which will most likely have a different
> + * format.
> + *
> + * The type command may be extended and parameters may be used for other
> + * purposes.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE5(stacktrace_setup, int, op, unsigned long, addr_start,
> +		unsigned long, addr_length, unsigned long, text_start,
> +		unsigned long, text_length)

Would it make sense to keep the parameters generic from start, similar
to how it is done in prctl()?  Or can this be changed later, if the need
arises?

SYSCALL_DEFINE5(stacktrace_setup, int, op, unsigned long, arg2,
		unsigned long, arg3, unsigned long, arg4, unsigned long, arg5)

> +{
> +	switch (op) {
> +	case STACKTRACE_REGISTER_SFRAME:
> +		return sframe_add_section(addr_start, addr_start + addr_length,
> +					  text_start, text_start+text_length);

Nit:
					  text_start, text_start + text_length);

> +	case STACKTRACE_UNREGISTER_SFRAME:
> +		return sframe_remove_section(addr_start);
> +	}
> +	return -EINVAL;
> +}
Thanks and regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/


      parent reply	other threads:[~2026-05-08  7:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260429114355.6c712e6a@gandalf.local.home>
2026-05-07 12:37 ` [RFC][PATCH] unwind: Add stacktrace_setup system call Jose E. Marchesi
2026-05-08  1:57   ` Steven Rostedt
2026-05-08  7:32     ` Jose E. Marchesi
2026-05-08  7:46 ` Jens Remus [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43158d95-b4c2-44d2-a244-eb546fb2bfaa@linux.ibm.com \
    --to=jremus@linux.ibm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=beaub@linux.microsoft.com \
    --cc=bp@alien8.de \
    --cc=codonell@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dylanbhatch@google.com \
    --cc=fweimer@redhat.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=indu.bhagat@oracle.com \
    --cc=jemarch@gnu.org \
    --cc=jolsa@kernel.org \
    --cc=jpoimboe@kernel.org \
    --cc=kees@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@kernel.org \
    --cc=rppt@kernel.org \
    --cc=sam@gentoo.org \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox