From: Jens Remus <jremus@linux.ibm.com>
To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
x86@kernel.org, Steven Rostedt <rostedt@kernel.org>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Indu Bhagat <ibhagatgnu@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Dylan Hatch <dylanbhatch@google.com>,
Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Kees Cook <kees@kernel.org>, Sam James <sam@gentoo.org>
Cc: Jens Remus <jremus@linux.ibm.com>,
bpf@vger.kernel.org, linux-mm@kvack.org,
Namhyung Kim <namhyung@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
"Jose E. Marchesi" <jemarch@gnu.org>,
Beau Belgrave <beaub@linux.microsoft.com>,
Florian Weimer <fweimer@redhat.com>,
"Carlos O'Donell" <codonell@redhat.com>,
Masami Hiramatsu <mhiramat@kernel.org>,
Jiri Olsa <jolsa@kernel.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <liam@infradead.org>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Ilya Leoshkevich <iii@linux.ibm.com>
Subject: [PATCH v14 00/19] unwind_deferred: Implement sframe handling
Date: Tue, 5 May 2026 14:16:59 +0200 [thread overview]
Message-ID: <20260505121718.3572346-1-jremus@linux.ibm.com> (raw)
This is the implementation of parsing the SFrame V3 stack trace information
from an .sframe section in an ELF file. It's a continuation of Josh's and
Steve's work that can be found here:
https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
https://lore.kernel.org/all/20250827201548.448472904@kernel.org/
Currently the only way to get a user space stack trace from a stack
walk (and not just copying large amount of user stack into the kernel
ring buffer) is to use frame pointers. This has a few issues. The biggest
one is that compiling frame pointers into every application and library
has been shown to cause performance overhead.
Another issue is that the format of the frames may not always be consistent
between different compilers and some architectures (s390) has no defined
format to do a reliable stack walk. The only way to perform user space
profiling on these architectures is to copy the user stack into the kernel
buffer.
SFrame [1] is now supported in binutils (x86-64, ARM64, and s390). There is
discussions going on about supporting SFrame in LLVM. SFrame acts more like
ORC, and lives in the ELF executable file as its own section. Like ORC it
has two tables where the first table is sorted by instruction pointers (IP)
and using the current IP and finding it's entry in the first table, it will
take you to the second table which will tell you where the return address
of the current function is located and then you can use that address to
look it up in the first table to find the return address of that function,
and so on. This performs a user space stack walk.
Now because the .sframe section lives in the ELF file it needs to be faulted
into memory when it is used. This means that walking the user space stack
requires being in a faultable context. As profilers like perf request a stack
trace in interrupt or NMI context, it cannot do the walking when it is
requested. Instead it must be deferred until it is safe to fault in user
space. One place this is known to be safe is when the task is about to return
back to user space.
This series makes the deferred unwind user code implement SFrame format V3
and enables it on x86-64.
[1]: https://sourceware.org/binutils/wiki/sframe
This series applies on top of v7.1-rc2 tag:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git v7.1-rc2
The to be stack-traced user space programs (and libraries) need to be
built with the recent SFrame stack trace information format V3, as
generated by binutils 2.46+ with assembler option --gsframe-3.
Namhyung Kim's related perf tools deferred callchain support can be used
for testing ("perf record --call-graph fp,defer" and "perf report/script").
Changes since v13 (see patch notes for details):
- Rebase on v7.1-rc2.
- Correct SFRAME_V3_FDE_TYPE_MASK value.
- Fix FDE function start address check in __read_fde().
- Rename SFrame V3 definitions accoring to final specification. (Indu)
- Improve comments on why UNWIND_USER_RULE_CFA_OFFSET is not
implemented. (Mark Rutland)
- Add/update/improve sframe debug messages.
- Add generic and arch-specific unwind_user.h to MAINTAINERS.
- Add arch-specific unwind_user_sframe.h to MAINTAINERS.
Changes since v12 (see patch notes for details):
- Add support for SFrame V3, including its new flexible FDEs. SFrame V2
is not supported.
Changes since v11 (see patch notes for details):
- Adjust to Peter's latest undwind user enhancements.
- Simplify logic by using an internal SFrame FDE representation, whose
FDE function start address field is an address instead of a PC-relative
offset (from FDE).
- Rename struct sframe_fre to sframe_fre_internal to align with
struct sframe_fde_internal.
- Remove unused pt_regs from unwind_user_next_common() and its
callers. (Peter)
- Simplify unwind_user_next_sframe(). (Peter)
- Fix a few checkpatch errors and warnings.
- Minor cleanups (e.g. move includes, fix indentation).
Changes since v10:
- Support for SFrame V2 PC-relative FDE function start address.
- Support for SFrame V2 representing RA undefined as indication for
outermost frames.
Patch 1 (new), as a preparatory cleanup, adds the generic and arch-specific
unwind_user.h to MAINTAINERS.
Patches 2, 5, 12, and 18 have been updated to exclusively support the
latest SFrame V3 stack trace information format, that is generated by
binutils 2.46+. Old SFrame V2 sections get rejected with dynamic debug
message "bad/unsupported sframe header".
Patches 8 and 9 add support to unwind user (sframe) for outermost frames.
Patches 13-16 add support to unwind user (sframe) for the new SFrame V3
flexible FDEs.
Patch 17 improves the performance of searching the SFrame FRE for an IP.
Regards,
Jens
Jens Remus (8):
unwind_user: Add generic and arch-specific headers to MAINTAINERS
unwind_user: Stop when reaching an outermost frame
unwind_user/sframe: Add support for outermost frame indication
unwind_user: Enable archs that pass RA in a register
unwind_user: Flexible FP/RA recovery rules
unwind_user: Flexible CFA recovery rules
unwind_user/sframe: Add support for SFrame V3 flexible FDEs
unwind_user/sframe: Separate reading of FRE from reading of FRE data
words
Josh Poimboeuf (11):
unwind_user/sframe: Add support for reading .sframe headers
unwind_user/sframe: Store .sframe section data in per-mm maple tree
x86/uaccess: Add unsafe_copy_from_user() implementation
unwind_user/sframe: Add support for reading .sframe contents
unwind_user/sframe: Detect .sframe sections in executables
unwind_user/sframe: Wire up unwind_user to sframe
unwind_user/sframe: Remove .sframe section on detected corruption
unwind_user/sframe: Show file name in debug output
unwind_user/sframe: Add .sframe validation option
unwind_user/sframe/x86: Enable sframe unwinding on x86
unwind_user/sframe: Add prctl() interface for registering .sframe
sections
MAINTAINERS | 4 +
arch/Kconfig | 23 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/mmu.h | 2 +-
arch/x86/include/asm/uaccess.h | 39 +-
arch/x86/include/asm/unwind_user.h | 68 +-
arch/x86/include/asm/unwind_user_sframe.h | 12 +
fs/binfmt_elf.c | 48 +-
include/linux/mm_types.h | 3 +
include/linux/sframe.h | 60 ++
include/linux/unwind_user.h | 18 +
include/linux/unwind_user_types.h | 46 +-
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 6 +-
kernel/fork.c | 10 +
kernel/sys.c | 8 +
kernel/unwind/Makefile | 3 +-
kernel/unwind/sframe.c | 842 ++++++++++++++++++++++
kernel/unwind/sframe.h | 87 +++
kernel/unwind/sframe_debug.h | 68 ++
kernel/unwind/user.c | 111 ++-
mm/init-mm.c | 2 +
22 files changed, 1423 insertions(+), 39 deletions(-)
create mode 100644 arch/x86/include/asm/unwind_user_sframe.h
create mode 100644 include/linux/sframe.h
create mode 100644 kernel/unwind/sframe.c
create mode 100644 kernel/unwind/sframe.h
create mode 100644 kernel/unwind/sframe_debug.h
--
2.51.0
next reply other threads:[~2026-05-05 12:18 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 12:16 Jens Remus [this message]
2026-05-05 12:17 ` [PATCH v14 01/19] unwind_user: Add generic and arch-specific headers to MAINTAINERS Jens Remus
2026-05-05 12:17 ` [PATCH v14 02/19] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
2026-05-05 12:17 ` [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree Jens Remus
2026-05-05 12:17 ` [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
2026-05-05 12:17 ` [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
2026-05-05 12:17 ` [PATCH v14 06/19] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
2026-05-05 12:17 ` [PATCH v14 07/19] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
2026-05-05 12:17 ` [PATCH v14 08/19] unwind_user: Stop when reaching an outermost frame Jens Remus
2026-05-05 12:17 ` [PATCH v14 09/19] unwind_user/sframe: Add support for outermost frame indication Jens Remus
2026-05-05 12:17 ` [PATCH v14 10/19] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
2026-05-05 12:17 ` [PATCH v14 11/19] unwind_user/sframe: Show file name in debug output Jens Remus
2026-05-05 12:17 ` [PATCH v14 12/19] unwind_user/sframe: Add .sframe validation option Jens Remus
2026-05-05 12:17 ` [PATCH v14 13/19] unwind_user: Enable archs that pass RA in a register Jens Remus
2026-05-05 12:17 ` [PATCH v14 14/19] unwind_user: Flexible FP/RA recovery rules Jens Remus
2026-05-05 12:17 ` [PATCH v14 15/19] unwind_user: Flexible CFA " Jens Remus
2026-05-05 12:17 ` [PATCH v14 16/19] unwind_user/sframe: Add support for SFrame V3 flexible FDEs Jens Remus
2026-05-05 12:17 ` [PATCH v14 17/19] unwind_user/sframe: Separate reading of FRE from reading of FRE data words Jens Remus
2026-05-05 12:17 ` [PATCH v14 18/19] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
2026-05-05 12:17 ` [PATCH v14 19/19] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
2026-05-05 12:25 ` [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260505121718.3572346-1-jremus@linux.ibm.com \
--to=jremus@linux.ibm.com \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=beaub@linux.microsoft.com \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=codonell@redhat.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=dylanbhatch@google.com \
--cc=fweimer@redhat.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=hpa@zytor.com \
--cc=ibhagatgnu@gmail.com \
--cc=iii@linux.ibm.com \
--cc=jemarch@gnu.org \
--cc=jolsa@kernel.org \
--cc=jpoimboe@kernel.org \
--cc=kees@kernel.org \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=ljs@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@kernel.org \
--cc=rppt@kernel.org \
--cc=sam@gentoo.org \
--cc=surenb@google.com \
--cc=tglx@kernel.org \
--cc=vbabka@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox