* [PATCH v14 00/19] unwind_deferred: Implement sframe handling
@ 2026-05-05 12:16 Jens Remus
2026-05-05 12:17 ` [PATCH v14 01/19] unwind_user: Add generic and arch-specific headers to MAINTAINERS Jens Remus
` (19 more replies)
0 siblings, 20 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:16 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich
This is the implementation of parsing the SFrame V3 stack trace information
from an .sframe section in an ELF file. It's a continuation of Josh's and
Steve's work that can be found here:
https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
https://lore.kernel.org/all/20250827201548.448472904@kernel.org/
Currently the only way to get a user space stack trace from a stack
walk (and not just copying large amount of user stack into the kernel
ring buffer) is to use frame pointers. This has a few issues. The biggest
one is that compiling frame pointers into every application and library
has been shown to cause performance overhead.
Another issue is that the format of the frames may not always be consistent
between different compilers and some architectures (s390) has no defined
format to do a reliable stack walk. The only way to perform user space
profiling on these architectures is to copy the user stack into the kernel
buffer.
SFrame [1] is now supported in binutils (x86-64, ARM64, and s390). There is
discussions going on about supporting SFrame in LLVM. SFrame acts more like
ORC, and lives in the ELF executable file as its own section. Like ORC it
has two tables where the first table is sorted by instruction pointers (IP)
and using the current IP and finding it's entry in the first table, it will
take you to the second table which will tell you where the return address
of the current function is located and then you can use that address to
look it up in the first table to find the return address of that function,
and so on. This performs a user space stack walk.
Now because the .sframe section lives in the ELF file it needs to be faulted
into memory when it is used. This means that walking the user space stack
requires being in a faultable context. As profilers like perf request a stack
trace in interrupt or NMI context, it cannot do the walking when it is
requested. Instead it must be deferred until it is safe to fault in user
space. One place this is known to be safe is when the task is about to return
back to user space.
This series makes the deferred unwind user code implement SFrame format V3
and enables it on x86-64.
[1]: https://sourceware.org/binutils/wiki/sframe
This series applies on top of v7.1-rc2 tag:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git v7.1-rc2
The to be stack-traced user space programs (and libraries) need to be
built with the recent SFrame stack trace information format V3, as
generated by binutils 2.46+ with assembler option --gsframe-3.
Namhyung Kim's related perf tools deferred callchain support can be used
for testing ("perf record --call-graph fp,defer" and "perf report/script").
Changes since v13 (see patch notes for details):
- Rebase on v7.1-rc2.
- Correct SFRAME_V3_FDE_TYPE_MASK value.
- Fix FDE function start address check in __read_fde().
- Rename SFrame V3 definitions accoring to final specification. (Indu)
- Improve comments on why UNWIND_USER_RULE_CFA_OFFSET is not
implemented. (Mark Rutland)
- Add/update/improve sframe debug messages.
- Add generic and arch-specific unwind_user.h to MAINTAINERS.
- Add arch-specific unwind_user_sframe.h to MAINTAINERS.
Changes since v12 (see patch notes for details):
- Add support for SFrame V3, including its new flexible FDEs. SFrame V2
is not supported.
Changes since v11 (see patch notes for details):
- Adjust to Peter's latest undwind user enhancements.
- Simplify logic by using an internal SFrame FDE representation, whose
FDE function start address field is an address instead of a PC-relative
offset (from FDE).
- Rename struct sframe_fre to sframe_fre_internal to align with
struct sframe_fde_internal.
- Remove unused pt_regs from unwind_user_next_common() and its
callers. (Peter)
- Simplify unwind_user_next_sframe(). (Peter)
- Fix a few checkpatch errors and warnings.
- Minor cleanups (e.g. move includes, fix indentation).
Changes since v10:
- Support for SFrame V2 PC-relative FDE function start address.
- Support for SFrame V2 representing RA undefined as indication for
outermost frames.
Patch 1 (new), as a preparatory cleanup, adds the generic and arch-specific
unwind_user.h to MAINTAINERS.
Patches 2, 5, 12, and 18 have been updated to exclusively support the
latest SFrame V3 stack trace information format, that is generated by
binutils 2.46+. Old SFrame V2 sections get rejected with dynamic debug
message "bad/unsupported sframe header".
Patches 8 and 9 add support to unwind user (sframe) for outermost frames.
Patches 13-16 add support to unwind user (sframe) for the new SFrame V3
flexible FDEs.
Patch 17 improves the performance of searching the SFrame FRE for an IP.
Regards,
Jens
Jens Remus (8):
unwind_user: Add generic and arch-specific headers to MAINTAINERS
unwind_user: Stop when reaching an outermost frame
unwind_user/sframe: Add support for outermost frame indication
unwind_user: Enable archs that pass RA in a register
unwind_user: Flexible FP/RA recovery rules
unwind_user: Flexible CFA recovery rules
unwind_user/sframe: Add support for SFrame V3 flexible FDEs
unwind_user/sframe: Separate reading of FRE from reading of FRE data
words
Josh Poimboeuf (11):
unwind_user/sframe: Add support for reading .sframe headers
unwind_user/sframe: Store .sframe section data in per-mm maple tree
x86/uaccess: Add unsafe_copy_from_user() implementation
unwind_user/sframe: Add support for reading .sframe contents
unwind_user/sframe: Detect .sframe sections in executables
unwind_user/sframe: Wire up unwind_user to sframe
unwind_user/sframe: Remove .sframe section on detected corruption
unwind_user/sframe: Show file name in debug output
unwind_user/sframe: Add .sframe validation option
unwind_user/sframe/x86: Enable sframe unwinding on x86
unwind_user/sframe: Add prctl() interface for registering .sframe
sections
MAINTAINERS | 4 +
arch/Kconfig | 23 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/mmu.h | 2 +-
arch/x86/include/asm/uaccess.h | 39 +-
arch/x86/include/asm/unwind_user.h | 68 +-
arch/x86/include/asm/unwind_user_sframe.h | 12 +
fs/binfmt_elf.c | 48 +-
include/linux/mm_types.h | 3 +
include/linux/sframe.h | 60 ++
include/linux/unwind_user.h | 18 +
include/linux/unwind_user_types.h | 46 +-
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 6 +-
kernel/fork.c | 10 +
kernel/sys.c | 8 +
kernel/unwind/Makefile | 3 +-
kernel/unwind/sframe.c | 842 ++++++++++++++++++++++
kernel/unwind/sframe.h | 87 +++
kernel/unwind/sframe_debug.h | 68 ++
kernel/unwind/user.c | 111 ++-
mm/init-mm.c | 2 +
22 files changed, 1423 insertions(+), 39 deletions(-)
create mode 100644 arch/x86/include/asm/unwind_user_sframe.h
create mode 100644 include/linux/sframe.h
create mode 100644 kernel/unwind/sframe.c
create mode 100644 kernel/unwind/sframe.h
create mode 100644 kernel/unwind/sframe_debug.h
--
2.51.0
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v14 01/19] unwind_user: Add generic and arch-specific headers to MAINTAINERS
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 12:17 ` [PATCH v14 02/19] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
` (18 subsequent siblings)
19 siblings, 0 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich
Commit 71753c6ed2bf ("unwind_user: Add user space unwinding API with
frame pointer support") introduced include/asm-generic/unwind_user.h
without adding it to MAINTAINERS, as well as any future arch-specific
versions such as the one added by commit 49cf34c0815f
("unwind_user/x86: Enable frame pointer unwinding on x86") which
introduced arch/x86/include/asm/unwind_user.h.
Suggested-by: Dylan Hatch <dylanbhatch@google.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v14:
- New patch.
MAINTAINERS | 2 ++
1 file changed, 2 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 882214b0e7db..8e71c6e7a68a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -27862,6 +27862,8 @@ USERSPACE STACK UNWINDING
M: Josh Poimboeuf <jpoimboe@kernel.org>
M: Steven Rostedt <rostedt@goodmis.org>
S: Maintained
+F: arch/*/include/asm/unwind_user.h
+F: include/asm-generic/unwind_user.h
F: include/linux/unwind*.h
F: kernel/unwind/
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 02/19] unwind_user/sframe: Add support for reading .sframe headers
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
2026-05-05 12:17 ` [PATCH v14 01/19] unwind_user: Add generic and arch-specific headers to MAINTAINERS Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 12:49 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree Jens Remus
` (17 subsequent siblings)
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
In preparation for unwinding user space stacks with sframe, add basic
sframe compile infrastructure and support for reading the .sframe
section header.
sframe_add_section() reads the header and unconditionally returns an
error, so it's not very useful yet. A subsequent patch will improve
that.
Link: https://lore.kernel.org/all/f27e8463783febfa0dabb0432a3dd6be8ad98412.1737511963.git.jpoimboe@kernel.org/
[ Jens Remus: Add support for SFrame V3. Add support for PC-relative
FDE function start offset. Cleanup includes and indentation. ]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v14:
- Rename SFRAME_FDE_TYPE_REGULAR to SFRAME_FDE_TYPE_DEFAULT to match
SFrame V3 specification. (Indu)
- Correct SFRAME_V3_FDE_TYPE_MASK value.
Changes in v13:
- Update to SFrame V3:
- Add and use SFRAME_VERSION_3 definition.
- Add helper macros to access SFrame V3 FDE type.
- Rename SFRAME_FUNC_*() macros to SFRAME_FDE_*().
- Rename SFRAME_FDE_TYPE_PC* defines to SFRAME_FDE_PCTYPE_* and
SFRAME_FUNC_FDE_TYPE() macro to SFRAME_V3_FDE_PCTYPE().
- Reword OFFSET to DATAWORD in SFRAME_FRE_OFFSET_{COUNT|SIZE}()
macros.
- Rename version-specific SFRAME_*() macros to SFRAME_V3_*().
- Update struct sframe_fde and rename to sframe_fde_v3:
- Change field start_addr from s32 to s64 and rename to
func_start_off.
- Change field fres_num from u32 to u16.
- New field u8 info2.
- Remove u16 padding field.
- Split FDE into function descriptor entry (struct sframe_fde_v3) and
attributes (struct sframe_fde_v3).
- Rename macro parameter "data" to "info" to hint at fde/fre info
word and wrap it in parenthesis.
- Group SFRAME_* definitions so that related ones are together.
- Reword commit message (my changes).
MAINTAINERS | 1 +
arch/Kconfig | 3 +
include/linux/sframe.h | 37 +++++++++++
kernel/unwind/Makefile | 3 +-
kernel/unwind/sframe.c | 136 +++++++++++++++++++++++++++++++++++++++++
kernel/unwind/sframe.h | 81 ++++++++++++++++++++++++
6 files changed, 260 insertions(+), 1 deletion(-)
create mode 100644 include/linux/sframe.h
create mode 100644 kernel/unwind/sframe.c
create mode 100644 kernel/unwind/sframe.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 8e71c6e7a68a..24d1fe93ff4a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -27864,6 +27864,7 @@ M: Steven Rostedt <rostedt@goodmis.org>
S: Maintained
F: arch/*/include/asm/unwind_user.h
F: include/asm-generic/unwind_user.h
+F: include/linux/sframe.h
F: include/linux/unwind*.h
F: kernel/unwind/
diff --git a/arch/Kconfig b/arch/Kconfig
index e86880045158..94b2d5e8e529 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -486,6 +486,9 @@ config HAVE_UNWIND_USER_FP
bool
select UNWIND_USER
+config HAVE_UNWIND_USER_SFRAME
+ bool
+
config HAVE_PERF_REGS
bool
help
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
new file mode 100644
index 000000000000..0642595534f9
--- /dev/null
+++ b/include/linux/sframe.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_SFRAME_H
+#define _LINUX_SFRAME_H
+
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+
+struct sframe_section {
+ unsigned long sframe_start;
+ unsigned long sframe_end;
+ unsigned long text_start;
+ unsigned long text_end;
+
+ unsigned long fdes_start;
+ unsigned long fres_start;
+ unsigned long fres_end;
+ unsigned int num_fdes;
+
+ signed char ra_off;
+ signed char fp_off;
+};
+
+extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+ unsigned long text_start, unsigned long text_end);
+extern int sframe_remove_section(unsigned long sframe_addr);
+
+#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
+
+static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+ unsigned long text_start, unsigned long text_end)
+{
+ return -ENOSYS;
+}
+static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+
+#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
+
+#endif /* _LINUX_SFRAME_H */
diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile
index eae37bea54fd..146038165865 100644
--- a/kernel/unwind/Makefile
+++ b/kernel/unwind/Makefile
@@ -1 +1,2 @@
- obj-$(CONFIG_UNWIND_USER) += user.o deferred.o
+ obj-$(CONFIG_UNWIND_USER) += user.o deferred.o
+ obj-$(CONFIG_HAVE_UNWIND_USER_SFRAME) += sframe.o
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
new file mode 100644
index 000000000000..c128c1eeaca2
--- /dev/null
+++ b/kernel/unwind/sframe.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Userspace sframe access functions
+ */
+
+#define pr_fmt(fmt) "sframe: " fmt
+
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/srcu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <linux/string_helpers.h>
+#include <linux/sframe.h>
+#include <linux/unwind_user_types.h>
+
+#include "sframe.h"
+
+#define dbg(fmt, ...) \
+ pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+
+static void free_section(struct sframe_section *sec)
+{
+ kfree(sec);
+}
+
+static int sframe_read_header(struct sframe_section *sec)
+{
+ unsigned long header_end, fdes_start, fdes_end, fres_start, fres_end;
+ struct sframe_header shdr;
+ unsigned int num_fdes;
+
+ if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
+ dbg("header usercopy failed\n");
+ return -EFAULT;
+ }
+
+ if (shdr.preamble.magic != SFRAME_MAGIC ||
+ shdr.preamble.version != SFRAME_VERSION_3 ||
+ !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
+ !(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
+ shdr.auxhdr_len) {
+ dbg("bad/unsupported sframe header\n");
+ return -EINVAL;
+ }
+
+ if (!shdr.num_fdes || !shdr.num_fres) {
+ dbg("no fde/fre entries\n");
+ return -EINVAL;
+ }
+
+ header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
+ if (header_end >= sec->sframe_end) {
+ dbg("header doesn't fit in section\n");
+ return -EINVAL;
+ }
+
+ num_fdes = shdr.num_fdes;
+ fdes_start = header_end + shdr.fdes_off;
+ fdes_end = fdes_start + (num_fdes * sizeof(struct sframe_fde_v3));
+
+ fres_start = header_end + shdr.fres_off;
+ fres_end = fres_start + shdr.fre_len;
+
+ if (fres_start < fdes_end || fres_end > sec->sframe_end) {
+ dbg("inconsistent fde/fre offsets\n");
+ return -EINVAL;
+ }
+
+ sec->num_fdes = num_fdes;
+ sec->fdes_start = fdes_start;
+ sec->fres_start = fres_start;
+ sec->fres_end = fres_end;
+
+ sec->ra_off = shdr.cfa_fixed_ra_offset;
+ sec->fp_off = shdr.cfa_fixed_fp_offset;
+
+ return 0;
+}
+
+int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+ unsigned long text_start, unsigned long text_end)
+{
+ struct vm_area_struct *sframe_vma, *text_vma;
+ struct mm_struct *mm = current->mm;
+ struct sframe_section *sec;
+ int ret;
+
+ if (!sframe_start || !sframe_end || !text_start || !text_end) {
+ dbg("zero-length sframe/text address\n");
+ return -EINVAL;
+ }
+
+ scoped_guard(mmap_read_lock, mm) {
+ sframe_vma = vma_lookup(mm, sframe_start);
+ if (!sframe_vma || sframe_end > sframe_vma->vm_end) {
+ dbg("bad sframe address (0x%lx - 0x%lx)\n",
+ sframe_start, sframe_end);
+ return -EINVAL;
+ }
+
+ text_vma = vma_lookup(mm, text_start);
+ if (!text_vma ||
+ !(text_vma->vm_flags & VM_EXEC) ||
+ text_end > text_vma->vm_end) {
+ dbg("bad text address (0x%lx - 0x%lx)\n",
+ text_start, text_end);
+ return -EINVAL;
+ }
+ }
+
+ sec = kzalloc(sizeof(*sec), GFP_KERNEL);
+ if (!sec)
+ return -ENOMEM;
+
+ sec->sframe_start = sframe_start;
+ sec->sframe_end = sframe_end;
+ sec->text_start = text_start;
+ sec->text_end = text_end;
+
+ ret = sframe_read_header(sec);
+ if (ret)
+ goto err_free;
+
+ /* TODO nowhere to store it yet - just free it and return an error */
+ ret = -ENOSYS;
+
+err_free:
+ free_section(sec);
+ return ret;
+}
+
+int sframe_remove_section(unsigned long sframe_start)
+{
+ return -ENOSYS;
+}
diff --git a/kernel/unwind/sframe.h b/kernel/unwind/sframe.h
new file mode 100644
index 000000000000..fc2908e92c7b
--- /dev/null
+++ b/kernel/unwind/sframe.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * From https://www.sourceware.org/binutils/docs/sframe-spec.html
+ */
+#ifndef _SFRAME_H
+#define _SFRAME_H
+
+#include <linux/types.h>
+
+#define SFRAME_VERSION_1 1
+#define SFRAME_VERSION_2 2
+#define SFRAME_VERSION_3 3
+#define SFRAME_MAGIC 0xdee2
+
+#define SFRAME_F_FDE_SORTED 0x1
+#define SFRAME_F_FRAME_POINTER 0x2
+#define SFRAME_F_FDE_FUNC_START_PCREL 0x4
+
+#define SFRAME_ABI_AARCH64_ENDIAN_BIG 1
+#define SFRAME_ABI_AARCH64_ENDIAN_LITTLE 2
+#define SFRAME_ABI_AMD64_ENDIAN_LITTLE 3
+
+struct sframe_preamble {
+ u16 magic;
+ u8 version;
+ u8 flags;
+} __packed;
+
+struct sframe_header {
+ struct sframe_preamble preamble;
+ u8 abi_arch;
+ s8 cfa_fixed_fp_offset;
+ s8 cfa_fixed_ra_offset;
+ u8 auxhdr_len;
+ u32 num_fdes;
+ u32 num_fres;
+ u32 fre_len;
+ u32 fdes_off;
+ u32 fres_off;
+} __packed;
+
+#define SFRAME_HEADER_SIZE(header) \
+ ((sizeof(struct sframe_header) + (header).auxhdr_len))
+
+struct sframe_fde_v3 {
+ s64 func_start_off;
+ u32 func_size;
+ u32 fres_off;
+} __packed;
+
+struct sframe_fda_v3 {
+ u16 fres_num;
+ u8 info;
+ u8 info2;
+ u8 rep_size;
+} __packed;
+
+#define SFRAME_FDE_PCTYPE_INC 0
+#define SFRAME_FDE_PCTYPE_MASK 1
+
+#define SFRAME_AARCH64_PAUTH_KEY_A 0
+#define SFRAME_AARCH64_PAUTH_KEY_B 1
+
+#define SFRAME_V3_FDE_FRE_TYPE(info) ((info) & 0xf)
+#define SFRAME_V3_FDE_PCTYPE(info) (((info) >> 4) & 0x1)
+#define SFRAME_V3_AARCH64_FDE_PAUTH_KEY(info) (((info) >> 5) & 0x1)
+
+#define SFRAME_FDE_TYPE_DEFAULT 0
+
+#define SFRAME_V3_FDE_TYPE_MASK 0x1f
+#define SFRAME_V3_FDE_TYPE(info2) ((info2) & SFRAME_V3_FDE_TYPE_MASK)
+
+#define SFRAME_BASE_REG_FP 0
+#define SFRAME_BASE_REG_SP 1
+
+#define SFRAME_V3_FRE_CFA_BASE_REG_ID(info) ((info) & 0x1)
+#define SFRAME_V3_FRE_DATAWORD_COUNT(info) (((info) >> 1) & 0xf)
+#define SFRAME_V3_FRE_DATAWORD_SIZE(info) (((info) >> 5) & 0x3)
+#define SFRAME_V3_AARCH64_FRE_MANGLED_RA_P(info) (((info) >> 7) & 0x1)
+
+#endif /* _SFRAME_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
2026-05-05 12:17 ` [PATCH v14 01/19] unwind_user: Add generic and arch-specific headers to MAINTAINERS Jens Remus
2026-05-05 12:17 ` [PATCH v14 02/19] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 18:51 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
` (16 subsequent siblings)
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
Associate an .sframe section with its mm by adding it to a per-mm maple
tree which is indexed by the corresponding text address range. A single
.sframe section can be associated with multiple text ranges.
[ Jens Remus: Minor cleanups. Reword commit subject/message. ]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/x86/include/asm/mmu.h | 2 +-
include/linux/mm_types.h | 3 ++
include/linux/sframe.h | 15 ++++++++++
kernel/fork.c | 10 +++++++
kernel/unwind/sframe.c | 56 ++++++++++++++++++++++++++++++++++++--
mm/init-mm.c | 2 ++
6 files changed, 84 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 0fe9c569d171..227a32899a59 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -87,7 +87,7 @@ typedef struct {
.context = { \
.ctx_id = 1, \
.lock = __MUTEX_INITIALIZER(mm.context.lock), \
- }
+ },
void leave_mm(void);
#define leave_mm leave_mm
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a308e2c23b82..c1505356b6fc 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1424,6 +1424,9 @@ struct mm_struct {
#ifdef CONFIG_MM_ID
mm_id_t mm_id;
#endif /* CONFIG_MM_ID */
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+ struct maple_tree sframe_mt;
+#endif
} __randomize_layout;
/*
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 0642595534f9..7ea6a97ed8af 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -2,6 +2,8 @@
#ifndef _LINUX_SFRAME_H
#define _LINUX_SFRAME_H
+#include <linux/mm_types.h>
+
#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
struct sframe_section {
@@ -19,18 +21,31 @@ struct sframe_section {
signed char fp_off;
};
+#define INIT_MM_SFRAME .sframe_mt = MTREE_INIT(sframe_mt, 0),
+extern void sframe_free_mm(struct mm_struct *mm);
+
extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end);
extern int sframe_remove_section(unsigned long sframe_addr);
+static inline bool current_has_sframe(void)
+{
+ struct mm_struct *mm = current->mm;
+
+ return mm && !mtree_empty(&mm->sframe_mt);
+}
+
#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
+#define INIT_MM_SFRAME
+static inline void sframe_free_mm(struct mm_struct *mm) {}
static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end)
{
return -ENOSYS;
}
static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+static inline bool current_has_sframe(void) { return false; }
#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
diff --git a/kernel/fork.c b/kernel/fork.c
index 5f3fdfdb14c7..8d8195561c95 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -110,6 +110,7 @@
#include <linux/tick.h>
#include <linux/unwind_deferred.h>
#include <linux/pgalloc.h>
+#include <linux/sframe.h>
#include <linux/uaccess.h>
#include <asm/mmu_context.h>
@@ -735,6 +736,7 @@ void __mmdrop(struct mm_struct *mm)
mm_pasid_drop(mm);
mm_destroy_cid(mm);
percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS);
+ sframe_free_mm(mm);
free_mm(mm);
}
@@ -1072,6 +1074,13 @@ static void mmap_init_lock(struct mm_struct *mm)
#endif
}
+static void mm_init_sframe(struct mm_struct *mm)
+{
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+ mt_init(&mm->sframe_mt);
+#endif
+}
+
static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
struct user_namespace *user_ns)
{
@@ -1100,6 +1109,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm->pmd_huge_pte = NULL;
#endif
mm_init_uprobes_state(mm);
+ mm_init_sframe(mm);
hugetlb_count_init(mm);
mm_flags_clear_all(mm);
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index c128c1eeaca2..fdb2a4908563 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -81,6 +81,7 @@ static int sframe_read_header(struct sframe_section *sec)
int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end)
{
+ struct maple_tree *sframe_mt = ¤t->mm->sframe_mt;
struct vm_area_struct *sframe_vma, *text_vma;
struct mm_struct *mm = current->mm;
struct sframe_section *sec;
@@ -122,15 +123,64 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
if (ret)
goto err_free;
- /* TODO nowhere to store it yet - just free it and return an error */
- ret = -ENOSYS;
+ ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
+ if (ret) {
+ dbg("mtree_insert_range failed: text=%lx-%lx\n",
+ sec->text_start, sec->text_end);
+ goto err_free;
+ }
+
+ return 0;
err_free:
free_section(sec);
return ret;
}
+static int __sframe_remove_section(struct mm_struct *mm,
+ struct sframe_section *sec)
+{
+ if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
+ dbg("mtree_erase failed: text=%lx\n", sec->text_start);
+ return -EINVAL;
+ }
+
+ free_section(sec);
+
+ return 0;
+}
+
int sframe_remove_section(unsigned long sframe_start)
{
- return -ENOSYS;
+ struct mm_struct *mm = current->mm;
+ struct sframe_section *sec;
+ unsigned long index = 0;
+ bool found = false;
+ int ret = 0;
+
+ mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
+ if (sec->sframe_start == sframe_start) {
+ found = true;
+ ret |= __sframe_remove_section(mm, sec);
+ }
+ }
+
+ if (!found || ret)
+ return -EINVAL;
+
+ return 0;
+}
+
+void sframe_free_mm(struct mm_struct *mm)
+{
+ struct sframe_section *sec;
+ unsigned long index = 0;
+
+ if (!mm)
+ return;
+
+ mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX)
+ free_section(sec);
+
+ mtree_destroy(&mm->sframe_mt);
}
diff --git a/mm/init-mm.c b/mm/init-mm.c
index c5556bb9d5f0..77909139162e 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -11,6 +11,7 @@
#include <linux/atomic.h>
#include <linux/user_namespace.h>
#include <linux/iommu.h>
+#include <linux/sframe.h>
#include <asm/mmu.h>
#ifndef INIT_MM_CONTEXT
@@ -49,6 +50,7 @@ struct mm_struct init_mm = {
#endif
.flexible_array = MM_STRUCT_FLEXIBLE_ARRAY_INIT,
INIT_MM_CONTEXT(init_mm)
+ INIT_MM_SFRAME
};
void setup_initial_init_mm(void *start_code, void *end_code,
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (2 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 18:22 ` sashiko-bot
2026-05-06 14:09 ` Jens Remus
2026-05-05 12:17 ` [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
` (15 subsequent siblings)
19 siblings, 2 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
Add an x86 implementation of unsafe_copy_from_user() similar to the
existing unsafe_copy_to_user().
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/x86/include/asm/uaccess.h | 39 +++++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 3a0dd3c2b233..79679779f1ea 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -598,7 +598,7 @@ _label: \
* We want the unsafe accessors to always be inlined and use
* the error labels - thus the macro games.
*/
-#define unsafe_copy_loop(dst, src, len, type, label) \
+#define unsafe_copy_to_user_loop(dst, src, len, type, label) \
while (len >= sizeof(type)) { \
unsafe_put_user(*(type *)(src),(type __user *)(dst),label); \
dst += sizeof(type); \
@@ -606,15 +606,34 @@ _label: \
len -= sizeof(type); \
}
-#define unsafe_copy_to_user(_dst,_src,_len,label) \
-do { \
- char __user *__ucu_dst = (_dst); \
- const char *__ucu_src = (_src); \
- size_t __ucu_len = (_len); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label); \
+#define unsafe_copy_to_user(_dst, _src, _len, label) \
+do { \
+ void __user *__dst = (_dst); \
+ const void *__src = (_src); \
+ size_t __len = (_len); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u64, label); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u32, label); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u16, label); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u8, label); \
+} while (0)
+
+#define unsafe_copy_from_user_loop(dst, src, len, type, label) \
+ while (len >= sizeof(type)) { \
+ unsafe_get_user(*(type *)(dst), (type __user *)(src), label); \
+ dst += sizeof(type); \
+ src += sizeof(type); \
+ len -= sizeof(type); \
+ }
+
+#define unsafe_copy_from_user(_dst, _src, _len, label) \
+do { \
+ void *__dst = (_dst); \
+ void __user *__src = (_src); \
+ size_t __len = (_len); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u64, label); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u32, label); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u16, label); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u8, label); \
} while (0)
#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (3 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 18:59 ` sashiko-bot
2026-05-08 10:50 ` Jens Remus
2026-05-05 12:17 ` [PATCH v14 06/19] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
` (14 subsequent siblings)
19 siblings, 2 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
In preparation for using sframe to unwind user space stacks, add an
sframe_find() interface for finding the sframe information associated
with a given text address.
For performance, use user_read_access_begin() and the corresponding
unsafe_*() accessors. Note that use of pr_debug() in uaccess-enabled
regions would break noinstr validation, so there aren't any debug
messages yet. That will be added in a subsequent commit.
Link: https://lore.kernel.org/all/77c0d1ec143bf2a53d66c4ecb190e7e0a576fbfd.1737511963.git.jpoimboe@kernel.org/
Link: https://lore.kernel.org/all/b35ca3a3-8de5-4d32-8d30-d4e562f6b0de@linux.ibm.com/
[ Jens Remus: Add initial support for SFrame V3 (limited to regular
FDEs). Add support for PC-relative FDE function start offset. Simplify
logic by using an internal FDE representation. Rename struct sframe_fre
to sframe_fre_internal to align with struct sframe_fde_internal.
Cleanup includes. Fix checkpatch errors "spaces required around that
':'". ]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v14:
- Fix FDE function start address check in __read_fde().
- Adjust to rename of SFRAME_FDE_TYPE_*.
Changes in v13:
- Update to SFrame V3:
- Adjust to SFRAME_V3_*() macros and macro/define renames.
- Adjust to struct sframe_fde_v3 rename.
- Adjust to s64 FDE function start offset.
- Rename local variables fde_type to fde_pctype.
- Add and maintain struct sframe_fde_internal field u8 info2.
- Adjust to FDE split into function descriptor entry
(struct sframe_fde_v3) and attributes (struct sframe_fde_v3).
- Rename offset_count/offset_size to dataword_count/dataword_count.
- Limit __read_fre() to SFrame V3 regular FDEs (FDE_TYPE_REGULAR). A
subsequent patch will add support for flexible FDEs (FDE_TYPE_FLEX).
- Rename struct sframe_fde_internal field func_start_addr to func_addr.
- Add support u64/s64 in UNSAFE_GET_USER_INC() for s64 FDE function
start offset.
- Reduce indentation of assignments to fre.
- Reword commit message (my changes).
include/linux/sframe.h | 6 +
kernel/unwind/sframe.c | 346 ++++++++++++++++++++++++++++++++++-
kernel/unwind/sframe_debug.h | 35 ++++
3 files changed, 383 insertions(+), 4 deletions(-)
create mode 100644 kernel/unwind/sframe_debug.h
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 7ea6a97ed8af..9a72209696f9 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -3,10 +3,14 @@
#define _LINUX_SFRAME_H
#include <linux/mm_types.h>
+#include <linux/srcu.h>
+#include <linux/unwind_user_types.h>
#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
struct sframe_section {
+ struct rcu_head rcu;
+
unsigned long sframe_start;
unsigned long sframe_end;
unsigned long text_start;
@@ -27,6 +31,7 @@ extern void sframe_free_mm(struct mm_struct *mm);
extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end);
extern int sframe_remove_section(unsigned long sframe_addr);
+extern int sframe_find(unsigned long ip, struct unwind_user_frame *frame);
static inline bool current_has_sframe(void)
{
@@ -45,6 +50,7 @@ static inline int sframe_add_section(unsigned long sframe_start, unsigned long s
return -ENOSYS;
}
static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+static inline int sframe_find(unsigned long ip, struct unwind_user_frame *frame) { return -ENOSYS; }
static inline bool current_has_sframe(void) { return false; }
#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index fdb2a4908563..c60aa527984a 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -15,9 +15,338 @@
#include <linux/unwind_user_types.h>
#include "sframe.h"
+#include "sframe_debug.h"
+
+struct sframe_fde_internal {
+ unsigned long func_addr;
+ u32 func_size;
+ u32 fda_off;
+ u32 fres_off;
+ u32 fres_num;
+ u8 info;
+ u8 info2;
+ u8 rep_size;
+};
+
+struct sframe_fre_internal {
+ unsigned int size;
+ u32 ip_off;
+ s32 cfa_off;
+ s32 ra_off;
+ s32 fp_off;
+ u8 info;
+};
+
+DEFINE_STATIC_SRCU(sframe_srcu);
+
+static __always_inline unsigned char fre_type_to_size(unsigned char fre_type)
+{
+ if (fre_type > 2)
+ return 0;
+ return 1 << fre_type;
+}
+
+static __always_inline unsigned char dataword_size_enum_to_size(unsigned char dataword_size)
+{
+ if (dataword_size > 2)
+ return 0;
+ return 1 << dataword_size;
+}
+
+static __always_inline int __read_fde(struct sframe_section *sec,
+ unsigned int fde_num,
+ struct sframe_fde_internal *fde)
+{
+ unsigned long fde_addr, fda_addr, func_addr;
+ struct sframe_fde_v3 _fde;
+ struct sframe_fda_v3 _fda;
+
+ fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde_v3));
+ unsafe_copy_from_user(&_fde, (void __user *)fde_addr,
+ sizeof(struct sframe_fde_v3), Efault);
+
+ func_addr = fde_addr + _fde.func_start_off;
+ if (func_addr < sec->text_start || func_addr >= sec->text_end)
+ return -EINVAL;
+
+ fda_addr = sec->fres_start + _fde.fres_off;
+ if (fda_addr + sizeof(struct sframe_fda_v3) > sec->fres_end)
+ return -EINVAL;
+ unsafe_copy_from_user(&_fda, (void __user *)fda_addr,
+ sizeof(struct sframe_fda_v3), Efault);
+
+ fde->func_addr = func_addr;
+ fde->func_size = _fde.func_size;
+ fde->fda_off = _fde.fres_off;
+ fde->fres_off = _fde.fres_off + sizeof(struct sframe_fda_v3);
+ fde->fres_num = _fda.fres_num;
+ fde->info = _fda.info;
+ fde->info2 = _fda.info2;
+ fde->rep_size = _fda.rep_size;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+static __always_inline int __find_fde(struct sframe_section *sec,
+ unsigned long ip,
+ struct sframe_fde_internal *fde)
+{
+ unsigned long func_addr_low = 0, func_addr_high = ULONG_MAX;
+ struct sframe_fde_v3 __user *first, *low, *high, *found = NULL;
+ int ret;
+
+ first = (void __user *)sec->fdes_start;
+ low = first;
+ high = first + sec->num_fdes - 1;
+
+ while (low <= high) {
+ struct sframe_fde_v3 __user *mid;
+ s64 func_off;
+ unsigned long func_addr;
+
+ mid = low + ((high - low) / 2);
+
+ unsafe_get_user(func_off, (s64 __user *)mid, Efault);
+ func_addr = (unsigned long)mid + func_off;
+
+ if (ip >= func_addr) {
+ if (func_addr < func_addr_low)
+ return -EFAULT;
+
+ func_addr_low = func_addr;
+
+ found = mid;
+ low = mid + 1;
+ } else {
+ if (func_addr > func_addr_high)
+ return -EFAULT;
+
+ func_addr_high = func_addr;
+
+ high = mid - 1;
+ }
+ }
+
+ if (!found)
+ return -EINVAL;
+
+ ret = __read_fde(sec, found - first, fde);
+ if (ret)
+ return ret;
+
+ /* make sure it's not in a gap */
+ if (ip < fde->func_addr || ip >= fde->func_addr + fde->func_size)
+ return -EINVAL;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+#define ____UNSAFE_GET_USER_INC(to, from, type, label) \
+({ \
+ type __to; \
+ unsafe_get_user(__to, (type __user *)from, label); \
+ from += sizeof(__to); \
+ to = __to; \
+})
+
+#define __UNSAFE_GET_USER_INC(to, from, size, label, u_or_s) \
+({ \
+ switch (size) { \
+ case 1: \
+ ____UNSAFE_GET_USER_INC(to, from, u_or_s##8, label); \
+ break; \
+ case 2: \
+ ____UNSAFE_GET_USER_INC(to, from, u_or_s##16, label); \
+ break; \
+ case 4: \
+ ____UNSAFE_GET_USER_INC(to, from, u_or_s##32, label); \
+ break; \
+ default: \
+ return -EFAULT; \
+ } \
+})
+
+#define UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label) \
+ __UNSAFE_GET_USER_INC(to, from, size, label, u)
+
+#define UNSAFE_GET_USER_SIGNED_INC(to, from, size, label) \
+ __UNSAFE_GET_USER_INC(to, from, size, label, s)
+
+#define UNSAFE_GET_USER_INC(to, from, size, label) \
+ _Generic(to, \
+ u8 : UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ u16 : UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ u32 : UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ u64 : UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ s8 : UNSAFE_GET_USER_SIGNED_INC(to, from, size, label), \
+ s16 : UNSAFE_GET_USER_SIGNED_INC(to, from, size, label), \
+ s32 : UNSAFE_GET_USER_SIGNED_INC(to, from, size, label), \
+ s64 : UNSAFE_GET_USER_SIGNED_INC(to, from, size, label))
+
+static __always_inline int __read_fre(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ unsigned long fre_addr,
+ struct sframe_fre_internal *fre)
+{
+ unsigned char fde_type = SFRAME_V3_FDE_TYPE(fde->info2);
+ unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(fde->info);
+ unsigned char fre_type = SFRAME_V3_FDE_FRE_TYPE(fde->info);
+ unsigned char dataword_count, dataword_size;
+ s32 cfa_off, ra_off, fp_off;
+ unsigned long cur = fre_addr;
+ unsigned char addr_size;
+ u32 ip_off;
+ u8 info;
+
+ addr_size = fre_type_to_size(fre_type);
+ if (!addr_size)
+ return -EFAULT;
+
+ if (fre_addr + addr_size + 1 > sec->fres_end)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(ip_off, cur, addr_size, Efault);
+ if (fde_pctype == SFRAME_FDE_PCTYPE_INC && ip_off > fde->func_size)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(info, cur, 1, Efault);
+ dataword_count = SFRAME_V3_FRE_DATAWORD_COUNT(info);
+ dataword_size = dataword_size_enum_to_size(SFRAME_V3_FRE_DATAWORD_SIZE(info));
+ if (!dataword_count || !dataword_size)
+ return -EFAULT;
+
+ if (cur + (dataword_count * dataword_size) > sec->fres_end)
+ return -EFAULT;
+
+ /* TODO: Support for flexible FDEs not implemented yet. */
+ if (fde_type != SFRAME_FDE_TYPE_DEFAULT)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
+ dataword_count--;
+
+ ra_off = sec->ra_off;
+ if (!ra_off) {
+ if (!dataword_count--)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
+ }
+
+ fp_off = sec->fp_off;
+ if (!fp_off && dataword_count) {
+ dataword_count--;
+ UNSAFE_GET_USER_INC(fp_off, cur, dataword_size, Efault);
+ }
+
+ if (dataword_count)
+ return -EFAULT;
+
+ fre->size = addr_size + 1 + (dataword_count * dataword_size);
+ fre->ip_off = ip_off;
+ fre->cfa_off = cfa_off;
+ fre->ra_off = ra_off;
+ fre->fp_off = fp_off;
+ fre->info = info;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+static __always_inline int __find_fre(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ unsigned long ip,
+ struct unwind_user_frame *frame)
+{
+ unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(fde->info);
+ struct sframe_fre_internal *fre, *prev_fre = NULL;
+ struct sframe_fre_internal fres[2];
+ unsigned long fre_addr;
+ bool which = false;
+ unsigned int i;
+ u32 ip_off;
+
+ ip_off = ip - fde->func_addr;
+
+ if (fde_pctype == SFRAME_FDE_PCTYPE_MASK)
+ ip_off %= fde->rep_size;
+
+ fre_addr = sec->fres_start + fde->fres_off;
+
+ for (i = 0; i < fde->fres_num; i++) {
+ int ret;
+
+ /*
+ * Alternate between the two fre_addr[] entries for 'fre' and
+ * 'prev_fre'.
+ */
+ fre = which ? fres : fres + 1;
+ which = !which;
+
+ ret = __read_fre(sec, fde, fre_addr, fre);
+ if (ret)
+ return ret;
+
+ fre_addr += fre->size;
+
+ if (prev_fre && fre->ip_off <= prev_fre->ip_off)
+ return -EFAULT;
+
+ if (fre->ip_off > ip_off)
+ break;
+
+ prev_fre = fre;
+ }
+
+ if (!prev_fre)
+ return -EINVAL;
+ fre = prev_fre;
+
+ frame->cfa_off = fre->cfa_off;
+ frame->ra_off = fre->ra_off;
+ frame->fp_off = fre->fp_off;
+ frame->use_fp = SFRAME_V3_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
+
+ return 0;
+}
+
+int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
+{
+ struct mm_struct *mm = current->mm;
+ struct sframe_section *sec;
+ struct sframe_fde_internal fde;
+ int ret;
+
+ if (!mm)
+ return -EINVAL;
+
+ guard(srcu)(&sframe_srcu);
+
+ sec = mtree_load(&mm->sframe_mt, ip);
+ if (!sec)
+ return -EINVAL;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
-#define dbg(fmt, ...) \
- pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+ ret = __find_fde(sec, ip, &fde);
+ if (ret)
+ goto end;
+
+ ret = __find_fre(sec, &fde, ip, frame);
+end:
+ user_read_access_end();
+ return ret;
+}
static void free_section(struct sframe_section *sec)
{
@@ -120,8 +449,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
sec->text_end = text_end;
ret = sframe_read_header(sec);
- if (ret)
+ if (ret) {
+ dbg_print_header(sec);
goto err_free;
+ }
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
if (ret) {
@@ -137,6 +468,13 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
return ret;
}
+static void sframe_free_srcu(struct rcu_head *rcu)
+{
+ struct sframe_section *sec = container_of(rcu, struct sframe_section, rcu);
+
+ free_section(sec);
+}
+
static int __sframe_remove_section(struct mm_struct *mm,
struct sframe_section *sec)
{
@@ -145,7 +483,7 @@ static int __sframe_remove_section(struct mm_struct *mm,
return -EINVAL;
}
- free_section(sec);
+ call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
return 0;
}
diff --git a/kernel/unwind/sframe_debug.h b/kernel/unwind/sframe_debug.h
new file mode 100644
index 000000000000..36352124cde8
--- /dev/null
+++ b/kernel/unwind/sframe_debug.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _SFRAME_DEBUG_H
+#define _SFRAME_DEBUG_H
+
+#include <linux/sframe.h>
+#include "sframe.h"
+
+#ifdef CONFIG_DYNAMIC_DEBUG
+
+#define dbg(fmt, ...) \
+ pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+
+static __always_inline void dbg_print_header(struct sframe_section *sec)
+{
+ unsigned long fdes_end;
+
+ fdes_end = sec->fdes_start + (sec->num_fdes * sizeof(struct sframe_fde_v3));
+
+ dbg("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
+ "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
+ "ra_off:%d fp_off:%d\n",
+ sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
+ sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
+ sec->ra_off, sec->fp_off);
+}
+
+#else /* !CONFIG_DYNAMIC_DEBUG */
+
+#define dbg(args...) no_printk(args)
+
+static inline void dbg_print_header(struct sframe_section *sec) {}
+
+#endif /* !CONFIG_DYNAMIC_DEBUG */
+
+#endif /* _SFRAME_DEBUG_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 06/19] unwind_user/sframe: Detect .sframe sections in executables
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (4 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 12:53 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 07/19] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
` (13 subsequent siblings)
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
When loading an ELF executable, automatically detect an .sframe section
and associate it with the mm_struct.
[ Jens Remus: Fix checkpatch warning "braces {} are not necessary for
single statement blocks". ]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
fs/binfmt_elf.c | 48 +++++++++++++++++++++++++++++++++++++---
include/uapi/linux/elf.h | 1 +
2 files changed, 46 insertions(+), 3 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 16a56b6b3f6c..55047659a3cf 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -48,6 +48,7 @@
#include <linux/uaccess.h>
#include <uapi/linux/rseq.h>
#include <linux/rseq.h>
+#include <linux/sframe.h>
#include <asm/param.h>
#include <asm/page.h>
@@ -637,6 +638,21 @@ static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state,
return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp);
}
+static void elf_add_sframe(struct elf_phdr *text, struct elf_phdr *sframe,
+ unsigned long base_addr)
+{
+ unsigned long sframe_start, sframe_end, text_start, text_end;
+
+ sframe_start = base_addr + sframe->p_vaddr;
+ sframe_end = sframe_start + sframe->p_memsz;
+
+ text_start = base_addr + text->p_vaddr;
+ text_end = text_start + text->p_memsz;
+
+ /* Ignore return value, sframe section isn't critical */
+ sframe_add_section(sframe_start, sframe_end, text_start, text_end);
+}
+
/* This is much more generalized than the library routine read function,
so we keep this separate. Technically the library read function
is only provided so that we can read a.out libraries that have
@@ -647,7 +663,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
unsigned long no_base, struct elf_phdr *interp_elf_phdata,
struct arch_elf_state *arch_state)
{
- struct elf_phdr *eppnt;
+ struct elf_phdr *eppnt, *sframe_phdr = NULL;
unsigned long load_addr = 0;
int load_addr_set = 0;
unsigned long error = ~0UL;
@@ -673,7 +689,8 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
eppnt = interp_elf_phdata;
for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
- if (eppnt->p_type == PT_LOAD) {
+ switch (eppnt->p_type) {
+ case PT_LOAD: {
int elf_type = MAP_PRIVATE;
int elf_prot = make_prot(eppnt->p_flags, arch_state,
true, true);
@@ -712,6 +729,19 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
error = -ENOMEM;
goto out;
}
+ break;
+ }
+ case PT_GNU_SFRAME:
+ sframe_phdr = eppnt;
+ break;
+ }
+ }
+
+ if (sframe_phdr) {
+ eppnt = interp_elf_phdata;
+ for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
+ if (eppnt->p_flags & PF_X)
+ elf_add_sframe(eppnt, sframe_phdr, load_addr);
}
}
@@ -836,7 +866,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
int first_pt_load = 1;
unsigned long error;
struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL;
- struct elf_phdr *elf_property_phdata = NULL;
+ struct elf_phdr *elf_property_phdata = NULL, *sframe_phdr = NULL;
unsigned long elf_brk;
bool brk_moved = false;
int retval, i;
@@ -945,6 +975,10 @@ static int load_elf_binary(struct linux_binprm *bprm)
executable_stack = EXSTACK_DISABLE_X;
break;
+ case PT_GNU_SFRAME:
+ sframe_phdr = elf_ppnt;
+ break;
+
case PT_LOPROC ... PT_HIPROC:
retval = arch_elf_pt_proc(elf_ex, elf_ppnt,
bprm->file, false,
@@ -1242,6 +1276,14 @@ static int load_elf_binary(struct linux_binprm *bprm)
elf_brk = k;
}
+ if (sframe_phdr) {
+ for (i = 0, elf_ppnt = elf_phdata;
+ i < elf_ex->e_phnum; i++, elf_ppnt++) {
+ if ((elf_ppnt->p_flags & PF_X))
+ elf_add_sframe(elf_ppnt, sframe_phdr, load_bias);
+ }
+ }
+
e_entry = elf_ex->e_entry + load_bias;
phdr_addr += load_bias;
elf_brk += load_bias;
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index ee30dcd80901..e2a7dbed2e80 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -41,6 +41,7 @@ typedef __u16 Elf64_Versym;
#define PT_GNU_STACK (PT_LOOS + 0x474e551)
#define PT_GNU_RELRO (PT_LOOS + 0x474e552)
#define PT_GNU_PROPERTY (PT_LOOS + 0x474e553)
+#define PT_GNU_SFRAME (PT_LOOS + 0x474e554)
/* ARM MTE memory tag segment type */
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 07/19] unwind_user/sframe: Wire up unwind_user to sframe
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (5 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 06/19] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 18:55 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 08/19] unwind_user: Stop when reaching an outermost frame Jens Remus
` (12 subsequent siblings)
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
Now that the sframe infrastructure is fully in place, make it work by
hooking it up to the unwind_user interface.
[ Jens Remus: Remove unused pt_regs from unwind_user_next_common() and
its callers. Simplify unwind_user_next_sframe(). ]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/Kconfig | 1 +
include/linux/unwind_user_types.h | 4 +++-
kernel/unwind/user.c | 23 +++++++++++++++++++++++
3 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 94b2d5e8e529..37549832bd1f 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -488,6 +488,7 @@ config HAVE_UNWIND_USER_FP
config HAVE_UNWIND_USER_SFRAME
bool
+ select UNWIND_USER
config HAVE_PERF_REGS
bool
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index 412729a269bc..43e4b160883f 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -9,7 +9,8 @@
* available.
*/
enum unwind_user_type_bits {
- UNWIND_USER_TYPE_FP_BIT = 0,
+ UNWIND_USER_TYPE_SFRAME_BIT = 0,
+ UNWIND_USER_TYPE_FP_BIT = 1,
NR_UNWIND_USER_TYPE_BITS,
};
@@ -17,6 +18,7 @@ enum unwind_user_type_bits {
enum unwind_user_type {
/* Type "none" for the start of stack walk iteration. */
UNWIND_USER_TYPE_NONE = 0,
+ UNWIND_USER_TYPE_SFRAME = BIT(UNWIND_USER_TYPE_SFRAME_BIT),
UNWIND_USER_TYPE_FP = BIT(UNWIND_USER_TYPE_FP_BIT),
};
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index 90ab3c1a205e..1fb272419733 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -7,6 +7,7 @@
#include <linux/sched/task_stack.h>
#include <linux/unwind_user.h>
#include <linux/uaccess.h>
+#include <linux/sframe.h>
#define for_each_user_frame(state) \
for (unwind_user_start(state); !(state)->done; unwind_user_next(state))
@@ -82,6 +83,16 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
return unwind_user_next_common(state, &fp_frame);
}
+static int unwind_user_next_sframe(struct unwind_user_state *state)
+{
+ struct unwind_user_frame frame;
+
+ /* sframe expects the frame to be local storage */
+ if (sframe_find(state->ip, &frame))
+ return -ENOENT;
+ return unwind_user_next_common(state, &frame);
+}
+
static int unwind_user_next(struct unwind_user_state *state)
{
unsigned long iter_mask = state->available_types;
@@ -95,6 +106,16 @@ static int unwind_user_next(struct unwind_user_state *state)
state->current_type = type;
switch (type) {
+ case UNWIND_USER_TYPE_SFRAME:
+ switch (unwind_user_next_sframe(state)) {
+ case 0:
+ return 0;
+ case -ENOENT:
+ continue; /* Try next method. */
+ default:
+ state->done = true;
+ }
+ break;
case UNWIND_USER_TYPE_FP:
if (!unwind_user_next_fp(state))
return 0;
@@ -123,6 +144,8 @@ static int unwind_user_start(struct unwind_user_state *state)
return -EINVAL;
}
+ if (current_has_sframe())
+ state->available_types |= UNWIND_USER_TYPE_SFRAME;
if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP))
state->available_types |= UNWIND_USER_TYPE_FP;
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 08/19] unwind_user: Stop when reaching an outermost frame
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (6 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 07/19] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 12:40 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 09/19] unwind_user/sframe: Add support for outermost frame indication Jens Remus
` (11 subsequent siblings)
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich
Add an indication for an outermost frame to the unwind user frame
structure and stop unwinding when reaching an outermost frame.
This will be used by unwind user sframe, as SFrame may represent an
undefined return address as indication for an outermost frame.
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/x86/include/asm/unwind_user.h | 6 ++++--
include/linux/unwind_user_types.h | 1 +
kernel/unwind/user.c | 6 ++++++
3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
index 6e469044e4de..2dfb5ef11e36 100644
--- a/arch/x86/include/asm/unwind_user.h
+++ b/arch/x86/include/asm/unwind_user.h
@@ -23,13 +23,15 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
.cfa_off = 2*(ws), \
.ra_off = -1*(ws), \
.fp_off = -2*(ws), \
- .use_fp = true,
+ .use_fp = true, \
+ .outermost = false,
#define ARCH_INIT_USER_FP_ENTRY_FRAME(ws) \
.cfa_off = 1*(ws), \
.ra_off = -1*(ws), \
.fp_off = 0, \
- .use_fp = false,
+ .use_fp = false, \
+ .outermost = false,
static inline bool unwind_user_at_function_start(struct pt_regs *regs)
{
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index 43e4b160883f..616cc5ee4586 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -32,6 +32,7 @@ struct unwind_user_frame {
s32 ra_off;
s32 fp_off;
bool use_fp;
+ bool outermost;
};
struct unwind_user_state {
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index 1fb272419733..fdb1001e3750 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -32,6 +32,12 @@ static int unwind_user_next_common(struct unwind_user_state *state,
{
unsigned long cfa, fp, ra;
+ /* Stop unwinding when reaching an outermost frame. */
+ if (frame->outermost) {
+ state->done = true;
+ return 0;
+ }
+
/* Get the Canonical Frame Address (CFA) */
if (frame->use_fp) {
if (state->fp < state->sp)
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 09/19] unwind_user/sframe: Add support for outermost frame indication
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (7 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 08/19] unwind_user: Stop when reaching an outermost frame Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 12:17 ` [PATCH v14 10/19] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
` (10 subsequent siblings)
19 siblings, 0 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich
SFrame may represent an undefined return address (RA) as SFrame FRE
without any offsets as indication for an outermost frame.
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
kernel/unwind/sframe.c | 15 ++++++++++++++-
kernel/unwind/sframe.h | 1 +
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index c60aa527984a..9c755965ab83 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -218,7 +218,7 @@ static __always_inline int __read_fre(struct sframe_section *sec,
UNSAFE_GET_USER_INC(info, cur, 1, Efault);
dataword_count = SFRAME_V3_FRE_DATAWORD_COUNT(info);
dataword_size = dataword_size_enum_to_size(SFRAME_V3_FRE_DATAWORD_SIZE(info));
- if (!dataword_count || !dataword_size)
+ if (!dataword_size)
return -EFAULT;
if (cur + (dataword_count * dataword_size) > sec->fres_end)
@@ -228,6 +228,17 @@ static __always_inline int __read_fre(struct sframe_section *sec,
if (fde_type != SFRAME_FDE_TYPE_DEFAULT)
return -EFAULT;
+ if (!dataword_count) {
+ /*
+ * A FRE without data words indicates RA undefined /
+ * outermost frame.
+ */
+ cfa_off = 0;
+ ra_off = 0;
+ fp_off = 0;
+ goto done;
+ }
+
UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
dataword_count--;
@@ -248,6 +259,7 @@ static __always_inline int __read_fre(struct sframe_section *sec,
if (dataword_count)
return -EFAULT;
+done:
fre->size = addr_size + 1 + (dataword_count * dataword_size);
fre->ip_off = ip_off;
fre->cfa_off = cfa_off;
@@ -314,6 +326,7 @@ static __always_inline int __find_fre(struct sframe_section *sec,
frame->ra_off = fre->ra_off;
frame->fp_off = fre->fp_off;
frame->use_fp = SFRAME_V3_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
+ frame->outermost = SFRAME_V3_FRE_RA_UNDEFINED_P(fre->info);
return 0;
}
diff --git a/kernel/unwind/sframe.h b/kernel/unwind/sframe.h
index fc2908e92c7b..ed111fd0d702 100644
--- a/kernel/unwind/sframe.h
+++ b/kernel/unwind/sframe.h
@@ -77,5 +77,6 @@ struct sframe_fda_v3 {
#define SFRAME_V3_FRE_DATAWORD_COUNT(info) (((info) >> 1) & 0xf)
#define SFRAME_V3_FRE_DATAWORD_SIZE(info) (((info) >> 5) & 0x3)
#define SFRAME_V3_AARCH64_FRE_MANGLED_RA_P(info) (((info) >> 7) & 0x1)
+#define SFRAME_V3_FRE_RA_UNDEFINED_P(info) (SFRAME_V3_FRE_DATAWORD_COUNT(info) == 0)
#endif /* _SFRAME_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 10/19] unwind_user/sframe: Remove .sframe section on detected corruption
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (8 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 09/19] unwind_user/sframe: Add support for outermost frame indication Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 20:39 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 11/19] unwind_user/sframe: Show file name in debug output Jens Remus
` (9 subsequent siblings)
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
To avoid continued attempted use of a bad .sframe section, remove it
on demand when the first sign of corruption is detected.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
kernel/unwind/sframe.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 9c755965ab83..8eed6a7d9625 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -358,6 +358,10 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
ret = __find_fre(sec, &fde, ip, frame);
end:
user_read_access_end();
+
+ if (ret == -EFAULT)
+ WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
+
return ret;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 11/19] unwind_user/sframe: Show file name in debug output
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (9 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 10/19] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 18:46 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 12/19] unwind_user/sframe: Add .sframe validation option Jens Remus
` (8 subsequent siblings)
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
When debugging sframe issues, the error messages aren't all that helpful
without knowing what file a corresponding .sframe section belongs to.
Prefix debug output strings with the file name.
[ Jens Remus: Fix checkpatch error "space prohibited before that close
parenthesis ')'". ]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v14:
- Uppercase terms FDE and FRE in debug messages.
include/linux/sframe.h | 4 +++-
kernel/unwind/sframe.c | 23 ++++++++++--------
kernel/unwind/sframe_debug.h | 45 +++++++++++++++++++++++++++++++-----
3 files changed, 56 insertions(+), 16 deletions(-)
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 9a72209696f9..b79c5ec09229 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -10,7 +10,9 @@
struct sframe_section {
struct rcu_head rcu;
-
+#ifdef CONFIG_DYNAMIC_DEBUG
+ const char *filename;
+#endif
unsigned long sframe_start;
unsigned long sframe_end;
unsigned long text_start;
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 8eed6a7d9625..4419626f0173 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -359,14 +359,17 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
end:
user_read_access_end();
- if (ret == -EFAULT)
+ if (ret == -EFAULT) {
+ dbg_sec("removing bad .sframe section\n");
WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
+ }
return ret;
}
static void free_section(struct sframe_section *sec)
{
+ dbg_free(sec);
kfree(sec);
}
@@ -377,7 +380,7 @@ static int sframe_read_header(struct sframe_section *sec)
unsigned int num_fdes;
if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
- dbg("header usercopy failed\n");
+ dbg_sec("header usercopy failed\n");
return -EFAULT;
}
@@ -386,18 +389,18 @@ static int sframe_read_header(struct sframe_section *sec)
!(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
!(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
shdr.auxhdr_len) {
- dbg("bad/unsupported sframe header\n");
+ dbg_sec("bad/unsupported sframe header\n");
return -EINVAL;
}
if (!shdr.num_fdes || !shdr.num_fres) {
- dbg("no fde/fre entries\n");
+ dbg_sec("no FDE/FRE entries\n");
return -EINVAL;
}
header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
if (header_end >= sec->sframe_end) {
- dbg("header doesn't fit in section\n");
+ dbg_sec("header doesn't fit in section\n");
return -EINVAL;
}
@@ -409,7 +412,7 @@ static int sframe_read_header(struct sframe_section *sec)
fres_end = fres_start + shdr.fre_len;
if (fres_start < fdes_end || fres_end > sec->sframe_end) {
- dbg("inconsistent fde/fre offsets\n");
+ dbg_sec("inconsistent FDE/FRE offsets\n");
return -EINVAL;
}
@@ -465,6 +468,8 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
sec->text_start = text_start;
sec->text_end = text_end;
+ dbg_init(sec);
+
ret = sframe_read_header(sec);
if (ret) {
dbg_print_header(sec);
@@ -473,8 +478,8 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
if (ret) {
- dbg("mtree_insert_range failed: text=%lx-%lx\n",
- sec->text_start, sec->text_end);
+ dbg_sec("mtree_insert_range failed: text=%lx-%lx\n",
+ sec->text_start, sec->text_end);
goto err_free;
}
@@ -496,7 +501,7 @@ static int __sframe_remove_section(struct mm_struct *mm,
struct sframe_section *sec)
{
if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
- dbg("mtree_erase failed: text=%lx\n", sec->text_start);
+ dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
return -EINVAL;
}
diff --git a/kernel/unwind/sframe_debug.h b/kernel/unwind/sframe_debug.h
index 36352124cde8..e568be4172b1 100644
--- a/kernel/unwind/sframe_debug.h
+++ b/kernel/unwind/sframe_debug.h
@@ -10,26 +10,59 @@
#define dbg(fmt, ...) \
pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+#define dbg_sec(fmt, ...) \
+ dbg("%s: " fmt, sec->filename, ##__VA_ARGS__)
+
static __always_inline void dbg_print_header(struct sframe_section *sec)
{
unsigned long fdes_end;
fdes_end = sec->fdes_start + (sec->num_fdes * sizeof(struct sframe_fde_v3));
- dbg("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
- "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
- "ra_off:%d fp_off:%d\n",
- sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
- sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
- sec->ra_off, sec->fp_off);
+ dbg_sec("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
+ "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
+ "ra_off:%d fp_off:%d\n",
+ sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
+ sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
+ sec->ra_off, sec->fp_off);
+}
+
+static inline void dbg_init(struct sframe_section *sec)
+{
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+
+ guard(mmap_read_lock)(mm);
+ vma = vma_lookup(mm, sec->sframe_start);
+ if (!vma)
+ sec->filename = kstrdup("(vma gone???)", GFP_KERNEL);
+ else if (vma->vm_file)
+ sec->filename = kstrdup_quotable_file(vma->vm_file, GFP_KERNEL);
+ else if (vma->vm_ops && vma->vm_ops->name)
+ sec->filename = kstrdup(vma->vm_ops->name(vma), GFP_KERNEL);
+ else if (arch_vma_name(vma))
+ sec->filename = kstrdup(arch_vma_name(vma), GFP_KERNEL);
+ else if (!vma->vm_mm)
+ sec->filename = kstrdup("(vdso)", GFP_KERNEL);
+ else
+ sec->filename = kstrdup("(anonymous)", GFP_KERNEL);
+}
+
+static inline void dbg_free(struct sframe_section *sec)
+{
+ kfree(sec->filename);
}
#else /* !CONFIG_DYNAMIC_DEBUG */
#define dbg(args...) no_printk(args)
+#define dbg_sec(args...) no_printk(args)
static inline void dbg_print_header(struct sframe_section *sec) {}
+static inline void dbg_init(struct sframe_section *sec) {}
+static inline void dbg_free(struct sframe_section *sec) {}
+
#endif /* !CONFIG_DYNAMIC_DEBUG */
#endif /* _SFRAME_DEBUG_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 12/19] unwind_user/sframe: Add .sframe validation option
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (10 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 11/19] unwind_user/sframe: Show file name in debug output Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 18:32 ` sashiko-bot
2026-05-08 10:51 ` Jens Remus
2026-05-05 12:17 ` [PATCH v14 13/19] unwind_user: Enable archs that pass RA in a register Jens Remus
` (7 subsequent siblings)
19 siblings, 2 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
Add a debug feature to validate all .sframe sections when first loading
the file rather than on demand.
[ Jens Remus: Add support for SFrame V3. Add support for PC-relative
FDE function start offset. Adjust to rename of struct sframe_fre to
sframe_fre_internal. Use %#x/%#lx format specifiers. ]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v14:
- Add debug message if safe_read_fde() fails.
- Update function names in debug messages.
- Uppercase terms FDE and FRE in debug messages.
Changes in v13:
- Update to SFrame V3:
- Print struct sframe_fde_internal fields fda_off and info2 in debug
message.
- Adjust to rename of struct sframe_fde_internal field func_start_addr
to func_addr.
- Use format strings "%#x" and "%#lx" instead of "0x%x" and "0x%lx".
- Reword commit message (my changes).
arch/Kconfig | 19 ++++++++
kernel/unwind/sframe.c | 101 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 120 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig
index 37549832bd1f..132249d342a3 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -490,6 +490,25 @@ config HAVE_UNWIND_USER_SFRAME
bool
select UNWIND_USER
+config SFRAME_VALIDATION
+ bool "Enable .sframe section debugging"
+ depends on HAVE_UNWIND_USER_SFRAME
+ depends on DYNAMIC_DEBUG
+ help
+ When adding an .sframe section for a task, validate the entire
+ section immediately rather than on demand.
+
+ This is a debug feature which is helpful for rooting out .sframe
+ section issues. If the .sframe section is corrupt, it will fail to
+ load immediately, with more information provided in dynamic printks.
+
+ This has a significant page cache footprint due to its reading of the
+ entire .sframe section for every loaded executable and shared
+ library. Also, it's done for all processes, even those which don't
+ get stack traced by the kernel. Not recommended for general use.
+
+ If unsure, say N.
+
config HAVE_PERF_REGS
bool
help
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 4419626f0173..45988cdc5c37 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -367,6 +367,103 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
return ret;
}
+#ifdef CONFIG_SFRAME_VALIDATION
+
+static int safe_read_fde(struct sframe_section *sec,
+ unsigned int fde_num, struct sframe_fde_internal *fde)
+{
+ int ret;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
+ ret = __read_fde(sec, fde_num, fde);
+ user_read_access_end();
+ return ret;
+}
+
+static int safe_read_fre(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ unsigned long fre_addr,
+ struct sframe_fre_internal *fre)
+{
+ int ret;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
+ ret = __read_fre(sec, fde, fre_addr, fre);
+ user_read_access_end();
+ return ret;
+}
+
+static int sframe_validate_section(struct sframe_section *sec)
+{
+ unsigned long prev_ip = 0;
+ unsigned int i;
+
+ for (i = 0; i < sec->num_fdes; i++) {
+ struct sframe_fre_internal *fre, *prev_fre = NULL;
+ unsigned long ip, fre_addr;
+ struct sframe_fde_internal fde;
+ struct sframe_fre_internal fres[2];
+ bool which = false;
+ unsigned int j;
+ int ret;
+
+ ret = safe_read_fde(sec, i, &fde);
+ if (ret) {
+ dbg_sec("safe_read_fde(%u) failed\n", i);
+ return ret;
+ }
+
+ ip = fde.func_addr;
+ if (ip <= prev_ip) {
+ dbg_sec("FDE %u not sorted\n", i);
+ return -EFAULT;
+ }
+ prev_ip = ip;
+
+ fre_addr = sec->fres_start + fde.fres_off;
+ for (j = 0; j < fde.fres_num; j++) {
+ int ret;
+
+ fre = which ? fres : fres + 1;
+ which = !which;
+
+ ret = safe_read_fre(sec, &fde, fre_addr, fre);
+ if (ret) {
+ dbg_sec("FDE %u: safe_read_fre(%u) failed\n", i, j);
+ dbg_sec("FDE: func_addr:%#lx func_size:%#x fda_off:%#x fres_off:%#x fres_num:%d info:%u info2:%u rep_size:%u\n",
+ fde.func_addr, fde.func_size,
+ fde.fda_off,
+ fde.fres_off, fde.fres_num,
+ fde.info, fde.info2,
+ fde.rep_size);
+ return ret;
+ }
+
+ fre_addr += fre->size;
+
+ if (prev_fre && fre->ip_off <= prev_fre->ip_off) {
+ dbg_sec("FDE %u: FRE %u not sorted\n", i, j);
+ return -EFAULT;
+ }
+
+ prev_fre = fre;
+ }
+ }
+
+ return 0;
+}
+
+#else /* !CONFIG_SFRAME_VALIDATION */
+
+static int sframe_validate_section(struct sframe_section *sec) { return 0; }
+
+#endif /* !CONFIG_SFRAME_VALIDATION */
+
+
static void free_section(struct sframe_section *sec)
{
dbg_free(sec);
@@ -476,6 +573,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
goto err_free;
}
+ ret = sframe_validate_section(sec);
+ if (ret)
+ goto err_free;
+
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
if (ret) {
dbg_sec("mtree_insert_range failed: text=%lx-%lx\n",
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 13/19] unwind_user: Enable archs that pass RA in a register
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (11 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 12/19] unwind_user/sframe: Add .sframe validation option Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 18:35 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 14/19] unwind_user: Flexible FP/RA recovery rules Jens Remus
` (6 subsequent siblings)
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich
Not all architectures have the return address (RA) in user space saved
on the stack on function entry, such as x86-64 does due to its CALL
instruction pushing the RA onto the stack. Architectures/ABIs, such as
s390, also do not necessarily enforce to save the RA in user space on
the stack in the function prologue or even at all, for instance in leaf
functions.
Treat a RA offset from CFA of zero as indication that the RA is not
saved (on the stack). For the topmost frame treat it as indication that
the RA is in the link/RA register, such as on arm64 and s390, and obtain
it from there. For non-topmost frames treat it as error, as the RA must
be saved.
Additionally allow the SP to be unchanged in the topmost frame, for
architectures where SP at function entry == SP at call site, such as
arm64 and s390.
Note that treating a RA offset from CFA of zero as indication that
the RA is not saved on the stack additionally allows for architectures,
such as s390, where the frame pointer (FP) may be saved without the RA
being saved as well. Provided that such architectures represent this
in SFrame by encoding the "missing" RA offset using a padding RA offset
with a value of zero.
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
include/linux/unwind_user.h | 9 +++++++++
kernel/unwind/sframe.c | 6 ++----
kernel/unwind/user.c | 17 +++++++++++++----
3 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h
index 64618618febd..bc2edae39955 100644
--- a/include/linux/unwind_user.h
+++ b/include/linux/unwind_user.h
@@ -23,6 +23,15 @@ static inline bool unwind_user_at_function_start(struct pt_regs *regs)
#define unwind_user_at_function_start unwind_user_at_function_start
#endif
+#ifndef unwind_user_get_ra_reg
+static inline int unwind_user_get_ra_reg(unsigned long *val)
+{
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+#define unwind_user_get_ra_reg unwind_user_get_ra_reg
+#endif
+
int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries);
#endif /* _LINUX_UNWIND_USER_H */
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 45988cdc5c37..d5e0fa04b99a 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -243,10 +243,8 @@ static __always_inline int __read_fre(struct sframe_section *sec,
dataword_count--;
ra_off = sec->ra_off;
- if (!ra_off) {
- if (!dataword_count--)
- return -EFAULT;
-
+ if (!ra_off && dataword_count) {
+ dataword_count--;
UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
}
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index fdb1001e3750..9ceef9b2b8db 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -48,8 +48,12 @@ static int unwind_user_next_common(struct unwind_user_state *state,
}
cfa += frame->cfa_off;
- /* Make sure that stack is not going in wrong direction */
- if (cfa <= state->sp)
+ /*
+ * Make sure that stack is not going in wrong direction. Allow SP
+ * to be unchanged for the topmost frame, by subtracting topmost,
+ * which is either 0 or 1.
+ */
+ if (cfa <= state->sp - state->topmost)
return -EINVAL;
/* Make sure that the address is word aligned */
@@ -57,8 +61,13 @@ static int unwind_user_next_common(struct unwind_user_state *state,
return -EINVAL;
/* Get the Return Address (RA) */
- if (get_user_word(&ra, cfa, frame->ra_off, state->ws))
- return -EINVAL;
+ if (frame->ra_off) {
+ if (get_user_word(&ra, cfa, frame->ra_off, state->ws))
+ return -EINVAL;
+ } else {
+ if (!state->topmost || unwind_user_get_ra_reg(&ra))
+ return -EINVAL;
+ }
/* Get the Frame Pointer (FP) */
if (frame->fp_off && get_user_word(&fp, cfa, frame->fp_off, state->ws))
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 14/19] unwind_user: Flexible FP/RA recovery rules
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (12 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 13/19] unwind_user: Enable archs that pass RA in a register Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 18:34 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 15/19] unwind_user: Flexible CFA " Jens Remus
` (5 subsequent siblings)
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich
To enable support for SFrame V3 flexible FDEs with a subsequent patch,
add support for the following flexible frame pointer (FP) and return
address (RA) recovery rules:
FP/RA = *(CFA + offset)
FP/RA = register + offset
FP/RA = *(register + offset)
Note that FP/RA recovery rules that use arbitrary register contents are
only valid when in the topmost frame, as their contents are otherwise
unknown.
This also enables unwinding of user space for architectures, such as
s390, that may save the frame pointer (FP) and/or return address (RA) in
other registers, for instance when in a leaf function.
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v14:
- Improve comment on why UNWIND_USER_RULE_CFA_OFFSET is not implemented.
(Mark Rutland)
arch/x86/include/asm/unwind_user.h | 21 +++++++++---
include/linux/unwind_user.h | 9 +++++
include/linux/unwind_user_types.h | 23 +++++++++++--
kernel/unwind/sframe.c | 16 +++++++--
kernel/unwind/user.c | 53 ++++++++++++++++++++++++++----
5 files changed, 107 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
index 2dfb5ef11e36..9c3417be4283 100644
--- a/arch/x86/include/asm/unwind_user.h
+++ b/arch/x86/include/asm/unwind_user.h
@@ -21,15 +21,26 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
#define ARCH_INIT_USER_FP_FRAME(ws) \
.cfa_off = 2*(ws), \
- .ra_off = -1*(ws), \
- .fp_off = -2*(ws), \
+ .ra = { \
+ .rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF,\
+ .offset = -1*(ws), \
+ }, \
+ .fp = { \
+ .rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF,\
+ .offset = -2*(ws), \
+ }, \
.use_fp = true, \
.outermost = false,
#define ARCH_INIT_USER_FP_ENTRY_FRAME(ws) \
.cfa_off = 1*(ws), \
- .ra_off = -1*(ws), \
- .fp_off = 0, \
+ .ra = { \
+ .rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF,\
+ .offset = -1*(ws), \
+ }, \
+ .fp = { \
+ .rule = UNWIND_USER_RULE_RETAIN,\
+ }, \
.use_fp = false, \
.outermost = false,
@@ -41,4 +52,6 @@ static inline bool unwind_user_at_function_start(struct pt_regs *regs)
#endif /* CONFIG_HAVE_UNWIND_USER_FP */
+#include <asm-generic/unwind_user.h>
+
#endif /* _ASM_X86_UNWIND_USER_H */
diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h
index bc2edae39955..92cdf38c8ade 100644
--- a/include/linux/unwind_user.h
+++ b/include/linux/unwind_user.h
@@ -32,6 +32,15 @@ static inline int unwind_user_get_ra_reg(unsigned long *val)
#define unwind_user_get_ra_reg unwind_user_get_ra_reg
#endif
+#ifndef unwind_user_get_reg
+static inline int unwind_user_get_reg(unsigned long *val, unsigned int regnum)
+{
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+#define unwind_user_get_reg unwind_user_get_reg
+#endif
+
int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries);
#endif /* _LINUX_UNWIND_USER_H */
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index 616cc5ee4586..0d02714a1b5d 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -27,10 +27,29 @@ struct unwind_stacktrace {
unsigned long *entries;
};
+#define UNWIND_USER_RULE_DEREF BIT(31)
+
+enum unwind_user_rule {
+ UNWIND_USER_RULE_RETAIN, /* entity = entity */
+ UNWIND_USER_RULE_CFA_OFFSET, /* entity = CFA + offset */
+ UNWIND_USER_RULE_REG_OFFSET, /* entity = register + offset */
+ /* DEREF variants */
+ UNWIND_USER_RULE_CFA_OFFSET_DEREF = /* entity = *(CFA + offset) */
+ UNWIND_USER_RULE_CFA_OFFSET | UNWIND_USER_RULE_DEREF,
+ UNWIND_USER_RULE_REG_OFFSET_DEREF = /* entity = *(register + offset) */
+ UNWIND_USER_RULE_REG_OFFSET | UNWIND_USER_RULE_DEREF,
+};
+
+struct unwind_user_rule_data {
+ enum unwind_user_rule rule;
+ s32 offset;
+ unsigned int regnum;
+};
+
struct unwind_user_frame {
s32 cfa_off;
- s32 ra_off;
- s32 fp_off;
+ struct unwind_user_rule_data ra;
+ struct unwind_user_rule_data fp;
bool use_fp;
bool outermost;
};
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index d5e0fa04b99a..cc57804aa3b1 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -271,6 +271,18 @@ static __always_inline int __read_fre(struct sframe_section *sec,
return -EFAULT;
}
+static __always_inline void
+sframe_init_rule_data(struct unwind_user_rule_data *rule_data,
+ s32 offset)
+{
+ if (offset) {
+ rule_data->rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF;
+ rule_data->offset = offset;
+ } else {
+ rule_data->rule = UNWIND_USER_RULE_RETAIN;
+ }
+}
+
static __always_inline int __find_fre(struct sframe_section *sec,
struct sframe_fde_internal *fde,
unsigned long ip,
@@ -321,8 +333,8 @@ static __always_inline int __find_fre(struct sframe_section *sec,
fre = prev_fre;
frame->cfa_off = fre->cfa_off;
- frame->ra_off = fre->ra_off;
- frame->fp_off = fre->fp_off;
+ sframe_init_rule_data(&frame->ra, fre->ra_off);
+ sframe_init_rule_data(&frame->fp, fre->fp_off);
frame->use_fp = SFRAME_V3_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
frame->outermost = SFRAME_V3_FRE_RA_UNDEFINED_P(fre->info);
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index 9ceef9b2b8db..89aecfbe3e84 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -61,22 +61,61 @@ static int unwind_user_next_common(struct unwind_user_state *state,
return -EINVAL;
/* Get the Return Address (RA) */
- if (frame->ra_off) {
- if (get_user_word(&ra, cfa, frame->ra_off, state->ws))
- return -EINVAL;
- } else {
+ switch (frame->ra.rule) {
+ case UNWIND_USER_RULE_RETAIN:
if (!state->topmost || unwind_user_get_ra_reg(&ra))
return -EINVAL;
+ break;
+ /*
+ * UNWIND_USER_RULE_CFA_OFFSET doesn't make sense for RA.
+ * A return address cannot legitimately be a stack address.
+ */
+ case UNWIND_USER_RULE_CFA_OFFSET_DEREF:
+ ra = cfa + frame->ra.offset;
+ break;
+ case UNWIND_USER_RULE_REG_OFFSET:
+ case UNWIND_USER_RULE_REG_OFFSET_DEREF:
+ if (!state->topmost || unwind_user_get_reg(&ra, frame->ra.regnum))
+ return -EINVAL;
+ ra += frame->ra.offset;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return -EINVAL;
}
+ if (frame->ra.rule & UNWIND_USER_RULE_DEREF &&
+ get_user_word(&ra, ra, 0, state->ws))
+ return -EINVAL;
/* Get the Frame Pointer (FP) */
- if (frame->fp_off && get_user_word(&fp, cfa, frame->fp_off, state->ws))
+ switch (frame->fp.rule) {
+ case UNWIND_USER_RULE_RETAIN:
+ fp = state->fp;
+ break;
+ /*
+ * UNWIND_USER_RULE_CFA_OFFSET is currently not used for FP
+ * (e.g. SFrame cannot represent this rule).
+ */
+ case UNWIND_USER_RULE_CFA_OFFSET_DEREF:
+ fp = cfa + frame->fp.offset;
+ break;
+ case UNWIND_USER_RULE_REG_OFFSET:
+ case UNWIND_USER_RULE_REG_OFFSET_DEREF:
+ if (!state->topmost || unwind_user_get_reg(&fp, frame->fp.regnum))
+ return -EINVAL;
+ fp += frame->fp.offset;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+ if (frame->fp.rule & UNWIND_USER_RULE_DEREF &&
+ get_user_word(&fp, fp, 0, state->ws))
return -EINVAL;
state->ip = ra;
state->sp = cfa;
- if (frame->fp_off)
- state->fp = fp;
+ state->fp = fp;
state->topmost = false;
return 0;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 15/19] unwind_user: Flexible CFA recovery rules
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (13 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 14/19] unwind_user: Flexible FP/RA recovery rules Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 12:17 ` [PATCH v14 16/19] unwind_user/sframe: Add support for SFrame V3 flexible FDEs Jens Remus
` (4 subsequent siblings)
19 siblings, 0 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich
To enable support for SFrame V3 flexible FDEs with a subsequent patch,
add support for the following flexible Canonical Frame Address (CFA)
recovery rules:
CFA = SP + offset
CFA = FP + offset
CFA = register + offset
CFA = *(register + offset)
Note that CFA recovery rules that use arbitrary register contents are
only valid when in the topmost frame, as their contents are otherwise
unknown.
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/x86/include/asm/unwind_user.h | 12 ++++++++----
include/linux/unwind_user_types.h | 18 ++++++++++++++++--
kernel/unwind/sframe.c | 15 +++++++++++++--
kernel/unwind/user.c | 22 ++++++++++++++++++----
4 files changed, 55 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
index 9c3417be4283..f38f7c5ff1de 100644
--- a/arch/x86/include/asm/unwind_user.h
+++ b/arch/x86/include/asm/unwind_user.h
@@ -20,7 +20,10 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
#ifdef CONFIG_HAVE_UNWIND_USER_FP
#define ARCH_INIT_USER_FP_FRAME(ws) \
- .cfa_off = 2*(ws), \
+ .cfa = { \
+ .rule = UNWIND_USER_CFA_RULE_FP_OFFSET,\
+ .offset = 2*(ws), \
+ }, \
.ra = { \
.rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF,\
.offset = -1*(ws), \
@@ -29,11 +32,13 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
.rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF,\
.offset = -2*(ws), \
}, \
- .use_fp = true, \
.outermost = false,
#define ARCH_INIT_USER_FP_ENTRY_FRAME(ws) \
- .cfa_off = 1*(ws), \
+ .cfa = { \
+ .rule = UNWIND_USER_CFA_RULE_SP_OFFSET,\
+ .offset = 1*(ws), \
+ }, \
.ra = { \
.rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF,\
.offset = -1*(ws), \
@@ -41,7 +46,6 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
.fp = { \
.rule = UNWIND_USER_RULE_RETAIN,\
}, \
- .use_fp = false, \
.outermost = false,
static inline bool unwind_user_at_function_start(struct pt_regs *regs)
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index 0d02714a1b5d..059e5c76f2f3 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -29,6 +29,21 @@ struct unwind_stacktrace {
#define UNWIND_USER_RULE_DEREF BIT(31)
+enum unwind_user_cfa_rule {
+ UNWIND_USER_CFA_RULE_SP_OFFSET, /* CFA = SP + offset */
+ UNWIND_USER_CFA_RULE_FP_OFFSET, /* CFA = FP + offset */
+ UNWIND_USER_CFA_RULE_REG_OFFSET, /* CFA = reg + offset */
+ /* DEREF variants */
+ UNWIND_USER_CFA_RULE_REG_OFFSET_DEREF = /* CFA = *(reg + offset) */
+ UNWIND_USER_CFA_RULE_REG_OFFSET | UNWIND_USER_RULE_DEREF,
+};
+
+struct unwind_user_cfa_rule_data {
+ enum unwind_user_cfa_rule rule;
+ s32 offset;
+ unsigned int regnum;
+};
+
enum unwind_user_rule {
UNWIND_USER_RULE_RETAIN, /* entity = entity */
UNWIND_USER_RULE_CFA_OFFSET, /* entity = CFA + offset */
@@ -47,10 +62,9 @@ struct unwind_user_rule_data {
};
struct unwind_user_frame {
- s32 cfa_off;
+ struct unwind_user_cfa_rule_data cfa;
struct unwind_user_rule_data ra;
struct unwind_user_rule_data fp;
- bool use_fp;
bool outermost;
};
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index cc57804aa3b1..2721f4af53fd 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -271,6 +271,18 @@ static __always_inline int __read_fre(struct sframe_section *sec,
return -EFAULT;
}
+static __always_inline void
+sframe_init_cfa_rule_data(struct unwind_user_cfa_rule_data *cfa_rule_data,
+ unsigned char fre_info,
+ s32 offset)
+{
+ if (SFRAME_V3_FRE_CFA_BASE_REG_ID(fre_info) == SFRAME_BASE_REG_FP)
+ cfa_rule_data->rule = UNWIND_USER_CFA_RULE_FP_OFFSET;
+ else
+ cfa_rule_data->rule = UNWIND_USER_CFA_RULE_SP_OFFSET;
+ cfa_rule_data->offset = offset;
+}
+
static __always_inline void
sframe_init_rule_data(struct unwind_user_rule_data *rule_data,
s32 offset)
@@ -332,10 +344,9 @@ static __always_inline int __find_fre(struct sframe_section *sec,
return -EINVAL;
fre = prev_fre;
- frame->cfa_off = fre->cfa_off;
+ sframe_init_cfa_rule_data(&frame->cfa, fre->info, fre->cfa_off);
sframe_init_rule_data(&frame->ra, fre->ra_off);
sframe_init_rule_data(&frame->fp, fre->fp_off);
- frame->use_fp = SFRAME_V3_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
frame->outermost = SFRAME_V3_FRE_RA_UNDEFINED_P(fre->info);
return 0;
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index 89aecfbe3e84..3d596da588d0 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -39,14 +39,28 @@ static int unwind_user_next_common(struct unwind_user_state *state,
}
/* Get the Canonical Frame Address (CFA) */
- if (frame->use_fp) {
+ switch (frame->cfa.rule) {
+ case UNWIND_USER_CFA_RULE_SP_OFFSET:
+ cfa = state->sp;
+ break;
+ case UNWIND_USER_CFA_RULE_FP_OFFSET:
if (state->fp < state->sp)
return -EINVAL;
cfa = state->fp;
- } else {
- cfa = state->sp;
+ break;
+ case UNWIND_USER_CFA_RULE_REG_OFFSET:
+ case UNWIND_USER_CFA_RULE_REG_OFFSET_DEREF:
+ if (!state->topmost || unwind_user_get_reg(&cfa, frame->cfa.regnum))
+ return -EINVAL;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return -EINVAL;
}
- cfa += frame->cfa_off;
+ cfa += frame->cfa.offset;
+ if (frame->cfa.rule & UNWIND_USER_RULE_DEREF &&
+ get_user_word(&cfa, cfa, 0, state->ws))
+ return -EINVAL;
/*
* Make sure that stack is not going in wrong direction. Allow SP
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 16/19] unwind_user/sframe: Add support for SFrame V3 flexible FDEs
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (14 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 15/19] unwind_user: Flexible CFA " Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 18:55 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 17/19] unwind_user/sframe: Separate reading of FRE from reading of FRE data words Jens Remus
` (3 subsequent siblings)
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich
SFrame V3 introduces flexible FDEs in addition to the regular FDEs.
The key difference is that flexible FDEs encode the CFA, RA, and FP
tracking information using two FRE data words, a control word and an
offset, or a single padding data word of zero (e.g. to represent FP
without RA tracking information).
The control word contains the following information:
- reg_p: Whether to use the register contents (reg_p=1) specified
by regnum or the CFA (reg_p=0) as base.
- deref_p: Whether to dereference.
- regnum: A DWARF register number.
The offset is added to the base (i.e. CFA or register contents). Then
the resulting address may optionally be dereferenced.
This enables the following flexible CFA and FP/RA recovery rules:
- CFA = register + offset // reg_p=1, deref_p=0
- CFA = *(register + offset) // reg_p=1, deref_p=1
- FP/RA = *(CFA + offset) // reg_p=0, deref_p=0
- FP/RA = register + offset // reg_p=1, deref_p=0
- FP/RA = *(register + offset) // reg_p=1, deref_p=1
Note that for the CFA a rule with reg_p=0 is invalid, as the value of
the CFA cannot be described using itself as base. For FP/RA a rule with
reg_p=0 and deref_p=0 and regnum=0 is invalid, as it that is equal to
the padding data word of zero.
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v14:
- Rename __read_regular_fre_datawords() to
__read_default_fre_datawords() to align to SFrame V3 specification
(default FRE).
- Rename SFRAME_FDE_TYPE_FLEXIBLE to SFRAME_FDE_TYPE_FLEX to match
SFrame V3 specification and adjust to rename of SFRAME_FDE_TYPE_*.
- Rename SFRAME_V3_FLEX_FDE_CTLWORD_*() to
SFRAME_V3_FLEX_FDE_CTRLWORD_*() to match SFrame V3 reference
implementation.
- Add arch/*/include/asm/unwind_user_sframe.h to MAINTAINERS.
MAINTAINERS | 1 +
kernel/unwind/sframe.c | 249 ++++++++++++++++++++++++++++++++---------
kernel/unwind/sframe.h | 5 +
3 files changed, 205 insertions(+), 50 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 24d1fe93ff4a..6812f581d44b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -27863,6 +27863,7 @@ M: Josh Poimboeuf <jpoimboe@kernel.org>
M: Steven Rostedt <rostedt@goodmis.org>
S: Maintained
F: arch/*/include/asm/unwind_user.h
+F: arch/*/include/asm/unwind_user_sframe.h
F: include/asm-generic/unwind_user.h
F: include/linux/sframe.h
F: include/linux/unwind*.h
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 2721f4af53fd..2ba2c8b385f9 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -12,6 +12,7 @@
#include <linux/mm.h>
#include <linux/string_helpers.h>
#include <linux/sframe.h>
+#include <asm/unwind_user_sframe.h>
#include <linux/unwind_user_types.h>
#include "sframe.h"
@@ -31,8 +32,11 @@ struct sframe_fde_internal {
struct sframe_fre_internal {
unsigned int size;
u32 ip_off;
+ u32 cfa_ctl;
s32 cfa_off;
+ u32 ra_ctl;
s32 ra_off;
+ u32 fp_ctl;
s32 fp_off;
u8 info;
};
@@ -189,16 +193,147 @@ static __always_inline int __find_fde(struct sframe_section *sec,
s32 : UNSAFE_GET_USER_SIGNED_INC(to, from, size, label), \
s64 : UNSAFE_GET_USER_SIGNED_INC(to, from, size, label))
+static __always_inline int
+__read_default_fre_datawords(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ unsigned long cur,
+ unsigned char dataword_count,
+ unsigned char dataword_size,
+ struct sframe_fre_internal *fre)
+{
+ s32 cfa_off, ra_off, fp_off;
+ unsigned int cfa_regnum;
+
+ UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
+ dataword_count--;
+
+ ra_off = sec->ra_off;
+ if (!ra_off && dataword_count) {
+ dataword_count--;
+ UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
+ }
+
+ fp_off = sec->fp_off;
+ if (!fp_off && dataword_count) {
+ dataword_count--;
+ UNSAFE_GET_USER_INC(fp_off, cur, dataword_size, Efault);
+ }
+
+ if (dataword_count)
+ return -EFAULT;
+
+ cfa_regnum =
+ (SFRAME_V3_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP) ?
+ SFRAME_REG_FP : SFRAME_REG_SP;
+
+ fre->cfa_ctl = (cfa_regnum << 3) | 1; /* regnum, deref_p=0, reg_p=1 */
+ fre->cfa_off = cfa_off;
+ fre->ra_ctl = ra_off ? 2 : 0; /* regnum=0, deref_p=(ra_off != 0), reg_p=0 */
+ fre->ra_off = ra_off;
+ fre->fp_ctl = fp_off ? 2 : 0; /* regnum=0, deref_p=(fp_off != 0), reg_p=0 */
+ fre->fp_off = fp_off;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+static __always_inline int
+__read_flex_fde_fre_datawords(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ unsigned long cur,
+ unsigned char dataword_count,
+ unsigned char dataword_size,
+ struct sframe_fre_internal *fre)
+{
+ u32 cfa_ctl, ra_ctl, fp_ctl;
+ s32 cfa_off, ra_off, fp_off;
+
+ if (dataword_count < 2)
+ return -EFAULT;
+ UNSAFE_GET_USER_INC(cfa_ctl, cur, dataword_size, Efault);
+ UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
+ dataword_count -= 2;
+
+ ra_off = sec->ra_off;
+ ra_ctl = ra_off ? 2 : 0; /* regnum=0, deref_p=(ra_off != 0), reg_p=0 */
+ if (dataword_count >= 2) {
+ UNSAFE_GET_USER_INC(ra_ctl, cur, dataword_size, Efault);
+ dataword_count--;
+ if (ra_ctl) {
+ UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
+ dataword_count--;
+ } else {
+ /* Padding RA location info */
+ ra_ctl = ra_off ? 2 : 0; /* re-deduce (see above) */
+ }
+ }
+
+ fp_off = sec->fp_off;
+ fp_ctl = fp_off ? 2 : 0; /* regnum=0, deref_p=(fp_off != 0), reg_p=0 */
+ if (dataword_count >= 2) {
+ UNSAFE_GET_USER_INC(fp_ctl, cur, dataword_size, Efault);
+ dataword_count--;
+ if (fp_ctl) {
+ UNSAFE_GET_USER_INC(fp_off, cur, dataword_size, Efault);
+ dataword_count--;
+ } else {
+ /* Padding FP location info */
+ fp_ctl = fp_off ? 2 : 0; /* re-deduce (see above) */
+ }
+ }
+
+ if (dataword_count)
+ return -EFAULT;
+
+ fre->cfa_ctl = cfa_ctl;
+ fre->cfa_off = cfa_off;
+ fre->ra_ctl = ra_ctl;
+ fre->ra_off = ra_off;
+ fre->fp_ctl = fp_ctl;
+ fre->fp_off = fp_off;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+static __always_inline int
+__read_fre_datawords(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ unsigned long cur,
+ unsigned char dataword_count,
+ unsigned char dataword_size,
+ struct sframe_fre_internal *fre)
+{
+ unsigned char fde_type = SFRAME_V3_FDE_TYPE(fde->info2);
+
+ switch (fde_type) {
+ case SFRAME_FDE_TYPE_DEFAULT:
+ return __read_default_fre_datawords(sec, fde, cur,
+ dataword_count,
+ dataword_size,
+ fre);
+ case SFRAME_FDE_TYPE_FLEX:
+ return __read_flex_fde_fre_datawords(sec, fde, cur,
+ dataword_count,
+ dataword_size,
+ fre);
+ default:
+ return -EFAULT;
+ }
+}
+
static __always_inline int __read_fre(struct sframe_section *sec,
struct sframe_fde_internal *fde,
unsigned long fre_addr,
struct sframe_fre_internal *fre)
{
- unsigned char fde_type = SFRAME_V3_FDE_TYPE(fde->info2);
unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(fde->info);
unsigned char fre_type = SFRAME_V3_FDE_FRE_TYPE(fde->info);
unsigned char dataword_count, dataword_size;
- s32 cfa_off, ra_off, fp_off;
unsigned long cur = fre_addr;
unsigned char addr_size;
u32 ip_off;
@@ -224,75 +359,88 @@ static __always_inline int __read_fre(struct sframe_section *sec,
if (cur + (dataword_count * dataword_size) > sec->fres_end)
return -EFAULT;
- /* TODO: Support for flexible FDEs not implemented yet. */
- if (fde_type != SFRAME_FDE_TYPE_DEFAULT)
- return -EFAULT;
+ fre->size = addr_size + 1 + (dataword_count * dataword_size);
+ fre->ip_off = ip_off;
+ fre->info = info;
if (!dataword_count) {
/*
* A FRE without data words indicates RA undefined /
* outermost frame.
*/
- cfa_off = 0;
- ra_off = 0;
- fp_off = 0;
- goto done;
- }
+ fre->cfa_ctl = 0;
+ fre->cfa_off = 0;
+ fre->ra_ctl = 0;
+ fre->ra_off = 0;
+ fre->fp_ctl = 0;
+ fre->fp_off = 0;
- UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
- dataword_count--;
-
- ra_off = sec->ra_off;
- if (!ra_off && dataword_count) {
- dataword_count--;
- UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
- }
-
- fp_off = sec->fp_off;
- if (!fp_off && dataword_count) {
- dataword_count--;
- UNSAFE_GET_USER_INC(fp_off, cur, dataword_size, Efault);
+ return 0;
}
- if (dataword_count)
- return -EFAULT;
-
-done:
- fre->size = addr_size + 1 + (dataword_count * dataword_size);
- fre->ip_off = ip_off;
- fre->cfa_off = cfa_off;
- fre->ra_off = ra_off;
- fre->fp_off = fp_off;
- fre->info = info;
-
- return 0;
+ return __read_fre_datawords(sec, fde, cur, dataword_count, dataword_size, fre);
Efault:
return -EFAULT;
}
-static __always_inline void
+static __always_inline int
sframe_init_cfa_rule_data(struct unwind_user_cfa_rule_data *cfa_rule_data,
- unsigned char fre_info,
- s32 offset)
+ u32 ctlword, s32 offset)
{
- if (SFRAME_V3_FRE_CFA_BASE_REG_ID(fre_info) == SFRAME_BASE_REG_FP)
- cfa_rule_data->rule = UNWIND_USER_CFA_RULE_FP_OFFSET;
- else
- cfa_rule_data->rule = UNWIND_USER_CFA_RULE_SP_OFFSET;
+ bool deref_p = SFRAME_V3_FLEX_FDE_CTRLWORD_DEREF_P(ctlword);
+ bool reg_p = SFRAME_V3_FLEX_FDE_CTRLWORD_REG_P(ctlword);
+
+ if (reg_p) {
+ unsigned int regnum = SFRAME_V3_FLEX_FDE_CTRLWORD_REGNUM(ctlword);
+
+ switch (regnum) {
+ case SFRAME_REG_SP:
+ cfa_rule_data->rule = UNWIND_USER_CFA_RULE_SP_OFFSET;
+ break;
+ case SFRAME_REG_FP:
+ cfa_rule_data->rule = UNWIND_USER_CFA_RULE_FP_OFFSET;
+ break;
+ default:
+ cfa_rule_data->rule = UNWIND_USER_CFA_RULE_REG_OFFSET;
+ cfa_rule_data->regnum = regnum;
+ }
+ } else {
+ return -EINVAL;
+ }
+
+ if (deref_p)
+ cfa_rule_data->rule |= UNWIND_USER_RULE_DEREF;
+
cfa_rule_data->offset = offset;
+
+ return 0;
}
static __always_inline void
sframe_init_rule_data(struct unwind_user_rule_data *rule_data,
- s32 offset)
+ u32 ctlword, s32 offset)
{
- if (offset) {
- rule_data->rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF;
- rule_data->offset = offset;
- } else {
+ bool deref_p = SFRAME_V3_FLEX_FDE_CTRLWORD_DEREF_P(ctlword);
+ bool reg_p = SFRAME_V3_FLEX_FDE_CTRLWORD_REG_P(ctlword);
+
+ if (!ctlword && !offset) {
rule_data->rule = UNWIND_USER_RULE_RETAIN;
+ return;
+ }
+ if (reg_p) {
+ unsigned int regnum = SFRAME_V3_FLEX_FDE_CTRLWORD_REGNUM(ctlword);
+
+ rule_data->rule = UNWIND_USER_RULE_REG_OFFSET;
+ rule_data->regnum = regnum;
+ } else {
+ rule_data->rule = UNWIND_USER_RULE_CFA_OFFSET;
}
+
+ if (deref_p)
+ rule_data->rule |= UNWIND_USER_RULE_DEREF;
+
+ rule_data->offset = offset;
}
static __always_inline int __find_fre(struct sframe_section *sec,
@@ -344,9 +492,10 @@ static __always_inline int __find_fre(struct sframe_section *sec,
return -EINVAL;
fre = prev_fre;
- sframe_init_cfa_rule_data(&frame->cfa, fre->info, fre->cfa_off);
- sframe_init_rule_data(&frame->ra, fre->ra_off);
- sframe_init_rule_data(&frame->fp, fre->fp_off);
+ if (sframe_init_cfa_rule_data(&frame->cfa, fre->cfa_ctl, fre->cfa_off))
+ return -EINVAL;
+ sframe_init_rule_data(&frame->ra, fre->ra_ctl, fre->ra_off);
+ sframe_init_rule_data(&frame->fp, fre->fp_ctl, fre->fp_off);
frame->outermost = SFRAME_V3_FRE_RA_UNDEFINED_P(fre->info);
return 0;
diff --git a/kernel/unwind/sframe.h b/kernel/unwind/sframe.h
index ed111fd0d702..add62ba2c0a6 100644
--- a/kernel/unwind/sframe.h
+++ b/kernel/unwind/sframe.h
@@ -66,6 +66,7 @@ struct sframe_fda_v3 {
#define SFRAME_V3_AARCH64_FDE_PAUTH_KEY(info) (((info) >> 5) & 0x1)
#define SFRAME_FDE_TYPE_DEFAULT 0
+#define SFRAME_FDE_TYPE_FLEX 1
#define SFRAME_V3_FDE_TYPE_MASK 0x1f
#define SFRAME_V3_FDE_TYPE(info2) ((info2) & SFRAME_V3_FDE_TYPE_MASK)
@@ -79,4 +80,8 @@ struct sframe_fda_v3 {
#define SFRAME_V3_AARCH64_FRE_MANGLED_RA_P(info) (((info) >> 7) & 0x1)
#define SFRAME_V3_FRE_RA_UNDEFINED_P(info) (SFRAME_V3_FRE_DATAWORD_COUNT(info) == 0)
+#define SFRAME_V3_FLEX_FDE_CTRLWORD_REGNUM(data) (((data) >> 3) & 0x1f)
+#define SFRAME_V3_FLEX_FDE_CTRLWORD_DEREF_P(data) (((data) >> 1) & 0x1)
+#define SFRAME_V3_FLEX_FDE_CTRLWORD_REG_P(data) ((data) & 0x1)
+
#endif /* _SFRAME_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 17/19] unwind_user/sframe: Separate reading of FRE from reading of FRE data words
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (15 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 16/19] unwind_user/sframe: Add support for SFrame V3 flexible FDEs Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 19:05 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 18/19] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
` (2 subsequent siblings)
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich
__find_fre() performs linear search for a matching SFrame FRE for a
given IP. For that purpose it uses __read_fre(), which reads the whole
FRE. That is the variable-size FRE structure as well as the trailing
variable-length array of variable-size data words. For the search logic
to skip over the FRE it would be sufficient to read the variable-size
FRE structure only, which includes the count and size of data words.
Add fields to struct sframe_fre_internal to store the FRE data word's
address, count, and size. Change __read_fre() to read the variable-
size FRE structure only and populate those new fields. Change
__read_fre_datawords() to use those new fields. Change __find_fre()
to use __read_fre_datawords() to read the FRE data words only after a
matching FRE has been found. Introduce safe_read_fre_datawords() and
use it in sframe_validate_section() to validate that the FRE data words.
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v14:
- Adjust to rename of SFRAME_FDE_TYPE_* and
__read_default_fre_datawords().
- Update function name in debug message.
kernel/unwind/sframe.c | 91 +++++++++++++++++++++++++++---------------
1 file changed, 58 insertions(+), 33 deletions(-)
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 2ba2c8b385f9..98346412078b 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -39,6 +39,9 @@ struct sframe_fre_internal {
u32 fp_ctl;
s32 fp_off;
u8 info;
+ unsigned long dw_addr;
+ unsigned char dw_count;
+ unsigned char dw_size;
};
DEFINE_STATIC_SRCU(sframe_srcu);
@@ -196,11 +199,11 @@ static __always_inline int __find_fde(struct sframe_section *sec,
static __always_inline int
__read_default_fre_datawords(struct sframe_section *sec,
struct sframe_fde_internal *fde,
- unsigned long cur,
- unsigned char dataword_count,
- unsigned char dataword_size,
struct sframe_fre_internal *fre)
{
+ unsigned char dataword_count = fre->dw_count;
+ unsigned char dataword_size = fre->dw_size;
+ unsigned long cur = fre->dw_addr;
s32 cfa_off, ra_off, fp_off;
unsigned int cfa_regnum;
@@ -242,11 +245,11 @@ __read_default_fre_datawords(struct sframe_section *sec,
static __always_inline int
__read_flex_fde_fre_datawords(struct sframe_section *sec,
struct sframe_fde_internal *fde,
- unsigned long cur,
- unsigned char dataword_count,
- unsigned char dataword_size,
struct sframe_fre_internal *fre)
{
+ unsigned char dataword_count = fre->dw_count;
+ unsigned char dataword_size = fre->dw_size;
+ unsigned long cur = fre->dw_addr;
u32 cfa_ctl, ra_ctl, fp_ctl;
s32 cfa_off, ra_off, fp_off;
@@ -303,24 +306,28 @@ __read_flex_fde_fre_datawords(struct sframe_section *sec,
static __always_inline int
__read_fre_datawords(struct sframe_section *sec,
struct sframe_fde_internal *fde,
- unsigned long cur,
- unsigned char dataword_count,
- unsigned char dataword_size,
struct sframe_fre_internal *fre)
{
unsigned char fde_type = SFRAME_V3_FDE_TYPE(fde->info2);
+ unsigned char dataword_count = fre->dw_count;
+
+ if (!dataword_count) {
+ /* A FRE without data words indicates an outermost frame. */
+ fre->cfa_ctl = 0;
+ fre->cfa_off = 0;
+ fre->ra_ctl = 0;
+ fre->ra_off = 0;
+ fre->fp_ctl = 0;
+ fre->fp_off = 0;
+
+ return 0;
+ }
switch (fde_type) {
case SFRAME_FDE_TYPE_DEFAULT:
- return __read_default_fre_datawords(sec, fde, cur,
- dataword_count,
- dataword_size,
- fre);
+ return __read_default_fre_datawords(sec, fde, fre);
case SFRAME_FDE_TYPE_FLEX:
- return __read_flex_fde_fre_datawords(sec, fde, cur,
- dataword_count,
- dataword_size,
- fre);
+ return __read_flex_fde_fre_datawords(sec, fde, fre);
default:
return -EFAULT;
}
@@ -362,23 +369,11 @@ static __always_inline int __read_fre(struct sframe_section *sec,
fre->size = addr_size + 1 + (dataword_count * dataword_size);
fre->ip_off = ip_off;
fre->info = info;
+ fre->dw_addr = cur;
+ fre->dw_count = dataword_count;
+ fre->dw_size = dataword_size;
- if (!dataword_count) {
- /*
- * A FRE without data words indicates RA undefined /
- * outermost frame.
- */
- fre->cfa_ctl = 0;
- fre->cfa_off = 0;
- fre->ra_ctl = 0;
- fre->ra_off = 0;
- fre->fp_ctl = 0;
- fre->fp_off = 0;
-
- return 0;
- }
-
- return __read_fre_datawords(sec, fde, cur, dataword_count, dataword_size, fre);
+ return 0;
Efault:
return -EFAULT;
@@ -455,6 +450,7 @@ static __always_inline int __find_fre(struct sframe_section *sec,
bool which = false;
unsigned int i;
u32 ip_off;
+ int ret;
ip_off = ip - fde->func_addr;
@@ -492,6 +488,10 @@ static __always_inline int __find_fre(struct sframe_section *sec,
return -EINVAL;
fre = prev_fre;
+ ret = __read_fre_datawords(sec, fde, fre);
+ if (ret)
+ return ret;
+
if (sframe_init_cfa_rule_data(&frame->cfa, fre->cfa_ctl, fre->cfa_off))
return -EINVAL;
sframe_init_rule_data(&frame->ra, fre->ra_ctl, fre->ra_off);
@@ -567,6 +567,20 @@ static int safe_read_fre(struct sframe_section *sec,
return ret;
}
+static int safe_read_fre_datawords(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ struct sframe_fre_internal *fre)
+{
+ int ret;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
+ ret = __read_fre_datawords(sec, fde, fre);
+ user_read_access_end();
+ return ret;
+}
+
static int sframe_validate_section(struct sframe_section *sec)
{
unsigned long prev_ip = 0;
@@ -612,6 +626,17 @@ static int sframe_validate_section(struct sframe_section *sec)
fde.rep_size);
return ret;
}
+ ret = safe_read_fre_datawords(sec, &fde, fre);
+ if (ret) {
+ dbg_sec("FDE %u: safe_read_fre_datawords(%u) failed\n", i, j);
+ dbg_sec("FDE: func_addr:%#lx func_size:%#x fda_off:%#x fres_off:%#x fres_num:%d info:%u info2:%u rep_size:%u\n",
+ fde.func_addr, fde.func_size,
+ fde.fda_off,
+ fde.fres_off, fde.fres_num,
+ fde.info, fde.info2,
+ fde.rep_size);
+ return ret;
+ }
fre_addr += fre->size;
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 18/19] unwind_user/sframe/x86: Enable sframe unwinding on x86
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (16 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 17/19] unwind_user/sframe: Separate reading of FRE from reading of FRE data words Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 19:07 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 19/19] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
2026-05-05 12:25 ` [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
The x86 SFrame V3 implementation works fairly well, starting with
binutils 2.46. Enable it.
[ Jens Remus: Reword commit message for SFrame V3, starting with
binutils 2.46. ]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v14:
- Drop superfluous empty line in unwind_user_get_reg().
Changes in v13:
- Naive implementation of unwind_user_get_reg() to support SFrame V3
flexible FDEs (e.g. used to represent DRAP pattern).
- Define SFRAME_REG_SP and SFRAME_REG_FP to the respective x86-64
DWARF register numbers.
- Reword commit message for SFrame V3 and (upcoming) binutils 2.46.
arch/x86/Kconfig | 1 +
arch/x86/include/asm/unwind_user.h | 33 +++++++++++++++++++++++
arch/x86/include/asm/unwind_user_sframe.h | 12 +++++++++
3 files changed, 46 insertions(+)
create mode 100644 arch/x86/include/asm/unwind_user_sframe.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f3f7cb01d69d..51286dfdb5f4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -302,6 +302,7 @@ config X86
select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_UNWIND_USER_FP if X86_64
+ select HAVE_UNWIND_USER_SFRAME if X86_64
select HAVE_USER_RETURN_NOTIFIER
select HAVE_GENERIC_VDSO
select VDSO_GETRANDOM if X86_64
diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
index f38f7c5ff1de..b80f0ec0f7a7 100644
--- a/arch/x86/include/asm/unwind_user.h
+++ b/arch/x86/include/asm/unwind_user.h
@@ -15,6 +15,39 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
return user_64bit_mode(regs) ? 8 : 4;
}
+static inline int unwind_user_get_reg(unsigned long *val, unsigned int regnum)
+{
+#ifdef CONFIG_X86_64
+ const struct pt_regs *regs = task_pt_regs(current);
+
+ switch (regnum) {
+ /* DWARF register numbers 0..15 */
+ case 0: *val = regs->ax; break;
+ case 1: *val = regs->dx; break;
+ case 2: *val = regs->cx; break;
+ case 3: *val = regs->bx; break;
+ case 4: *val = regs->si; break;
+ case 5: *val = regs->di; break;
+ case 6: *val = regs->bp; break;
+ case 7: *val = regs->sp; break;
+ case 8: *val = regs->r8; break;
+ case 9: *val = regs->r9; break;
+ case 10: *val = regs->r10; break;
+ case 11: *val = regs->r11; break;
+ case 12: *val = regs->r12; break;
+ case 13: *val = regs->r13; break;
+ case 14: *val = regs->r14; break;
+ case 15: *val = regs->r15; break;
+ default:
+ return -EINVAL;
+ }
+ return 0;
+#else /* !CONFIG_X86_64 */
+ return -EINVAL;
+#endif /* !CONFIG_X86_64 */
+}
+#define unwind_user_get_reg unwind_user_get_reg
+
#endif /* CONFIG_UNWIND_USER */
#ifdef CONFIG_HAVE_UNWIND_USER_FP
diff --git a/arch/x86/include/asm/unwind_user_sframe.h b/arch/x86/include/asm/unwind_user_sframe.h
new file mode 100644
index 000000000000..d828ae1a4aac
--- /dev/null
+++ b/arch/x86/include/asm/unwind_user_sframe.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_UNWIND_USER_SFRAME_H
+#define _ASM_X86_UNWIND_USER_SFRAME_H
+
+#ifdef CONFIG_X86_64
+
+#define SFRAME_REG_SP 7
+#define SFRAME_REG_FP 6
+
+#endif
+
+#endif /* _ASM_X86_UNWIND_USER_SFRAME_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v14 19/19] unwind_user/sframe: Add prctl() interface for registering .sframe sections
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (17 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 18/19] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
@ 2026-05-05 12:17 ` Jens Remus
2026-05-05 18:45 ` sashiko-bot
2026-05-05 12:25 ` [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
19 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:17 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: Jens Remus, bpf, linux-mm, Namhyung Kim, Andrii Nakryiko,
Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
The kernel doesn't have direct visibility to the ELF contents of shared
libraries. Add some prctl() interfaces which allow glibc to tell the
kernel where to find .sframe sections.
[
This adds an interface for prctl() for testing loading of sframes for
libraries. But this interface should really be a system call. This patch
is for testing purposes only and should not be applied to mainline.
]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v14:
- Bump PR_ADD_SFRAME and PR_REMOVE_SFRAME.
include/uapi/linux/prctl.h | 6 +++++-
kernel/sys.c | 8 ++++++++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index b6ec6f693719..0aa0ec971843 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -368,7 +368,7 @@ struct prctl_mm_map {
* configuration. All bits may be locked via this call, including
* undefined bits.
*/
-#define PR_LOCK_SHADOW_STACK_STATUS 76
+#define PR_LOCK_SHADOW_STACK_STATUS 76
/*
* Controls the mode of timer_create() for CRIU restore operations.
@@ -416,4 +416,8 @@ struct prctl_mm_map {
# define PR_CFI_DISABLE _BITUL(1)
# define PR_CFI_LOCK _BITUL(2)
+/* SFRAME management */
+#define PR_ADD_SFRAME 82
+#define PR_REMOVE_SFRAME 83
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 62e842055cc9..6e81e82bc991 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -65,6 +65,7 @@
#include <linux/rcupdate.h>
#include <linux/uidgid.h>
#include <linux/cred.h>
+#include <linux/sframe.h>
#include <linux/nospec.h>
@@ -2906,6 +2907,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
break;
if (arg3 & PR_CFI_LOCK && !(arg3 & PR_CFI_DISABLE))
error = arch_prctl_lock_branch_landing_pad_state(me);
+ case PR_ADD_SFRAME:
+ error = sframe_add_section(arg2, arg3, arg4, arg5);
+ break;
+ case PR_REMOVE_SFRAME:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = sframe_remove_section(arg2);
break;
default:
trace_task_prctl_unknown(option, arg2, arg3, arg4, arg5);
--
2.51.0
^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH v14 00/19] unwind_deferred: Implement sframe handling
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
` (18 preceding siblings ...)
2026-05-05 12:17 ` [PATCH v14 19/19] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
@ 2026-05-05 12:25 ` Jens Remus
19 siblings, 0 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-05 12:25 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, x86, Steven Rostedt,
Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Dylan Hatch,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
Cc: bpf, linux-mm, Namhyung Kim, Andrii Nakryiko, Jose E. Marchesi,
Beau Belgrave, Florian Weimer, Carlos O'Donell,
Masami Hiramatsu, Jiri Olsa, Arnaldo Carvalho de Melo,
Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich
On 5/5/2026 2:16 PM, Jens Remus wrote:
> This is the implementation of parsing the SFrame V3 stack trace information
> from an .sframe section in an ELF file. It's a continuation of Josh's and
> Steve's work that can be found here:
>
> https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
> https://lore.kernel.org/all/20250827201548.448472904@kernel.org/
I forgot to mention that Steven is working on a system call for dynamic
linkers to register .sframe for .text in dynamic libraries (DSOs), that
is supposed to replace the last test-patch of this series. Feedback
on that is much appreciated as well:
[RFC][PATCH] unwind: Add stacktrace_setup system call
https://lore.kernel.org/all/20260429114355.6c712e6a@gandalf.local.home/
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 08/19] unwind_user: Stop when reaching an outermost frame
2026-05-05 12:17 ` [PATCH v14 08/19] unwind_user: Stop when reaching an outermost frame Jens Remus
@ 2026-05-05 12:40 ` sashiko-bot
2026-05-06 15:01 ` Jens Remus
0 siblings, 1 reply; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 12:40 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The newly added `outermost` field in `struct unwind_user_frame` is read uninitialized during SFrame unwinding, which can randomly truncate stack traces.
--
commit 84ee374fd6a62072e8d61545c4ceae42822e4e37
Author: Jens Remus <jremus@linux.ibm.com>
unwind_user: Stop when reaching an outermost frame
This commit adds an outermost frame indicator to the unwind user frame
structure, and modifies the unwinder to stop when it is reached.
> diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
> index 1fb2724197332..fdb1001e37505 100644
> --- a/kernel/unwind/user.c
> +++ b/kernel/unwind/user.c
> @@ -32,6 +32,12 @@ static int unwind_user_next_common(struct unwind_user_state *state,
> {
> unsigned long cfa, fp, ra;
>
> + /* Stop unwinding when reaching an outermost frame. */
> + if (frame->outermost) {
Could this evaluate uninitialized stack memory during SFrame unwinding?
In kernel/unwind/user.c, unwind_user_next_sframe() declares the frame
variable on the stack without zero-initializing it:
unwind_user_next_sframe() {
struct unwind_user_frame frame;
...
err = sframe_find(..., &frame);
...
return unwind_user_next_common(state, &frame);
}
Since sframe_find() does not populate the newly added outermost field at this
point in the patch series, frame->outermost contains uninitialized memory
when checked here.
If this evaluates to true, does it erroneously set state->done and silently
truncate stack traces?
I note that this is fixed later in the series by the commit
"unwind_user/sframe: Add support for outermost frame indication", but
introducing the uninitialized read in this commit could break kernel
bisectability.
> + state->done = true;
> + return 0;
> + }
> +
> /* Get the Canonical Frame Address (CFA) */
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=8
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 02/19] unwind_user/sframe: Add support for reading .sframe headers
2026-05-05 12:17 ` [PATCH v14 02/19] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
@ 2026-05-05 12:49 ` sashiko-bot
2026-05-06 13:42 ` Jens Remus
0 siblings, 1 reply; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 12:49 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Medium] Missing validation of shdr.abi_arch allows parsing SFrame data from mismatched architectures.
- [Medium] VMA addresses are incorrectly evaluated as boolean flags to check for zero-length inputs.
- [Low] Mixing goto error handling with scoped_guard scope-based cleanup.
--
commit afe7c9eb6f0e6fc3f73d0b6a555f8d3926cf6ccf
Author: Josh Poimboeuf <jpoimboe@kernel.org>
unwind_user/sframe: Add support for reading .sframe headers
This commit introduces basic infrastructure for unwinding user space stacks
with SFrame. It adds support for parsing the .sframe section header.
Link: https://lore.kernel.org/all/f27e8463783febfa0dabb0432a3dd6be8ad98412.1737511963.git.jpoimboe@kernel.org/
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> new file mode 100644
> index 0000000000000..c128c1eeaca20
> --- /dev/null
> +++ b/kernel/unwind/sframe.c
[ ... ]
> +static int sframe_read_header(struct sframe_section *sec)
> +{
> + unsigned long header_end, fdes_start, fdes_end, fres_start, fres_end;
> + struct sframe_header shdr;
> + unsigned int num_fdes;
> +
> + if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
> + dbg("header usercopy failed\n");
> + return -EFAULT;
> + }
> +
> + if (shdr.preamble.magic != SFRAME_MAGIC ||
> + shdr.preamble.version != SFRAME_VERSION_3 ||
> + !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
> + !(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
> + shdr.auxhdr_len) {
> + dbg("bad/unsupported sframe header\n");
> + return -EINVAL;
> + }
Should we also validate shdr.abi_arch to ensure it matches the expected
architecture?
If malformed or foreign-architecture SFrame data is passed in (for instance,
with an unexpected endianness or register mapping), the kernel might interpret
it incorrectly and generate invalid stack traces.
Furthermore, accepting arbitrary values here could risk ABI calcification if
userspace starts supplying invalid values that we later want to reject.
[ ... ]
> +int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> + unsigned long text_start, unsigned long text_end)
> +{
> + struct vm_area_struct *sframe_vma, *text_vma;
> + struct mm_struct *mm = current->mm;
> + struct sframe_section *sec;
> + int ret;
> +
> + if (!sframe_start || !sframe_end || !text_start || !text_end) {
Is it safe to use VMA start addresses as boolean flags?
Since vm_start can legitimately be zero in certain configurations (if
mmap_min_addr permits), evaluating !sframe_start or !text_start could
incorrectly reject perfectly valid mappings at address 0.
Also, to check for zero-length sections, would it be better to explicitly use
comparisons like sframe_start >= sframe_end instead?
> + dbg("zero-length sframe/text address\n");
> + return -EINVAL;
> + }
> +
> + scoped_guard(mmap_read_lock, mm) {
[ ... ]
> + sec = kzalloc(sizeof(*sec), GFP_KERNEL);
> + if (!sec)
> + return -ENOMEM;
> +
> + sec->sframe_start = sframe_start;
> + sec->sframe_end = sframe_end;
> + sec->text_start = text_start;
> + sec->text_end = text_end;
> +
> + ret = sframe_read_header(sec);
> + if (ret)
> + goto err_free;
Is it acceptable to mix goto error handling with scoped_guard() cleanup
helpers in the same function?
The kernel's cleanup subsystem guidelines generally suggest that usage of
goto and cleanup helpers shouldn't be mixed in the same routine.
Could sec be allocated using __free(kfree) to avoid the goto entirely?
> +
> + /* TODO nowhere to store it yet - just free it and return an error */
> + ret = -ENOSYS;
> +
> +err_free:
> + free_section(sec);
> + return ret;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=2
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 06/19] unwind_user/sframe: Detect .sframe sections in executables
2026-05-05 12:17 ` [PATCH v14 06/19] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
@ 2026-05-05 12:53 ` sashiko-bot
2026-05-06 14:56 ` Jens Remus
0 siblings, 1 reply; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 12:53 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Missing `PT_LOAD` segment type check when registering executable sections for sframe unwinding.
- [High] Off-by-one error in maple tree insertion bounds causes adjacent executable segments to fail registration.
--
commit 610ebc2ab2a7fd5ae4cd4a346076c9d5c2acc195
Author: Josh Poimboeuf <jpoimboe@kernel.org>
unwind_user/sframe: Detect .sframe sections in executables
When loading an ELF executable, automatically detect an .sframe section
and associate it with the mm_struct.
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 16a56b6b3f6ca..55047659a3cf4 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
[ ... ]
> @@ -637,6 +638,21 @@ static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state,
> return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp);
> }
>
> +static void elf_add_sframe(struct elf_phdr *text, struct elf_phdr *sframe,
> + unsigned long base_addr)
> +{
> + unsigned long sframe_start, sframe_end, text_start, text_end;
> +
> + sframe_start = base_addr + sframe->p_vaddr;
> + sframe_end = sframe_start + sframe->p_memsz;
> +
> + text_start = base_addr + text->p_vaddr;
> + text_end = text_start + text->p_memsz;
Should text_end be calculated as text_start + text->p_memsz - 1?
Since the maple tree API strictly expects inclusive ranges [first, last] for
insertions, passing the exclusive text_end would cause the inserted range
to cover one byte beyond the actual segment.
If an ELF binary has adjacent executable PT_LOAD segments, could the first
segment's inclusive end overlap with the next segment's start and fail
subsequent insertions?
> +
> + /* Ignore return value, sframe section isn't critical */
> + sframe_add_section(sframe_start, sframe_end, text_start, text_end);
> +}
> +
[ ... ]
> @@ -712,6 +729,19 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
> error = -ENOMEM;
> goto out;
> }
> + break;
> + }
> + case PT_GNU_SFRAME:
> + sframe_phdr = eppnt;
> + break;
> + }
> + }
> +
> + if (sframe_phdr) {
> + eppnt = interp_elf_phdata;
> + for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
> + if (eppnt->p_flags & PF_X)
> + elf_add_sframe(eppnt, sframe_phdr, load_addr);
Does this need to verify the segment is loadable (eppnt->p_type == PT_LOAD)?
If non-loadable segments like PT_GNU_STACK have the PF_X flag set, they
typically have p_vaddr and p_memsz as 0. Processing these segments would
compute bogus text boundaries and bypass VMA validation. If a legitimate
PT_LOAD segment overlaps with these bogus ranges, could its insertion fail
silently and break stack unwinding for the executable?
> }
> }
>
[ ... ]
> @@ -1242,6 +1276,14 @@ static int load_elf_binary(struct linux_binprm *bprm)
> elf_brk = k;
> }
>
> + if (sframe_phdr) {
> + for (i = 0, elf_ppnt = elf_phdata;
> + i < elf_ex->e_phnum; i++, elf_ppnt++) {
> + if ((elf_ppnt->p_flags & PF_X))
> + elf_add_sframe(elf_ppnt, sframe_phdr, load_bias);
Similarly, should this also check if the segment is a PT_LOAD segment before
adding it to the sframe sections?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=6
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation
2026-05-05 12:17 ` [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
@ 2026-05-05 18:22 ` sashiko-bot
2026-05-06 14:13 ` Jens Remus
2026-05-06 14:09 ` Jens Remus
1 sibling, 1 reply; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 18:22 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Variable shadowing hazard in unsafe_copy_to_user and unsafe_copy_from_user macros.
- [Low] Missing const qualifier on the source pointer in unsafe_copy_from_user.
--
Note: The format of this report is altered due to recitation restrictions.
The original patch code is not quoted directly, and a free-form summary
of the review findings is provided instead.
commit b652061f302d029392e67f46699e5a2db0abfaca
Author: Josh Poimboeuf <jpoimboe@kernel.org>
x86/uaccess: Add unsafe_copy_from_user() implementation
This commit introduces the unsafe_copy_from_user() macro for x86 architectures.
It also refactors the existing unsafe_copy_to_user() macro to use updated
internal variable names.
Regarding the variable naming in the updated unsafe_copy_to_user() and the
new unsafe_copy_from_user() macros, is there a risk of variable shadowing?
The macros declare local variables with very common names, specifically __dst,
__src, and __len. If a caller passes an argument that happens to share one of
these names, the local variable declaration will shadow the caller's argument.
For instance, if a caller passes __len, the initialization size_t __len =
(_len); will evaluate the right-hand side as the newly declared uninitialized
local variable itself. Could this lead to the loop copying a random length of
memory based on stack garbage instead of the intended length? The previous
implementation seemed to avoid this by using uniquely prefixed names like
__ucu_len.
Additionally, looking at the unsafe_copy_from_user() macro, the user-space
source pointer is initialized as a void pointer without a const qualifier.
Since memory read operations typically expect the source buffer to be
read-only, callers might pass a const void pointer. Does this implicit
dropping of the const qualifier trigger a discarded-qualifiers compiler
warning? Could the source pointer be declared as a const void pointer
to prevent potential build failures when warnings are treated as errors?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=4
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 12/19] unwind_user/sframe: Add .sframe validation option
2026-05-05 12:17 ` [PATCH v14 12/19] unwind_user/sframe: Add .sframe validation option Jens Remus
@ 2026-05-05 18:32 ` sashiko-bot
2026-05-08 10:51 ` Jens Remus
1 sibling, 0 replies; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 18:32 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Denial of Service (Soft Lockup) due to unyielding loops in .sframe section validation.
- [High] Integer overflows on 32-bit architectures bypass bounds checks, allowing arbitrary kernel/MMIO reads.
- [Low] Incorrect format specifier '%d' used for unsigned 32-bit integer.
--
commit 1f173c4dc22a264948bc63183c96747aa7022119
Author: Josh Poimboeuf <jpoimboe@kernel.org>
unwind_user/sframe: Add .sframe validation option
This commit adds a debug feature to validate all .sframe sections when
first loading the file rather than on demand.
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> index 4419626f01734..45988cdc5c37a 100644
> --- a/kernel/unwind/sframe.c
> +++ b/kernel/unwind/sframe.c
[ ... ]
> +static int sframe_validate_section(struct sframe_section *sec)
> +{
> + unsigned long prev_ip = 0;
> + unsigned int i;
> +
> + for (i = 0; i < sec->num_fdes; i++) {
Can a maliciously crafted .sframe mapping with extremely large values for
sec->num_fdes and fde.fres_num cause a soft lockup?
The outer and inner loops iterate over values parsed directly from the
user-provided .sframe section header, which could be up to 4 billion.
Since there is no cond_resched() called within these loops, processing a
large section might monopolize the CPU for an extended period and trigger
the soft lockup watchdog.
> + struct sframe_fre_internal *fre, *prev_fre = NULL;
> + unsigned long ip, fre_addr;
> + struct sframe_fde_internal fde;
> + struct sframe_fre_internal fres[2];
> + bool which = false;
> + unsigned int j;
> + int ret;
> +
> + ret = safe_read_fde(sec, i, &fde);
[ ... ]
> + fre_addr = sec->fres_start + fde.fres_off;
> + for (j = 0; j < fde.fres_num; j++) {
> + int ret;
> +
> + fre = which ? fres : fres + 1;
> + which = !which;
> +
> + ret = safe_read_fre(sec, &fde, fre_addr, fre);
> + if (ret) {
> + dbg_sec("FDE %u: safe_read_fre(%u) failed\n", i, j);
> + dbg_sec("FDE: func_addr:%#lx func_size:%#x fda_off:%#x fres_off:%#x fres_num:%d info:%u info2:%u rep_size:%u\n",
This isn't a bug, but does fres_num use the wrong format specifier?
The fres_num field is defined as a u32, so %u should probably be used
instead of %d. If a user provides a fres_num exceeding 0x7FFFFFFF, it will
incorrectly print as a negative number in the debug output.
> + fde.func_addr, fde.func_size,
> + fde.fda_off,
> + fde.fres_off, fde.fres_num,
> + fde.info, fde.info2,
> + fde.rep_size);
> + return ret;
> + }
> +
> + fre_addr += fre->size;
Could this addition overflow on 32-bit architectures?
Calculations like fre_addr += fre->size (and similarly
fda_addr = sec->fres_start + _fde.fres_off in __read_fde() called by
safe_read_fde()) use 32-bit arithmetic on 32-bit architectures.
A maliciously crafted .sframe section could cause these calculations to wrap
around to a small value, which perfectly bypasses the bounds checks.
Because user_read_access_begin() validates the whole section range upfront
and UNSAFE_GET_USER_INC omits per-access access_ok() checks, this wrap
around could allow arbitrary kernel or MMIO reads.
> +
> + if (prev_fre && fre->ip_off <= prev_fre->ip_off) {
> + dbg_sec("FDE %u: FRE %u not sorted\n", i, j);
> + return -EFAULT;
> + }
> +
> + prev_fre = fre;
> + }
> + }
> +
> + return 0;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=12
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 14/19] unwind_user: Flexible FP/RA recovery rules
2026-05-05 12:17 ` [PATCH v14 14/19] unwind_user: Flexible FP/RA recovery rules Jens Remus
@ 2026-05-05 18:34 ` sashiko-bot
0 siblings, 0 replies; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 18:34 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Unprivileged users can trigger a kernel warning (and panic if panic_on_warn is set) by providing crafted SFrame data that forces the use of unimplemented register recovery rules.
--
commit 32f67c3538981cd61e598218f28891cb4051e567
Author: Jens Remus <jremus@linux.ibm.com>
unwind_user: Flexible FP/RA recovery rules
This commit adds support for flexible frame pointer and return address
recovery rules to the user unwinder. It introduces rules to retain values
or read them from registers.
> diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h
> index bc2edae399556..92cdf38c8aded 100644
> --- a/include/linux/unwind_user.h
> +++ b/include/linux/unwind_user.h
> @@ -32,6 +32,15 @@ static inline int unwind_user_get_ra_reg(unsigned long *val)
> #define unwind_user_get_ra_reg unwind_user_get_ra_reg
> #endif
>
> +#ifndef unwind_user_get_reg
> +static inline int unwind_user_get_reg(unsigned long *val, unsigned int regnum)
> +{
> + WARN_ON_ONCE(1);
> + return -EINVAL;
> +}
> +#define unwind_user_get_reg unwind_user_get_reg
> +#endif
Could an unprivileged user trigger these WARN_ON_ONCE macros by providing
crafted SFrame data during profiling?
If an architecture does not implement unwind_user_get_reg or
unwind_user_get_ra_reg, falling back to these stubs will execute a kernel
warning. On systems with panic_on_warn enabled, this could lead to a
kernel panic.
[ ... ]
> diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
> index 9ceef9b2b8db0..89aecfbe3e849 100644
> --- a/kernel/unwind/user.c
> +++ b/kernel/unwind/user.c
> @@ -61,22 +61,61 @@ static int unwind_user_next_common(struct unwind_user_state *state,
[ ... ]
> + switch (frame->ra.rule) {
> + case UNWIND_USER_RULE_RETAIN:
> if (!state->topmost || unwind_user_get_ra_reg(&ra))
> return -EINVAL;
> + break;
[ ... ]
> + default:
> + WARN_ON_ONCE(1);
> + return -EINVAL;
> }
A similar issue seems to exist in the default case here, as well as the
default case for frame->fp.rule below it.
Since sframe_init_rule_data() sets the rule based on untrusted user memory
(for example, setting UNWIND_USER_RULE_RETAIN when ra_off is 0), can a
crafted binary intentionally hit these paths?
Would it be safer to silently fail and return -EINVAL when an unsupported
or unimplemented rule is encountered, rather than warning?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=14
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 13/19] unwind_user: Enable archs that pass RA in a register
2026-05-05 12:17 ` [PATCH v14 13/19] unwind_user: Enable archs that pass RA in a register Jens Remus
@ 2026-05-05 18:35 ` sashiko-bot
0 siblings, 0 replies; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 18:35 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Unprivileged user space can trigger a kernel warning (`WARN_ON_ONCE`) and potential panic by providing a maliciously crafted `.sframe` section.
--
commit 20f03d41baa46baef137fb9a88081c8f92b5eb5f
Author: Jens Remus <jremus@linux.ibm.com>
unwind_user: Enable archs that pass RA in a register
This commit updates the user space stack unwinder to support architectures
that pass the return address in a register instead of saving it on the
stack. It also handles architectures where the stack pointer may be
unchanged in the topmost frame.
> diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h
> index 64618618febde..bc2edae399556 100644
> --- a/include/linux/unwind_user.h
> +++ b/include/linux/unwind_user.h
> @@ -23,6 +23,15 @@ static inline bool unwind_user_at_function_start(struct pt_regs *regs)
> #define unwind_user_at_function_start unwind_user_at_function_start
> #endif
>
> +#ifndef unwind_user_get_ra_reg
> +static inline int unwind_user_get_ra_reg(unsigned long *val)
> +{
> + WARN_ON_ONCE(1);
Can a maliciously crafted .sframe section trigger this warning and panic
the kernel if panic_on_warn is enabled?
The SFrame unwind data is parsed from user-provided ELF binaries, so the
.sframe sections are entirely user-controlled.
If an unprivileged user on an architecture using the generic fallback
provides an SFrame Frame Row Entry with dataword_count > 0 but an
explicit ra_off dataword of 0, it appears this would bypass the
frame->outermost check while keeping frame->ra_off == 0.
When the kernel unwinds the stack, unwind_user_next_common() would see
frame->ra_off == 0 and call unwind_user_get_ra_reg():
kernel/unwind/user.c:unwind_user_next_common() {
...
if (frame->ra_off) {
if (get_user_word(&ra, cfa, frame->ra_off, state->ws))
return -EINVAL;
} else {
if (!state->topmost || unwind_user_get_ra_reg(&ra))
return -EINVAL;
}
...
}
Would it be better to gracefully return an error code like -EINVAL here
without raising a warning?
> + return -EINVAL;
> +}
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=13
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 19/19] unwind_user/sframe: Add prctl() interface for registering .sframe sections
2026-05-05 12:17 ` [PATCH v14 19/19] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
@ 2026-05-05 18:45 ` sashiko-bot
2026-05-07 14:14 ` Jens Remus
0 siblings, 1 reply; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 18:45 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Missing `break;` statement causes `PR_SET_CFI` to fall through to `PR_ADD_SFRAME`.
- [Medium] Patch introduces a test-only interface into the permanent UAPI.
--
commit 9171aa16917963dd1070d9ca3467d9086cd66393
Author: Josh Poimboeuf <jpoimboe@kernel.org>
unwind_user/sframe: Add prctl() interface for registering .sframe sections
The commit adds prctl() interfaces which allow glibc to tell the kernel
where to find .sframe sections. It mentions that the patch is for testing
purposes only.
> [
> This adds an interface for prctl() for testing loading of sframes for
> libraries. But this interface should really be a system call. This patch
> is for testing purposes only and should not be applied to mainline.
> ]
Is it intended to include this in a mainline patch series? Adding temporary
or test-only interfaces into the UAPI represents a permanent ABI commitment
once released.
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 62e842055cc9c..6e81e82bc9911 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
[ ... ]
> @@ -2906,6 +2907,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> break;
> if (arg3 & PR_CFI_LOCK && !(arg3 & PR_CFI_DISABLE))
> error = arch_prctl_lock_branch_landing_pad_state(me);
> + case PR_ADD_SFRAME:
> + error = sframe_add_section(arg2, arg3, arg4, arg5);
> + break;
Does this cause the previous case to fall through into PR_ADD_SFRAME?
It appears the new case was inserted before the break statement of the
previous case. This would cause execution to unintentionally continue into
sframe_add_section and overwrite the original error status.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=19
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 11/19] unwind_user/sframe: Show file name in debug output
2026-05-05 12:17 ` [PATCH v14 11/19] unwind_user/sframe: Show file name in debug output Jens Remus
@ 2026-05-05 18:46 ` sashiko-bot
0 siblings, 0 replies; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 18:46 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [High] Potential use-after-free of `sec` and `sec->filename` in the error path of `__sframe_remove_section`.
- [High] A race condition in the `-EFAULT` error path of `sframe_find()` allows an unprivileged local attacker to trigger a `WARN_ON_ONCE` and potentially panic the kernel.
- [High] Unbounded kernel memory allocations controlled by user space do not use `__GFP_ACCOUNT`, allowing memory cgroup limit bypass.
- [Medium] dbg_init() fails to fall back to other VMA naming methods if vma->vm_ops->name(vma) returns NULL.
- [Medium] Directly checking `vma->vm_file` in a conditional to classify the VMA violates VMA architectural rules.
- [Medium] The functions `sframe_find()` and `sframe_add_section()` mix scope-based cleanup helpers with manual `goto`-based cleanup labels, violating kernel refactoring guidelines.
--
commit d6a437647b4507fcb6d70115c0d08b1d68cafa8a
Author: Josh Poimboeuf <jpoimboe@kernel.org>
unwind_user/sframe: Show file name in debug output
This commit adds the file name to debug output messages for sframe sections
to make debugging easier. It records the filename when the section is added
by inspecting the associated VMA.
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> index 8eed6a7d96256..4419626f01734 100644
> --- a/kernel/unwind/sframe.c
> +++ b/kernel/unwind/sframe.c
> @@ -359,14 +359,17 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
> end:
> user_read_access_end();
>
> - if (ret == -EFAULT)
> + if (ret == -EFAULT) {
> + dbg_sec("removing bad .sframe section\n");
> WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
> + }
Can concurrent executions of this error path trigger the WARN_ON_ONCE()? If
two threads hit the EFAULT path concurrently, the first successfully removes
the section, but the second fails to find it and returns EINVAL, which
triggers the warning.
Also, does this code mix goto-based error handling with scope-based cleanup
helpers? The sframe_find() function uses guard(srcu)(&sframe_srcu) while
also using the goto end label. The sframe_add_section() function similarly
mixes scoped_guard with goto. The guidelines specify that functions should
convert all resources to scope-based cleanup or convert none of them.
[ ... ]
> @@ -496,7 +501,7 @@ static int __sframe_remove_section(struct mm_struct *mm,
> struct sframe_section *sec)
> {
> if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> - dbg("mtree_erase failed: text=%lx\n", sec->text_start);
> + dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
Does accessing sec->filename in dbg_sec() risk a use-after-free here? If
mtree_erase() fails, it indicates the section might have been concurrently
removed. The SRCU grace period could expire, causing sec to be freed before
dbg_sec() is called.
[ ... ]
> diff --git a/kernel/unwind/sframe_debug.h b/kernel/unwind/sframe_debug.h
> index 36352124cde81..e568be4172b1b 100644
> --- a/kernel/unwind/sframe_debug.h
> +++ b/kernel/unwind/sframe_debug.h
[ ... ]
> +static inline void dbg_init(struct sframe_section *sec)
> +{
> + struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma;
> +
> + guard(mmap_read_lock)(mm);
> + vma = vma_lookup(mm, sec->sframe_start);
> + if (!vma)
> + sec->filename = kstrdup("(vma gone???)", GFP_KERNEL);
> + else if (vma->vm_file)
> + sec->filename = kstrdup_quotable_file(vma->vm_file, GFP_KERNEL);
Does checking vma->vm_file directly skip proper classification for some
anonymous VMAs? According to the VMA guidelines, certain anonymous VMAs like
private mappings of /dev/zero have vma->vm_file set. Should this use
!vma_is_anonymous(vma) instead?
Do these string allocations need __GFP_ACCOUNT? Since they are triggered by
user space and persist until the section is removed, could a user allocate
large amounts of uncharged memory and bypass memory cgroup limits?
> + else if (vma->vm_ops && vma->vm_ops->name)
> + sec->filename = kstrdup(vma->vm_ops->name(vma), GFP_KERNEL);
If vma->vm_ops->name(vma) returns NULL, will kstrdup() return NULL and skip
the subsequent fallback naming methods? Does this leave sec->filename as
NULL, causing debug logs to print (null)?
> + else if (arch_vma_name(vma))
> + sec->filename = kstrdup(arch_vma_name(vma), GFP_KERNEL);
> + else if (!vma->vm_mm)
> + sec->filename = kstrdup("(vdso)", GFP_KERNEL);
> + else
> + sec->filename = kstrdup("(anonymous)", GFP_KERNEL);
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=11
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree
2026-05-05 12:17 ` [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree Jens Remus
@ 2026-05-05 18:51 ` sashiko-bot
2026-05-06 13:50 ` Jens Remus
0 siblings, 1 reply; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 18:51 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [Critical] Concurrency Use-After-Free and Double Free in sframe_remove_section
- [High] Unbounded memory leak of sframe_section allocations leading to Denial of Service (OOM)
- [High] Missing MT_FLAGS_USE_RCU initialization breaks lockless NMI unwinding
- [Medium] Child processes silently lose .sframe unwind data on fork()
- [Medium] Inclusive bound semantics in maple tree cause 1-byte overlap rejections for contiguous sections
- [Low] Zero-address validation falsely rejects valid memory mappings
--
commit 3b41045a6b38a262cb32acf9ed118e4d667448dd
Author: Josh Poimboeuf <jpoimboe@kernel.org>
unwind_user/sframe: Store .sframe section data in per-mm maple tree
Associate an .sframe section with its mm by adding it to a per-mm maple
tree which is indexed by the corresponding text address range. A single
.sframe section can be associated with multiple text ranges.
> diff --git a/kernel/fork.c b/kernel/fork.c
> index f1ad69c6dc2d4..0d92fb1c61a99 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
[ ... ]
> @@ -1072,6 +1074,13 @@ static void mmap_init_lock(struct mm_struct *mm)
> #endif
> }
>
> +static void mm_init_sframe(struct mm_struct *mm)
> +{
> +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
> + mt_init(&mm->sframe_mt);
> +#endif
> +}
> +
> static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mt_init() initializes the maple tree with flags set to 0. Since stack
unwinding typically occurs in profiling interrupts (NMIs) which must
traverse the maple tree locklessly via RCU, does this missing
MT_FLAGS_USE_RCU flag allow concurrent modifications to perform in-place
updates instead of RCU-safe copy-on-write allocations? This could result in
the NMI unwinder observing corrupted tree states.
[ ... ]
> @@ -1100,6 +1109,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
> mm->pmd_huge_pte = NULL;
> #endif
> mm_init_uprobes_state(mm);
> + mm_init_sframe(mm);
> hugetlb_count_init(mm);
>
> mm_flags_clear_all(mm);
When a process forks, this initializes an empty sframe_mt for the child's
mm_struct. While dup_mmap() duplicates the parent's VMAs, it doesn't appear
that the registered .sframe entries are propagated to the child's maple tree.
Will child processes silently lose their .sframe unwind data for shared
libraries registered prior to the fork?
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> index c128c1eeaca20..fdb2a49085638 100644
> --- a/kernel/unwind/sframe.c
> +++ b/kernel/unwind/sframe.c
[ ... ]
> @@ -81,6 +81,7 @@ static int sframe_read_header(struct sframe_section *sec)
> int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> unsigned long text_start, unsigned long text_end)
> {
> + struct maple_tree *sframe_mt = ¤t->mm->sframe_mt;
> struct vm_area_struct *sframe_vma, *text_vma;
> struct mm_struct *mm = current->mm;
> struct sframe_section *sec;
> @@ -88,10 +89,12 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> if (!sframe_start || !sframe_end || !text_start || !text_end) {
Virtual addresses can legitimately be zero, such as when mmap_min_addr
permits it. Does checking !text_start falsely reject valid zero-address
VMAs? Checking the range bounds explicitly might be safer here.
[ ... ]
> }
> }
>
> sec = kzalloc(sizeof(*sec), GFP_KERNEL);
This allocates the section with GFP_KERNEL instead of GFP_KERNEL_ACCOUNT,
which bypasses memory cgroup limits.
Furthermore, is there a mechanism to clean up these allocated sections when
the corresponding VMA is torn down (e.g., via munmap)? Without cleanup hooks
in the VMA teardown paths, it looks like an application could repeatedly mmap
an executable region, register it here, and munmap it, leaking kernel memory
and maple tree nodes indefinitely.
[ ... ]
> - /* TODO nowhere to store it yet - just free it and return an error */
> - ret = -ENOSYS;
> + ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
The Maple Tree API stores ranges inclusively as [first, last]. Since VMA
boundaries are exclusive, does passing sec->text_end directly cause the
maple tree range to incorrectly cover one extra byte?
If an application registers two perfectly contiguous .sframe sections, the
insertion of the second section could fail with -EEXIST because the first
section erroneously claims the first byte of the second section. Using
sec->text_end - 1 might prevent this overlap.
> + if (ret) {
> + dbg("mtree_insert_range failed: text=%lx-%lx\n",
> + sec->text_start, sec->text_end);
> + goto err_free;
> + }
> +
> + return 0;
>
> err_free:
> free_section(sec);
> return ret;
> }
>
> +static int __sframe_remove_section(struct mm_struct *mm,
> + struct sframe_section *sec)
> +{
> + if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> + dbg("mtree_erase failed: text=%lx\n", sec->text_start);
> + return -EINVAL;
> + }
> +
> + free_section(sec);
> +
> + return 0;
> +}
> +
> int sframe_remove_section(unsigned long sframe_start)
> {
> - return -ENOSYS;
> + struct mm_struct *mm = current->mm;
> + struct sframe_section *sec;
> + unsigned long index = 0;
> + bool found = false;
> + int ret = 0;
> +
> + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
> + if (sec->sframe_start == sframe_start) {
Can concurrent calls to sframe_remove_section() cause a use-after-free and
subsequent double free?
mt_for_each() locklessly iterates mm->sframe_mt, and internally acquires and
drops the RCU read lock, meaning the returned sec pointer has no lifetime
protection in the loop body.
If two threads concurrently invoke sframe_remove_section(), both could
receive the exact same sec pointer from the tree. Thread A could then call
__sframe_remove_section(), erasing the entry and freeing sec via
free_section(). Thread B would then evaluate the if statement using the
freed sec pointer, causing a use-after-free read, and potentially proceeding
to free it again.
Additionally, does free_section() need to use kfree_rcu() instead of
kfree() to prevent lockless NMI unwinders from hitting a use-after-free
when reading payloads?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=3
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 16/19] unwind_user/sframe: Add support for SFrame V3 flexible FDEs
2026-05-05 12:17 ` [PATCH v14 16/19] unwind_user/sframe: Add support for SFrame V3 flexible FDEs Jens Remus
@ 2026-05-05 18:55 ` sashiko-bot
2026-05-07 15:30 ` Jens Remus
0 siblings, 1 reply; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 18:55 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Valid flexible FREs ending in a single padding data word are incorrectly rejected.
- [High] Valid flexible FRE rules for CFA and FP/RA produce unhandled enums, triggering user-controlled WARN_ON_ONCE().
- [High] Outermost frame unwinding is unconditionally broken, causing unintended fallback to frame pointers.
--
commit 7d3289b2e756d08c38877458400ce276ff3f5994
Author: Jens Remus <jremus@linux.ibm.com>
unwind_user/sframe: Add support for SFrame V3 flexible FDEs
SFrame V3 introduces flexible FDEs in addition to the regular FDEs.
The key difference is that flexible FDEs encode the CFA, RA, and FP
tracking information using two FRE data words, a control word and an
offset, or a single padding data word of zero.
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> index 2721f4af53fd3..2ba2c8b385f96 100644
> --- a/kernel/unwind/sframe.c
> +++ b/kernel/unwind/sframe.c
[ ... ]
> @@ -189,16 +193,147 @@ static __always_inline int __find_fde(struct sframe_section *sec,
[ ... ]
> +static __always_inline int
> +__read_flex_fde_fre_datawords(struct sframe_section *sec,
> + struct sframe_fde_internal *fde,
> + unsigned long cur,
> + unsigned char dataword_count,
> + unsigned char dataword_size,
> + struct sframe_fre_internal *fre)
> +{
> + u32 cfa_ctl, ra_ctl, fp_ctl;
> + s32 cfa_off, ra_off, fp_off;
> +
> + if (dataword_count < 2)
> + return -EFAULT;
> + UNSAFE_GET_USER_INC(cfa_ctl, cur, dataword_size, Efault);
> + UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
> + dataword_count -= 2;
> +
> + ra_off = sec->ra_off;
> + ra_ctl = ra_off ? 2 : 0; /* regnum=0, deref_p=(ra_off != 0), reg_p=0 */
> + if (dataword_count >= 2) {
The commit message mentions that flexible FDEs can use a single padding
data word of zero. However, this code checks dataword_count >= 2 for both
the RA and FP blocks.
If an FRE ends with a single padding data word, it will bypass both of these
blocks and reach the check below:
> + UNSAFE_GET_USER_INC(ra_ctl, cur, dataword_size, Efault);
> + dataword_count--;
> + if (ra_ctl) {
> + UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
> + dataword_count--;
> + } else {
> + /* Padding RA location info */
> + ra_ctl = ra_off ? 2 : 0; /* re-deduce (see above) */
> + }
> + }
> +
> + fp_off = sec->fp_off;
> + fp_ctl = fp_off ? 2 : 0; /* regnum=0, deref_p=(fp_off != 0), reg_p=0 */
> + if (dataword_count >= 2) {
> + UNSAFE_GET_USER_INC(fp_ctl, cur, dataword_size, Efault);
> + dataword_count--;
> + if (fp_ctl) {
> + UNSAFE_GET_USER_INC(fp_off, cur, dataword_size, Efault);
> + dataword_count--;
> + } else {
> + /* Padding FP location info */
> + fp_ctl = fp_off ? 2 : 0; /* re-deduce (see above) */
> + }
> + }
> +
> + if (dataword_count)
> + return -EFAULT;
Could this incorrectly reject valid FREs ending in a single padding data word
since dataword_count would be 1 here?
> +
> + fre->cfa_ctl = cfa_ctl;
> + fre->cfa_off = cfa_off;
> + fre->ra_ctl = ra_ctl;
> + fre->ra_off = ra_off;
> + fre->fp_ctl = fp_ctl;
> + fre->fp_off = fp_off;
> +
> + return 0;
> +
> +Efault:
> + return -EFAULT;
> +}
[ ... ]
> static __always_inline int __read_fre(struct sframe_section *sec,
> struct sframe_fde_internal *fde,
> unsigned long fre_addr,
> struct sframe_fre_internal *fre)
> {
[ ... ]
> if (!dataword_count) {
> /*
> * A FRE without data words indicates RA undefined /
> * outermost frame.
> */
> - cfa_off = 0;
> - ra_off = 0;
> - fp_off = 0;
> - goto done;
> - }
> + fre->cfa_ctl = 0;
> + fre->cfa_off = 0;
> + fre->ra_ctl = 0;
> + fre->ra_off = 0;
> + fre->fp_ctl = 0;
> + fre->fp_off = 0;
>
> - UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
When __read_fre() processes an outermost frame, it zeroes out fre->cfa_ctl.
In __find_fre(), this is later passed to sframe_init_cfa_rule_data():
> -static __always_inline void
> +static __always_inline int
> sframe_init_cfa_rule_data(struct unwind_user_cfa_rule_data *cfa_rule_data,
> - unsigned char fre_info,
> - s32 offset)
> + u32 ctlword, s32 offset)
> {
> - if (SFRAME_V3_FRE_CFA_BASE_REG_ID(fre_info) == SFRAME_BASE_REG_FP)
> - cfa_rule_data->rule = UNWIND_USER_CFA_RULE_FP_OFFSET;
> - else
> - cfa_rule_data->rule = UNWIND_USER_CFA_RULE_SP_OFFSET;
> + bool deref_p = SFRAME_V3_FLEX_FDE_CTRLWORD_DEREF_P(ctlword);
> + bool reg_p = SFRAME_V3_FLEX_FDE_CTRLWORD_REG_P(ctlword);
> +
> + if (reg_p) {
> + unsigned int regnum = SFRAME_V3_FLEX_FDE_CTRLWORD_REGNUM(ctlword);
> +
> + switch (regnum) {
> + case SFRAME_REG_SP:
> + cfa_rule_data->rule = UNWIND_USER_CFA_RULE_SP_OFFSET;
> + break;
> + case SFRAME_REG_FP:
> + cfa_rule_data->rule = UNWIND_USER_CFA_RULE_FP_OFFSET;
> + break;
> + default:
> + cfa_rule_data->rule = UNWIND_USER_CFA_RULE_REG_OFFSET;
> + cfa_rule_data->regnum = regnum;
> + }
> + } else {
> + return -EINVAL;
> + }
Because ctlword is 0 for an outermost frame, reg_p will evaluate to false,
and this unconditionally returns -EINVAL.
Does this unintentionally break outermost frame unwinding? Returning an
error here will cause __find_fre() to fail before it can reach the
frame->outermost check, resulting in a fallback to frame pointer unwinding
instead of stopping gracefully.
> +
> + if (deref_p)
> + cfa_rule_data->rule |= UNWIND_USER_RULE_DEREF;
> +
Also, does this append UNWIND_USER_RULE_DEREF to CFA rules in a way that is
not supported by the unwinder?
If a CFA rule uses SP/FP with deref_p=1, this creates a combined enum like
UNWIND_USER_CFA_RULE_SP_OFFSET | UNWIND_USER_RULE_DEREF. Because .sframe
sections are user-supplied, could an attacker use this to trigger an
unhandled enum path and hit a WARN_ON_ONCE() during unwinding?
> cfa_rule_data->offset = offset;
> +
> + return 0;
> }
>
> static __always_inline void
> sframe_init_rule_data(struct unwind_user_rule_data *rule_data,
> - s32 offset)
> + u32 ctlword, s32 offset)
> {
> - if (offset) {
> - rule_data->rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF;
> - rule_data->offset = offset;
> - } else {
> + bool deref_p = SFRAME_V3_FLEX_FDE_CTRLWORD_DEREF_P(ctlword);
> + bool reg_p = SFRAME_V3_FLEX_FDE_CTRLWORD_REG_P(ctlword);
> +
> + if (!ctlword && !offset) {
> rule_data->rule = UNWIND_USER_RULE_RETAIN;
> + return;
> + }
> + if (reg_p) {
> + unsigned int regnum = SFRAME_V3_FLEX_FDE_CTRLWORD_REGNUM(ctlword);
> +
> + rule_data->rule = UNWIND_USER_RULE_REG_OFFSET;
> + rule_data->regnum = regnum;
> + } else {
> + rule_data->rule = UNWIND_USER_RULE_CFA_OFFSET;
> }
> +
> + if (deref_p)
> + rule_data->rule |= UNWIND_USER_RULE_DEREF;
> +
> + rule_data->offset = offset;
> }
Similarly, if a rule for FP/RA is set to *(CFA + offset) with reg_p=0
and deref_p=0, this assigns UNWIND_USER_RULE_CFA_OFFSET without the
UNWIND_USER_RULE_DEREF flag.
Is this combination handled by the rest of the unwinding logic, or will
it fall through to an unsupported state and abort unwinding?
>
> static __always_inline int __find_fre(struct sframe_section *sec,
> @@ -344,9 +492,10 @@ static __always_inline int __find_fre(struct sframe_section *sec,
> return -EINVAL;
> fre = prev_fre;
>
> - sframe_init_cfa_rule_data(&frame->cfa, fre->info, fre->cfa_off);
> - sframe_init_rule_data(&frame->ra, fre->ra_off);
> - sframe_init_rule_data(&frame->fp, fre->fp_off);
> + if (sframe_init_cfa_rule_data(&frame->cfa, fre->cfa_ctl, fre->cfa_off))
> + return -EINVAL;
> + sframe_init_rule_data(&frame->ra, fre->ra_ctl, fre->ra_off);
> + sframe_init_rule_data(&frame->fp, fre->fp_ctl, fre->fp_off);
> frame->outermost = SFRAME_V3_FRE_RA_UNDEFINED_P(fre->info);
>
> return 0;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=16
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 07/19] unwind_user/sframe: Wire up unwind_user to sframe
2026-05-05 12:17 ` [PATCH v14 07/19] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
@ 2026-05-05 18:55 ` sashiko-bot
2026-05-07 16:18 ` Jens Remus
0 siblings, 1 reply; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 18:55 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Unbounded linear scan of user-controlled sframe entries can cause a CPU hard lockup in NMI context.
- [Medium] The commit message claims to refactor `unwind_user_next_common()` by removing an unused `pt_regs` parameter, but this change is entirely absent from the code modifications.
--
commit 2e1e17bbaa1c0426d41d66ce7930a944f3e86ae7
Author: Josh Poimboeuf <jpoimboe@kernel.org>
unwind_user/sframe: Wire up unwind_user to sframe
This commit hooks up the sframe infrastructure into the unwind_user interface
to support user-space stack unwinding via sframe. It adds the
UNWIND_USER_TYPE_SFRAME type and updates the unwind_user_next() logic to
attempt sframe unwinding when available.
> [ Jens Remus: Remove unused pt_regs from unwind_user_next_common() and
> its callers. Simplify unwind_user_next_sframe(). ]
This isn't a bug, but does this commit message match the changes?
Looking at the diff and the code, there are no modifications to
unwind_user_next_common() or its callers to remove a pt_regs parameter.
> diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
> index 90ab3c1a205ef..1fb2724197332 100644
> --- a/kernel/unwind/user.c
> +++ b/kernel/unwind/user.c
[ ... ]
> @@ -82,6 +83,16 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
> return unwind_user_next_common(state, &fp_frame);
> }
>
> +static int unwind_user_next_sframe(struct unwind_user_state *state)
> +{
> + struct unwind_user_frame frame;
> +
> + /* sframe expects the frame to be local storage */
> + if (sframe_find(state->ip, &frame))
Can this sequence cause a hard lockup in NMI context?
Since this unwinder is often called from NMI context (like perf), sframe_find()
eventually calls __find_fre() to search for Frame Row Entries. If __find_fre()
performs a linear scan over an unbounded, user-controlled number of entries
(fde->fres_num), it could spin for a long time without yielding.
If a malicious userspace program maps a large VMA and provides hundreds of
millions of FREs, would this loop calling unsafe_get_user() on every
iteration tie up the CPU long enough in NMI context to trigger the hard
lockup detector?
> + return -ENOENT;
> + return unwind_user_next_common(state, &frame);
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=7
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents
2026-05-05 12:17 ` [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
@ 2026-05-05 18:59 ` sashiko-bot
2026-05-06 14:34 ` Jens Remus
2026-05-08 10:50 ` Jens Remus
1 sibling, 1 reply; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 18:59 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [Critical] Division by zero in `__find_fre` when `fde->rep_size` is 0.
- [High] `mtree_insert_range` is called with an exclusive end boundary, preventing contiguous SFrame sections.
- [High] Unbounded linear search over user-controlled memory in `__find_fre` can cause hard lockups in atomic context.
- [High] Integer overflows in SFrame header parsing allow bypass of bounds checks on 32-bit architectures.
- [Medium] Generic SFrame unwinding code uses `unsafe_get_user()` for unaligned accesses, breaking strict-alignment architectures.
--
commit 159c92acf3c961aecc6383f6c3f35a955591abf3
Author: Josh Poimboeuf <jpoimboe@kernel.org>
unwind_user/sframe: Add support for reading .sframe contents
This commit adds support for reading .sframe section contents to facilitate
user space stack unwinding. It introduces the sframe_find() interface which
uses user_read_access_begin() and unsafe_*() accessors for performance.
Support is included for SFrame V3, internal FDE representations, and
PC-relative offsets.
Link: https://lore.kernel.org/all/77c0d1ec143bf2a53d66c4ecb190e7e0a576fbfd.1737511963.git.jpoimboe@kernel.org/
Link: https://lore.kernel.org/all/b35ca3a3-8de5-4d32-8d30-d4e562f6b0de@linux.ibm.com/
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> index fdb2a49085638..c60aa527984a1 100644
> --- a/kernel/unwind/sframe.c
> +++ b/kernel/unwind/sframe.c
> @@ -15,9 +15,338 @@
> #include <linux/unwind_user_types.h>
>
> #include "sframe.h"
> +#include "sframe_debug.h"
[ ... ]
> +static __always_inline int __read_fre(struct sframe_section *sec,
> + struct sframe_fde_internal *fde,
> + unsigned long fre_addr,
> + struct sframe_fre_internal *fre)
> +{
[ ... ]
> + if (fre_addr + addr_size + 1 > sec->fres_end)
> + return -EFAULT;
> +
> + UNSAFE_GET_USER_INC(ip_off, cur, addr_size, Efault);
Will this cause alignment faults on architectures with strict alignment
requirements?
The .sframe format uses packed structures and variable-length datawords. The
cur pointer might be unaligned here, and UNSAFE_GET_USER_INC() performs
16-bit or 32-bit reads via unsafe_get_user().
[ ... ]
> +static __always_inline int __find_fre(struct sframe_section *sec,
> + struct sframe_fde_internal *fde,
> + unsigned long ip,
> + struct unwind_user_frame *frame)
> +{
> + unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(fde->info);
> + struct sframe_fre_internal *fre, *prev_fre = NULL;
> + struct sframe_fre_internal fres[2];
> + unsigned long fre_addr;
> + bool which = false;
> + unsigned int i;
> + u32 ip_off;
> +
> + ip_off = ip - fde->func_addr;
> +
> + if (fde_pctype == SFRAME_FDE_PCTYPE_MASK)
> + ip_off %= fde->rep_size;
Can a malicious user provide an SFrame section with a rep_size of 0?
Since fde->rep_size is populated directly from the user-provided
_fda.rep_size without validation, a 0 value here would trigger a
divide-by-zero exception and a kernel panic.
> +
> + fre_addr = sec->fres_start + fde->fres_off;
> +
> + for (i = 0; i < fde->fres_num; i++) {
Can this loop cause a hard lockup in atomic context?
fde->fres_num is a 32-bit value copied from user space without validation.
Since sframe_find() is designed to be called by unwinders in NMI context, an
attacker could provide a very large number of valid entries. Executing
billions of iterations and unsafe_get_user() calls could stall the CPU
and trigger the hard lockup detector.
> + int ret;
> +
[ ... ]
> @@ -120,8 +449,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> sec->text_end = text_end;
>
> ret = sframe_read_header(sec);
> - if (ret)
> + if (ret) {
> + dbg_print_header(sec);
> goto err_free;
> + }
Can shdr.fre_len cause an integer overflow on 32-bit architectures during
header parsing?
If a malicious user provides a large fre_len in the header, fres_end
(calculated as fres_start + shdr.fre_len) could wrap around the 32-bit
address space. This would bypass the bounds check in sframe_read_header(),
allowing fres_start and fdes_start to point into kernel memory. Later, when
__read_fde() and __find_fre() use unsafe_get_user(), this could lead to
arbitrary kernel memory disclosure.
>
> ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
Does passing sec->text_end directly as the last parameter to
mtree_insert_range() break contiguous mappings?
mtree_insert_range() expects the last boundary to be inclusive, but
sec->text_end represents the exclusive end address of the executable segment.
If user space maps seamlessly contiguous text segments, the insertion for the
second segment might overlap with the claimed end of the first, causing it to
fail with -EEXIST.
> if (ret) {
> dbg("mtree_insert_range failed: text=%lx-%lx\n",
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=5
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 17/19] unwind_user/sframe: Separate reading of FRE from reading of FRE data words
2026-05-05 12:17 ` [PATCH v14 17/19] unwind_user/sframe: Separate reading of FRE from reading of FRE data words Jens Remus
@ 2026-05-05 19:05 ` sashiko-bot
2026-05-07 16:01 ` Jens Remus
0 siblings, 1 reply; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 19:05 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Critical] A malicious user can trigger a kernel panic by providing a crafted `.sframe` section with `rep_size` set to 0, which causes a division-by-zero when the kernel attempts to unwind the stack.
--
commit 1aeb36f953ce26b15e30e2b228bfb48239924c2b
Author: Jens Remus <jremus@linux.ibm.com>
unwind_user/sframe: Separate reading of FRE from reading of FRE data words
This commit separates the reading of SFrame FRE structures from their
data words, which allows linear searches to skip data words until a
matching FRE is found. New fields are added to struct sframe_fre_internal
to cache the necessary information to read data words later.
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> index 2ba2c8b385f96..98346412078b8 100644
> --- a/kernel/unwind/sframe.c
> +++ b/kernel/unwind/sframe.c
[ ... ]
> @@ -455,6 +450,7 @@ static __always_inline int __find_fre(struct sframe_section *sec,
> bool which = false;
> unsigned int i;
> u32 ip_off;
> + int ret;
>
> ip_off = ip - fde->func_addr;
>
This isn't a regression introduced by this patch, but while looking at this
area, can a crafted sframe section cause a division-by-zero just after
this line?
In __find_fre(), the code does:
if (fde_pctype == SFRAME_FDE_PCTYPE_MASK)
ip_off %= fde->rep_size;
Since fde->rep_size and fde->info are populated directly from the user-provided
sframe section, does anything ensure rep_size is non-zero before reaching
this code?
If a user provides an sframe section with rep_size set to 0 and the
SFRAME_FDE_PCTYPE_MASK bit set in the info field, will this lead to an
unhandled division-by-zero exception and crash the kernel during stack
unwinding?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=17
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 18/19] unwind_user/sframe/x86: Enable sframe unwinding on x86
2026-05-05 12:17 ` [PATCH v14 18/19] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
@ 2026-05-05 19:07 ` sashiko-bot
0 siblings, 0 replies; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 19:07 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] SFrame unwinding uses incorrect DWARF register mappings for 32-bit (compat) user processes on x86_64.
--
commit 014ce7a38c83420c1017dd7293ec85cc9b4e45d6
Author: Josh Poimboeuf <jpoimboe@kernel.org>
unwind_user/sframe/x86: Enable sframe unwinding on x86
This commit enables SFrame V3 unwinding on x86, which requires binutils 2.46.
It introduces DWARF register mappings and sets the SFrame register constants.
> diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
> index f38f7c5ff1de1..b80f0ec0f7a74 100644
> --- a/arch/x86/include/asm/unwind_user.h
> +++ b/arch/x86/include/asm/unwind_user.h
> @@ -15,6 +15,39 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
> return user_64bit_mode(regs) ? 8 : 4;
> }
>
> +static inline int unwind_user_get_reg(unsigned long *val, unsigned int regnum)
> +{
> +#ifdef CONFIG_X86_64
> + const struct pt_regs *regs = task_pt_regs(current);
> +
> + switch (regnum) {
> + /* DWARF register numbers 0..15 */
> + case 0: *val = regs->ax; break;
> + case 1: *val = regs->dx; break;
> + case 2: *val = regs->cx; break;
> + case 3: *val = regs->bx; break;
> + case 4: *val = regs->si; break;
> + case 5: *val = regs->di; break;
> + case 6: *val = regs->bp; break;
> + case 7: *val = regs->sp; break;
Does this code handle 32-bit (compat) user processes correctly?
32-bit (i386) processes use a different DWARF mapping where ESP is 4 and
EBP is 5, but this code unconditionally uses the x86_64 DWARF register
mapping (where register 4 is RSI and 7 is RSP).
When an i386 application uses an SFrame V3 flexible FDE, the DWARF register
number is explicitly encoded in the FRE data words. The generic unwinder
sframe_init_cfa_rule_data() compares the register against the hardcoded
SFRAME_REG_SP (which is 7 on x86_64).
Since an i386 SFrame specifies 4 for SP, the check would fail and fall
back to UNWIND_USER_CFA_RULE_REG_OFFSET with regnum = 4. Then
unwind_user_next_common() would call unwind_user_get_reg(&cfa, 4), which
would return regs->si instead of regs->sp.
Could this break unwinding for 32-bit tasks by using the wrong registers
(e.g., RSI acting as the stack pointer), leading to corrupted stack traces?
Would the unwinder need to be explicitly disabled for compat tasks or
dynamically adjust the DWARF mappings using user_64bit_mode(regs)?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=18
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 10/19] unwind_user/sframe: Remove .sframe section on detected corruption
2026-05-05 12:17 ` [PATCH v14 10/19] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
@ 2026-05-05 20:39 ` sashiko-bot
2026-05-07 16:23 ` Jens Remus
0 siblings, 1 reply; 67+ messages in thread
From: sashiko-bot @ 2026-05-05 20:39 UTC (permalink / raw)
To: Jens Remus; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Concurrent stack unwinding on a corrupted `.sframe` section triggers a user-controllable `WARN_ON_ONCE()`, leading to a potential Denial of Service (DoS).
--
commit a00e313eeb121415a3ca3fe4ed3b15126f46c76a
Author: Josh Poimboeuf <jpoimboe@kernel.org>
unwind_user/sframe: Remove .sframe section on detected corruption
This commit removes the .sframe section on demand when corruption is
detected during unwinding, to avoid continued attempted use of a bad
section.
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> index 9c755965ab837..8eed6a7d96256 100644
> --- a/kernel/unwind/sframe.c
> +++ b/kernel/unwind/sframe.c
> @@ -358,6 +358,10 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
> ret = __find_fre(sec, &fde, ip, frame);
> end:
> user_read_access_end();
> +
> + if (ret == -EFAULT)
> + WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
> +
> return ret;
> }
Can this WARN_ON_ONCE() be triggered concurrently by userspace?
If two threads within the same process are being unwound simultaneously,
they could both encounter the same bad .sframe section and hit the
-EFAULT path.
Because sframe_find() is only protected by the sframe_srcu read lock,
both threads might call sframe_remove_section() at the same time. The first
thread will successfully remove the section and return 0.
The second thread will fail to find the section in mm->sframe_mt:
kernel/unwind/sframe.c:sframe_remove_section() {
...
mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
if (sec->sframe_start == sframe_start) {
found = true;
ret |= __sframe_remove_section(mm, sec);
}
}
if (!found || ret)
return -EINVAL;
...
}
Since found is false, this returns -EINVAL. Because -EINVAL is non-zero,
WARN_ON_ONCE(-EINVAL) evaluates to true.
Could this allow an unprivileged user to intentionally trigger kernel
warnings (and panic the system if panic_on_warn is set) by unmapping a
registered .sframe section and triggering concurrent unwinds?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260505121718.3572346-1-jremus@linux.ibm.com?part=10
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 02/19] unwind_user/sframe: Add support for reading .sframe headers
2026-05-05 12:49 ` sashiko-bot
@ 2026-05-06 13:42 ` Jens Remus
2026-05-07 14:55 ` Jens Remus
2026-05-08 23:02 ` Indu Bhagat
0 siblings, 2 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-06 13:42 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf, Indu Bhagat; +Cc: bpf, sashiko
On 5/5/2026 2:49 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> - [Medium] Missing validation of shdr.abi_arch allows parsing SFrame data from mismatched architectures.
> - [Medium] VMA addresses are incorrectly evaluated as boolean flags to check for zero-length inputs.
> - [Low] Mixing goto error handling with scoped_guard scope-based cleanup.
> --
>
> commit afe7c9eb6f0e6fc3f73d0b6a555f8d3926cf6ccf
> Author: Josh Poimboeuf <jpoimboe@kernel.org>
>
> unwind_user/sframe: Add support for reading .sframe headers
>
> This commit introduces basic infrastructure for unwinding user space stacks
> with SFrame. It adds support for parsing the .sframe section header.
>
> Link: https://lore.kernel.org/all/f27e8463783febfa0dabb0432a3dd6be8ad98412.1737511963.git.jpoimboe@kernel.org/
>
>> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>> new file mode 100644
>> index 0000000000000..c128c1eeaca20
>> --- /dev/null
>> +++ b/kernel/unwind/sframe.c
> [ ... ]
>> +static int sframe_read_header(struct sframe_section *sec)
>> +{
>> + unsigned long header_end, fdes_start, fdes_end, fres_start, fres_end;
>> + struct sframe_header shdr;
>> + unsigned int num_fdes;
>> +
>> + if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
>> + dbg("header usercopy failed\n");
>> + return -EFAULT;
>> + }
>> +
>> + if (shdr.preamble.magic != SFRAME_MAGIC ||
>> + shdr.preamble.version != SFRAME_VERSION_3 ||
>> + !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
>> + !(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
shdr.abi_arch != sframe_abi_arch() ||
>> + shdr.auxhdr_len) {
>> + dbg("bad/unsupported sframe header\n");
>> + return -EINVAL;
>> + }
>
> Should we also validate shdr.abi_arch to ensure it matches the expected
> architecture?
>
> If malformed or foreign-architecture SFrame data is passed in (for instance,
> with an unexpected endianness or register mapping), the kernel might interpret
> it incorrectly and generate invalid stack traces.
>
> Furthermore, accepting arbitrary values here could risk ABI calcification if
> userspace starts supplying invalid values that we later want to reject.
I could introduce an architecture-specific sframe_abi_arch() and update
the SFrame header validation as above. For x86 the implementation in
in unwind_user_sframe.h could look as follows:
#ifdef CONFIG_X86_64
#define sframe_abi_arch() SFRAME_ABI_AMD64_ENDIAN_LITTLE
#endif
Given sframe.o is only linked in if CONFIG_HAVE_UNWIND_USER_SFRAME and
that is only selected if X86_64 this should be fine.
Do you see any value?
> [ ... ]
>> +int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>> + unsigned long text_start, unsigned long text_end)
>> +{
>> + struct vm_area_struct *sframe_vma, *text_vma;
>> + struct mm_struct *mm = current->mm;
>> + struct sframe_section *sec;
>> + int ret;
>> +
>> + if (!sframe_start || !sframe_end || !text_start || !text_end) {
if (sframe_start >= sframe_end || text_start >= text_end) {
dbg("invalid sframe/text address\n");
Do you agree?
> Is it safe to use VMA start addresses as boolean flags?
>
> Since vm_start can legitimately be zero in certain configurations (if
> mmap_min_addr permits), evaluating !sframe_start or !text_start could
> incorrectly reject perfectly valid mappings at address 0.
>
> Also, to check for zero-length sections, would it be better to explicitly use
> comparisons like sframe_start >= sframe_end instead?
>
>> + dbg("zero-length sframe/text address\n");
>> + return -EINVAL;
>> + }
>> +
>> + scoped_guard(mmap_read_lock, mm) {
> [ ... ]
>> + sec = kzalloc(sizeof(*sec), GFP_KERNEL);
>> + if (!sec)
>> + return -ENOMEM;
>> +
>> + sec->sframe_start = sframe_start;
>> + sec->sframe_end = sframe_end;
>> + sec->text_start = text_start;
>> + sec->text_end = text_end;
>> +
>> + ret = sframe_read_header(sec);
>> + if (ret)
>> + goto err_free;
>
> Is it acceptable to mix goto error handling with scoped_guard() cleanup
> helpers in the same function?
>
> The kernel's cleanup subsystem guidelines generally suggest that usage of
> goto and cleanup helpers shouldn't be mixed in the same routine.
This references to the following comment in include/linux/cleanup.h:
Lastly, given that the benefit of cleanup helpers is removal of
"goto", and that the "goto" statement can jump between scopes, the
expectation is that usage of "goto" and cleanup helpers is never
mixed in the same function. I.e. for a given routine, convert all
resources that need a "goto" cleanup to scope-based cleanup, or
convert none of them.
>
> Could sec be allocated using __free(kfree) to avoid the goto entirely?
The goto error handling is used outside the scoped_guard(). So I think
it is fine. Do you see any value in converting it as follows (my naive
attempt):
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
@@ -722,13 +722,14 @@ static int sframe_read_header(struct sframe_section *sec)
return 0;
}
+DEFINE_FREE(free_section, struct sframe_section *, if (_T) free_section(_T))
int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end)
{
struct maple_tree *sframe_mt = ¤t->mm->sframe_mt;
struct vm_area_struct *sframe_vma, *text_vma;
struct mm_struct *mm = current->mm;
- struct sframe_section *sec;
+ struct sframe_section *sec __free(free_section) = NULL;
int ret;
if (sframe_start >= sframe_end || text_start >= text_end) {
@@ -768,7 +769,7 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
ret = sframe_read_header(sec);
if (ret) {
dbg_print_header(sec);
- goto err_free;
+ return ret;
}
ret = sframe_validate_section(sec);
@@ -784,10 +785,6 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
}
return 0;
-
-err_free:
- free_section(sec);
- return ret;
}
>
>> +
>> + /* TODO nowhere to store it yet - just free it and return an error */
>> + ret = -ENOSYS;
>> +
>> +err_free:
>> + free_section(sec);
>> + return ret;
>> +}
>
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree
2026-05-05 18:51 ` sashiko-bot
@ 2026-05-06 13:50 ` Jens Remus
2026-05-06 15:21 ` Steven Rostedt
0 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-06 13:50 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko, Indu Bhagat
On 5/5/2026 8:51 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
> - [Critical] Concurrency Use-After-Free and Double Free in sframe_remove_section
> - [High] Unbounded memory leak of sframe_section allocations leading to Denial of Service (OOM)
> - [High] Missing MT_FLAGS_USE_RCU initialization breaks lockless NMI unwinding
> - [Medium] Child processes silently lose .sframe unwind data on fork()
> - [Medium] Inclusive bound semantics in maple tree cause 1-byte overlap rejections for contiguous sections
> - [Low] Zero-address validation falsely rejects valid memory mappings
> --
>
> commit 3b41045a6b38a262cb32acf9ed118e4d667448dd
> Author: Josh Poimboeuf <jpoimboe@kernel.org>
>
> unwind_user/sframe: Store .sframe section data in per-mm maple tree
>
> Associate an .sframe section with its mm by adding it to a per-mm maple
> tree which is indexed by the corresponding text address range. A single
> .sframe section can be associated with multiple text ranges.
>
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index f1ad69c6dc2d4..0d92fb1c61a99 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
> [ ... ]
>> @@ -1072,6 +1074,13 @@ static void mmap_init_lock(struct mm_struct *mm)
>> #endif
>> }
>>
>> +static void mm_init_sframe(struct mm_struct *mm)
>> +{
>> +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
>> + mt_init(&mm->sframe_mt);
>> +#endif
>> +}
>> +
>> static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>
> mt_init() initializes the maple tree with flags set to 0. Since stack
> unwinding typically occurs in profiling interrupts (NMIs) which must
> traverse the maple tree locklessly via RCU, does this missing
> MT_FLAGS_USE_RCU flag allow concurrent modifications to perform in-place
> updates instead of RCU-safe copy-on-write allocations? This could result in
> the NMI unwinder observing corrupted tree states.
Please advise.
> [ ... ]
>> @@ -1100,6 +1109,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>> mm->pmd_huge_pte = NULL;
>> #endif
>> mm_init_uprobes_state(mm);
>> + mm_init_sframe(mm);
>> hugetlb_count_init(mm);
>>
>> mm_flags_clear_all(mm);
>
> When a process forks, this initializes an empty sframe_mt for the child's
> mm_struct. While dup_mmap() duplicates the parent's VMAs, it doesn't appear
> that the registered .sframe entries are propagated to the child's maple tree.
> Will child processes silently lose their .sframe unwind data for shared
> libraries registered prior to the fork?
Does this ring any bell, Steven? I ran into it with my signal frame
stacktracing test on s390. I think we discussed this during one of the
SFrame calls and it got lost.
>> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>> index c128c1eeaca20..fdb2a49085638 100644
>> --- a/kernel/unwind/sframe.c
>> +++ b/kernel/unwind/sframe.c
> [ ... ]
>> @@ -81,6 +81,7 @@ static int sframe_read_header(struct sframe_section *sec)
>> int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>> unsigned long text_start, unsigned long text_end)
>> {
>> + struct maple_tree *sframe_mt = ¤t->mm->sframe_mt;
>> struct vm_area_struct *sframe_vma, *text_vma;
>> struct mm_struct *mm = current->mm;
>> struct sframe_section *sec;
>> @@ -88,10 +89,12 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>> if (!sframe_start || !sframe_end || !text_start || !text_end) {
>
> Virtual addresses can legitimately be zero, such as when mmap_min_addr
> permits it. Does checking !text_start falsely reject valid zero-address
> VMAs? Checking the range bounds explicitly might be safer here.
Addressed in previous patch.
> [ ... ]
>> }
>> }
>>
>> sec = kzalloc(sizeof(*sec), GFP_KERNEL);
>
> This allocates the section with GFP_KERNEL instead of GFP_KERNEL_ACCOUNT,
> which bypasses memory cgroup limits.
>
> Furthermore, is there a mechanism to clean up these allocated sections when
> the corresponding VMA is torn down (e.g., via munmap)? Without cleanup hooks
> in the VMA teardown paths, it looks like an application could repeatedly mmap
> an executable region, register it here, and munmap it, leaking kernel memory
> and maple tree nodes indefinitely.
Isn't that taken care of by the change in __mmdrop() to call
sframe_free_mm(mm), which is part of this patch?
> [ ... ]
>> - /* TODO nowhere to store it yet - just free it and return an error */
>> - ret = -ENOSYS;
>> + ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end - 1,
sec, GFP_KERNEL);
> The Maple Tree API stores ranges inclusively as [first, last]. Since VMA
> boundaries are exclusive, does passing sec->text_end directly cause the
> maple tree range to incorrectly cover one extra byte?
>
> If an application registers two perfectly contiguous .sframe sections, the
> insertion of the second section could fail with -EEXIST because the first
> section erroneously claims the first byte of the second section. Using
> sec->text_end - 1 might prevent this overlap.
>
>> + if (ret) {
>> + dbg("mtree_insert_range failed: text=%lx-%lx\n",
>> + sec->text_start, sec->text_end);
>> + goto err_free;
>> + }
>> +
>> + return 0;
>>
>> err_free:
>> free_section(sec);
>> return ret;
>> }
>>
>> +static int __sframe_remove_section(struct mm_struct *mm,
>> + struct sframe_section *sec)
>> +{
>> + if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
>> + dbg("mtree_erase failed: text=%lx\n", sec->text_start);
>> + return -EINVAL;
>> + }
>> +
>> + free_section(sec);
>> +
>> + return 0;
>> +}
>> +
>> int sframe_remove_section(unsigned long sframe_start)
>> {
>> - return -ENOSYS;
>> + struct mm_struct *mm = current->mm;
>> + struct sframe_section *sec;
>> + unsigned long index = 0;
>> + bool found = false;
>> + int ret = 0;
>> +
>> + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
>> + if (sec->sframe_start == sframe_start) {
>
> Can concurrent calls to sframe_remove_section() cause a use-after-free and
> subsequent double free?
>
> mt_for_each() locklessly iterates mm->sframe_mt, and internally acquires and
> drops the RCU read lock, meaning the returned sec pointer has no lifetime
> protection in the loop body.
>
> If two threads concurrently invoke sframe_remove_section(), both could
> receive the exact same sec pointer from the tree. Thread A could then call
> __sframe_remove_section(), erasing the entry and freeing sec via
> free_section(). Thread B would then evaluate the if statement using the
> freed sec pointer, causing a use-after-free read, and potentially proceeding
> to free it again.
Please advise.
> Additionally, does free_section() need to use kfree_rcu() instead of
> kfree() to prevent lockless NMI unwinders from hitting a use-after-free
> when reading payloads?
Please advise.
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation
2026-05-05 12:17 ` [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
2026-05-05 18:22 ` sashiko-bot
@ 2026-05-06 14:09 ` Jens Remus
2026-05-06 15:03 ` Steven Rostedt
2026-05-06 21:13 ` David Laight
1 sibling, 2 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-06 14:09 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf, Dave Hansen, H. Peter Anvin
Cc: linux-kernel, linux-trace-kernel, x86, bpf, linux-mm,
Namhyung Kim, Andrii Nakryiko, Jose E. Marchesi, Beau Belgrave,
Florian Weimer, Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google), Indu Bhagat,
Peter Zijlstra, Dylan Hatch, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Mathieu Desnoyers, Kees Cook, Sam James
On 5/5/2026 2:17 PM, Jens Remus wrote:
> From: Josh Poimboeuf <jpoimboe@kernel.org>
>
> Add an x86 implementation of unsafe_copy_from_user() similar to the
> existing unsafe_copy_to_user().
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> -#define unsafe_copy_to_user(_dst,_src,_len,label) \
> -do { \
> - char __user *__ucu_dst = (_dst); \
> - const char *__ucu_src = (_src); \
> - size_t __ucu_len = (_len); \
> - unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label); \
> - unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label); \
> - unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label); \
> - unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label); \
> +#define unsafe_copy_to_user(_dst, _src, _len, label) \
> +do { \
> + void __user *__dst = (_dst); \
> + const void *__src = (_src); \
> + size_t __len = (_len); \
> + unsafe_copy_to_user_loop(__dst, __src, __len, u64, label); \
> + unsafe_copy_to_user_loop(__dst, __src, __len, u32, label); \
> + unsafe_copy_to_user_loop(__dst, __src, __len, u16, label); \
> + unsafe_copy_to_user_loop(__dst, __src, __len, u8, label); \
> +} while (0)
> +
> +#define unsafe_copy_from_user_loop(dst, src, len, type, label) \
> + while (len >= sizeof(type)) { \
> + unsafe_get_user(*(type *)(dst), (type __user *)(src), label); \
> + dst += sizeof(type); \
> + src += sizeof(type); \
> + len -= sizeof(type); \
> + }
> +
> +#define unsafe_copy_from_user(_dst, _src, _len, label) \
> +do { \
> + void *__dst = (_dst); \
> + void __user *__src = (_src); \
const void __user *__src = (_src); \
This was suggested by Sashiko AI review. Any objections?
> + size_t __len = (_len); \
> + unsafe_copy_from_user_loop(__dst, __src, __len, u64, label); \
> + unsafe_copy_from_user_loop(__dst, __src, __len, u32, label); \
> + unsafe_copy_from_user_loop(__dst, __src, __len, u16, label); \
> + unsafe_copy_from_user_loop(__dst, __src, __len, u8, label); \
> } while (0)
>
> #ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation
2026-05-05 18:22 ` sashiko-bot
@ 2026-05-06 14:13 ` Jens Remus
2026-05-06 15:05 ` Steven Rostedt
0 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-06 14:13 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko, Indu Bhagat
On 5/5/2026 8:22 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [High] Variable shadowing hazard in unsafe_copy_to_user and unsafe_copy_from_user macros.
> - [Low] Missing const qualifier on the source pointer in unsafe_copy_from_user.
> --
>
> Note: The format of this report is altered due to recitation restrictions.
> The original patch code is not quoted directly, and a free-form summary
> of the review findings is provided instead.
>
> commit b652061f302d029392e67f46699e5a2db0abfaca
> Author: Josh Poimboeuf <jpoimboe@kernel.org>
>
> x86/uaccess: Add unsafe_copy_from_user() implementation
>
> This commit introduces the unsafe_copy_from_user() macro for x86 architectures.
> It also refactors the existing unsafe_copy_to_user() macro to use updated
> internal variable names.
>
> Regarding the variable naming in the updated unsafe_copy_to_user() and the
> new unsafe_copy_from_user() macros, is there a risk of variable shadowing?
>
> The macros declare local variables with very common names, specifically __dst,
> __src, and __len. If a caller passes an argument that happens to share one of
> these names, the local variable declaration will shadow the caller's argument.
> For instance, if a caller passes __len, the initialization size_t __len =
> (_len); will evaluate the right-hand side as the newly declared uninitialized
> local variable itself. Could this lead to the loop copying a random length of
> memory based on stack garbage instead of the intended length? The previous
> implementation seemed to avoid this by using uniquely prefixed names like
> __ucu_len.
Please advise.
> Additionally, looking at the unsafe_copy_from_user() macro, the user-space
> source pointer is initialized as a void pointer without a const qualifier.
>
> Since memory read operations typically expect the source buffer to be
> read-only, callers might pass a const void pointer. Does this implicit
> dropping of the const qualifier trigger a discarded-qualifiers compiler
> warning? Could the source pointer be declared as a const void pointer
> to prevent potential build failures when warnings are treated as errors?
Makes sense.
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents
2026-05-05 18:59 ` sashiko-bot
@ 2026-05-06 14:34 ` Jens Remus
2026-05-06 15:01 ` Steven Rostedt
2026-05-08 23:03 ` Indu Bhagat
0 siblings, 2 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-06 14:34 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf, Indu Bhagat; +Cc: bpf, sashiko
On 5/5/2026 8:59 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
> - [Critical] Division by zero in `__find_fre` when `fde->rep_size` is 0.
> - [High] `mtree_insert_range` is called with an exclusive end boundary, preventing contiguous SFrame sections.
> - [High] Unbounded linear search over user-controlled memory in `__find_fre` can cause hard lockups in atomic context.
> - [High] Integer overflows in SFrame header parsing allow bypass of bounds checks on 32-bit architectures.
> - [Medium] Generic SFrame unwinding code uses `unsafe_get_user()` for unaligned accesses, breaking strict-alignment architectures.
> --
>
> commit 159c92acf3c961aecc6383f6c3f35a955591abf3
> Author: Josh Poimboeuf <jpoimboe@kernel.org>
>
> unwind_user/sframe: Add support for reading .sframe contents
>
> This commit adds support for reading .sframe section contents to facilitate
> user space stack unwinding. It introduces the sframe_find() interface which
> uses user_read_access_begin() and unsafe_*() accessors for performance.
> Support is included for SFrame V3, internal FDE representations, and
> PC-relative offsets.
>
> Link: https://lore.kernel.org/all/77c0d1ec143bf2a53d66c4ecb190e7e0a576fbfd.1737511963.git.jpoimboe@kernel.org/
> Link: https://lore.kernel.org/all/b35ca3a3-8de5-4d32-8d30-d4e562f6b0de@linux.ibm.com/
>
>> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>> index fdb2a49085638..c60aa527984a1 100644
>> --- a/kernel/unwind/sframe.c
>> +++ b/kernel/unwind/sframe.c
>> @@ -15,9 +15,338 @@
>> #include <linux/unwind_user_types.h>
>>
>> #include "sframe.h"
>> +#include "sframe_debug.h"
>
> [ ... ]
>
>> +static __always_inline int __read_fre(struct sframe_section *sec,
>> + struct sframe_fde_internal *fde,
>> + unsigned long fre_addr,
>> + struct sframe_fre_internal *fre)
>> +{
> [ ... ]
>> + if (fre_addr + addr_size + 1 > sec->fres_end)
>> + return -EFAULT;
>> +
>> + UNSAFE_GET_USER_INC(ip_off, cur, addr_size, Efault);
>
> Will this cause alignment faults on architectures with strict alignment
> requirements?
>
> The .sframe format uses packed structures and variable-length datawords. The
> cur pointer might be unaligned here, and UNSAFE_GET_USER_INC() performs
> 16-bit or 32-bit reads via unsafe_get_user().
IIUC this should not be an issue for x86-64, s390, and arm64.
> [ ... ]
>
>> +static __always_inline int __find_fre(struct sframe_section *sec,
>> + struct sframe_fde_internal *fde,
>> + unsigned long ip,
>> + struct unwind_user_frame *frame)
>> +{
>> + unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(fde->info);
>> + struct sframe_fre_internal *fre, *prev_fre = NULL;
>> + struct sframe_fre_internal fres[2];
>> + unsigned long fre_addr;
>> + bool which = false;
>> + unsigned int i;
>> + u32 ip_off;
>> +
>> + ip_off = ip - fde->func_addr;
>> +
>> + if (fde_pctype == SFRAME_FDE_PCTYPE_MASK)
>> + ip_off %= fde->rep_size;
>
> Can a malicious user provide an SFrame section with a rep_size of 0?
>
> Since fde->rep_size is populated directly from the user-provided
> _fda.rep_size without validation, a 0 value here would trigger a
> divide-by-zero exception and a kernel panic.
Good catch! I would add a check in __read_fde():
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
@@ -67,6 +67,7 @@ static __always_inline int __read_fde(struct sframe_section *sec,
unsigned long fde_addr, fda_addr, func_addr;
struct sframe_fde_v3 _fde;
struct sframe_fda_v3 _fda;
+ unsigned char fde_pctype;
fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde_v3));
unsafe_copy_from_user(&_fde, (void __user *)fde_addr,
@@ -82,6 +83,10 @@ static __always_inline int __read_fde(struct sframe_section *sec,
unsafe_copy_from_user(&_fda, (void __user *)fda_addr,
sizeof(struct sframe_fda_v3), Efault);
+ fde_pctype = SFRAME_V3_FDE_PCTYPE(_fda.info);
+ if (fde_pctype == SFRAME_FDE_PCTYPE_MASK && !_fda.rep_size)
+ return -EINVAL;
+
fde->func_addr = func_addr;
fde->func_size = _fde.func_size;
fde->fda_off = _fde.fres_off;
>
>> +
>> + fre_addr = sec->fres_start + fde->fres_off;
>> +
>> + for (i = 0; i < fde->fres_num; i++) {
>
> Can this loop cause a hard lockup in atomic context?
>
> fde->fres_num is a 32-bit value copied from user space without validation.
> Since sframe_find() is designed to be called by unwinders in NMI context, an
> attacker could provide a very large number of valid entries. Executing
> billions of iterations and unsafe_get_user() calls could stall the CPU
> and trigger the hard lockup detector.
Please advise.
>
>> + int ret;
>> +
>
> [ ... ]
>
>> @@ -120,8 +449,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>> sec->text_end = text_end;
>>
>> ret = sframe_read_header(sec);
>> - if (ret)
>> + if (ret) {
>> + dbg_print_header(sec);
>> goto err_free;
>> + }
>
> Can shdr.fre_len cause an integer overflow on 32-bit architectures during
> header parsing?
>
> If a malicious user provides a large fre_len in the header, fres_end
> (calculated as fres_start + shdr.fre_len) could wrap around the 32-bit
> address space. This would bypass the bounds check in sframe_read_header(),
> allowing fres_start and fdes_start to point into kernel memory. Later, when
> __read_fde() and __find_fre() use unsafe_get_user(), this could lead to
> arbitrary kernel memory disclosure.
SFrame is currently only supported on 64-bit architectures (i.e. x86-64,
arm64, s390 64-bit). So unsigned long fres_end should always be 64-bit.
Do we need to add the following to the header parsing?
if (fdes_start >= fdes_end || fres_start >= fres_end) {
dbg_sec("inconsistent FDE/FRE start/end address\n");
return -EINVAL;
}
>
>>
>> ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
>
> Does passing sec->text_end directly as the last parameter to
> mtree_insert_range() break contiguous mappings?
>
> mtree_insert_range() expects the last boundary to be inclusive, but
> sec->text_end represents the exclusive end address of the executable segment.
> If user space maps seamlessly contiguous text segments, the insertion for the
> second segment might overlap with the claimed end of the first, causing it to
> fail with -EEXIST.
Addressed in previous patch.
>
>> if (ret) {
>> dbg("mtree_insert_range failed: text=%lx-%lx\n",
>
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 06/19] unwind_user/sframe: Detect .sframe sections in executables
2026-05-05 12:53 ` sashiko-bot
@ 2026-05-06 14:56 ` Jens Remus
2026-05-06 15:36 ` Steven Rostedt
0 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-06 14:56 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf, Indu Bhagat; +Cc: bpf, sashiko
On 5/5/2026 2:53 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [High] Missing `PT_LOAD` segment type check when registering executable sections for sframe unwinding.
> - [High] Off-by-one error in maple tree insertion bounds causes adjacent executable segments to fail registration.
> --
>
> commit 610ebc2ab2a7fd5ae4cd4a346076c9d5c2acc195
> Author: Josh Poimboeuf <jpoimboe@kernel.org>
>
> unwind_user/sframe: Detect .sframe sections in executables
>
> When loading an ELF executable, automatically detect an .sframe section
> and associate it with the mm_struct.
>
>> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>> index 16a56b6b3f6ca..55047659a3cf4 100644
>> --- a/fs/binfmt_elf.c
>> +++ b/fs/binfmt_elf.c
> [ ... ]
>> @@ -637,6 +638,21 @@ static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state,
>> return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp);
>> }
>>
>> +static void elf_add_sframe(struct elf_phdr *text, struct elf_phdr *sframe,
>> + unsigned long base_addr)
>> +{
>> + unsigned long sframe_start, sframe_end, text_start, text_end;
>> +
>> + sframe_start = base_addr + sframe->p_vaddr;
>> + sframe_end = sframe_start + sframe->p_memsz;
>> +
>> + text_start = base_addr + text->p_vaddr;
>> + text_end = text_start + text->p_memsz;
>
> Should text_end be calculated as text_start + text->p_memsz - 1?
>
> Since the maple tree API strictly expects inclusive ranges [first, last] for
> insertions, passing the exclusive text_end would cause the inserted range
> to cover one byte beyond the actual segment.
Shall we keep the text_end and sframe_end is pointing past the
respective section end semantic and just adjust the value passed on the
call to mtree_insert_range()?
>
> If an ELF binary has adjacent executable PT_LOAD segments, could the first
> segment's inclusive end overlap with the next segment's start and fail
> subsequent insertions?
>
>> +
>> + /* Ignore return value, sframe section isn't critical */
>> + sframe_add_section(sframe_start, sframe_end, text_start, text_end);
>> +}
>> +
> [ ... ]
>> @@ -712,6 +729,19 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
>> error = -ENOMEM;
>> goto out;
>> }
>> + break;
>> + }
>> + case PT_GNU_SFRAME:
>> + sframe_phdr = eppnt;
>> + break;
>> + }
>> + }
>> +
>> + if (sframe_phdr) {
>> + eppnt = interp_elf_phdata;
>> + for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
>> + if (eppnt->p_flags & PF_X)
if (eppnt->p_flags & PF_X && eppnt->p_type == PT_LOAD)
>> + elf_add_sframe(eppnt, sframe_phdr, load_addr);
>
> Does this need to verify the segment is loadable (eppnt->p_type == PT_LOAD)?
>
> If non-loadable segments like PT_GNU_STACK have the PF_X flag set, they
> typically have p_vaddr and p_memsz as 0. Processing these segments would
> compute bogus text boundaries and bypass VMA validation. If a legitimate
> PT_LOAD segment overlaps with these bogus ranges, could its insertion fail
> silently and break stack unwinding for the executable?
Do you agree?
>
>> }
>> }
>>
> [ ... ]
>> @@ -1242,6 +1276,14 @@ static int load_elf_binary(struct linux_binprm *bprm)
>> elf_brk = k;
>> }
>>
>> + if (sframe_phdr) {
>> + for (i = 0, elf_ppnt = elf_phdata;
>> + i < elf_ex->e_phnum; i++, elf_ppnt++) {
>> + if ((elf_ppnt->p_flags & PF_X))
if (elf_ppnt->p_flags & PF_X && elf_ppnt->p_type == PT_LOAD)
>> + elf_add_sframe(elf_ppnt, sframe_phdr, load_bias);
>
> Similarly, should this also check if the segment is a PT_LOAD segment before
> adding it to the sframe sections?
Likewise.
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 08/19] unwind_user: Stop when reaching an outermost frame
2026-05-05 12:40 ` sashiko-bot
@ 2026-05-06 15:01 ` Jens Remus
2026-05-06 15:40 ` Steven Rostedt
0 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-06 15:01 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko, Indu Bhagat
On 5/5/2026 2:40 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Medium] The newly added `outermost` field in `struct unwind_user_frame` is read uninitialized during SFrame unwinding, which can randomly truncate stack traces.
> --
>
> commit 84ee374fd6a62072e8d61545c4ceae42822e4e37
> Author: Jens Remus <jremus@linux.ibm.com>
>
> unwind_user: Stop when reaching an outermost frame
>
> This commit adds an outermost frame indicator to the unwind user frame
> structure, and modifies the unwinder to stop when it is reached.
>
>> diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
>> index 1fb2724197332..fdb1001e37505 100644
>> --- a/kernel/unwind/user.c
>> +++ b/kernel/unwind/user.c
>> @@ -32,6 +32,12 @@ static int unwind_user_next_common(struct unwind_user_state *state,
>> {
>> unsigned long cfa, fp, ra;
>>
>> + /* Stop unwinding when reaching an outermost frame. */
>> + if (frame->outermost) {
>
> Could this evaluate uninitialized stack memory during SFrame unwinding?
>
> In kernel/unwind/user.c, unwind_user_next_sframe() declares the frame
> variable on the stack without zero-initializing it:
>
> unwind_user_next_sframe() {
> struct unwind_user_frame frame;
Should unwind_user_next_sframe() zero-initialize frame to resolve below?
> ...
> err = sframe_find(..., &frame);
> ...
> return unwind_user_next_common(state, &frame);
> }
>
> Since sframe_find() does not populate the newly added outermost field at this
> point in the patch series, frame->outermost contains uninitialized memory
> when checked here.
>
> If this evaluates to true, does it erroneously set state->done and silently
> truncate stack traces?
>
> I note that this is fixed later in the series by the commit
> "unwind_user/sframe: Add support for outermost frame indication", but
> introducing the uninitialized read in this commit could break kernel
> bisectability.
Or should I merge this and the following commit into one? Or disregard?
>
>> + state->done = true;
>> + return 0;
>> + }
>> +
>> /* Get the Canonical Frame Address (CFA) */
>
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents
2026-05-06 14:34 ` Jens Remus
@ 2026-05-06 15:01 ` Steven Rostedt
2026-05-06 15:29 ` Jens Remus
2026-05-08 9:49 ` Jens Remus
2026-05-08 23:03 ` Indu Bhagat
1 sibling, 2 replies; 67+ messages in thread
From: Steven Rostedt @ 2026-05-06 15:01 UTC (permalink / raw)
To: Jens Remus; +Cc: Josh Poimboeuf, Indu Bhagat, bpf, sashiko
On Wed, 6 May 2026 16:34:34 +0200
Jens Remus <jremus@linux.ibm.com> wrote:
> >> +static __always_inline int __read_fre(struct sframe_section *sec,
> >> + struct sframe_fde_internal *fde,
> >> + unsigned long fre_addr,
> >> + struct sframe_fre_internal *fre)
> >> +{
> > [ ... ]
> >> + if (fre_addr + addr_size + 1 > sec->fres_end)
> >> + return -EFAULT;
> >> +
> >> + UNSAFE_GET_USER_INC(ip_off, cur, addr_size, Efault);
> >
> > Will this cause alignment faults on architectures with strict alignment
> > requirements?
> >
> > The .sframe format uses packed structures and variable-length datawords. The
> > cur pointer might be unaligned here, and UNSAFE_GET_USER_INC() performs
> > 16-bit or 32-bit reads via unsafe_get_user().
>
> IIUC this should not be an issue for x86-64, s390, and arm64.
Do we have a way to make sure that sframe support will always be for
architectures that can handle alignment issues like this? There should
be something to force this via configs or something that will trigger a
warning or bug if this is built for architectures that can't handle
this alignment.
> >
> >> +
> >> + fre_addr = sec->fres_start + fde->fres_off;
> >> +
> >> + for (i = 0; i < fde->fres_num; i++) {
> >
> > Can this loop cause a hard lockup in atomic context?
> >
> > fde->fres_num is a 32-bit value copied from user space without validation.
> > Since sframe_find() is designed to be called by unwinders in NMI context, an
What? No. This looks to be a hallucination. sframe_find() will never be
called in NMI context. In fact, it can only be called in task context.
> > attacker could provide a very large number of valid entries. Executing
> > billions of iterations and unsafe_get_user() calls could stall the CPU
> > and trigger the hard lockup detector.
>
> Please advise.
That said, we should verify that fde->fres_num is at least always
smaller than the size of the table.
>
> >
> >> + int ret;
> >> +
> >
> > [ ... ]
> >
> >> @@ -120,8 +449,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> >> sec->text_end = text_end;
> >>
> >> ret = sframe_read_header(sec);
> >> - if (ret)
> >> + if (ret) {
> >> + dbg_print_header(sec);
> >> goto err_free;
> >> + }
> >
> > Can shdr.fre_len cause an integer overflow on 32-bit architectures during
> > header parsing?
> >
> > If a malicious user provides a large fre_len in the header, fres_end
> > (calculated as fres_start + shdr.fre_len) could wrap around the 32-bit
> > address space. This would bypass the bounds check in sframe_read_header(),
> > allowing fres_start and fdes_start to point into kernel memory. Later, when
> > __read_fde() and __find_fre() use unsafe_get_user(), this could lead to
> > arbitrary kernel memory disclosure.
>
> SFrame is currently only supported on 64-bit architectures (i.e. x86-64,
> arm64, s390 64-bit). So unsigned long fres_end should always be 64-bit.
> Do we need to add the following to the header parsing?
>
> if (fdes_start >= fdes_end || fres_start >= fres_end) {
> dbg_sec("inconsistent FDE/FRE start/end address\n");
> return -EINVAL;
> }
I guess this wouldn't hurt.
>
> >
> >>
> >> ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
> >
> > Does passing sec->text_end directly as the last parameter to
> > mtree_insert_range() break contiguous mappings?
> >
> > mtree_insert_range() expects the last boundary to be inclusive, but
> > sec->text_end represents the exclusive end address of the executable segment.
> > If user space maps seamlessly contiguous text segments, the insertion for the
> > second segment might overlap with the claimed end of the first, causing it to
> > fail with -EEXIST.
>
> Addressed in previous patch.
And I just sent a patch to fix the documentation of
mtree_insert_range() to update the kerneldoc to explicitly state it is
inclusive :-p
-- Steve
>
> >
> >> if (ret) {
> >> dbg("mtree_insert_range failed: text=%lx-%lx\n",
> >
>
> Thanks and regards,
> Jens
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation
2026-05-06 14:09 ` Jens Remus
@ 2026-05-06 15:03 ` Steven Rostedt
2026-05-06 21:13 ` David Laight
1 sibling, 0 replies; 67+ messages in thread
From: Steven Rostedt @ 2026-05-06 15:03 UTC (permalink / raw)
To: Jens Remus
Cc: Josh Poimboeuf, Dave Hansen, H. Peter Anvin, linux-kernel,
linux-trace-kernel, x86, bpf, linux-mm, Namhyung Kim,
Andrii Nakryiko, Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google), Indu Bhagat,
Peter Zijlstra, Dylan Hatch, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Mathieu Desnoyers, Kees Cook, Sam James
On Wed, 6 May 2026 16:09:16 +0200
Jens Remus <jremus@linux.ibm.com> wrote:
> > +#define unsafe_copy_from_user(_dst, _src, _len, label) \
> > +do { \
> > + void *__dst = (_dst); \
> > + void __user *__src = (_src); \
>
> const void __user *__src = (_src); \
>
> This was suggested by Sashiko AI review. Any objections?
I have no problems with that.
-- Steve
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation
2026-05-06 14:13 ` Jens Remus
@ 2026-05-06 15:05 ` Steven Rostedt
0 siblings, 0 replies; 67+ messages in thread
From: Steven Rostedt @ 2026-05-06 15:05 UTC (permalink / raw)
To: Jens Remus; +Cc: Josh Poimboeuf, bpf, sashiko, Indu Bhagat
On Wed, 6 May 2026 16:13:25 +0200
Jens Remus <jremus@linux.ibm.com> wrote:
> > Regarding the variable naming in the updated unsafe_copy_to_user() and the
> > new unsafe_copy_from_user() macros, is there a risk of variable shadowing?
> >
> > The macros declare local variables with very common names, specifically __dst,
> > __src, and __len. If a caller passes an argument that happens to share one of
> > these names, the local variable declaration will shadow the caller's argument.
> > For instance, if a caller passes __len, the initialization size_t __len =
> > (_len); will evaluate the right-hand side as the newly declared uninitialized
> > local variable itself. Could this lead to the loop copying a random length of
> > memory based on stack garbage instead of the intended length? The previous
> > implementation seemed to avoid this by using uniquely prefixed names like
> > __ucu_len.
>
> Please advise.
There is an effort to get rid of common variable names in macros that
start with underscores. But honestly, that can be dealt with later, as
it's a problem with other macros in this header.
Feel free to ignore this suggestion.
-- Steve
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree
2026-05-06 13:50 ` Jens Remus
@ 2026-05-06 15:21 ` Steven Rostedt
0 siblings, 0 replies; 67+ messages in thread
From: Steven Rostedt @ 2026-05-06 15:21 UTC (permalink / raw)
To: Jens Remus
Cc: Josh Poimboeuf, bpf, sashiko, Indu Bhagat, Chris Mason,
Roman Gushchin
On Wed, 6 May 2026 15:50:45 +0200
Jens Remus <jremus@linux.ibm.com> wrote:
> >
> > mt_init() initializes the maple tree with flags set to 0. Since stack
> > unwinding typically occurs in profiling interrupts (NMIs) which must
> > traverse the maple tree locklessly via RCU, does this missing
> > MT_FLAGS_USE_RCU flag allow concurrent modifications to perform in-place
> > updates instead of RCU-safe copy-on-write allocations? This could result in
> > the NMI unwinder observing corrupted tree states.
>
> Please advise.
Sashiko is confused thinking that unwinders happen in NMI context. They
used to, but this will always be done from task context. Hopefully
Sashiko will figure this out soon (Cc'ing Chris and Roman about it).
>
> > [ ... ]
> >> @@ -1100,6 +1109,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
> >> mm->pmd_huge_pte = NULL;
> >> #endif
> >> mm_init_uprobes_state(mm);
> >> + mm_init_sframe(mm);
> >> hugetlb_count_init(mm);
> >>
> >> mm_flags_clear_all(mm);
> >
> > When a process forks, this initializes an empty sframe_mt for the child's
> > mm_struct. While dup_mmap() duplicates the parent's VMAs, it doesn't appear
> > that the registered .sframe entries are propagated to the child's maple tree.
> > Will child processes silently lose their .sframe unwind data for shared
> > libraries registered prior to the fork?
>
> Does this ring any bell, Steven? I ran into it with my signal frame
> stacktracing test on s390. I think we discussed this during one of the
> SFrame calls and it got lost.
Oh, this is a good point. I think we need to copy the sframe mt on
fork, not initialize it.
> >> +static int __sframe_remove_section(struct mm_struct *mm,
> >> + struct sframe_section *sec)
> >> +{
> >> + if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> >> + dbg("mtree_erase failed: text=%lx\n", sec->text_start);
> >> + return -EINVAL;
> >> + }
> >> +
> >> + free_section(sec);
> >> +
> >> + return 0;
> >> +}
> >> +
> >> int sframe_remove_section(unsigned long sframe_start)
> >> {
> >> - return -ENOSYS;
> >> + struct mm_struct *mm = current->mm;
> >> + struct sframe_section *sec;
> >> + unsigned long index = 0;
> >> + bool found = false;
> >> + int ret = 0;
> >> +
> >> + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
> >> + if (sec->sframe_start == sframe_start) {
> >
> > Can concurrent calls to sframe_remove_section() cause a use-after-free and
> > subsequent double free?
> >
> > mt_for_each() locklessly iterates mm->sframe_mt, and internally acquires and
> > drops the RCU read lock, meaning the returned sec pointer has no lifetime
> > protection in the loop body.
> >
> > If two threads concurrently invoke sframe_remove_section(), both could
> > receive the exact same sec pointer from the tree. Thread A could then call
> > __sframe_remove_section(), erasing the entry and freeing sec via
> > free_section(). Thread B would then evaluate the if statement using the
> > freed sec pointer, causing a use-after-free read, and potentially proceeding
> > to free it again.
>
> Please advise.
I guess it's asking if we should have a read_srcu_lock()?
>
> > Additionally, does free_section() need to use kfree_rcu() instead of
> > kfree() to prevent lockless NMI unwinders from hitting a use-after-free
> > when reading payloads?
Again, this is Sashiko's confusion with thinking this can be called by
NMI (it can't!)
-- Steve
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents
2026-05-06 15:01 ` Steven Rostedt
@ 2026-05-06 15:29 ` Jens Remus
2026-05-08 9:49 ` Jens Remus
1 sibling, 0 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-06 15:29 UTC (permalink / raw)
To: Steven Rostedt; +Cc: Josh Poimboeuf, Indu Bhagat, bpf, sashiko
On 5/6/2026 5:01 PM, Steven Rostedt wrote:
> On Wed, 6 May 2026 16:34:34 +0200
> Jens Remus <jremus@linux.ibm.com> wrote:
>
>>>> +static __always_inline int __read_fre(struct sframe_section *sec,
>>>> + struct sframe_fde_internal *fde,
>>>> + unsigned long fre_addr,
>>>> + struct sframe_fre_internal *fre)
>>>> +{
>>> [ ... ]
>>>> + if (fre_addr + addr_size + 1 > sec->fres_end)
>>>> + return -EFAULT;
>>>> +
>>>> + UNSAFE_GET_USER_INC(ip_off, cur, addr_size, Efault);
>>>
>>> Will this cause alignment faults on architectures with strict alignment
>>> requirements?
>>>
>>> The .sframe format uses packed structures and variable-length datawords. The
>>> cur pointer might be unaligned here, and UNSAFE_GET_USER_INC() performs
>>> 16-bit or 32-bit reads via unsafe_get_user().
>>
>> IIUC this should not be an issue for x86-64, s390, and arm64.
>
> Do we have a way to make sure that sframe support will always be for
> architectures that can handle alignment issues like this? There should
> be something to force this via configs or something that will trigger a
> warning or bug if this is built for architectures that can't handle
> this alignment.
Any suggestions are very welcome.
>>>> +
>>>> + fre_addr = sec->fres_start + fde->fres_off;
>>>> +
>>>> + for (i = 0; i < fde->fres_num; i++) {
>>>
>>> Can this loop cause a hard lockup in atomic context?
>>>
>>> fde->fres_num is a 32-bit value copied from user space without validation.
>>> Since sframe_find() is designed to be called by unwinders in NMI context, an
>
> What? No. This looks to be a hallucination. sframe_find() will never be
> called in NMI context. In fact, it can only be called in task context.
>
>>> attacker could provide a very large number of valid entries. Executing
>>> billions of iterations and unsafe_get_user() calls could stall the CPU
>>> and trigger the hard lockup detector.
>>
>> Please advise.
>
> That said, we should verify that fde->fres_num is at least always
> smaller than the size of the table.
If we carry the .sframe header sfh->num_fres over to a new sec->num_fres
then __read_fde() could check:
if (_fda.fres_num > sec->num_fres)
return -EINVAL;
But I am not sure if that check provides much benefit, as fde->fres_num
is relative to fde->fres_off, which points into the
[sec->fres_start, sec->fres_end[ part of the .sframe section.
So even a small fde->fres_num might be out of range. Given the FRE are
variable size, it is not easily possible to check upfront in
__read_fde() whether the fde->fres_num is valid.
The validity is checked later in __read_fre() before any fields are read
from the FRE, with fre_addr = sec->fres_start + fde->fres_off being
passed as argument:
if (fre_addr + addr_size + 1 > sec->fres_end)
return -EFAULT;
[...]
if (cur + (dataword_count * dataword_size) > sec->fres_end)
return -EFAULT;
>
>>
>>>
>>>> + int ret;
>>>> +
>>>
>>> [ ... ]
>>>
>>>> @@ -120,8 +449,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>>>> sec->text_end = text_end;
>>>>
>>>> ret = sframe_read_header(sec);
>>>> - if (ret)
>>>> + if (ret) {
>>>> + dbg_print_header(sec);
>>>> goto err_free;
>>>> + }
>>>
>>> Can shdr.fre_len cause an integer overflow on 32-bit architectures during
>>> header parsing?
>>>
>>> If a malicious user provides a large fre_len in the header, fres_end
>>> (calculated as fres_start + shdr.fre_len) could wrap around the 32-bit
>>> address space. This would bypass the bounds check in sframe_read_header(),
>>> allowing fres_start and fdes_start to point into kernel memory. Later, when
>>> __read_fde() and __find_fre() use unsafe_get_user(), this could lead to
>>> arbitrary kernel memory disclosure.
>>
>> SFrame is currently only supported on 64-bit architectures (i.e. x86-64,
>> arm64, s390 64-bit). So unsigned long fres_end should always be 64-bit.
>> Do we need to add the following to the header parsing?
>>
>> if (fdes_start >= fdes_end || fres_start >= fres_end) {
>> dbg_sec("inconsistent FDE/FRE start/end address\n");
>> return -EINVAL;
>> }
>
> I guess this wouldn't hurt.
Ok.
>>>>
>>>> ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
>>>
>>> Does passing sec->text_end directly as the last parameter to
>>> mtree_insert_range() break contiguous mappings?
>>>
>>> mtree_insert_range() expects the last boundary to be inclusive, but
>>> sec->text_end represents the exclusive end address of the executable segment.
>>> If user space maps seamlessly contiguous text segments, the insertion for the
>>> second segment might overlap with the claimed end of the first, causing it to
>>> fail with -EEXIST.
>>
>> Addressed in previous patch.
>
> And I just sent a patch to fix the documentation of
> mtree_insert_range() to update the kerneldoc to explicitly state it is
> inclusive :-p
Thanks!
>>
>>>
>>>> if (ret) {
>>>> dbg("mtree_insert_range failed: text=%lx-%lx\n",
>>>
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 06/19] unwind_user/sframe: Detect .sframe sections in executables
2026-05-06 14:56 ` Jens Remus
@ 2026-05-06 15:36 ` Steven Rostedt
2026-05-08 23:05 ` Indu Bhagat
0 siblings, 1 reply; 67+ messages in thread
From: Steven Rostedt @ 2026-05-06 15:36 UTC (permalink / raw)
To: Jens Remus; +Cc: Josh Poimboeuf, Indu Bhagat, bpf, sashiko, Jose E. Marchesi
On Wed, 6 May 2026 16:56:01 +0200
Jens Remus <jremus@linux.ibm.com> wrote:
> >> --- a/fs/binfmt_elf.c
> >> +++ b/fs/binfmt_elf.c
> > [ ... ]
> >> @@ -637,6 +638,21 @@ static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state,
> >> return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp);
> >> }
> >>
> >> +static void elf_add_sframe(struct elf_phdr *text, struct elf_phdr *sframe,
> >> + unsigned long base_addr)
> >> +{
> >> + unsigned long sframe_start, sframe_end, text_start, text_end;
> >> +
> >> + sframe_start = base_addr + sframe->p_vaddr;
> >> + sframe_end = sframe_start + sframe->p_memsz;
> >> +
> >> + text_start = base_addr + text->p_vaddr;
> >> + text_end = text_start + text->p_memsz;
> >
> > Should text_end be calculated as text_start + text->p_memsz - 1?
> >
> > Since the maple tree API strictly expects inclusive ranges [first, last] for
> > insertions, passing the exclusive text_end would cause the inserted range
> > to cover one byte beyond the actual segment.
>
> Shall we keep the text_end and sframe_end is pointing past the
> respective section end semantic and just adjust the value passed on the
> call to mtree_insert_range()?
Let's keep the end exclusive, and add the "- 1" to text_end before
passing it to the mtree_insert_range(). I did that to get it working for
me.
>
> >
> > If an ELF binary has adjacent executable PT_LOAD segments, could the first
> > segment's inclusive end overlap with the next segment's start and fail
> > subsequent insertions?
> >
> >> +
> >> + /* Ignore return value, sframe section isn't critical */
> >> + sframe_add_section(sframe_start, sframe_end, text_start, text_end);
> >> +}
> >> +
> > [ ... ]
> >> @@ -712,6 +729,19 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
> >> error = -ENOMEM;
> >> goto out;
> >> }
> >> + break;
> >> + }
> >> + case PT_GNU_SFRAME:
> >> + sframe_phdr = eppnt;
> >> + break;
> >> + }
> >> + }
> >> +
> >> + if (sframe_phdr) {
> >> + eppnt = interp_elf_phdata;
> >> + for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
> >> + if (eppnt->p_flags & PF_X)
>
> if (eppnt->p_flags & PF_X && eppnt->p_type == PT_LOAD)
>
> >> + elf_add_sframe(eppnt, sframe_phdr, load_addr);
> >
> > Does this need to verify the segment is loadable (eppnt->p_type == PT_LOAD)?
> >
> > If non-loadable segments like PT_GNU_STACK have the PF_X flag set, they
> > typically have p_vaddr and p_memsz as 0. Processing these segments would
> > compute bogus text boundaries and bypass VMA validation. If a legitimate
> > PT_LOAD segment overlaps with these bogus ranges, could its insertion fail
> > silently and break stack unwinding for the executable?
>
> Do you agree?
Indu or Jose?
-- Steve
>
> >
> >> }
> >> }
> >>
> > [ ... ]
> >> @@ -1242,6 +1276,14 @@ static int load_elf_binary(struct linux_binprm *bprm)
> >> elf_brk = k;
> >> }
> >>
> >> + if (sframe_phdr) {
> >> + for (i = 0, elf_ppnt = elf_phdata;
> >> + i < elf_ex->e_phnum; i++, elf_ppnt++) {
> >> + if ((elf_ppnt->p_flags & PF_X))
>
> if (elf_ppnt->p_flags & PF_X && elf_ppnt->p_type == PT_LOAD)
>
> >> + elf_add_sframe(elf_ppnt, sframe_phdr, load_bias);
> >
> > Similarly, should this also check if the segment is a PT_LOAD segment before
> > adding it to the sframe sections?
>
> Likewise.
>
> Regards,
> Jens
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 08/19] unwind_user: Stop when reaching an outermost frame
2026-05-06 15:01 ` Jens Remus
@ 2026-05-06 15:40 ` Steven Rostedt
0 siblings, 0 replies; 67+ messages in thread
From: Steven Rostedt @ 2026-05-06 15:40 UTC (permalink / raw)
To: Jens Remus; +Cc: Josh Poimboeuf, bpf, sashiko, Indu Bhagat
On Wed, 6 May 2026 17:01:17 +0200
Jens Remus <jremus@linux.ibm.com> wrote:
> > I note that this is fixed later in the series by the commit
> > "unwind_user/sframe: Add support for outermost frame indication", but
> > introducing the uninitialized read in this commit could break kernel
> > bisectability.
>
> Or should I merge this and the following commit into one? Or disregard?
Sframe isn't even able to be enabled yet in the series here, thus
bisection should not be a problem. The only issues with bisections we
need to worry about before sframes are enabled, is compiler errors.
That is, every patch should be able to be built.
In other words, you can safely disregard it.
-- Steve
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation
2026-05-06 14:09 ` Jens Remus
2026-05-06 15:03 ` Steven Rostedt
@ 2026-05-06 21:13 ` David Laight
2026-05-06 21:17 ` David Laight
1 sibling, 1 reply; 67+ messages in thread
From: David Laight @ 2026-05-06 21:13 UTC (permalink / raw)
To: Jens Remus
Cc: Steven Rostedt, Josh Poimboeuf, Dave Hansen, H. Peter Anvin,
linux-kernel, linux-trace-kernel, x86, bpf, linux-mm,
Namhyung Kim, Andrii Nakryiko, Jose E. Marchesi, Beau Belgrave,
Florian Weimer, Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google), Indu Bhagat,
Peter Zijlstra, Dylan Hatch, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Mathieu Desnoyers, Kees Cook, Sam James
On Wed, 6 May 2026 16:09:16 +0200
Jens Remus <jremus@linux.ibm.com> wrote:
> On 5/5/2026 2:17 PM, Jens Remus wrote:
> > From: Josh Poimboeuf <jpoimboe@kernel.org>
> >
> > Add an x86 implementation of unsafe_copy_from_user() similar to the
> > existing unsafe_copy_to_user().
>
> > diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
>
> > -#define unsafe_copy_to_user(_dst,_src,_len,label) \
> > -do { \
> > - char __user *__ucu_dst = (_dst); \
> > - const char *__ucu_src = (_src); \
> > - size_t __ucu_len = (_len); \
> > - unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label); \
> > - unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label); \
> > - unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label); \
> > - unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label); \
> > +#define unsafe_copy_to_user(_dst, _src, _len, label) \
> > +do { \
> > + void __user *__dst = (_dst); \
> > + const void *__src = (_src); \
> > + size_t __len = (_len); \
> > + unsafe_copy_to_user_loop(__dst, __src, __len, u64, label); \
> > + unsafe_copy_to_user_loop(__dst, __src, __len, u32, label); \
> > + unsafe_copy_to_user_loop(__dst, __src, __len, u16, label); \
> > + unsafe_copy_to_user_loop(__dst, __src, __len, u8, label); \
> > +} while (0)
> > +
> > +#define unsafe_copy_from_user_loop(dst, src, len, type, label) \
> > + while (len >= sizeof(type)) { \
> > + unsafe_get_user(*(type *)(dst), (type __user *)(src), label); \
> > + dst += sizeof(type); \
> > + src += sizeof(type); \
> > + len -= sizeof(type); \
> > + }
> > +
> > +#define unsafe_copy_from_user(_dst, _src, _len, label) \
> > +do { \
> > + void *__dst = (_dst); \
> > + void __user *__src = (_src); \
>
> const void __user *__src = (_src); \
>
> This was suggested by Sashiko AI review. Any objections?
Try compiling it :-)
-- David
>
> > + size_t __len = (_len); \
> > + unsafe_copy_from_user_loop(__dst, __src, __len, u64, label); \
> > + unsafe_copy_from_user_loop(__dst, __src, __len, u32, label); \
> > + unsafe_copy_from_user_loop(__dst, __src, __len, u16, label); \
> > + unsafe_copy_from_user_loop(__dst, __src, __len, u8, label); \
> > } while (0)
> >
> > #ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
>
> Thanks and regards,
> Jens
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation
2026-05-06 21:13 ` David Laight
@ 2026-05-06 21:17 ` David Laight
0 siblings, 0 replies; 67+ messages in thread
From: David Laight @ 2026-05-06 21:17 UTC (permalink / raw)
To: Jens Remus
Cc: Steven Rostedt, Josh Poimboeuf, Dave Hansen, H. Peter Anvin,
linux-kernel, linux-trace-kernel, x86, bpf, linux-mm,
Namhyung Kim, Andrii Nakryiko, Jose E. Marchesi, Beau Belgrave,
Florian Weimer, Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google), Indu Bhagat,
Peter Zijlstra, Dylan Hatch, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Mathieu Desnoyers, Kees Cook, Sam James
On Wed, 6 May 2026 22:13:58 +0100
David Laight <david.laight.linux@gmail.com> wrote:
...
> Try compiling it :-)
Ignore me...
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 19/19] unwind_user/sframe: Add prctl() interface for registering .sframe sections
2026-05-05 18:45 ` sashiko-bot
@ 2026-05-07 14:14 ` Jens Remus
0 siblings, 0 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-07 14:14 UTC (permalink / raw)
To: Steven Rostedt; +Cc: bpf, sashiko, Josh Poimboeuf, Indu Bhagat
On 5/5/2026 8:45 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [High] Missing `break;` statement causes `PR_SET_CFI` to fall through to `PR_ADD_SFRAME`.
> - [Medium] Patch introduces a test-only interface into the permanent UAPI.
> --
>
> commit 9171aa16917963dd1070d9ca3467d9086cd66393
> Author: Josh Poimboeuf <jpoimboe@kernel.org>
>
> unwind_user/sframe: Add prctl() interface for registering .sframe sections
>
> The commit adds prctl() interfaces which allow glibc to tell the kernel
> where to find .sframe sections. It mentions that the patch is for testing
> purposes only.
>
>> [
>> This adds an interface for prctl() for testing loading of sframes for
>> libraries. But this interface should really be a system call. This patch
>> is for testing purposes only and should not be applied to mainline.
>> ]
>
> Is it intended to include this in a mainline patch series? Adding temporary
> or test-only interfaces into the UAPI represents a permanent ABI commitment
> once released.
This patch will get replaced by a new syscall that Steven is working on:
[RFC][PATCH] unwind: Add stacktrace_setup system call
https://lore.kernel.org/all/20260429114355.6c712e6a@gandalf.local.home/
>
>> diff --git a/kernel/sys.c b/kernel/sys.c
>> index 62e842055cc9c..6e81e82bc9911 100644
>> --- a/kernel/sys.c
>> +++ b/kernel/sys.c
> [ ... ]
>> @@ -2906,6 +2907,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>> break;
>> if (arg3 & PR_CFI_LOCK && !(arg3 & PR_CFI_DISABLE))
>> error = arch_prctl_lock_branch_landing_pad_state(me);
>> + case PR_ADD_SFRAME:
>> + error = sframe_add_section(arg2, arg3, arg4, arg5);
>> + break;
>
> Does this cause the previous case to fall through into PR_ADD_SFRAME?
>
> It appears the new case was inserted before the break statement of the
> previous case. This would cause execution to unintentionally continue into
> sframe_add_section and overwrite the original error status.
Doh! Stupid mistake during rebase. Fixed in next version.
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 02/19] unwind_user/sframe: Add support for reading .sframe headers
2026-05-06 13:42 ` Jens Remus
@ 2026-05-07 14:55 ` Jens Remus
2026-05-08 23:02 ` Indu Bhagat
1 sibling, 0 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-07 14:55 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko, Indu Bhagat
On 5/6/2026 3:42 PM, Jens Remus wrote:
> On 5/5/2026 2:49 PM, sashiko-bot@kernel.org wrote:
>> [ ... ]
>>> + sec = kzalloc(sizeof(*sec), GFP_KERNEL);
>>> + if (!sec)
>>> + return -ENOMEM;
>>> +
>>> + sec->sframe_start = sframe_start;
>>> + sec->sframe_end = sframe_end;
>>> + sec->text_start = text_start;
>>> + sec->text_end = text_end;
>>> +
>>> + ret = sframe_read_header(sec);
>>> + if (ret)
>>> + goto err_free;
>>
>> Is it acceptable to mix goto error handling with scoped_guard() cleanup
>> helpers in the same function?
>>
>> The kernel's cleanup subsystem guidelines generally suggest that usage of
>> goto and cleanup helpers shouldn't be mixed in the same routine.
>
> This references to the following comment in include/linux/cleanup.h:
>
> Lastly, given that the benefit of cleanup helpers is removal of
> "goto", and that the "goto" statement can jump between scopes, the
> expectation is that usage of "goto" and cleanup helpers is never
> mixed in the same function. I.e. for a given routine, convert all
> resources that need a "goto" cleanup to scope-based cleanup, or
> convert none of them.
>
>>
>> Could sec be allocated using __free(kfree) to avoid the goto entirely?
>
> The goto error handling is used outside the scoped_guard(). So I think
> it is fine.
Let's do that.
> Do you see any value in converting it as follows (my naive attempt):
>
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> @@ -722,13 +722,14 @@ static int sframe_read_header(struct sframe_section *sec)
> return 0;
> }
>
> +DEFINE_FREE(free_section, struct sframe_section *, if (_T) free_section(_T))
> int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> unsigned long text_start, unsigned long text_end)
> {
> struct maple_tree *sframe_mt = ¤t->mm->sframe_mt;
> struct vm_area_struct *sframe_vma, *text_vma;
> struct mm_struct *mm = current->mm;
> - struct sframe_section *sec;
> + struct sframe_section *sec __free(free_section) = NULL;
Obviously this whole approach does not work, as it causes the section
to get freed once set goes out of scope. Just learned the hard way,
that I had not thought this through well enough...
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 16/19] unwind_user/sframe: Add support for SFrame V3 flexible FDEs
2026-05-05 18:55 ` sashiko-bot
@ 2026-05-07 15:30 ` Jens Remus
0 siblings, 0 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-07 15:30 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf, Indu Bhagat; +Cc: bpf, sashiko
On 5/5/2026 8:55 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> - [High] Valid flexible FREs ending in a single padding data word are incorrectly rejected.
Those are invalid as per SFrame spec.
> - [High] Valid flexible FRE rules for CFA and FP/RA produce unhandled enums, triggering user-controlled WARN_ON_ONCE().
Fix below.
> - [High] Outermost frame unwinding is unconditionally broken, causing unintended fallback to frame pointers.
Fix below.
> --
>
> commit 7d3289b2e756d08c38877458400ce276ff3f5994
> Author: Jens Remus <jremus@linux.ibm.com>
>
> unwind_user/sframe: Add support for SFrame V3 flexible FDEs
>
> SFrame V3 introduces flexible FDEs in addition to the regular FDEs.
> The key difference is that flexible FDEs encode the CFA, RA, and FP
> tracking information using two FRE data words, a control word and an
> offset, or a single padding data word of zero.
>
>> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>> index 2721f4af53fd3..2ba2c8b385f96 100644
>> --- a/kernel/unwind/sframe.c
>> +++ b/kernel/unwind/sframe.c
> [ ... ]
>> @@ -189,16 +193,147 @@ static __always_inline int __find_fde(struct sframe_section *sec,
> [ ... ]
>> +static __always_inline int
>> +__read_flex_fde_fre_datawords(struct sframe_section *sec,
>> + struct sframe_fde_internal *fde,
>> + unsigned long cur,
>> + unsigned char dataword_count,
>> + unsigned char dataword_size,
>> + struct sframe_fre_internal *fre)
>> +{
>> + u32 cfa_ctl, ra_ctl, fp_ctl;
>> + s32 cfa_off, ra_off, fp_off;
>> +
>> + if (dataword_count < 2)
>> + return -EFAULT;
>> + UNSAFE_GET_USER_INC(cfa_ctl, cur, dataword_size, Efault);
>> + UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
>> + dataword_count -= 2;
>> +
/*
* Each RA/FP location info consumes either two datawords
* (control word + offset) or one padding word substituting
* for that pair. Padding is only valid as substitution if
* followed by further non-padding location info. Therefore
* decoding only proceeds with at least two datawords. Any
* leftover trailing datawords are invalid and rejected by
* the final check.
*/
>> + ra_off = sec->ra_off;
>> + ra_ctl = ra_off ? 2 : 0; /* regnum=0, deref_p=(ra_off != 0), reg_p=0 */
>> + if (dataword_count >= 2) {
>
> The commit message mentions that flexible FDEs can use a single padding
> data word of zero. However, this code checks dataword_count >= 2 for both
> the RA and FP blocks.
That is ok, as a padding dataword for RA is only valid if followed by
two datawords (control and offset) for FP. In case of a valid padding
dataword for RA only one dataword is consumed. In case of an invalid
trailing padding dataword for RA it is not consumed. The same logic is
then used for FP, as that ensures that a leftover trailing padding for
for both RA or FP is not consumed (and then rejected by the final check)
and that any future extension e.g. to convey SP location info can use
the same logic. This helps reducing the number of checks at the cost
of a slow error path. Otherwise the checks would be:
/* RA */
if (!dataword_count)
goto done;
if (dataword_count == 1)
return -EFAULT; /* Trailing RA padding */
... consume RA dataword(s) ...
/* FP */
if (!dataword_count)
goto done;
if (dataword_count == 1)
return -EFAULT; /* Trailing FP padding */
... consume FP dataword(s) ...
if (dataword_count)
return -EFAULT; /* Trailing extra dataword(s) */
done:
fre->cfa_ctl = cfa_ctl;
...
>
> If an FRE ends with a single padding data word, it will bypass both of these
> blocks and reach the check below:
>
>> + UNSAFE_GET_USER_INC(ra_ctl, cur, dataword_size, Efault);
>> + dataword_count--;
>> + if (ra_ctl) {
>> + UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
>> + dataword_count--;
>> + } else {
>> + /* Padding RA location info */
>> + ra_ctl = ra_off ? 2 : 0; /* re-deduce (see above) */
>> + }
>> + }
>> +
>> + fp_off = sec->fp_off;
>> + fp_ctl = fp_off ? 2 : 0; /* regnum=0, deref_p=(fp_off != 0), reg_p=0 */
>> + if (dataword_count >= 2) {
>> + UNSAFE_GET_USER_INC(fp_ctl, cur, dataword_size, Efault);
>> + dataword_count--;
>> + if (fp_ctl) {
>> + UNSAFE_GET_USER_INC(fp_off, cur, dataword_size, Efault);
>> + dataword_count--;
>> + } else {
>> + /* Padding FP location info */
>> + fp_ctl = fp_off ? 2 : 0; /* re-deduce (see above) */
>> + }
>> + }
>> +
/* Reject trailing padding or unknown extra datawords */
>> + if (dataword_count)
>> + return -EFAULT;
>
> Could this incorrectly reject valid FREs ending in a single padding data word
> since dataword_count would be 1 here?
>
>> +
>> + fre->cfa_ctl = cfa_ctl;
>> + fre->cfa_off = cfa_off;
>> + fre->ra_ctl = ra_ctl;
>> + fre->ra_off = ra_off;
>> + fre->fp_ctl = fp_ctl;
>> + fre->fp_off = fp_off;
>> +
>> + return 0;
>> +
>> +Efault:
>> + return -EFAULT;
>> +}
> [ ... ]
>> static __always_inline int __read_fre(struct sframe_section *sec,
>> struct sframe_fde_internal *fde,
>> unsigned long fre_addr,
>> struct sframe_fre_internal *fre)
>> {
> [ ... ]
>> if (!dataword_count) {
>> /*
>> * A FRE without data words indicates RA undefined /
>> * outermost frame.
>> */
>> - cfa_off = 0;
>> - ra_off = 0;
>> - fp_off = 0;
>> - goto done;
>> - }
>> + fre->cfa_ctl = 0;
/*
* A FRE without datawords indicates an outermost
* frame. Zero-initialize CFA, RA, and FP location
* info, except for the CFA control word, which
* must not cause sframe_init_cfa_rule_data() to fail.
*/
fre->cfa_ctl = (SFRAME_REG_SP << 3) | 1; /* regnum=SP, deref_p=0, reg_p=1 */
>> + fre->cfa_off = 0;
>> + fre->ra_ctl = 0;
>> + fre->ra_off = 0;
>> + fre->fp_ctl = 0;
>> + fre->fp_off = 0;
>>
>> - UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
>
> When __read_fre() processes an outermost frame, it zeroes out fre->cfa_ctl.
> In __find_fre(), this is later passed to sframe_init_cfa_rule_data():
>
>> -static __always_inline void
>> +static __always_inline int
>> sframe_init_cfa_rule_data(struct unwind_user_cfa_rule_data *cfa_rule_data,
>> - unsigned char fre_info,
>> - s32 offset)
>> + u32 ctlword, s32 offset)
>> {
>> - if (SFRAME_V3_FRE_CFA_BASE_REG_ID(fre_info) == SFRAME_BASE_REG_FP)
>> - cfa_rule_data->rule = UNWIND_USER_CFA_RULE_FP_OFFSET;
>> - else
>> - cfa_rule_data->rule = UNWIND_USER_CFA_RULE_SP_OFFSET;
>> + bool deref_p = SFRAME_V3_FLEX_FDE_CTRLWORD_DEREF_P(ctlword);
>> + bool reg_p = SFRAME_V3_FLEX_FDE_CTRLWORD_REG_P(ctlword);
>> +
>> + if (reg_p) {
>> + unsigned int regnum = SFRAME_V3_FLEX_FDE_CTRLWORD_REGNUM(ctlword);
>> +
>> + switch (regnum) {
>> + case SFRAME_REG_SP:
>> + cfa_rule_data->rule = UNWIND_USER_CFA_RULE_SP_OFFSET;
>> + break;
>> + case SFRAME_REG_FP:
>> + cfa_rule_data->rule = UNWIND_USER_CFA_RULE_FP_OFFSET;
>> + break;
>> + default:
>> + cfa_rule_data->rule = UNWIND_USER_CFA_RULE_REG_OFFSET;
>> + cfa_rule_data->regnum = regnum;
>> + }
>> + } else {
>> + return -EINVAL;
>> + }
>
> Because ctlword is 0 for an outermost frame, reg_p will evaluate to false,
> and this unconditionally returns -EINVAL.
>
> Does this unintentionally break outermost frame unwinding? Returning an
> error here will cause __find_fre() to fail before it can reach the
> frame->outermost check, resulting in a fallback to frame pointer unwinding
> instead of stopping gracefully.
Good catch! That was broken. I did not notice during my testing on
s390, as the fallback to FP-based (back chain on s390) unwinding then
also detected an outermost frame. Suggested fix see above.
>
>> +
>> + if (deref_p)
>> + cfa_rule_data->rule |= UNWIND_USER_RULE_DEREF;
>> +
>
> Also, does this append UNWIND_USER_RULE_DEREF to CFA rules in a way that is
> not supported by the unwinder?
>
> If a CFA rule uses SP/FP with deref_p=1, this creates a combined enum like
> UNWIND_USER_CFA_RULE_SP_OFFSET | UNWIND_USER_RULE_DEREF. Because .sframe
> sections are user-supplied, could an attacker use this to trigger an
> unhandled enum path and hit a WARN_ON_ONCE() during unwinding?
Good catch! The generic unwinding logic in unwind_user_next_common()
actually needs to support both UNWIND_USER_CFA_RULE_SP_OFFSET_DEREF and
UNWIND_USER_CFA_RULE_FP_OFFSET_DEREF, as those are valid to be encoded
in SFrame flexible FDE and sframe_init_cfa_rule_data() converts
deref_p=1, reg_p=1, regnum=SP/FP into those. The reason for the
conversion is that arbitrary registers are only available in the topmost
frame, whereas SP and FP are available in any frame.
>
>> cfa_rule_data->offset = offset;
>> +
>> + return 0;
>> }
>>
>> static __always_inline void
static __always_inline int
>> sframe_init_rule_data(struct unwind_user_rule_data *rule_data,
>> - s32 offset)
>> + u32 ctlword, s32 offset)
>> {
>> - if (offset) {
>> - rule_data->rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF;
>> - rule_data->offset = offset;
>> - } else {
>> + bool deref_p = SFRAME_V3_FLEX_FDE_CTRLWORD_DEREF_P(ctlword);
i>> + bool reg_p = SFRAME_V3_FLEX_FDE_CTRLWORD_REG_P(ctlword);
bool reserved_p = SFRAME_V3_FLEX_FDE_CTRLWORD_RESERVED_P(ctlword);
unsigned int regnum = SFRAME_V3_FLEX_FDE_CTRLWORD_REGNUM(ctlword);
>> +
>> + if (!ctlword && !offset) {
>> rule_data->rule = UNWIND_USER_RULE_RETAIN;
>> + return;
return 0;
>> + }
if (reserved_p)
return -EINVAL;
@Indu: Although the SFrame spec does only state "unused Unused bit." I
think it would be good for the logic to reject any value other than zero
as that could be used in future extensions of the SFrame format. Do you
agree?
>> + if (reg_p) {
>> + unsigned int regnum = SFRAME_V3_FLEX_FDE_CTRLWORD_REGNUM(ctlword);
Drop line above.
>> +
>> + rule_data->rule = UNWIND_USER_RULE_REG_OFFSET;
>> + rule_data->regnum = regnum;
>> + } else {
if (regnum)
return -EINVAL;
@Indu: Is that too strict? The SFrame spec does only state that regnum
is "Effective only if reg_p is 1.". Shall I better ignore any non-zero
value if reg_p=0?
>> + rule_data->rule = UNWIND_USER_RULE_CFA_OFFSET;
>> }
>> +
>> + if (deref_p)
>> + rule_data->rule |= UNWIND_USER_RULE_DEREF;
>> +
>> + rule_data->offset = offset;
return 0;
>> }
Likewise checks of reserved_p and regnum in sframe_init_cfa_rule_data().
>
> Similarly, if a rule for FP/RA is set to *(CFA + offset) with reg_p=0
> and deref_p=0, this assigns UNWIND_USER_RULE_CFA_OFFSET without the
> UNWIND_USER_RULE_DEREF flag.
Huh? *(CFA + offset) is reg_p=0 and deref_p=1 and thus with the
UNWIND_USER_RULE_DEREF flag (i.e. *() denotes dereference).
>
> Is this combination handled by the rest of the unwinding logic, or will
> it fall through to an unsupported state and abort unwinding?
The unwinding logic only handles UNWIND_USER_RULE_CFA_OFFSET_DEREF for
RA and FP. It does not handle UNWIND_USER_RULE_CFA_OFFSET (that is
without the UNWIND_USER_RULE_DEREF flag). The reason is that SFrame
cannot represent the rule CFA + offset for RA/FP. Additionally it
would not be valid for RA, as a return address cannot legitimately be
a stack address.
sframe_init_rule_data() can be improved to reject the invalid encodings
reg_p==0 with regnum!=0 as well as reserved_p!=0 (unused control word
bit 2) as shown above.
>
>>
>> static __always_inline int __find_fre(struct sframe_section *sec,
>> @@ -344,9 +492,10 @@ static __always_inline int __find_fre(struct sframe_section *sec,
>> return -EINVAL;
>> fre = prev_fre;
>>
>> - sframe_init_cfa_rule_data(&frame->cfa, fre->info, fre->cfa_off);
>> - sframe_init_rule_data(&frame->ra, fre->ra_off);
>> - sframe_init_rule_data(&frame->fp, fre->fp_off);
>> + if (sframe_init_cfa_rule_data(&frame->cfa, fre->cfa_ctl, fre->cfa_off))
>> + return -EINVAL;
>> + sframe_init_rule_data(&frame->ra, fre->ra_ctl, fre->ra_off);
if (sframe_init_rule_data(&frame->ra, fre->ra_ctl, fre->ra_off))
return -EINVAL;
>> + sframe_init_rule_data(&frame->fp, fre->fp_ctl, fre->fp_off);
Likewise.
>> frame->outermost = SFRAME_V3_FRE_RA_UNDEFINED_P(fre->info);
>>
>> return 0;
>
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 17/19] unwind_user/sframe: Separate reading of FRE from reading of FRE data words
2026-05-05 19:05 ` sashiko-bot
@ 2026-05-07 16:01 ` Jens Remus
0 siblings, 0 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-07 16:01 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf, Indu Bhagat; +Cc: bpf, sashiko
On 5/5/2026 9:05 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Critical] A malicious user can trigger a kernel panic by providing a crafted `.sframe` section with `rep_size` set to 0, which causes a division-by-zero when the kernel attempts to unwind the stack.
> --
>
> commit 1aeb36f953ce26b15e30e2b228bfb48239924c2b
> Author: Jens Remus <jremus@linux.ibm.com>
>
> unwind_user/sframe: Separate reading of FRE from reading of FRE data words
>
> This commit separates the reading of SFrame FRE structures from their
> data words, which allows linear searches to skip data words until a
> matching FRE is found. New fields are added to struct sframe_fre_internal
> to cache the necessary information to read data words later.
>
>> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>> index 2ba2c8b385f96..98346412078b8 100644
>> --- a/kernel/unwind/sframe.c
>> +++ b/kernel/unwind/sframe.c
>
> [ ... ]
>
>> @@ -455,6 +450,7 @@ static __always_inline int __find_fre(struct sframe_section *sec,
>> bool which = false;
>> unsigned int i;
>> u32 ip_off;
>> + int ret;
>>
>> ip_off = ip - fde->func_addr;
>>
>
> This isn't a regression introduced by this patch, but while looking at this
> area, can a crafted sframe section cause a division-by-zero just after
> this line?
Fixed in [PATCH v14 05/19] unwind_user/sframe: Add support for reading
.sframe contents.
>
> In __find_fre(), the code does:
>
> if (fde_pctype == SFRAME_FDE_PCTYPE_MASK)
> ip_off %= fde->rep_size;
>
> Since fde->rep_size and fde->info are populated directly from the user-provided
> sframe section, does anything ensure rep_size is non-zero before reaching
> this code?
>
> If a user provides an sframe section with rep_size set to 0 and the
> SFRAME_FDE_PCTYPE_MASK bit set in the info field, will this lead to an
> unhandled division-by-zero exception and crash the kernel during stack
> unwinding?
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 07/19] unwind_user/sframe: Wire up unwind_user to sframe
2026-05-05 18:55 ` sashiko-bot
@ 2026-05-07 16:18 ` Jens Remus
2026-05-08 23:07 ` Indu Bhagat
0 siblings, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-07 16:18 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko, Indu Bhagat
On 5/5/2026 8:55 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [High] Unbounded linear scan of user-controlled sframe entries can cause a CPU hard lockup in NMI context.
> - [Medium] The commit message claims to refactor `unwind_user_next_common()` by removing an unused `pt_regs` parameter, but this change is entirely absent from the code modifications.
> --
>
> commit 2e1e17bbaa1c0426d41d66ce7930a944f3e86ae7
> Author: Josh Poimboeuf <jpoimboe@kernel.org>
>
> unwind_user/sframe: Wire up unwind_user to sframe
>
> This commit hooks up the sframe infrastructure into the unwind_user interface
> to support user-space stack unwinding via sframe. It adds the
> UNWIND_USER_TYPE_SFRAME type and updates the unwind_user_next() logic to
> attempt sframe unwinding when available.
>
>> [ Jens Remus: Remove unused pt_regs from unwind_user_next_common() and
>> its callers. Simplify unwind_user_next_sframe(). ]
>
> This isn't a bug, but does this commit message match the changes?
> Looking at the diff and the code, there are no modifications to
> unwind_user_next_common() or its callers to remove a pt_regs parameter.
Sashiko AI review failed to understand this commit message comment.
>
>> diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
>> index 90ab3c1a205ef..1fb2724197332 100644
>> --- a/kernel/unwind/user.c
>> +++ b/kernel/unwind/user.c
> [ ... ]
>> @@ -82,6 +83,16 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
>> return unwind_user_next_common(state, &fp_frame);
>> }
>>
>> +static int unwind_user_next_sframe(struct unwind_user_state *state)
>> +{
>> + struct unwind_user_frame frame;
>> +
>> + /* sframe expects the frame to be local storage */
>> + if (sframe_find(state->ip, &frame))
>
> Can this sequence cause a hard lockup in NMI context?
>
> Since this unwinder is often called from NMI context (like perf), sframe_find()
> eventually calls __find_fre() to search for Frame Row Entries. If __find_fre()
> performs a linear scan over an unbounded, user-controlled number of entries
> (fde->fres_num), it could spin for a long time without yielding.
>
> If a malicious userspace program maps a large VMA and provides hundreds of
> millions of FREs, would this loop calling unsafe_get_user() on every
> iteration tie up the CPU long enough in NMI context to trigger the hard
> lockup detector?
Please advise.
A user could craft .sframe with a large number of FDEs (up to UINT32_MAX)
and large number of FREs per FDE (up to UINT16_MAX). Each FDE could point
at the same FDA and FREs to save section space. Each FDE would require
16 bytes in the FDE table. All FDEs could share a single FDA of 5 bytes
and share FREs of minimum 5 bytes each (when using the 16-bit start
address offset to have UINT16_MAX FREs) in the FRE table.
>
>> + return -ENOENT;
>> + return unwind_user_next_common(state, &frame);
>> +}
>
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 10/19] unwind_user/sframe: Remove .sframe section on detected corruption
2026-05-05 20:39 ` sashiko-bot
@ 2026-05-07 16:23 ` Jens Remus
0 siblings, 0 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-07 16:23 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko, Indu Bhagat
On 5/5/2026 10:39 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Concurrent stack unwinding on a corrupted `.sframe` section triggers a user-controllable `WARN_ON_ONCE()`, leading to a potential Denial of Service (DoS).
> --
>
> commit a00e313eeb121415a3ca3fe4ed3b15126f46c76a
> Author: Josh Poimboeuf <jpoimboe@kernel.org>
>
> unwind_user/sframe: Remove .sframe section on detected corruption
>
> This commit removes the .sframe section on demand when corruption is
> detected during unwinding, to avoid continued attempted use of a bad
> section.
>
>> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>> index 9c755965ab837..8eed6a7d96256 100644
>> --- a/kernel/unwind/sframe.c
>> +++ b/kernel/unwind/sframe.c
>> @@ -358,6 +358,10 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
>> ret = __find_fre(sec, &fde, ip, frame);
>> end:
>> user_read_access_end();
>> +
>> + if (ret == -EFAULT)
>> + WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
>> +
>> return ret;
>> }
>
> Can this WARN_ON_ONCE() be triggered concurrently by userspace?
>
> If two threads within the same process are being unwound simultaneously,
> they could both encounter the same bad .sframe section and hit the
> -EFAULT path.
>
> Because sframe_find() is only protected by the sframe_srcu read lock,
> both threads might call sframe_remove_section() at the same time. The first
> thread will successfully remove the section and return 0.
>
> The second thread will fail to find the section in mm->sframe_mt:
>
> kernel/unwind/sframe.c:sframe_remove_section() {
> ...
> mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
> if (sec->sframe_start == sframe_start) {
> found = true;
> ret |= __sframe_remove_section(mm, sec);
> }
> }
>
> if (!found || ret)
> return -EINVAL;
> ...
> }
>
> Since found is false, this returns -EINVAL. Because -EINVAL is non-zero,
> WARN_ON_ONCE(-EINVAL) evaluates to true.
>
> Could this allow an unprivileged user to intentionally trigger kernel
> warnings (and panic the system if panic_on_warn is set) by unmapping a
> registered .sframe section and triggering concurrent unwinds?
Please advise.
Should all WARN_ON_ONCE() in user unwind (sframe) be replaced with e.g.
dbg_once() instead, defining it as follows in sframe_debug.h:
#define dbg_once(fmt, ...) \
pr_debug_once("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents
2026-05-06 15:01 ` Steven Rostedt
2026-05-06 15:29 ` Jens Remus
@ 2026-05-08 9:49 ` Jens Remus
2026-05-08 23:04 ` Indu Bhagat
1 sibling, 1 reply; 67+ messages in thread
From: Jens Remus @ 2026-05-08 9:49 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf; +Cc: Indu Bhagat, bpf, sashiko
On 5/6/2026 5:01 PM, Steven Rostedt wrote:
> On Wed, 6 May 2026 16:34:34 +0200
> Jens Remus <jremus@linux.ibm.com> wrote:
>>> If a malicious user provides a large fre_len in the header, fres_end
>>> (calculated as fres_start + shdr.fre_len) could wrap around the 32-bit
>>> address space. This would bypass the bounds check in sframe_read_header(),
>>> allowing fres_start and fdes_start to point into kernel memory. Later, when
>>> __read_fde() and __find_fre() use unsafe_get_user(), this could lead to
>>> arbitrary kernel memory disclosure.
>>
>> SFrame is currently only supported on 64-bit architectures (i.e. x86-64,
>> arm64, s390 64-bit). So unsigned long fres_end should always be 64-bit.
>> Do we need to add the following to the header parsing?
>>
>> if (fdes_start >= fdes_end || fres_start >= fres_end) {
>> dbg_sec("inconsistent FDE/FRE start/end address\n");
>> return -EINVAL;
>> }
>
> I guess this wouldn't hurt.
Reviewing my suggestion again I realize that this check would be
superfluous. The existing computation and check already ensures that
the FDE table is within sframe section, the FRE table is within sframe
section, and both tables do not overlap:
num_fdes = shdr.num_fdes;
fdes_start = header_end + shdr.fdes_off;
fdes_end = fdes_start + (num_fdes * sizeof(struct sframe_fde_v3));
fres_start = header_end + shdr.fres_off;
fres_end = fres_start + shdr.fre_len;
if (fres_start < fdes_end || fres_end > sec->sframe_end) {
dbg_sec("inconsistent FDE/FRE offsets\n");
return -EINVAL;
}
- fdes_start and fres_start are computed from header_start and thus must
be larger sframe_start
- fdes_end and fres_end are computed from their fdes_start and
fres_start and thus must be larger than sframe_start
- fres_start < fdes_end ensures that the FDE table and FRE table do not
overlap
- fres_end > sec->sframe_end ensures that fres_end (and fdes_end and both
fdes_start and fres_start) are smaller or equal sframe_end
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents
2026-05-05 12:17 ` [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
2026-05-05 18:59 ` sashiko-bot
@ 2026-05-08 10:50 ` Jens Remus
1 sibling, 0 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-08 10:50 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
Dylan Hatch
Cc: bpf, linux-mm, linux-trace-kernel, x86, Namhyung Kim,
Andrii Nakryiko, Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google), Peter Zijlstra,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
On 5/5/2026 2:17 PM, Jens Remus wrote:
> From: Josh Poimboeuf <jpoimboe@kernel.org>
>
> In preparation for using sframe to unwind user space stacks, add an
> sframe_find() interface for finding the sframe information associated
> with a given text address.
>
> For performance, use user_read_access_begin() and the corresponding
> unsafe_*() accessors. Note that use of pr_debug() in uaccess-enabled
> regions would break noinstr validation, so there aren't any debug
> messages yet. That will be added in a subsequent commit.
>
> Link: https://lore.kernel.org/all/77c0d1ec143bf2a53d66c4ecb190e7e0a576fbfd.1737511963.git.jpoimboe@kernel.org/
> Link: https://lore.kernel.org/all/b35ca3a3-8de5-4d32-8d30-d4e562f6b0de@linux.ibm.com/
>
> [ Jens Remus: Add initial support for SFrame V3 (limited to regular
> FDEs). Add support for PC-relative FDE function start offset. Simplify
> logic by using an internal FDE representation. Rename struct sframe_fre
> to sframe_fre_internal to align with struct sframe_fde_internal.
> Cleanup includes. Fix checkpatch errors "spaces required around that
> ':'". ]
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> +static __always_inline int __read_fde(struct sframe_section *sec,
> + unsigned int fde_num,
> + struct sframe_fde_internal *fde)
> +{
> + unsigned long fde_addr, fda_addr, func_addr;
unsigned long fde_addr, fda_addr, func_start, func_end;
> + struct sframe_fde_v3 _fde;
> + struct sframe_fda_v3 _fda;
> +
> + fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde_v3));
> + unsafe_copy_from_user(&_fde, (void __user *)fde_addr,
> + sizeof(struct sframe_fde_v3), Efault);
> +
> + func_addr = fde_addr + _fde.func_start_off;
> + if (func_addr < sec->text_start || func_addr >= sec->text_end)
> + return -EINVAL;
func_start = fde_addr + _fde.func_start_off;
func_end = func_start + _fde.func_size;
if (func_start < sec->text_start || func_end > sec->text_end)
return -EINVAL;
This would validate that the whole function described by the FDE is
within the text section and not only the function start.
Note that, unrelated to above change, this check in general causes
sframe_validate_section() to fail, if one sframe section covers more
than one text section (unrelated to whether it is actually registered
for multiple text sections), for instance in case of Dylan's sframe
kernel stacktracer on arm64. Should the check therefore be made
conditional on whether __read_fde() is called from __find_fde() or
sframe_validate_section()? Or shall we drop this check as it does not
provide that much benefit during normal stacktracing use:
- sframe_find() obtains the struct sframe_section *sec from the
mm->sframe_mt based on IP. So IP must be within sec->text_start and
sec->text_end.
- __find_fde() only returns a FDE, if the IP is within
[fde->func_addr, fde->func_addr + fde->func_size[.
Dropping the check would allow the function start/end to be outside the
text section.
> +
> + fda_addr = sec->fres_start + _fde.fres_off;
> + if (fda_addr + sizeof(struct sframe_fda_v3) > sec->fres_end)
> + return -EINVAL;
> + unsafe_copy_from_user(&_fda, (void __user *)fda_addr,
> + sizeof(struct sframe_fda_v3), Efault);
Can unsafe_copy_from_user() be used for unaligned fda_addr, at least
on x86-64, s390 64-bit, and amr64?
Do the FDE type, FDE PC type, and FRE type values need to be validated
here as well?
unsigned char fde_type = SFRAME_V3_FDE_TYPE(_fda.info2);
unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(_fda.info);
unsigned char fre_type = SFRAME_V3_FDE_FRE_TYPE(_fda.info);
The FDE type would get validatd by __read_fre_datawords(), which is
called after __read_fde(), if the read FDE is the one of interest.
So that does not neccessarily need to be checked here. Do you agree?
The FDE PC type is currently not checked for supported values anywhere.
That one would make sense to be checked here:
if (fde_pctype != SFRAME_FDE_PCTYPE_INC &&
fde_pctype != SFRAME_FDE_PCTYPE_MASK)
return -EINVAL;
The FRE type would get validated by __read_fre(), which is called
somewhere down the line after __read_fde(). So that does not
neccesarily need to be checked here. Do you agree?
> +
> + fde->func_addr = func_addr;
fde->func_addr = func_start;
> + fde->func_size = _fde.func_size;
> + fde->fda_off = _fde.fres_off;
> + fde->fres_off = _fde.fres_off + sizeof(struct sframe_fda_v3);
> + fde->fres_num = _fda.fres_num;
> + fde->info = _fda.info;
> + fde->info2 = _fda.info2;
> + fde->rep_size = _fda.rep_size;
> +
> + return 0;
> +
> +Efault:
> + return -EFAULT;
> +}
> +static __always_inline int __read_fre(struct sframe_section *sec,
> + struct sframe_fde_internal *fde,
> + unsigned long fre_addr,
> + struct sframe_fre_internal *fre)
> +{
> + unsigned char fde_type = SFRAME_V3_FDE_TYPE(fde->info2);
> + unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(fde->info);
> + unsigned char fre_type = SFRAME_V3_FDE_FRE_TYPE(fde->info);
> + unsigned char dataword_count, dataword_size;
> + s32 cfa_off, ra_off, fp_off;
> + unsigned long cur = fre_addr;
> + unsigned char addr_size;
> + u32 ip_off;
> + u8 info;
> +
> + addr_size = fre_type_to_size(fre_type);
> + if (!addr_size)
> + return -EFAULT;
> +
> + if (fre_addr + addr_size + 1 > sec->fres_end)
> + return -EFAULT;
> +
> + UNSAFE_GET_USER_INC(ip_off, cur, addr_size, Efault);
> + if (fde_pctype == SFRAME_FDE_PCTYPE_INC && ip_off > fde->func_size)
if ((fde_pctype == SFRAME_FDE_PCTYPE_INC && ip_off >= fde->func_size) ||
(fde_pctype == SFRAME_FDE_PCTYPE_MASK && ip_off >= fde->rep_size))
For PCTYPE_INC the FRE IP offset must be less than the FDE function size.
For PCTYPE_MASK the FRE IP offset must be less than the FDE repetition size.
> + return -EFAULT;
> +
> + UNSAFE_GET_USER_INC(info, cur, 1, Efault);
> + dataword_count = SFRAME_V3_FRE_DATAWORD_COUNT(info);
> + dataword_size = dataword_size_enum_to_size(SFRAME_V3_FRE_DATAWORD_SIZE(info));
> + if (!dataword_count || !dataword_size)
> + return -EFAULT;
> +
> + if (cur + (dataword_count * dataword_size) > sec->fres_end)
> + return -EFAULT;
> +
> + /* TODO: Support for flexible FDEs not implemented yet. */
> + if (fde_type != SFRAME_FDE_TYPE_DEFAULT)
> + return -EFAULT;
> +
> + UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
> + dataword_count--;
> +
> + ra_off = sec->ra_off;
> + if (!ra_off) {
> + if (!dataword_count--)
> + return -EFAULT;
> +
> + UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
> + }
> +
> + fp_off = sec->fp_off;
> + if (!fp_off && dataword_count) {
> + dataword_count--;
> + UNSAFE_GET_USER_INC(fp_off, cur, dataword_size, Efault);
> + }
> +
> + if (dataword_count)
> + return -EFAULT;
> +
> + fre->size = addr_size + 1 + (dataword_count * dataword_size);
> + fre->ip_off = ip_off;
> + fre->cfa_off = cfa_off;
> + fre->ra_off = ra_off;
> + fre->fp_off = fp_off;
> + fre->info = info;
> +
> + return 0;
> +
> +Efault:
> + return -EFAULT;
> +}
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 12/19] unwind_user/sframe: Add .sframe validation option
2026-05-05 12:17 ` [PATCH v14 12/19] unwind_user/sframe: Add .sframe validation option Jens Remus
2026-05-05 18:32 ` sashiko-bot
@ 2026-05-08 10:51 ` Jens Remus
1 sibling, 0 replies; 67+ messages in thread
From: Jens Remus @ 2026-05-08 10:51 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf, Indu Bhagat, Dylan Hatch
Cc: bpf, linux-kernel, linux-mm, linux-trace-kernel, x86,
Namhyung Kim, Andrii Nakryiko, Jose E. Marchesi, Beau Belgrave,
Florian Weimer, Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google), Peter Zijlstra,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
On 5/5/2026 2:17 PM, Jens Remus wrote:
> From: Josh Poimboeuf <jpoimboe@kernel.org>
>
> Add a debug feature to validate all .sframe sections when first loading
> the file rather than on demand.
>
> [ Jens Remus: Add support for SFrame V3. Add support for PC-relative
> FDE function start offset. Adjust to rename of struct sframe_fre to
> sframe_fre_internal. Use %#x/%#lx format specifiers. ]
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> +static int safe_read_fde(struct sframe_section *sec,
> + unsigned int fde_num, struct sframe_fde_internal *fde)
> +{
> + int ret;
> +
> + if (!user_read_access_begin((void __user *)sec->sframe_start,
> + sec->sframe_end - sec->sframe_start))
> + return -EFAULT;
> + ret = __read_fde(sec, fde_num, fde);
> + user_read_access_end();
> + return ret;
> +}
> +static int sframe_validate_section(struct sframe_section *sec)
> +{
> + unsigned long prev_ip = 0;
> + unsigned int i;
> +
> + for (i = 0; i < sec->num_fdes; i++) {
> + struct sframe_fre_internal *fre, *prev_fre = NULL;
> + unsigned long ip, fre_addr;
> + struct sframe_fde_internal fde;
> + struct sframe_fre_internal fres[2];
> + bool which = false;
> + unsigned int j;
> + int ret;
> +
> + ret = safe_read_fde(sec, i, &fde);
Iterating over all FDEs may cause __read_fde() and thus safe_read_fde()
to fail if one sframe section covers multiple text sections (regardless
of whether it is also registered for multiple text sections), as
__read_fde() checks whether the read FDE function start address is
within [sec->text_start, sec->text_end[.
See my related comments in my reply to [PATCH v14 05/19] unwind_user/
sframe: Add support for reading .sframe contents.
> + if (ret) {
> + dbg_sec("safe_read_fde(%u) failed\n", i);
> + return ret;
> + }
> +
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 02/19] unwind_user/sframe: Add support for reading .sframe headers
2026-05-06 13:42 ` Jens Remus
2026-05-07 14:55 ` Jens Remus
@ 2026-05-08 23:02 ` Indu Bhagat
1 sibling, 0 replies; 67+ messages in thread
From: Indu Bhagat @ 2026-05-08 23:02 UTC (permalink / raw)
To: Jens Remus, Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko
On 2026-05-06 06:42, Jens Remus wrote:
> On 5/5/2026 2:49 PM,sashiko-bot@kernel.org wrote:
>> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
>> - [Medium] Missing validation of shdr.abi_arch allows parsing SFrame data from mismatched architectures.
>> - [Medium] VMA addresses are incorrectly evaluated as boolean flags to check for zero-length inputs.
>> - [Low] Mixing goto error handling with scoped_guard scope-based cleanup.
>> --
>>
>> commit afe7c9eb6f0e6fc3f73d0b6a555f8d3926cf6ccf
>> Author: Josh Poimboeuf<jpoimboe@kernel.org>
>>
>> unwind_user/sframe: Add support for reading .sframe headers
>>
>> This commit introduces basic infrastructure for unwinding user space stacks
>> with SFrame. It adds support for parsing the .sframe section header.
>>
>> Link:https://lore.kernel.org/all/
>> f27e8463783febfa0dabb0432a3dd6be8ad98412.1737511963.git.jpoimboe@kernel.org/
>>
>>> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>>> new file mode 100644
>>> index 0000000000000..c128c1eeaca20
>>> --- /dev/null
>>> +++ b/kernel/unwind/sframe.c
>> [ ... ]
>>> +static int sframe_read_header(struct sframe_section *sec)
>>> +{
>>> + unsigned long header_end, fdes_start, fdes_end, fres_start, fres_end;
>>> + struct sframe_header shdr;
>>> + unsigned int num_fdes;
>>> +
>>> + if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
>>> + dbg("header usercopy failed\n");
>>> + return -EFAULT;
>>> + }
>>> +
>>> + if (shdr.preamble.magic != SFRAME_MAGIC ||
>>> + shdr.preamble.version != SFRAME_VERSION_3 ||
>>> + !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
>>> + !(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
> shdr.abi_arch != sframe_abi_arch() ||
>
>>> + shdr.auxhdr_len) {
>>> + dbg("bad/unsupported sframe header\n");
>>> + return -EINVAL;
>>> + }
>> Should we also validate shdr.abi_arch to ensure it matches the expected
>> architecture?
>>
>> If malformed or foreign-architecture SFrame data is passed in (for instance,
>> with an unexpected endianness or register mapping), the kernel might interpret
>> it incorrectly and generate invalid stack traces.
>>
>> Furthermore, accepting arbitrary values here could risk ABI calcification if
>> userspace starts supplying invalid values that we later want to reject.
> I could introduce an architecture-specific sframe_abi_arch() and update
> the SFrame header validation as above. For x86 the implementation in
> in unwind_user_sframe.h could look as follows:
>
> #ifdef CONFIG_X86_64
> #define sframe_abi_arch() SFRAME_ABI_AMD64_ENDIAN_LITTLE
> #endif
>
> Given sframe.o is only linked in if CONFIG_HAVE_UNWIND_USER_SFRAME and
> that is only selected if X86_64 this should be fine.
>
> Do you see any value?
I dont see much value. fs/binfmt_elf.c has done some admission control
for the user binary already. For bi-endian systems, this doesnt add
value (wrt checking correct endianness).
For addressing the concern of a compromised SFrame section, we need to
ensure checks at SFrame information read/validation time (which the code
is doing already and additional checks that can be added are currently
being discussed). Checking for the abi_arch value does not necessarily
safeguard much.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents
2026-05-06 14:34 ` Jens Remus
2026-05-06 15:01 ` Steven Rostedt
@ 2026-05-08 23:03 ` Indu Bhagat
1 sibling, 0 replies; 67+ messages in thread
From: Indu Bhagat @ 2026-05-08 23:03 UTC (permalink / raw)
To: Jens Remus, Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko
On 2026-05-06 07:34, Jens Remus wrote:
> On 5/5/2026 8:59 PM, sashiko-bot@kernel.org wrote:
>> Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
>> - [Critical] Division by zero in `__find_fre` when `fde->rep_size` is 0.
>> - [High] `mtree_insert_range` is called with an exclusive end boundary, preventing contiguous SFrame sections.
>> - [High] Unbounded linear search over user-controlled memory in `__find_fre` can cause hard lockups in atomic context.
>> - [High] Integer overflows in SFrame header parsing allow bypass of bounds checks on 32-bit architectures.
>> - [Medium] Generic SFrame unwinding code uses `unsafe_get_user()` for unaligned accesses, breaking strict-alignment architectures.
>> --
>>
>> commit 159c92acf3c961aecc6383f6c3f35a955591abf3
>> Author: Josh Poimboeuf <jpoimboe@kernel.org>
>>
>> unwind_user/sframe: Add support for reading .sframe contents
>>
>> This commit adds support for reading .sframe section contents to facilitate
>> user space stack unwinding. It introduces the sframe_find() interface which
>> uses user_read_access_begin() and unsafe_*() accessors for performance.
>> Support is included for SFrame V3, internal FDE representations, and
>> PC-relative offsets.
>>
>> Link: https://lore.kernel.org/all/77c0d1ec143bf2a53d66c4ecb190e7e0a576fbfd.1737511963.git.jpoimboe@kernel.org/
>> Link: https://lore.kernel.org/all/b35ca3a3-8de5-4d32-8d30-d4e562f6b0de@linux.ibm.com/
>>
>>> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>>> index fdb2a49085638..c60aa527984a1 100644
>>> --- a/kernel/unwind/sframe.c
>>> +++ b/kernel/unwind/sframe.c
>>> @@ -15,9 +15,338 @@
>>> #include <linux/unwind_user_types.h>
>>>
>>> #include "sframe.h"
>>> +#include "sframe_debug.h"
>>
>> [ ... ]
>>
>>> +static __always_inline int __read_fre(struct sframe_section *sec,
>>> + struct sframe_fde_internal *fde,
>>> + unsigned long fre_addr,
>>> + struct sframe_fre_internal *fre)
>>> +{
>> [ ... ]
>>> + if (fre_addr + addr_size + 1 > sec->fres_end)
>>> + return -EFAULT;
>>> +
>>> + UNSAFE_GET_USER_INC(ip_off, cur, addr_size, Efault);
>>
>> Will this cause alignment faults on architectures with strict alignment
>> requirements?
>>
>> The .sframe format uses packed structures and variable-length datawords. The
>> cur pointer might be unaligned here, and UNSAFE_GET_USER_INC() performs
>> 16-bit or 32-bit reads via unsafe_get_user().
>
> IIUC this should not be an issue for x86-64, s390, and arm64.
>
>> [ ... ]
>>
>>> +static __always_inline int __find_fre(struct sframe_section *sec,
>>> + struct sframe_fde_internal *fde,
>>> + unsigned long ip,
>>> + struct unwind_user_frame *frame)
>>> +{
>>> + unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(fde->info);
>>> + struct sframe_fre_internal *fre, *prev_fre = NULL;
>>> + struct sframe_fre_internal fres[2];
>>> + unsigned long fre_addr;
>>> + bool which = false;
>>> + unsigned int i;
>>> + u32 ip_off;
>>> +
>>> + ip_off = ip - fde->func_addr;
>>> +
>>> + if (fde_pctype == SFRAME_FDE_PCTYPE_MASK)
>>> + ip_off %= fde->rep_size;
>>
>> Can a malicious user provide an SFrame section with a rep_size of 0?
>>
>> Since fde->rep_size is populated directly from the user-provided
>> _fda.rep_size without validation, a 0 value here would trigger a
>> divide-by-zero exception and a kernel panic.
>
> Good catch! I would add a check in __read_fde():
>
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> @@ -67,6 +67,7 @@ static __always_inline int __read_fde(struct sframe_section *sec,
> unsigned long fde_addr, fda_addr, func_addr;
> struct sframe_fde_v3 _fde;
> struct sframe_fda_v3 _fda;
> + unsigned char fde_pctype;
>
> fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde_v3));
> unsafe_copy_from_user(&_fde, (void __user *)fde_addr,
> @@ -82,6 +83,10 @@ static __always_inline int __read_fde(struct sframe_section *sec,
> unsafe_copy_from_user(&_fda, (void __user *)fda_addr,
> sizeof(struct sframe_fda_v3), Efault);
>
> + fde_pctype = SFRAME_V3_FDE_PCTYPE(_fda.info);
> + if (fde_pctype == SFRAME_FDE_PCTYPE_MASK && !_fda.rep_size)
> + return -EINVAL;
> +
> fde->func_addr = func_addr;
> fde->func_size = _fde.func_size;
> fde->fda_off = _fde.fres_off;
>
Yes, I think this is good to add.
>>
>>> +
>>> + fre_addr = sec->fres_start + fde->fres_off;
>>> +
>>> + for (i = 0; i < fde->fres_num; i++) {
>>
>> Can this loop cause a hard lockup in atomic context?
>>
>> fde->fres_num is a 32-bit value copied from user space without validation.
>> Since sframe_find() is designed to be called by unwinders in NMI context, an
>> attacker could provide a very large number of valid entries. Executing
>> billions of iterations and unsafe_get_user() calls could stall the CPU
>> and trigger the hard lockup detector.
>
> Please advise.
>
>>
>>> + int ret;
>>> +
>>
>> [ ... ]
>>
>>> @@ -120,8 +449,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>>> sec->text_end = text_end;
>>>
>>> ret = sframe_read_header(sec);
>>> - if (ret)
>>> + if (ret) {
>>> + dbg_print_header(sec);
>>> goto err_free;
>>> + }
>>
>> Can shdr.fre_len cause an integer overflow on 32-bit architectures during
>> header parsing?
>>
>> If a malicious user provides a large fre_len in the header, fres_end
>> (calculated as fres_start + shdr.fre_len) could wrap around the 32-bit
>> address space. This would bypass the bounds check in sframe_read_header(),
>> allowing fres_start and fdes_start to point into kernel memory. Later, when
>> __read_fde() and __find_fre() use unsafe_get_user(), this could lead to
>> arbitrary kernel memory disclosure.
>
> SFrame is currently only supported on 64-bit architectures (i.e. x86-64,
> arm64, s390 64-bit). So unsigned long fres_end should always be 64-bit.
> Do we need to add the following to the header parsing?
>
> if (fdes_start >= fdes_end || fres_start >= fres_end) {
> dbg_sec("inconsistent FDE/FRE start/end address\n");
> return -EINVAL;
> }
>
>>
>>>
>>> ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
>>
>> Does passing sec->text_end directly as the last parameter to
>> mtree_insert_range() break contiguous mappings?
>>
>> mtree_insert_range() expects the last boundary to be inclusive, but
>> sec->text_end represents the exclusive end address of the executable segment.
>> If user space maps seamlessly contiguous text segments, the insertion for the
>> second segment might overlap with the claimed end of the first, causing it to
>> fail with -EEXIST.
>
> Addressed in previous patch.
>
>>
>>> if (ret) {
>>> dbg("mtree_insert_range failed: text=%lx-%lx\n",
>>
>
> Thanks and regards,
> Jens
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents
2026-05-08 9:49 ` Jens Remus
@ 2026-05-08 23:04 ` Indu Bhagat
0 siblings, 0 replies; 67+ messages in thread
From: Indu Bhagat @ 2026-05-08 23:04 UTC (permalink / raw)
To: Jens Remus, Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko
On 2026-05-08 02:49, Jens Remus wrote:
> On 5/6/2026 5:01 PM, Steven Rostedt wrote:
>> On Wed, 6 May 2026 16:34:34 +0200
>> Jens Remus <jremus@linux.ibm.com> wrote:
>
>>>> If a malicious user provides a large fre_len in the header, fres_end
>>>> (calculated as fres_start + shdr.fre_len) could wrap around the 32-bit
>>>> address space. This would bypass the bounds check in sframe_read_header(),
>>>> allowing fres_start and fdes_start to point into kernel memory. Later, when
>>>> __read_fde() and __find_fre() use unsafe_get_user(), this could lead to
>>>> arbitrary kernel memory disclosure.
>>>
>>> SFrame is currently only supported on 64-bit architectures (i.e. x86-64,
>>> arm64, s390 64-bit). So unsigned long fres_end should always be 64-bit.
>>> Do we need to add the following to the header parsing?
>>>
>>> if (fdes_start >= fdes_end || fres_start >= fres_end) {
>>> dbg_sec("inconsistent FDE/FRE start/end address\n");
>>> return -EINVAL;
>>> }
>>
>> I guess this wouldn't hurt.
>
> Reviewing my suggestion again I realize that this check would be
> superfluous. The existing computation and check already ensures that
> the FDE table is within sframe section, the FRE table is within sframe
> section, and both tables do not overlap:
>
> num_fdes = shdr.num_fdes;
> fdes_start = header_end + shdr.fdes_off;
> fdes_end = fdes_start + (num_fdes * sizeof(struct sframe_fde_v3));
>
> fres_start = header_end + shdr.fres_off;
> fres_end = fres_start + shdr.fre_len;
>
> if (fres_start < fdes_end || fres_end > sec->sframe_end) {
> dbg_sec("inconsistent FDE/FRE offsets\n");
> return -EINVAL;
> }
>
> - fdes_start and fres_start are computed from header_start and thus must
> be larger sframe_start
> - fdes_end and fres_end are computed from their fdes_start and
> fres_start and thus must be larger than sframe_start
> - fres_start < fdes_end ensures that the FDE table and FRE table do not
> overlap
> - fres_end > sec->sframe_end ensures that fres_end (and fdes_end and both
> fdes_start and fres_start) are smaller or equal sframe_end
>
Yes, I too think the existing check you note above suffices.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 06/19] unwind_user/sframe: Detect .sframe sections in executables
2026-05-06 15:36 ` Steven Rostedt
@ 2026-05-08 23:05 ` Indu Bhagat
0 siblings, 0 replies; 67+ messages in thread
From: Indu Bhagat @ 2026-05-08 23:05 UTC (permalink / raw)
To: Steven Rostedt, Jens Remus; +Cc: Josh Poimboeuf, bpf, sashiko, Jose E. Marchesi
On 2026-05-06 08:36, Steven Rostedt wrote:
> On Wed, 6 May 2026 16:56:01 +0200
> Jens Remus<jremus@linux.ibm.com> wrote:
>
>>>> --- a/fs/binfmt_elf.c
>>>> +++ b/fs/binfmt_elf.c
>>> [ ... ]
>>>> @@ -637,6 +638,21 @@ static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state,
>>>> return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp);
>>>> }
>>>>
>>>> +static void elf_add_sframe(struct elf_phdr *text, struct elf_phdr *sframe,
>>>> + unsigned long base_addr)
>>>> +{
>>>> + unsigned long sframe_start, sframe_end, text_start, text_end;
>>>> +
>>>> + sframe_start = base_addr + sframe->p_vaddr;
>>>> + sframe_end = sframe_start + sframe->p_memsz;
>>>> +
>>>> + text_start = base_addr + text->p_vaddr;
>>>> + text_end = text_start + text->p_memsz;
>>> Should text_end be calculated as text_start + text->p_memsz - 1?
>>>
>>> Since the maple tree API strictly expects inclusive ranges [first, last] for
>>> insertions, passing the exclusive text_end would cause the inserted range
>>> to cover one byte beyond the actual segment.
>> Shall we keep the text_end and sframe_end is pointing past the
>> respective section end semantic and just adjust the value passed on the
>> call to mtree_insert_range()?
> Let's keep the end exclusive, and add the "- 1" to text_end before
> passing it to the mtree_insert_range(). I did that to get it working for
> me.
>
>>> If an ELF binary has adjacent executable PT_LOAD segments, could the first
>>> segment's inclusive end overlap with the next segment's start and fail
>>> subsequent insertions?
>>>
>>>> +
>>>> + /* Ignore return value, sframe section isn't critical */
>>>> + sframe_add_section(sframe_start, sframe_end, text_start, text_end);
>>>> +}
>>>> +
>>> [ ... ]
>>>> @@ -712,6 +729,19 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
>>>> error = -ENOMEM;
>>>> goto out;
>>>> }
>>>> + break;
>>>> + }
>>>> + case PT_GNU_SFRAME:
>>>> + sframe_phdr = eppnt;
>>>> + break;
>>>> + }
>>>> + }
>>>> +
>>>> + if (sframe_phdr) {
>>>> + eppnt = interp_elf_phdata;
>>>> + for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
>>>> + if (eppnt->p_flags & PF_X)
>> if (eppnt->p_flags & PF_X && eppnt->p_type == PT_LOAD)
>>
>>>> + elf_add_sframe(eppnt, sframe_phdr, load_addr);
>>> Does this need to verify the segment is loadable (eppnt->p_type == PT_LOAD)?
>>>
>>> If non-loadable segments like PT_GNU_STACK have the PF_X flag set, they
>>> typically have p_vaddr and p_memsz as 0. Processing these segments would
>>> compute bogus text boundaries and bypass VMA validation. If a legitimate
>>> PT_LOAD segment overlaps with these bogus ranges, could its insertion fail
>>> silently and break stack unwinding for the executable?
>> Do you agree?
> Indu or Jose?
I think checking for PT_LOAD makes sense as we want to detect the
executable text segments.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v14 07/19] unwind_user/sframe: Wire up unwind_user to sframe
2026-05-07 16:18 ` Jens Remus
@ 2026-05-08 23:07 ` Indu Bhagat
0 siblings, 0 replies; 67+ messages in thread
From: Indu Bhagat @ 2026-05-08 23:07 UTC (permalink / raw)
To: Jens Remus, Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko
On 2026-05-07 09:18, Jens Remus wrote:
> On 5/5/2026 8:55 PM,sashiko-bot@kernel.org wrote:
>> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
>> - [High] Unbounded linear scan of user-controlled sframe entries can cause a CPU hard lockup in NMI context.
>> - [Medium] The commit message claims to refactor `unwind_user_next_common()` by removing an unused `pt_regs` parameter, but this change is entirely absent from the code modifications.
>> --
>>
>> commit 2e1e17bbaa1c0426d41d66ce7930a944f3e86ae7
>> Author: Josh Poimboeuf<jpoimboe@kernel.org>
>>
>> unwind_user/sframe: Wire up unwind_user to sframe
>>
>> This commit hooks up the sframe infrastructure into the unwind_user interface
>> to support user-space stack unwinding via sframe. It adds the
>> UNWIND_USER_TYPE_SFRAME type and updates the unwind_user_next() logic to
>> attempt sframe unwinding when available.
>>
>>> [ Jens Remus: Remove unused pt_regs from unwind_user_next_common() and
>>> its callers. Simplify unwind_user_next_sframe(). ]
>> This isn't a bug, but does this commit message match the changes?
>> Looking at the diff and the code, there are no modifications to
>> unwind_user_next_common() or its callers to remove a pt_regs parameter.
> Sashiko AI review failed to understand this commit message comment.
>
>>> diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
>>> index 90ab3c1a205ef..1fb2724197332 100644
>>> --- a/kernel/unwind/user.c
>>> +++ b/kernel/unwind/user.c
>> [ ... ]
>>> @@ -82,6 +83,16 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
>>> return unwind_user_next_common(state, &fp_frame);
>>> }
>>>
>>> +static int unwind_user_next_sframe(struct unwind_user_state *state)
>>> +{
>>> + struct unwind_user_frame frame;
>>> +
>>> + /* sframe expects the frame to be local storage */
>>> + if (sframe_find(state->ip, &frame))
>> Can this sequence cause a hard lockup in NMI context?
>>
>> Since this unwinder is often called from NMI context (like perf), sframe_find()
>> eventually calls __find_fre() to search for Frame Row Entries. If __find_fre()
>> performs a linear scan over an unbounded, user-controlled number of entries
>> (fde->fres_num), it could spin for a long time without yielding.
>>
>> If a malicious userspace program maps a large VMA and provides hundreds of
>> millions of FREs, would this loop calling unsafe_get_user() on every
>> iteration tie up the CPU long enough in NMI context to trigger the hard
>> lockup detector?
> Please advise.
>
> A user could craft .sframe with a large number of FDEs (up to UINT32_MAX)
> and large number of FREs per FDE (up to UINT16_MAX). Each FDE could point
> at the same FDA and FREs to save section space. Each FDE would require
> 16 bytes in the FDE table. All FDEs could share a single FDA of 5 bytes
> and share FREs of minimum 5 bytes each (when using the 16-bit start
> address offset to have UINT16_MAX FREs) in the FRE table.
For the case of malicious .sframe with large number of FDEs, wouldnt the
existing checks of the various offsets being less than sframe section
size suffice and the following in sframe_read_header suffice:
if (fres_start < fdes_end || fres_end > sec->sframe_end)
For malicious num fres, should we add a check that a function's num fres
is less than or equal to func size (available in the FDE index entry)?
It is an approximate check, but should provide some boundary check.
^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2026-05-08 23:07 UTC | newest]
Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05 12:16 [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
2026-05-05 12:17 ` [PATCH v14 01/19] unwind_user: Add generic and arch-specific headers to MAINTAINERS Jens Remus
2026-05-05 12:17 ` [PATCH v14 02/19] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
2026-05-05 12:49 ` sashiko-bot
2026-05-06 13:42 ` Jens Remus
2026-05-07 14:55 ` Jens Remus
2026-05-08 23:02 ` Indu Bhagat
2026-05-05 12:17 ` [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree Jens Remus
2026-05-05 18:51 ` sashiko-bot
2026-05-06 13:50 ` Jens Remus
2026-05-06 15:21 ` Steven Rostedt
2026-05-05 12:17 ` [PATCH v14 04/19] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
2026-05-05 18:22 ` sashiko-bot
2026-05-06 14:13 ` Jens Remus
2026-05-06 15:05 ` Steven Rostedt
2026-05-06 14:09 ` Jens Remus
2026-05-06 15:03 ` Steven Rostedt
2026-05-06 21:13 ` David Laight
2026-05-06 21:17 ` David Laight
2026-05-05 12:17 ` [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
2026-05-05 18:59 ` sashiko-bot
2026-05-06 14:34 ` Jens Remus
2026-05-06 15:01 ` Steven Rostedt
2026-05-06 15:29 ` Jens Remus
2026-05-08 9:49 ` Jens Remus
2026-05-08 23:04 ` Indu Bhagat
2026-05-08 23:03 ` Indu Bhagat
2026-05-08 10:50 ` Jens Remus
2026-05-05 12:17 ` [PATCH v14 06/19] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
2026-05-05 12:53 ` sashiko-bot
2026-05-06 14:56 ` Jens Remus
2026-05-06 15:36 ` Steven Rostedt
2026-05-08 23:05 ` Indu Bhagat
2026-05-05 12:17 ` [PATCH v14 07/19] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
2026-05-05 18:55 ` sashiko-bot
2026-05-07 16:18 ` Jens Remus
2026-05-08 23:07 ` Indu Bhagat
2026-05-05 12:17 ` [PATCH v14 08/19] unwind_user: Stop when reaching an outermost frame Jens Remus
2026-05-05 12:40 ` sashiko-bot
2026-05-06 15:01 ` Jens Remus
2026-05-06 15:40 ` Steven Rostedt
2026-05-05 12:17 ` [PATCH v14 09/19] unwind_user/sframe: Add support for outermost frame indication Jens Remus
2026-05-05 12:17 ` [PATCH v14 10/19] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
2026-05-05 20:39 ` sashiko-bot
2026-05-07 16:23 ` Jens Remus
2026-05-05 12:17 ` [PATCH v14 11/19] unwind_user/sframe: Show file name in debug output Jens Remus
2026-05-05 18:46 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 12/19] unwind_user/sframe: Add .sframe validation option Jens Remus
2026-05-05 18:32 ` sashiko-bot
2026-05-08 10:51 ` Jens Remus
2026-05-05 12:17 ` [PATCH v14 13/19] unwind_user: Enable archs that pass RA in a register Jens Remus
2026-05-05 18:35 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 14/19] unwind_user: Flexible FP/RA recovery rules Jens Remus
2026-05-05 18:34 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 15/19] unwind_user: Flexible CFA " Jens Remus
2026-05-05 12:17 ` [PATCH v14 16/19] unwind_user/sframe: Add support for SFrame V3 flexible FDEs Jens Remus
2026-05-05 18:55 ` sashiko-bot
2026-05-07 15:30 ` Jens Remus
2026-05-05 12:17 ` [PATCH v14 17/19] unwind_user/sframe: Separate reading of FRE from reading of FRE data words Jens Remus
2026-05-05 19:05 ` sashiko-bot
2026-05-07 16:01 ` Jens Remus
2026-05-05 12:17 ` [PATCH v14 18/19] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
2026-05-05 19:07 ` sashiko-bot
2026-05-05 12:17 ` [PATCH v14 19/19] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
2026-05-05 18:45 ` sashiko-bot
2026-05-07 14:14 ` Jens Remus
2026-05-05 12:25 ` [PATCH v14 00/19] unwind_deferred: Implement sframe handling Jens Remus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox