* [PATCH v10 00/11] unwind_deferred: Implement sframe handling
@ 2025-08-27 20:15 Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 01/11] unwind_user/sframe: Add support for reading .sframe headers Steven Rostedt
` (10 more replies)
0 siblings, 11 replies; 16+ messages in thread
From: Steven Rostedt @ 2025-08-27 20:15 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell
[
This version is simply a rebase of v9 on top of the v6.17-rc3.
It needs to be updated to work with the latest SFrame specification.
Indu said she'll be able to make those changes, but I needed to
forward port the latest code.
You can test this code with the x86 and perf changes applied at:
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
unwind/sframe-test
]
This is the implementation of parsing the SFrame section in an ELF file.
It's a continuation of Josh's last work that can be found here:
https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
Currently the only way to get a user space stack trace from a stack
walk (and not just copying large amount of user stack into the kernel
ring buffer) is to use frame pointers. This has a few issues. The biggest
one is that compiling frame pointers into every application and library
has been shown to cause performance overhead.
Another issue is that the format of the frames may not always be consistent
between different compilers and some architectures (s390) has no defined
format to do a reliable stack walk. The only way to perform user space
profiling on these architectures is to copy the user stack into the kernel
buffer.
SFrames[1] is now supported in gcc binutils and soon will also be supported
by LLVM. SFrames acts more like ORC, and lives in the ELF executable
file as its own section. Like ORC it has two tables where the first table
is sorted by instruction pointers (IP) and using the current IP and finding
it's entry in the first table, it will take you to the second table which
will tell you where the return address of the current function is located
and then you can use that address to look it up in the first table to find
the return address of that function, and so on. This performs a user
space stack walk.
Now because the SFrame section lives in the ELF file it needs to be faulted
into memory when it is used. This means that walking the user space stack
requires being in a faultable context. As profilers like perf request a stack
trace in interrupt or NMI context, it cannot do the walking when it is
requested. Instead it must be deferred until it is safe to fault in user
space. One place this is known to be safe is when the task is about to return
back to user space.
This series makes the deferred unwind code implement SFrames.
[1] https://sourceware.org/binutils/wiki/sframe
Changes since v9: https://lore.kernel.org/linux-trace-kernel/20250717012848.927473176@kernel.org/
- Rebased on v6.17-rc3
- Update the changes to unwind/user.c to handle passing a const
unwind_user_frame pointer.
Josh Poimboeuf (11):
unwind_user/sframe: Add support for reading .sframe headers
unwind_user/sframe: Store sframe section data in per-mm maple tree
x86/uaccess: Add unsafe_copy_from_user() implementation
unwind_user/sframe: Add support for reading .sframe contents
unwind_user/sframe: Detect .sframe sections in executables
unwind_user/sframe: Wire up unwind_user to sframe
unwind_user/sframe/x86: Enable sframe unwinding on x86
unwind_user/sframe: Remove .sframe section on detected corruption
unwind_user/sframe: Show file name in debug output
unwind_user/sframe: Add .sframe validation option
unwind_user/sframe: Add prctl() interface for registering .sframe sections
----
MAINTAINERS | 1 +
arch/Kconfig | 23 ++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/mmu.h | 2 +-
arch/x86/include/asm/uaccess.h | 39 ++-
fs/binfmt_elf.c | 49 +++-
include/linux/mm_types.h | 3 +
include/linux/sframe.h | 60 ++++
include/linux/unwind_user_types.h | 4 +-
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 6 +-
kernel/fork.c | 10 +
kernel/sys.c | 9 +
kernel/unwind/Makefile | 3 +-
kernel/unwind/sframe.c | 593 ++++++++++++++++++++++++++++++++++++++
kernel/unwind/sframe.h | 71 +++++
kernel/unwind/sframe_debug.h | 68 +++++
kernel/unwind/user.c | 41 ++-
mm/init-mm.c | 2 +
19 files changed, 967 insertions(+), 19 deletions(-)
create mode 100644 include/linux/sframe.h
create mode 100644 kernel/unwind/sframe.c
create mode 100644 kernel/unwind/sframe.h
create mode 100644 kernel/unwind/sframe_debug.h
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v10 01/11] unwind_user/sframe: Add support for reading .sframe headers
2025-08-27 20:15 [PATCH v10 00/11] unwind_deferred: Implement sframe handling Steven Rostedt
@ 2025-08-27 20:15 ` Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 02/11] unwind_user/sframe: Store sframe section data in per-mm maple tree Steven Rostedt
` (9 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2025-08-27 20:15 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell
From: Josh Poimboeuf <jpoimboe@kernel.org>
In preparation for unwinding user space stacks with sframe, add basic
sframe compile infrastructure and support for reading the .sframe
section header.
sframe_add_section() reads the header and unconditionally returns an
error, so it's not very useful yet. A subsequent patch will improve
that.
Link: https://lore.kernel.org/all/f27e8463783febfa0dabb0432a3dd6be8ad98412.1737511963.git.jpoimboe@kernel.org/
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
MAINTAINERS | 1 +
arch/Kconfig | 3 +
include/linux/sframe.h | 40 ++++++++++++
kernel/unwind/Makefile | 3 +-
kernel/unwind/sframe.c | 136 +++++++++++++++++++++++++++++++++++++++++
kernel/unwind/sframe.h | 71 +++++++++++++++++++++
6 files changed, 253 insertions(+), 1 deletion(-)
create mode 100644 include/linux/sframe.h
create mode 100644 kernel/unwind/sframe.c
create mode 100644 kernel/unwind/sframe.h
diff --git a/MAINTAINERS b/MAINTAINERS
index fed6cd812d79..42b0fb83516a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -26333,6 +26333,7 @@ USERSPACE STACK UNWINDING
M: Josh Poimboeuf <jpoimboe@kernel.org>
M: Steven Rostedt <rostedt@goodmis.org>
S: Maintained
+F: include/linux/sframe.h
F: include/linux/unwind*.h
F: kernel/unwind/
diff --git a/arch/Kconfig b/arch/Kconfig
index d1b4ffd6e085..69fcabf53088 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -451,6 +451,9 @@ config HAVE_UNWIND_USER_FP
bool
select UNWIND_USER
+config HAVE_UNWIND_USER_SFRAME
+ bool
+
config HAVE_PERF_REGS
bool
help
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
new file mode 100644
index 000000000000..0584f661f698
--- /dev/null
+++ b/include/linux/sframe.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_SFRAME_H
+#define _LINUX_SFRAME_H
+
+#include <linux/mm_types.h>
+#include <linux/unwind_user_types.h>
+
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+
+struct sframe_section {
+ unsigned long sframe_start;
+ unsigned long sframe_end;
+ unsigned long text_start;
+ unsigned long text_end;
+
+ unsigned long fdes_start;
+ unsigned long fres_start;
+ unsigned long fres_end;
+ unsigned int num_fdes;
+
+ signed char ra_off;
+ signed char fp_off;
+};
+
+extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+ unsigned long text_start, unsigned long text_end);
+extern int sframe_remove_section(unsigned long sframe_addr);
+
+#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
+
+static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+ unsigned long text_start, unsigned long text_end)
+{
+ return -ENOSYS;
+}
+static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+
+#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
+
+#endif /* _LINUX_SFRAME_H */
diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile
index eae37bea54fd..146038165865 100644
--- a/kernel/unwind/Makefile
+++ b/kernel/unwind/Makefile
@@ -1 +1,2 @@
- obj-$(CONFIG_UNWIND_USER) += user.o deferred.o
+ obj-$(CONFIG_UNWIND_USER) += user.o deferred.o
+ obj-$(CONFIG_HAVE_UNWIND_USER_SFRAME) += sframe.o
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
new file mode 100644
index 000000000000..20287f795b36
--- /dev/null
+++ b/kernel/unwind/sframe.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Userspace sframe access functions
+ */
+
+#define pr_fmt(fmt) "sframe: " fmt
+
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/srcu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <linux/string_helpers.h>
+#include <linux/sframe.h>
+#include <linux/unwind_user_types.h>
+
+#include "sframe.h"
+
+#define dbg(fmt, ...) \
+ pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+
+static void free_section(struct sframe_section *sec)
+{
+ kfree(sec);
+}
+
+static int sframe_read_header(struct sframe_section *sec)
+{
+ unsigned long header_end, fdes_start, fdes_end, fres_start, fres_end;
+ struct sframe_header shdr;
+ unsigned int num_fdes;
+
+ if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
+ dbg("header usercopy failed\n");
+ return -EFAULT;
+ }
+
+ if (shdr.preamble.magic != SFRAME_MAGIC ||
+ shdr.preamble.version != SFRAME_VERSION_2 ||
+ !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
+ shdr.auxhdr_len) {
+ dbg("bad/unsupported sframe header\n");
+ return -EINVAL;
+ }
+
+ if (!shdr.num_fdes || !shdr.num_fres) {
+ dbg("no fde/fre entries\n");
+ return -EINVAL;
+ }
+
+ header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
+ if (header_end >= sec->sframe_end) {
+ dbg("header doesn't fit in section\n");
+ return -EINVAL;
+ }
+
+ num_fdes = shdr.num_fdes;
+ fdes_start = header_end + shdr.fdes_off;
+ fdes_end = fdes_start + (num_fdes * sizeof(struct sframe_fde));
+
+ fres_start = header_end + shdr.fres_off;
+ fres_end = fres_start + shdr.fre_len;
+
+ if (fres_start < fdes_end || fres_end > sec->sframe_end) {
+ dbg("inconsistent fde/fre offsets\n");
+ return -EINVAL;
+ }
+
+ sec->num_fdes = num_fdes;
+ sec->fdes_start = fdes_start;
+ sec->fres_start = fres_start;
+ sec->fres_end = fres_end;
+
+ sec->ra_off = shdr.cfa_fixed_ra_offset;
+ sec->fp_off = shdr.cfa_fixed_fp_offset;
+
+ return 0;
+}
+
+int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+ unsigned long text_start, unsigned long text_end)
+{
+ struct maple_tree *sframe_mt = ¤t->mm->sframe_mt;
+ struct vm_area_struct *sframe_vma, *text_vma;
+ struct mm_struct *mm = current->mm;
+ struct sframe_section *sec;
+ int ret;
+
+ if (!sframe_start || !sframe_end || !text_start || !text_end) {
+ dbg("zero-length sframe/text address\n");
+ return -EINVAL;
+ }
+
+ scoped_guard(mmap_read_lock, mm) {
+ sframe_vma = vma_lookup(mm, sframe_start);
+ if (!sframe_vma || sframe_end > sframe_vma->vm_end) {
+ dbg("bad sframe address (0x%lx - 0x%lx)\n",
+ sframe_start, sframe_end);
+ return -EINVAL;
+ }
+
+ text_vma = vma_lookup(mm, text_start);
+ if (!text_vma ||
+ !(text_vma->vm_flags & VM_EXEC) ||
+ text_end > text_vma->vm_end) {
+ dbg("bad text address (0x%lx - 0x%lx)\n",
+ text_start, text_end);
+ return -EINVAL;
+ }
+ }
+
+ sec = kzalloc(sizeof(*sec), GFP_KERNEL);
+ if (!sec)
+ return -ENOMEM;
+
+ sec->sframe_start = sframe_start;
+ sec->sframe_end = sframe_end;
+ sec->text_start = text_start;
+ sec->text_end = text_end;
+
+ ret = sframe_read_header(sec);
+ if (ret)
+ goto err_free;
+
+ /* TODO nowhere to store it yet - just free it and return an error */
+ ret = -ENOSYS;
+
+err_free:
+ free_section(sec);
+ return ret;
+}
+
+int sframe_remove_section(unsigned long sframe_start)
+{
+ return -ENOSYS;
+}
diff --git a/kernel/unwind/sframe.h b/kernel/unwind/sframe.h
new file mode 100644
index 000000000000..e9bfccfaf5b4
--- /dev/null
+++ b/kernel/unwind/sframe.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * From https://www.sourceware.org/binutils/docs/sframe-spec.html
+ */
+#ifndef _SFRAME_H
+#define _SFRAME_H
+
+#include <linux/types.h>
+
+#define SFRAME_VERSION_1 1
+#define SFRAME_VERSION_2 2
+#define SFRAME_MAGIC 0xdee2
+
+#define SFRAME_F_FDE_SORTED 0x1
+#define SFRAME_F_FRAME_POINTER 0x2
+
+#define SFRAME_ABI_AARCH64_ENDIAN_BIG 1
+#define SFRAME_ABI_AARCH64_ENDIAN_LITTLE 2
+#define SFRAME_ABI_AMD64_ENDIAN_LITTLE 3
+
+#define SFRAME_FDE_TYPE_PCINC 0
+#define SFRAME_FDE_TYPE_PCMASK 1
+
+struct sframe_preamble {
+ u16 magic;
+ u8 version;
+ u8 flags;
+} __packed;
+
+struct sframe_header {
+ struct sframe_preamble preamble;
+ u8 abi_arch;
+ s8 cfa_fixed_fp_offset;
+ s8 cfa_fixed_ra_offset;
+ u8 auxhdr_len;
+ u32 num_fdes;
+ u32 num_fres;
+ u32 fre_len;
+ u32 fdes_off;
+ u32 fres_off;
+} __packed;
+
+#define SFRAME_HEADER_SIZE(header) \
+ ((sizeof(struct sframe_header) + header.auxhdr_len))
+
+#define SFRAME_AARCH64_PAUTH_KEY_A 0
+#define SFRAME_AARCH64_PAUTH_KEY_B 1
+
+struct sframe_fde {
+ s32 start_addr;
+ u32 func_size;
+ u32 fres_off;
+ u32 fres_num;
+ u8 info;
+ u8 rep_size;
+ u16 padding;
+} __packed;
+
+#define SFRAME_FUNC_FRE_TYPE(data) (data & 0xf)
+#define SFRAME_FUNC_FDE_TYPE(data) ((data >> 4) & 0x1)
+#define SFRAME_FUNC_PAUTH_KEY(data) ((data >> 5) & 0x1)
+
+#define SFRAME_BASE_REG_FP 0
+#define SFRAME_BASE_REG_SP 1
+
+#define SFRAME_FRE_CFA_BASE_REG_ID(data) (data & 0x1)
+#define SFRAME_FRE_OFFSET_COUNT(data) ((data >> 1) & 0xf)
+#define SFRAME_FRE_OFFSET_SIZE(data) ((data >> 5) & 0x3)
+#define SFRAME_FRE_MANGLED_RA_P(data) ((data >> 7) & 0x1)
+
+#endif /* _SFRAME_H */
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 02/11] unwind_user/sframe: Store sframe section data in per-mm maple tree
2025-08-27 20:15 [PATCH v10 00/11] unwind_deferred: Implement sframe handling Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 01/11] unwind_user/sframe: Add support for reading .sframe headers Steven Rostedt
@ 2025-08-27 20:15 ` Steven Rostedt
2025-08-28 1:46 ` Liam R. Howlett
2025-08-27 20:15 ` [PATCH v10 03/11] x86/uaccess: Add unsafe_copy_from_user() implementation Steven Rostedt
` (8 subsequent siblings)
10 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2025-08-27 20:15 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm
From: Josh Poimboeuf <jpoimboe@kernel.org>
Associate an sframe section with its mm by adding it to a per-mm maple
tree which is indexed by the corresponding text address range. A single
sframe section can be associated with multiple text ranges.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: x86@kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
arch/x86/include/asm/mmu.h | 2 +-
include/linux/mm_types.h | 3 +++
include/linux/sframe.h | 13 +++++++++
kernel/fork.c | 10 +++++++
kernel/unwind/sframe.c | 55 +++++++++++++++++++++++++++++++++++---
mm/init-mm.c | 2 ++
6 files changed, 81 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 0fe9c569d171..227a32899a59 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -87,7 +87,7 @@ typedef struct {
.context = { \
.ctx_id = 1, \
.lock = __MUTEX_INITIALIZER(mm.context.lock), \
- }
+ },
void leave_mm(void);
#define leave_mm leave_mm
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 08bc2442db93..31fbd6663047 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1210,6 +1210,9 @@ struct mm_struct {
#ifdef CONFIG_MM_ID
mm_id_t mm_id;
#endif /* CONFIG_MM_ID */
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+ struct maple_tree sframe_mt;
+#endif
} __randomize_layout;
/*
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 0584f661f698..73bf6f0b30c2 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -22,18 +22,31 @@ struct sframe_section {
signed char fp_off;
};
+#define INIT_MM_SFRAME .sframe_mt = MTREE_INIT(sframe_mt, 0),
+extern void sframe_free_mm(struct mm_struct *mm);
+
extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end);
extern int sframe_remove_section(unsigned long sframe_addr);
+static inline bool current_has_sframe(void)
+{
+ struct mm_struct *mm = current->mm;
+
+ return mm && !mtree_empty(&mm->sframe_mt);
+}
+
#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
+#define INIT_MM_SFRAME
+static inline void sframe_free_mm(struct mm_struct *mm) {}
static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end)
{
return -ENOSYS;
}
static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+static inline bool current_has_sframe(void) { return false; }
#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
diff --git a/kernel/fork.c b/kernel/fork.c
index af673856499d..496781b389bc 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -106,6 +106,7 @@
#include <linux/pidfs.h>
#include <linux/tick.h>
#include <linux/unwind_deferred.h>
+#include <linux/sframe.h>
#include <asm/pgalloc.h>
#include <linux/uaccess.h>
@@ -690,6 +691,7 @@ void __mmdrop(struct mm_struct *mm)
mm_destroy_cid(mm);
percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS);
futex_hash_free(mm);
+ sframe_free_mm(mm);
free_mm(mm);
}
@@ -1027,6 +1029,13 @@ static void mmap_init_lock(struct mm_struct *mm)
#endif
}
+static void mm_init_sframe(struct mm_struct *mm)
+{
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+ mt_init(&mm->sframe_mt);
+#endif
+}
+
static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
struct user_namespace *user_ns)
{
@@ -1055,6 +1064,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm->pmd_huge_pte = NULL;
#endif
mm_init_uprobes_state(mm);
+ mm_init_sframe(mm);
hugetlb_count_init(mm);
if (current->mm) {
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 20287f795b36..fa7d87ffd00a 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -122,15 +122,64 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
if (ret)
goto err_free;
- /* TODO nowhere to store it yet - just free it and return an error */
- ret = -ENOSYS;
+ ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
+ if (ret) {
+ dbg("mtree_insert_range failed: text=%lx-%lx\n",
+ sec->text_start, sec->text_end);
+ goto err_free;
+ }
+
+ return 0;
err_free:
free_section(sec);
return ret;
}
+static int __sframe_remove_section(struct mm_struct *mm,
+ struct sframe_section *sec)
+{
+ if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
+ dbg("mtree_erase failed: text=%lx\n", sec->text_start);
+ return -EINVAL;
+ }
+
+ free_section(sec);
+
+ return 0;
+}
+
int sframe_remove_section(unsigned long sframe_start)
{
- return -ENOSYS;
+ struct mm_struct *mm = current->mm;
+ struct sframe_section *sec;
+ unsigned long index = 0;
+ bool found = false;
+ int ret = 0;
+
+ mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
+ if (sec->sframe_start == sframe_start) {
+ found = true;
+ ret |= __sframe_remove_section(mm, sec);
+ }
+ }
+
+ if (!found || ret)
+ return -EINVAL;
+
+ return 0;
+}
+
+void sframe_free_mm(struct mm_struct *mm)
+{
+ struct sframe_section *sec;
+ unsigned long index = 0;
+
+ if (!mm)
+ return;
+
+ mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX)
+ free_section(sec);
+
+ mtree_destroy(&mm->sframe_mt);
}
diff --git a/mm/init-mm.c b/mm/init-mm.c
index 4600e7605cab..b32fcf167cc2 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -11,6 +11,7 @@
#include <linux/atomic.h>
#include <linux/user_namespace.h>
#include <linux/iommu.h>
+#include <linux/sframe.h>
#include <asm/mmu.h>
#ifndef INIT_MM_CONTEXT
@@ -46,6 +47,7 @@ struct mm_struct init_mm = {
.user_ns = &init_user_ns,
.cpu_bitmap = CPU_BITS_NONE,
INIT_MM_CONTEXT(init_mm)
+ INIT_MM_SFRAME
};
void setup_initial_init_mm(void *start_code, void *end_code,
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 03/11] x86/uaccess: Add unsafe_copy_from_user() implementation
2025-08-27 20:15 [PATCH v10 00/11] unwind_deferred: Implement sframe handling Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 01/11] unwind_user/sframe: Add support for reading .sframe headers Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 02/11] unwind_user/sframe: Store sframe section data in per-mm maple tree Steven Rostedt
@ 2025-08-27 20:15 ` Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 04/11] unwind_user/sframe: Add support for reading .sframe contents Steven Rostedt
` (7 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2025-08-27 20:15 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell
From: Josh Poimboeuf <jpoimboe@kernel.org>
Add an x86 implementation of unsafe_copy_from_user() similar to the
existing unsafe_copy_to_user().
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
arch/x86/include/asm/uaccess.h | 39 +++++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 3a7755c1a441..3caf02d0503e 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -599,7 +599,7 @@ _label: \
* We want the unsafe accessors to always be inlined and use
* the error labels - thus the macro games.
*/
-#define unsafe_copy_loop(dst, src, len, type, label) \
+#define unsafe_copy_to_user_loop(dst, src, len, type, label) \
while (len >= sizeof(type)) { \
unsafe_put_user(*(type *)(src),(type __user *)(dst),label); \
dst += sizeof(type); \
@@ -607,15 +607,34 @@ _label: \
len -= sizeof(type); \
}
-#define unsafe_copy_to_user(_dst,_src,_len,label) \
-do { \
- char __user *__ucu_dst = (_dst); \
- const char *__ucu_src = (_src); \
- size_t __ucu_len = (_len); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label); \
+#define unsafe_copy_to_user(_dst, _src, _len, label) \
+do { \
+ void __user *__dst = (_dst); \
+ const void *__src = (_src); \
+ size_t __len = (_len); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u64, label); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u32, label); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u16, label); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u8, label); \
+} while (0)
+
+#define unsafe_copy_from_user_loop(dst, src, len, type, label) \
+ while (len >= sizeof(type)) { \
+ unsafe_get_user(*(type *)(dst), (type __user *)(src), label); \
+ dst += sizeof(type); \
+ src += sizeof(type); \
+ len -= sizeof(type); \
+ }
+
+#define unsafe_copy_from_user(_dst, _src, _len, label) \
+do { \
+ void *__dst = (_dst); \
+ void __user *__src = (_src); \
+ size_t __len = (_len); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u64, label); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u32, label); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u16, label); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u8, label); \
} while (0)
#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 04/11] unwind_user/sframe: Add support for reading .sframe contents
2025-08-27 20:15 [PATCH v10 00/11] unwind_deferred: Implement sframe handling Steven Rostedt
` (2 preceding siblings ...)
2025-08-27 20:15 ` [PATCH v10 03/11] x86/uaccess: Add unsafe_copy_from_user() implementation Steven Rostedt
@ 2025-08-27 20:15 ` Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 05/11] unwind_user/sframe: Detect .sframe sections in executables Steven Rostedt
` (6 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2025-08-27 20:15 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell
From: Josh Poimboeuf <jpoimboe@kernel.org>
In preparation for using sframe to unwind user space stacks, add an
sframe_find() interface for finding the sframe information associated
with a given text address.
For performance, use user_read_access_begin() and the corresponding
unsafe_*() accessors. Note that use of pr_debug() in uaccess-enabled
regions would break noinstr validation, so there aren't any debug
messages yet. That will be added in a subsequent commit.
Link: https://lore.kernel.org/all/77c0d1ec143bf2a53d66c4ecb190e7e0a576fbfd.1737511963.git.jpoimboe@kernel.org/
Link: https://lore.kernel.org/all/b35ca3a3-8de5-4d32-8d30-d4e562f6b0de@linux.ibm.com/
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/sframe.h | 5 +
kernel/unwind/sframe.c | 311 ++++++++++++++++++++++++++++++++++-
kernel/unwind/sframe_debug.h | 35 ++++
3 files changed, 347 insertions(+), 4 deletions(-)
create mode 100644 kernel/unwind/sframe_debug.h
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 73bf6f0b30c2..9a72209696f9 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -3,11 +3,14 @@
#define _LINUX_SFRAME_H
#include <linux/mm_types.h>
+#include <linux/srcu.h>
#include <linux/unwind_user_types.h>
#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
struct sframe_section {
+ struct rcu_head rcu;
+
unsigned long sframe_start;
unsigned long sframe_end;
unsigned long text_start;
@@ -28,6 +31,7 @@ extern void sframe_free_mm(struct mm_struct *mm);
extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end);
extern int sframe_remove_section(unsigned long sframe_addr);
+extern int sframe_find(unsigned long ip, struct unwind_user_frame *frame);
static inline bool current_has_sframe(void)
{
@@ -46,6 +50,7 @@ static inline int sframe_add_section(unsigned long sframe_start, unsigned long s
return -ENOSYS;
}
static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+static inline int sframe_find(unsigned long ip, struct unwind_user_frame *frame) { return -ENOSYS; }
static inline bool current_has_sframe(void) { return false; }
#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index fa7d87ffd00a..b10420d19840 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -15,9 +15,303 @@
#include <linux/unwind_user_types.h>
#include "sframe.h"
+#include "sframe_debug.h"
-#define dbg(fmt, ...) \
- pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+struct sframe_fre {
+ unsigned int size;
+ u32 ip_off;
+ s32 cfa_off;
+ s32 ra_off;
+ s32 fp_off;
+ u8 info;
+};
+
+DEFINE_STATIC_SRCU(sframe_srcu);
+
+static __always_inline unsigned char fre_type_to_size(unsigned char fre_type)
+{
+ if (fre_type > 2)
+ return 0;
+ return 1 << fre_type;
+}
+
+static __always_inline unsigned char offset_size_enum_to_size(unsigned char off_size)
+{
+ if (off_size > 2)
+ return 0;
+ return 1 << off_size;
+}
+
+static __always_inline int __read_fde(struct sframe_section *sec,
+ unsigned int fde_num,
+ struct sframe_fde *fde)
+{
+ unsigned long fde_addr, ip;
+
+ fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde));
+ unsafe_copy_from_user(fde, (void __user *)fde_addr,
+ sizeof(struct sframe_fde), Efault);
+
+ ip = sec->sframe_start + fde->start_addr;
+ if (ip < sec->text_start || ip > sec->text_end)
+ return -EINVAL;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+static __always_inline int __find_fde(struct sframe_section *sec,
+ unsigned long ip,
+ struct sframe_fde *fde)
+{
+ s32 ip_off, func_off_low = S32_MIN, func_off_high = S32_MAX;
+ struct sframe_fde __user *first, *low, *high, *found = NULL;
+ int ret;
+
+ ip_off = ip - sec->sframe_start;
+
+ first = (void __user *)sec->fdes_start;
+ low = first;
+ high = first + sec->num_fdes - 1;
+
+ while (low <= high) {
+ struct sframe_fde __user *mid;
+ s32 func_off;
+
+ mid = low + ((high - low) / 2);
+
+ unsafe_get_user(func_off, (s32 __user *)mid, Efault);
+
+ if (ip_off >= func_off) {
+ if (func_off < func_off_low)
+ return -EFAULT;
+
+ func_off_low = func_off;
+
+ found = mid;
+ low = mid + 1;
+ } else {
+ if (func_off > func_off_high)
+ return -EFAULT;
+
+ func_off_high = func_off;
+
+ high = mid - 1;
+ }
+ }
+
+ if (!found)
+ return -EINVAL;
+
+ ret = __read_fde(sec, found - first, fde);
+ if (ret)
+ return ret;
+
+ /* make sure it's not in a gap */
+ if (ip_off < fde->start_addr || ip_off >= fde->start_addr + fde->func_size)
+ return -EINVAL;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+#define ____UNSAFE_GET_USER_INC(to, from, type, label) \
+({ \
+ type __to; \
+ unsafe_get_user(__to, (type __user *)from, label); \
+ from += sizeof(__to); \
+ to = __to; \
+})
+
+#define __UNSAFE_GET_USER_INC(to, from, size, label, u_or_s) \
+({ \
+ switch (size) { \
+ case 1: \
+ ____UNSAFE_GET_USER_INC(to, from, u_or_s##8, label); \
+ break; \
+ case 2: \
+ ____UNSAFE_GET_USER_INC(to, from, u_or_s##16, label); \
+ break; \
+ case 4: \
+ ____UNSAFE_GET_USER_INC(to, from, u_or_s##32, label); \
+ break; \
+ default: \
+ return -EFAULT; \
+ } \
+})
+
+#define UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label) \
+ __UNSAFE_GET_USER_INC(to, from, size, label, u)
+
+#define UNSAFE_GET_USER_SIGNED_INC(to, from, size, label) \
+ __UNSAFE_GET_USER_INC(to, from, size, label, s)
+
+#define UNSAFE_GET_USER_INC(to, from, size, label) \
+ _Generic(to, \
+ u8: UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ u16: UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ u32: UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ s8: UNSAFE_GET_USER_SIGNED_INC(to, from, size, label), \
+ s16: UNSAFE_GET_USER_SIGNED_INC(to, from, size, label), \
+ s32: UNSAFE_GET_USER_SIGNED_INC(to, from, size, label))
+
+static __always_inline int __read_fre(struct sframe_section *sec,
+ struct sframe_fde *fde,
+ unsigned long fre_addr,
+ struct sframe_fre *fre)
+{
+ unsigned char fde_type = SFRAME_FUNC_FDE_TYPE(fde->info);
+ unsigned char fre_type = SFRAME_FUNC_FRE_TYPE(fde->info);
+ unsigned char offset_count, offset_size;
+ s32 cfa_off, ra_off, fp_off;
+ unsigned long cur = fre_addr;
+ unsigned char addr_size;
+ u32 ip_off;
+ u8 info;
+
+ addr_size = fre_type_to_size(fre_type);
+ if (!addr_size)
+ return -EFAULT;
+
+ if (fre_addr + addr_size + 1 > sec->fres_end)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(ip_off, cur, addr_size, Efault);
+ if (fde_type == SFRAME_FDE_TYPE_PCINC && ip_off > fde->func_size)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(info, cur, 1, Efault);
+ offset_count = SFRAME_FRE_OFFSET_COUNT(info);
+ offset_size = offset_size_enum_to_size(SFRAME_FRE_OFFSET_SIZE(info));
+ if (!offset_count || !offset_size)
+ return -EFAULT;
+
+ if (cur + (offset_count * offset_size) > sec->fres_end)
+ return -EFAULT;
+
+ fre->size = addr_size + 1 + (offset_count * offset_size);
+
+ UNSAFE_GET_USER_INC(cfa_off, cur, offset_size, Efault);
+ offset_count--;
+
+ ra_off = sec->ra_off;
+ if (!ra_off) {
+ if (!offset_count--)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(ra_off, cur, offset_size, Efault);
+ }
+
+ fp_off = sec->fp_off;
+ if (!fp_off && offset_count) {
+ offset_count--;
+ UNSAFE_GET_USER_INC(fp_off, cur, offset_size, Efault);
+ }
+
+ if (offset_count)
+ return -EFAULT;
+
+ fre->ip_off = ip_off;
+ fre->cfa_off = cfa_off;
+ fre->ra_off = ra_off;
+ fre->fp_off = fp_off;
+ fre->info = info;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+static __always_inline int __find_fre(struct sframe_section *sec,
+ struct sframe_fde *fde, unsigned long ip,
+ struct unwind_user_frame *frame)
+{
+ unsigned char fde_type = SFRAME_FUNC_FDE_TYPE(fde->info);
+ struct sframe_fre *fre, *prev_fre = NULL;
+ struct sframe_fre fres[2];
+ unsigned long fre_addr;
+ bool which = false;
+ unsigned int i;
+ u32 ip_off;
+
+ ip_off = ip - (sec->sframe_start + fde->start_addr);
+
+ if (fde_type == SFRAME_FDE_TYPE_PCMASK)
+ ip_off %= fde->rep_size;
+
+ fre_addr = sec->fres_start + fde->fres_off;
+
+ for (i = 0; i < fde->fres_num; i++) {
+ int ret;
+
+ /*
+ * Alternate between the two fre_addr[] entries for 'fre' and
+ * 'prev_fre'.
+ */
+ fre = which ? fres : fres + 1;
+ which = !which;
+
+ ret = __read_fre(sec, fde, fre_addr, fre);
+ if (ret)
+ return ret;
+
+ fre_addr += fre->size;
+
+ if (prev_fre && fre->ip_off <= prev_fre->ip_off)
+ return -EFAULT;
+
+ if (fre->ip_off > ip_off)
+ break;
+
+ prev_fre = fre;
+ }
+
+ if (!prev_fre)
+ return -EINVAL;
+ fre = prev_fre;
+
+ frame->cfa_off = fre->cfa_off;
+ frame->ra_off = fre->ra_off;
+ frame->fp_off = fre->fp_off;
+ frame->use_fp = SFRAME_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
+
+ return 0;
+}
+
+int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
+{
+ struct mm_struct *mm = current->mm;
+ struct sframe_section *sec;
+ struct sframe_fde fde;
+ int ret;
+
+ if (!mm)
+ return -EINVAL;
+
+ guard(srcu)(&sframe_srcu);
+
+ sec = mtree_load(&mm->sframe_mt, ip);
+ if (!sec)
+ return -EINVAL;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
+
+ ret = __find_fde(sec, ip, &fde);
+ if (ret)
+ goto end;
+
+ ret = __find_fre(sec, &fde, ip, frame);
+end:
+ user_read_access_end();
+ return ret;
+}
static void free_section(struct sframe_section *sec)
{
@@ -119,8 +413,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
sec->text_end = text_end;
ret = sframe_read_header(sec);
- if (ret)
+ if (ret) {
+ dbg_print_header(sec);
goto err_free;
+ }
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
if (ret) {
@@ -136,6 +432,13 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
return ret;
}
+static void sframe_free_srcu(struct rcu_head *rcu)
+{
+ struct sframe_section *sec = container_of(rcu, struct sframe_section, rcu);
+
+ free_section(sec);
+}
+
static int __sframe_remove_section(struct mm_struct *mm,
struct sframe_section *sec)
{
@@ -144,7 +447,7 @@ static int __sframe_remove_section(struct mm_struct *mm,
return -EINVAL;
}
- free_section(sec);
+ call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
return 0;
}
diff --git a/kernel/unwind/sframe_debug.h b/kernel/unwind/sframe_debug.h
new file mode 100644
index 000000000000..055c8c8fae24
--- /dev/null
+++ b/kernel/unwind/sframe_debug.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _SFRAME_DEBUG_H
+#define _SFRAME_DEBUG_H
+
+#include <linux/sframe.h>
+#include "sframe.h"
+
+#ifdef CONFIG_DYNAMIC_DEBUG
+
+#define dbg(fmt, ...) \
+ pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+
+static __always_inline void dbg_print_header(struct sframe_section *sec)
+{
+ unsigned long fdes_end;
+
+ fdes_end = sec->fdes_start + (sec->num_fdes * sizeof(struct sframe_fde));
+
+ dbg("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
+ "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
+ "ra_off:%d fp_off:%d\n",
+ sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
+ sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
+ sec->ra_off, sec->fp_off);
+}
+
+#else /* !CONFIG_DYNAMIC_DEBUG */
+
+#define dbg(args...) no_printk(args)
+
+static inline void dbg_print_header(struct sframe_section *sec) {}
+
+#endif /* !CONFIG_DYNAMIC_DEBUG */
+
+#endif /* _SFRAME_DEBUG_H */
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 05/11] unwind_user/sframe: Detect .sframe sections in executables
2025-08-27 20:15 [PATCH v10 00/11] unwind_deferred: Implement sframe handling Steven Rostedt
` (3 preceding siblings ...)
2025-08-27 20:15 ` [PATCH v10 04/11] unwind_user/sframe: Add support for reading .sframe contents Steven Rostedt
@ 2025-08-27 20:15 ` Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 06/11] unwind_user/sframe: Wire up unwind_user to sframe Steven Rostedt
` (5 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2025-08-27 20:15 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell, linux-mm
From: Josh Poimboeuf <jpoimboe@kernel.org>
When loading an ELF executable, automatically detect an .sframe section
and associate it with the mm_struct.
Cc: linux-mm@kvack.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
fs/binfmt_elf.c | 49 +++++++++++++++++++++++++++++++++++++---
include/uapi/linux/elf.h | 1 +
2 files changed, 47 insertions(+), 3 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 264fba0d44bd..1fd7623cf9a5 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -47,6 +47,7 @@
#include <linux/dax.h>
#include <linux/uaccess.h>
#include <linux/rseq.h>
+#include <linux/sframe.h>
#include <asm/param.h>
#include <asm/page.h>
@@ -622,6 +623,21 @@ static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state,
return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp);
}
+static void elf_add_sframe(struct elf_phdr *text, struct elf_phdr *sframe,
+ unsigned long base_addr)
+{
+ unsigned long sframe_start, sframe_end, text_start, text_end;
+
+ sframe_start = base_addr + sframe->p_vaddr;
+ sframe_end = sframe_start + sframe->p_memsz;
+
+ text_start = base_addr + text->p_vaddr;
+ text_end = text_start + text->p_memsz;
+
+ /* Ignore return value, sframe section isn't critical */
+ sframe_add_section(sframe_start, sframe_end, text_start, text_end);
+}
+
/* This is much more generalized than the library routine read function,
so we keep this separate. Technically the library read function
is only provided so that we can read a.out libraries that have
@@ -632,7 +648,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
unsigned long no_base, struct elf_phdr *interp_elf_phdata,
struct arch_elf_state *arch_state)
{
- struct elf_phdr *eppnt;
+ struct elf_phdr *eppnt, *sframe_phdr = NULL;
unsigned long load_addr = 0;
int load_addr_set = 0;
unsigned long error = ~0UL;
@@ -658,7 +674,8 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
eppnt = interp_elf_phdata;
for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
- if (eppnt->p_type == PT_LOAD) {
+ switch (eppnt->p_type) {
+ case PT_LOAD: {
int elf_type = MAP_PRIVATE;
int elf_prot = make_prot(eppnt->p_flags, arch_state,
true, true);
@@ -697,6 +714,20 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
error = -ENOMEM;
goto out;
}
+ break;
+ }
+ case PT_GNU_SFRAME:
+ sframe_phdr = eppnt;
+ break;
+ }
+ }
+
+ if (sframe_phdr) {
+ eppnt = interp_elf_phdata;
+ for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
+ if (eppnt->p_flags & PF_X) {
+ elf_add_sframe(eppnt, sframe_phdr, load_addr);
+ }
}
}
@@ -821,7 +852,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
int first_pt_load = 1;
unsigned long error;
struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL;
- struct elf_phdr *elf_property_phdata = NULL;
+ struct elf_phdr *elf_property_phdata = NULL, *sframe_phdr = NULL;
unsigned long elf_brk;
bool brk_moved = false;
int retval, i;
@@ -930,6 +961,10 @@ static int load_elf_binary(struct linux_binprm *bprm)
executable_stack = EXSTACK_DISABLE_X;
break;
+ case PT_GNU_SFRAME:
+ sframe_phdr = elf_ppnt;
+ break;
+
case PT_LOPROC ... PT_HIPROC:
retval = arch_elf_pt_proc(elf_ex, elf_ppnt,
bprm->file, false,
@@ -1227,6 +1262,14 @@ static int load_elf_binary(struct linux_binprm *bprm)
elf_brk = k;
}
+ if (sframe_phdr) {
+ for (i = 0, elf_ppnt = elf_phdata;
+ i < elf_ex->e_phnum; i++, elf_ppnt++) {
+ if ((elf_ppnt->p_flags & PF_X))
+ elf_add_sframe(elf_ppnt, sframe_phdr, load_bias);
+ }
+ }
+
e_entry = elf_ex->e_entry + load_bias;
phdr_addr += load_bias;
elf_brk += load_bias;
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 819ded2d39de..92c16c94fca8 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -41,6 +41,7 @@ typedef __u16 Elf64_Versym;
#define PT_GNU_STACK (PT_LOOS + 0x474e551)
#define PT_GNU_RELRO (PT_LOOS + 0x474e552)
#define PT_GNU_PROPERTY (PT_LOOS + 0x474e553)
+#define PT_GNU_SFRAME (PT_LOOS + 0x474e554)
/* ARM MTE memory tag segment type */
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 06/11] unwind_user/sframe: Wire up unwind_user to sframe
2025-08-27 20:15 [PATCH v10 00/11] unwind_deferred: Implement sframe handling Steven Rostedt
` (4 preceding siblings ...)
2025-08-27 20:15 ` [PATCH v10 05/11] unwind_user/sframe: Detect .sframe sections in executables Steven Rostedt
@ 2025-08-27 20:15 ` Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 07/11] unwind_user/sframe/x86: Enable sframe unwinding on x86 Steven Rostedt
` (4 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2025-08-27 20:15 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell
From: Josh Poimboeuf <jpoimboe@kernel.org>
Now that the sframe infrastructure is fully in place, make it work by
hooking it up to the unwind_user interface.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v9: https://lore.kernel.org/20250717012936.619600891@kernel.org
- Update the changes to unwind/user.c to handle passing a const
unwind_user_frame pointer.
arch/Kconfig | 1 +
include/linux/unwind_user_types.h | 4 ++-
kernel/unwind/user.c | 41 +++++++++++++++++++++++++++++--
3 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 69fcabf53088..277b87af949f 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -453,6 +453,7 @@ config HAVE_UNWIND_USER_FP
config HAVE_UNWIND_USER_SFRAME
bool
+ select UNWIND_USER
config HAVE_PERF_REGS
bool
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index a449f15be890..d30e8495eaa9 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -9,7 +9,8 @@
* available.
*/
enum unwind_user_type_bits {
- UNWIND_USER_TYPE_FP_BIT = 0,
+ UNWIND_USER_TYPE_SFRAME_BIT = 0,
+ UNWIND_USER_TYPE_FP_BIT = 1,
NR_UNWIND_USER_TYPE_BITS,
};
@@ -17,6 +18,7 @@ enum unwind_user_type_bits {
enum unwind_user_type {
/* Type "none" for the start of stack walk iteration. */
UNWIND_USER_TYPE_NONE = 0,
+ UNWIND_USER_TYPE_SFRAME = BIT(UNWIND_USER_TYPE_SFRAME_BIT),
UNWIND_USER_TYPE_FP = BIT(UNWIND_USER_TYPE_FP_BIT),
};
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index 97a8415e3216..9d34c7659f90 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -7,17 +7,24 @@
#include <linux/sched/task_stack.h>
#include <linux/unwind_user.h>
#include <linux/uaccess.h>
+#include <linux/sframe.h>
static const struct unwind_user_frame fp_frame = {
ARCH_INIT_USER_FP_FRAME
};
+static const struct unwind_user_frame *get_fp_frame(struct pt_regs *regs)
+{
+ return &fp_frame;
+}
+
#define for_each_user_frame(state) \
for (unwind_user_start(state); !(state)->done; unwind_user_next(state))
-static int unwind_user_next_fp(struct unwind_user_state *state)
+static int unwind_user_next_common(struct unwind_user_state *state,
+ const struct unwind_user_frame *frame,
+ struct pt_regs *regs)
{
- const struct unwind_user_frame *frame = &fp_frame;
unsigned long cfa, fp, ra;
unsigned int shift;
@@ -55,6 +62,24 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
return 0;
}
+static int unwind_user_next_sframe(struct unwind_user_state *state)
+{
+ struct unwind_user_frame _frame, *frame;
+
+ /* sframe expects the frame to be local storage */
+ frame = &_frame;
+ if (sframe_find(state->ip, frame))
+ return -ENOENT;
+ return unwind_user_next_common(state, frame, task_pt_regs(current));
+}
+
+static int unwind_user_next_fp(struct unwind_user_state *state)
+{
+ struct pt_regs *regs = task_pt_regs(current);
+
+ return unwind_user_next_common(state, get_fp_frame(regs), regs);
+}
+
static int unwind_user_next(struct unwind_user_state *state)
{
unsigned long iter_mask = state->available_types;
@@ -68,6 +93,16 @@ static int unwind_user_next(struct unwind_user_state *state)
state->current_type = type;
switch (type) {
+ case UNWIND_USER_TYPE_SFRAME:
+ switch (unwind_user_next_sframe(state)) {
+ case 0:
+ return 0;
+ case -ENOENT:
+ continue; /* Try next method. */
+ default:
+ state->done = true;
+ }
+ break;
case UNWIND_USER_TYPE_FP:
if (!unwind_user_next_fp(state))
return 0;
@@ -96,6 +131,8 @@ static int unwind_user_start(struct unwind_user_state *state)
return -EINVAL;
}
+ if (current_has_sframe())
+ state->available_types |= UNWIND_USER_TYPE_SFRAME;
if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP))
state->available_types |= UNWIND_USER_TYPE_FP;
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 07/11] unwind_user/sframe/x86: Enable sframe unwinding on x86
2025-08-27 20:15 [PATCH v10 00/11] unwind_deferred: Implement sframe handling Steven Rostedt
` (5 preceding siblings ...)
2025-08-27 20:15 ` [PATCH v10 06/11] unwind_user/sframe: Wire up unwind_user to sframe Steven Rostedt
@ 2025-08-27 20:15 ` Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 08/11] unwind_user/sframe: Remove .sframe section on detected corruption Steven Rostedt
` (3 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2025-08-27 20:15 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell
From: Josh Poimboeuf <jpoimboe@kernel.org>
The x86 sframe 2.0 implementation works fairly well, starting with
binutils 2.41 (though some bugs are getting fixed in later versions).
Enable it.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
arch/x86/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8f94c58d4de8..c3518f145f0d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -300,6 +300,7 @@ config X86
select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_UNWIND_USER_FP if X86_64
+ select HAVE_UNWIND_USER_SFRAME if X86_64
select HAVE_USER_RETURN_NOTIFIER
select HAVE_GENERIC_VDSO
select VDSO_GETRANDOM if X86_64
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 08/11] unwind_user/sframe: Remove .sframe section on detected corruption
2025-08-27 20:15 [PATCH v10 00/11] unwind_deferred: Implement sframe handling Steven Rostedt
` (6 preceding siblings ...)
2025-08-27 20:15 ` [PATCH v10 07/11] unwind_user/sframe/x86: Enable sframe unwinding on x86 Steven Rostedt
@ 2025-08-27 20:15 ` Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 09/11] unwind_user/sframe: Show file name in debug output Steven Rostedt
` (2 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2025-08-27 20:15 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell
From: Josh Poimboeuf <jpoimboe@kernel.org>
To avoid continued attempted use of a bad .sframe section, remove it
on demand when the first sign of corruption is detected.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/unwind/sframe.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index b10420d19840..f246ead6c2a0 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -310,6 +310,10 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
ret = __find_fre(sec, &fde, ip, frame);
end:
user_read_access_end();
+
+ if (ret == -EFAULT)
+ WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
+
return ret;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 09/11] unwind_user/sframe: Show file name in debug output
2025-08-27 20:15 [PATCH v10 00/11] unwind_deferred: Implement sframe handling Steven Rostedt
` (7 preceding siblings ...)
2025-08-27 20:15 ` [PATCH v10 08/11] unwind_user/sframe: Remove .sframe section on detected corruption Steven Rostedt
@ 2025-08-27 20:15 ` Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 10/11] unwind_user/sframe: Add .sframe validation option Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 11/11] [DO NOT APPLY]unwind_user/sframe: Add prctl() interface for registering .sframe sections Steven Rostedt
10 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2025-08-27 20:15 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell
From: Josh Poimboeuf <jpoimboe@kernel.org>
When debugging sframe issues, the error messages aren't all that helpful
without knowing what file a corresponding .sframe section belongs to.
Prefix debug output strings with the file name.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/sframe.h | 4 +++-
kernel/unwind/sframe.c | 23 ++++++++++--------
kernel/unwind/sframe_debug.h | 45 +++++++++++++++++++++++++++++++-----
3 files changed, 56 insertions(+), 16 deletions(-)
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 9a72209696f9..b79c5ec09229 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -10,7 +10,9 @@
struct sframe_section {
struct rcu_head rcu;
-
+#ifdef CONFIG_DYNAMIC_DEBUG
+ const char *filename;
+#endif
unsigned long sframe_start;
unsigned long sframe_end;
unsigned long text_start;
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index f246ead6c2a0..66d3ba3c8389 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -311,14 +311,17 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
end:
user_read_access_end();
- if (ret == -EFAULT)
+ if (ret == -EFAULT) {
+ dbg_sec("removing bad .sframe section\n");
WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
+ }
return ret;
}
static void free_section(struct sframe_section *sec)
{
+ dbg_free(sec);
kfree(sec);
}
@@ -329,7 +332,7 @@ static int sframe_read_header(struct sframe_section *sec)
unsigned int num_fdes;
if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
- dbg("header usercopy failed\n");
+ dbg_sec("header usercopy failed\n");
return -EFAULT;
}
@@ -337,18 +340,18 @@ static int sframe_read_header(struct sframe_section *sec)
shdr.preamble.version != SFRAME_VERSION_2 ||
!(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
shdr.auxhdr_len) {
- dbg("bad/unsupported sframe header\n");
+ dbg_sec("bad/unsupported sframe header\n");
return -EINVAL;
}
if (!shdr.num_fdes || !shdr.num_fres) {
- dbg("no fde/fre entries\n");
+ dbg_sec("no fde/fre entries\n");
return -EINVAL;
}
header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
if (header_end >= sec->sframe_end) {
- dbg("header doesn't fit in section\n");
+ dbg_sec("header doesn't fit in section\n");
return -EINVAL;
}
@@ -360,7 +363,7 @@ static int sframe_read_header(struct sframe_section *sec)
fres_end = fres_start + shdr.fre_len;
if (fres_start < fdes_end || fres_end > sec->sframe_end) {
- dbg("inconsistent fde/fre offsets\n");
+ dbg_sec("inconsistent fde/fre offsets\n");
return -EINVAL;
}
@@ -416,6 +419,8 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
sec->text_start = text_start;
sec->text_end = text_end;
+ dbg_init(sec);
+
ret = sframe_read_header(sec);
if (ret) {
dbg_print_header(sec);
@@ -424,8 +429,8 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
if (ret) {
- dbg("mtree_insert_range failed: text=%lx-%lx\n",
- sec->text_start, sec->text_end);
+ dbg_sec("mtree_insert_range failed: text=%lx-%lx\n",
+ sec->text_start, sec->text_end);
goto err_free;
}
@@ -447,7 +452,7 @@ static int __sframe_remove_section(struct mm_struct *mm,
struct sframe_section *sec)
{
if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
- dbg("mtree_erase failed: text=%lx\n", sec->text_start);
+ dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
return -EINVAL;
}
diff --git a/kernel/unwind/sframe_debug.h b/kernel/unwind/sframe_debug.h
index 055c8c8fae24..7794bf0bd78c 100644
--- a/kernel/unwind/sframe_debug.h
+++ b/kernel/unwind/sframe_debug.h
@@ -10,26 +10,59 @@
#define dbg(fmt, ...) \
pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+#define dbg_sec(fmt, ...) \
+ dbg("%s: " fmt, sec->filename, ##__VA_ARGS__)
+
static __always_inline void dbg_print_header(struct sframe_section *sec)
{
unsigned long fdes_end;
fdes_end = sec->fdes_start + (sec->num_fdes * sizeof(struct sframe_fde));
- dbg("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
- "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
- "ra_off:%d fp_off:%d\n",
- sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
- sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
- sec->ra_off, sec->fp_off);
+ dbg_sec("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
+ "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
+ "ra_off:%d fp_off:%d\n",
+ sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
+ sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
+ sec->ra_off, sec->fp_off);
+}
+
+static inline void dbg_init(struct sframe_section *sec)
+{
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+
+ guard(mmap_read_lock)(mm);
+ vma = vma_lookup(mm, sec->sframe_start);
+ if (!vma)
+ sec->filename = kstrdup("(vma gone???)", GFP_KERNEL);
+ else if (vma->vm_file)
+ sec->filename = kstrdup_quotable_file(vma->vm_file, GFP_KERNEL);
+ else if (vma->vm_ops && vma->vm_ops->name)
+ sec->filename = kstrdup(vma->vm_ops->name(vma), GFP_KERNEL);
+ else if (arch_vma_name(vma))
+ sec->filename = kstrdup(arch_vma_name(vma), GFP_KERNEL);
+ else if (!vma->vm_mm)
+ sec->filename = kstrdup("(vdso)", GFP_KERNEL);
+ else
+ sec->filename = kstrdup("(anonymous)", GFP_KERNEL);
+}
+
+static inline void dbg_free(struct sframe_section *sec)
+{
+ kfree(sec->filename);
}
#else /* !CONFIG_DYNAMIC_DEBUG */
#define dbg(args...) no_printk(args)
+#define dbg_sec(args... ) no_printk(args)
static inline void dbg_print_header(struct sframe_section *sec) {}
+static inline void dbg_init(struct sframe_section *sec) {}
+static inline void dbg_free(struct sframe_section *sec) {}
+
#endif /* !CONFIG_DYNAMIC_DEBUG */
#endif /* _SFRAME_DEBUG_H */
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 10/11] unwind_user/sframe: Add .sframe validation option
2025-08-27 20:15 [PATCH v10 00/11] unwind_deferred: Implement sframe handling Steven Rostedt
` (8 preceding siblings ...)
2025-08-27 20:15 ` [PATCH v10 09/11] unwind_user/sframe: Show file name in debug output Steven Rostedt
@ 2025-08-27 20:15 ` Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 11/11] [DO NOT APPLY]unwind_user/sframe: Add prctl() interface for registering .sframe sections Steven Rostedt
10 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2025-08-27 20:15 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell
From: Josh Poimboeuf <jpoimboe@kernel.org>
Add a debug feature to validate all .sframe sections when first loading
the file rather than on demand.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
arch/Kconfig | 19 +++++++++
kernel/unwind/sframe.c | 96 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 115 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig
index 277b87af949f..918ebe3c5a85 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -455,6 +455,25 @@ config HAVE_UNWIND_USER_SFRAME
bool
select UNWIND_USER
+config SFRAME_VALIDATION
+ bool "Enable .sframe section debugging"
+ depends on HAVE_UNWIND_USER_SFRAME
+ depends on DYNAMIC_DEBUG
+ help
+ When adding an .sframe section for a task, validate the entire
+ section immediately rather than on demand.
+
+ This is a debug feature which is helpful for rooting out .sframe
+ section issues. If the .sframe section is corrupt, it will fail to
+ load immediately, with more information provided in dynamic printks.
+
+ This has a significant page cache footprint due to its reading of the
+ entire .sframe section for every loaded executable and shared
+ library. Also, it's done for all processes, even those which don't
+ get stack traced by the kernel. Not recommended for general use.
+
+ If unsure, say N.
+
config HAVE_PERF_REGS
bool
help
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 66d3ba3c8389..79ff3c0fc11f 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -319,6 +319,98 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
return ret;
}
+#ifdef CONFIG_SFRAME_VALIDATION
+
+static int safe_read_fde(struct sframe_section *sec,
+ unsigned int fde_num, struct sframe_fde *fde)
+{
+ int ret;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
+ ret = __read_fde(sec, fde_num, fde);
+ user_read_access_end();
+ return ret;
+}
+
+static int safe_read_fre(struct sframe_section *sec,
+ struct sframe_fde *fde, unsigned long fre_addr,
+ struct sframe_fre *fre)
+{
+ int ret;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
+ ret = __read_fre(sec, fde, fre_addr, fre);
+ user_read_access_end();
+ return ret;
+}
+
+static int sframe_validate_section(struct sframe_section *sec)
+{
+ unsigned long prev_ip = 0;
+ unsigned int i;
+
+ for (i = 0; i < sec->num_fdes; i++) {
+ struct sframe_fre *fre, *prev_fre = NULL;
+ unsigned long ip, fre_addr;
+ struct sframe_fde fde;
+ struct sframe_fre fres[2];
+ bool which = false;
+ unsigned int j;
+ int ret;
+
+ ret = safe_read_fde(sec, i, &fde);
+ if (ret)
+ return ret;
+
+ ip = sec->sframe_start + fde.start_addr;
+ if (ip <= prev_ip) {
+ dbg_sec("fde %u not sorted\n", i);
+ return -EFAULT;
+ }
+ prev_ip = ip;
+
+ fre_addr = sec->fres_start + fde.fres_off;
+ for (j = 0; j < fde.fres_num; j++) {
+ int ret;
+
+ fre = which ? fres : fres + 1;
+ which = !which;
+
+ ret = safe_read_fre(sec, &fde, fre_addr, fre);
+ if (ret) {
+ dbg_sec("fde %u: __read_fre(%u) failed\n", i, j);
+ dbg_sec("FDE: start_addr:0x%x func_size:0x%x fres_off:0x%x fres_num:%d info:%u rep_size:%u\n",
+ fde.start_addr, fde.func_size,
+ fde.fres_off, fde.fres_num,
+ fde.info, fde.rep_size);
+ return ret;
+ }
+
+ fre_addr += fre->size;
+
+ if (prev_fre && fre->ip_off <= prev_fre->ip_off) {
+ dbg_sec("fde %u: fre %u not sorted\n", i, j);
+ return -EFAULT;
+ }
+
+ prev_fre = fre;
+ }
+ }
+
+ return 0;
+}
+
+#else /* !CONFIG_SFRAME_VALIDATION */
+
+static int sframe_validate_section(struct sframe_section *sec) { return 0; }
+
+#endif /* !CONFIG_SFRAME_VALIDATION */
+
+
static void free_section(struct sframe_section *sec)
{
dbg_free(sec);
@@ -427,6 +519,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
goto err_free;
}
+ ret = sframe_validate_section(sec);
+ if (ret)
+ goto err_free;
+
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
if (ret) {
dbg_sec("mtree_insert_range failed: text=%lx-%lx\n",
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 11/11] [DO NOT APPLY]unwind_user/sframe: Add prctl() interface for registering .sframe sections
2025-08-27 20:15 [PATCH v10 00/11] unwind_deferred: Implement sframe handling Steven Rostedt
` (9 preceding siblings ...)
2025-08-27 20:15 ` [PATCH v10 10/11] unwind_user/sframe: Add .sframe validation option Steven Rostedt
@ 2025-08-27 20:15 ` Steven Rostedt
10 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2025-08-27 20:15 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell
From: Josh Poimboeuf <jpoimboe@kernel.org>
The kernel doesn't have direct visibility to the ELF contents of shared
libraries. Add some prctl() interfaces which allow glibc to tell the
kernel where to find .sframe sections.
[
This adds an interface for prctl() for testing loading of sframes for
libraries. But this interface should really be a system call. This patch
is for testing purposes only and should not be applied to mainline.
]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/uapi/linux/prctl.h | 6 +++++-
kernel/sys.c | 9 +++++++++
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index ed3aed264aeb..b807baa8a53b 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -358,7 +358,7 @@ struct prctl_mm_map {
* configuration. All bits may be locked via this call, including
* undefined bits.
*/
-#define PR_LOCK_SHADOW_STACK_STATUS 76
+#define PR_LOCK_SHADOW_STACK_STATUS 76
/*
* Controls the mode of timer_create() for CRIU restore operations.
@@ -376,4 +376,8 @@ struct prctl_mm_map {
# define PR_FUTEX_HASH_SET_SLOTS 1
# define PR_FUTEX_HASH_GET_SLOTS 2
+/* SFRAME management */
+#define PR_ADD_SFRAME 79
+#define PR_REMOVE_SFRAME 80
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 1e28b40053ce..e6ce79a3a7aa 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -65,6 +65,7 @@
#include <linux/rcupdate.h>
#include <linux/uidgid.h>
#include <linux/cred.h>
+#include <linux/sframe.h>
#include <linux/nospec.h>
@@ -2805,6 +2806,14 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
case PR_FUTEX_HASH:
error = futex_hash_prctl(arg2, arg3, arg4);
break;
+ case PR_ADD_SFRAME:
+ error = sframe_add_section(arg2, arg3, arg4, arg5);
+ break;
+ case PR_REMOVE_SFRAME:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = sframe_remove_section(arg2);
+ break;
default:
trace_task_prctl_unknown(option, arg2, arg3, arg4, arg5);
error = -EINVAL;
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v10 02/11] unwind_user/sframe: Store sframe section data in per-mm maple tree
2025-08-27 20:15 ` [PATCH v10 02/11] unwind_user/sframe: Store sframe section data in per-mm maple tree Steven Rostedt
@ 2025-08-28 1:46 ` Liam R. Howlett
2025-08-28 14:28 ` Steven Rostedt
0 siblings, 1 reply; 16+ messages in thread
From: Liam R. Howlett @ 2025-08-28 1:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, bpf, x86, Masami Hiramatsu,
Mathieu Desnoyers, Josh Poimboeuf, Peter Zijlstra, Ingo Molnar,
Jiri Olsa, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
Beau Belgrave, Jens Remus, Linus Torvalds, Andrew Morton,
Florian Weimer, Sam James, Kees Cook, Carlos O'Donell,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
David Hildenbrand, Lorenzo Stoakes, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm
* Steven Rostedt <rostedt@kernel.org> [250827 16:24]:
> From: Josh Poimboeuf <jpoimboe@kernel.org>
>
> Associate an sframe section with its mm by adding it to a per-mm maple
> tree which is indexed by the corresponding text address range. A single
> sframe section can be associated with multiple text ranges.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: x86@kernel.org
> Cc: linux-mm@kvack.org
> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> arch/x86/include/asm/mmu.h | 2 +-
> include/linux/mm_types.h | 3 +++
> include/linux/sframe.h | 13 +++++++++
> kernel/fork.c | 10 +++++++
> kernel/unwind/sframe.c | 55 +++++++++++++++++++++++++++++++++++---
> mm/init-mm.c | 2 ++
> 6 files changed, 81 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
> index 0fe9c569d171..227a32899a59 100644
> --- a/arch/x86/include/asm/mmu.h
> +++ b/arch/x86/include/asm/mmu.h
> @@ -87,7 +87,7 @@ typedef struct {
> .context = { \
> .ctx_id = 1, \
> .lock = __MUTEX_INITIALIZER(mm.context.lock), \
> - }
> + },
>
> void leave_mm(void);
> #define leave_mm leave_mm
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 08bc2442db93..31fbd6663047 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1210,6 +1210,9 @@ struct mm_struct {
> #ifdef CONFIG_MM_ID
> mm_id_t mm_id;
> #endif /* CONFIG_MM_ID */
> +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
> + struct maple_tree sframe_mt;
> +#endif
> } __randomize_layout;
>
> /*
> diff --git a/include/linux/sframe.h b/include/linux/sframe.h
> index 0584f661f698..73bf6f0b30c2 100644
> --- a/include/linux/sframe.h
> +++ b/include/linux/sframe.h
> @@ -22,18 +22,31 @@ struct sframe_section {
> signed char fp_off;
> };
>
> +#define INIT_MM_SFRAME .sframe_mt = MTREE_INIT(sframe_mt, 0),
> +extern void sframe_free_mm(struct mm_struct *mm);
> +
> extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> unsigned long text_start, unsigned long text_end);
> extern int sframe_remove_section(unsigned long sframe_addr);
>
> +static inline bool current_has_sframe(void)
> +{
> + struct mm_struct *mm = current->mm;
> +
> + return mm && !mtree_empty(&mm->sframe_mt);
> +}
> +
> #else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
>
> +#define INIT_MM_SFRAME
> +static inline void sframe_free_mm(struct mm_struct *mm) {}
> static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> unsigned long text_start, unsigned long text_end)
> {
> return -ENOSYS;
> }
> static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
> +static inline bool current_has_sframe(void) { return false; }
>
> #endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index af673856499d..496781b389bc 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -106,6 +106,7 @@
> #include <linux/pidfs.h>
> #include <linux/tick.h>
> #include <linux/unwind_deferred.h>
> +#include <linux/sframe.h>
>
> #include <asm/pgalloc.h>
> #include <linux/uaccess.h>
> @@ -690,6 +691,7 @@ void __mmdrop(struct mm_struct *mm)
> mm_destroy_cid(mm);
> percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS);
> futex_hash_free(mm);
> + sframe_free_mm(mm);
>
> free_mm(mm);
> }
> @@ -1027,6 +1029,13 @@ static void mmap_init_lock(struct mm_struct *mm)
> #endif
> }
>
> +static void mm_init_sframe(struct mm_struct *mm)
> +{
> +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
> + mt_init(&mm->sframe_mt);
> +#endif
> +}
> +
> static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
> struct user_namespace *user_ns)
> {
> @@ -1055,6 +1064,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
> mm->pmd_huge_pte = NULL;
> #endif
> mm_init_uprobes_state(mm);
> + mm_init_sframe(mm);
> hugetlb_count_init(mm);
>
> if (current->mm) {
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> index 20287f795b36..fa7d87ffd00a 100644
> --- a/kernel/unwind/sframe.c
> +++ b/kernel/unwind/sframe.c
> @@ -122,15 +122,64 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> if (ret)
> goto err_free;
>
> - /* TODO nowhere to store it yet - just free it and return an error */
> - ret = -ENOSYS;
> + ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
> + if (ret) {
> + dbg("mtree_insert_range failed: text=%lx-%lx\n",
> + sec->text_start, sec->text_end);
> + goto err_free;
> + }
> +
> + return 0;
>
> err_free:
> free_section(sec);
> return ret;
> }
>
> +static int __sframe_remove_section(struct mm_struct *mm,
> + struct sframe_section *sec)
> +{
> + if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> + dbg("mtree_erase failed: text=%lx\n", sec->text_start);
> + return -EINVAL;
> + }
> +
> + free_section(sec);
> +
> + return 0;
> +}
> +
> int sframe_remove_section(unsigned long sframe_start)
> {
> - return -ENOSYS;
> + struct mm_struct *mm = current->mm;
> + struct sframe_section *sec;
> + unsigned long index = 0;
> + bool found = false;
> + int ret = 0;
> +
> + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
> + if (sec->sframe_start == sframe_start) {
> + found = true;
> + ret |= __sframe_remove_section(mm, sec);
> + }
> + }
If you use the advanced interface you have to handle the locking, but it
will be faster. I'm not sure how frequent you loop across many entries,
but you can do something like:
MA_SATE(mas, &mm->sframe_mt, index, index);
mas_lock(&mas);
mas_for_each(&mas, sec, ULONG_MAX) {
...
}
mas_unlock(&mas);
The maple state contains memory addresses of internal nodes, so you
cannot just edit the tree without it being either unlocked (which
negates the gains you would have) or by using it in the modification.
This seems like a good choice considering the __sframe_remove_section()
is called from only one place. You can pass the struct ma_state through
to the remove function and use it with mas_erase().
Actually, reading it again, why are you starting a search at 0? And
why are you deleting everything after the sframe_start to ULONG_MAX?
This seems incorrect. Can you explain your plan a bit here?
> +
> + if (!found || ret)
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> +void sframe_free_mm(struct mm_struct *mm)
> +{
> + struct sframe_section *sec;
> + unsigned long index = 0;
> +
> + if (!mm)
> + return;
> +
> + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX)
> + free_section(sec);
> +
> + mtree_destroy(&mm->sframe_mt);
The same goes for this function. mt_for_each will start at the top of
the tree, lock, find your result, unlock. Each search starts from the
top of tree because it was unlocked. In the mas_ functions, the tree is
iterated in place which can be significantly faster depending on the
tree size.
Since you are not going to edit the tree you can use a maple state:
struct sframe_section *sec;
MA_STATE(mas, &mm->sframe_mt, 0, 0);
mas_lock(&mas);
mas_for_each(&mas, sec, ULONG_MAX)
free_section(sec);
mas_unlock(&mas);
mtree_destroy(&mm->sframe_mt);
> }
> diff --git a/mm/init-mm.c b/mm/init-mm.c
> index 4600e7605cab..b32fcf167cc2 100644
> --- a/mm/init-mm.c
> +++ b/mm/init-mm.c
> @@ -11,6 +11,7 @@
> #include <linux/atomic.h>
> #include <linux/user_namespace.h>
> #include <linux/iommu.h>
> +#include <linux/sframe.h>
> #include <asm/mmu.h>
>
> #ifndef INIT_MM_CONTEXT
> @@ -46,6 +47,7 @@ struct mm_struct init_mm = {
> .user_ns = &init_user_ns,
> .cpu_bitmap = CPU_BITS_NONE,
> INIT_MM_CONTEXT(init_mm)
> + INIT_MM_SFRAME
> };
>
> void setup_initial_init_mm(void *start_code, void *end_code,
> --
> 2.50.1
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 02/11] unwind_user/sframe: Store sframe section data in per-mm maple tree
2025-08-28 1:46 ` Liam R. Howlett
@ 2025-08-28 14:28 ` Steven Rostedt
2025-08-28 15:27 ` Liam R. Howlett
0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2025-08-28 14:28 UTC (permalink / raw)
To: Liam R. Howlett
Cc: Steven Rostedt, linux-kernel, linux-trace-kernel, bpf, x86,
Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm
On Wed, 27 Aug 2025 21:46:01 -0400
"Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
> > int sframe_remove_section(unsigned long sframe_start)
> > {
> > - return -ENOSYS;
> > + struct mm_struct *mm = current->mm;
> > + struct sframe_section *sec;
> > + unsigned long index = 0;
> > + bool found = false;
> > + int ret = 0;
> > +
> > + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
> > + if (sec->sframe_start == sframe_start) {
> > + found = true;
> > + ret |= __sframe_remove_section(mm, sec);
> > + }
> > + }
>
Josh should be able to answer this better than I can, as he wrote it, and
I'm not too familiar with how to use maple tree (reading the documentation
now).
> If you use the advanced interface you have to handle the locking, but it
> will be faster. I'm not sure how frequent you loop across many entries,
> but you can do something like:
>
> MA_SATE(mas, &mm->sframe_mt, index, index);
>
> mas_lock(&mas);
> mas_for_each(&mas, sec, ULONG_MAX) {
> ...
> }
> mas_unlock(&mas);
>
> The maple state contains memory addresses of internal nodes, so you
> cannot just edit the tree without it being either unlocked (which
> negates the gains you would have) or by using it in the modification.
>
> This seems like a good choice considering the __sframe_remove_section()
> is called from only one place. You can pass the struct ma_state through
> to the remove function and use it with mas_erase().
>
> Actually, reading it again, why are you starting a search at 0? And
> why are you deleting everything after the sframe_start to ULONG_MAX?
> This seems incorrect. Can you explain your plan a bit here?
Let me give a brief overview of how and why maple trees are used for
sframes:
The sframe section is mapped to the user space address from the elf file
when the application starts. The dynamic library loader could also do a
system call to tell the kernel where the sframe is for some dynamically
loaded code. Since there can be more than one text section that has an
sframe associated to it, the mm->sframe_mt is used to hold the range of
text to find its corresponding sframe section. That is, there's one sframe
section for the code that was loaded during exec(), and then there may be a
separate sframe section for every library that is loaded. Note, it is
possible that the same sframe section may cover more than one range of text.
When doing stack walking, the instruction pointer is used as the key in the
maple tree to find its corresponding sframe section.
Now, if the sframe is determined to be corrupted, it must be removed from
the current->mm->sframe_mt. It also gets removed when the dynamic loader
removes some text from the application that has the code.
I'm guessing that the 0 to ULONG_MAX is to simply find and remove all the
associated sframe sections, as there may be more than one text range that a
single sframe section covers.
Does this make sense?
Thanks for reviewing!
-- Steve
>
> > +
> > + if (!found || ret)
> > + return -EINVAL;
> > +
> > + return 0;
> > +}
> > +
> > +void sframe_free_mm(struct mm_struct *mm)
> > +{
> > + struct sframe_section *sec;
> > + unsigned long index = 0;
> > +
> > + if (!mm)
> > + return;
> > +
> > + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX)
> > + free_section(sec);
> > +
> > + mtree_destroy(&mm->sframe_mt);
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 02/11] unwind_user/sframe: Store sframe section data in per-mm maple tree
2025-08-28 14:28 ` Steven Rostedt
@ 2025-08-28 15:27 ` Liam R. Howlett
2025-08-28 15:51 ` Steven Rostedt
0 siblings, 1 reply; 16+ messages in thread
From: Liam R. Howlett @ 2025-08-28 15:27 UTC (permalink / raw)
To: Steven Rostedt
Cc: Steven Rostedt, linux-kernel, linux-trace-kernel, bpf, x86,
Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm
* Steven Rostedt <rostedt@goodmis.org> [250828 10:28]:
> On Wed, 27 Aug 2025 21:46:01 -0400
> "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
>
> > > int sframe_remove_section(unsigned long sframe_start)
> > > {
> > > - return -ENOSYS;
> > > + struct mm_struct *mm = current->mm;
> > > + struct sframe_section *sec;
> > > + unsigned long index = 0;
> > > + bool found = false;
> > > + int ret = 0;
> > > +
> > > + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
> > > + if (sec->sframe_start == sframe_start) {
> > > + found = true;
> > > + ret |= __sframe_remove_section(mm, sec);
> > > + }
> > > + }
> >
>
> Josh should be able to answer this better than I can, as he wrote it, and
> I'm not too familiar with how to use maple tree (reading the documentation
> now).
>
> > If you use the advanced interface you have to handle the locking, but it
> > will be faster. I'm not sure how frequent you loop across many entries,
> > but you can do something like:
> >
> > MA_SATE(mas, &mm->sframe_mt, index, index);
> >
> > mas_lock(&mas);
> > mas_for_each(&mas, sec, ULONG_MAX) {
> > ...
> > }
> > mas_unlock(&mas);
> >
> > The maple state contains memory addresses of internal nodes, so you
> > cannot just edit the tree without it being either unlocked (which
> > negates the gains you would have) or by using it in the modification.
> >
> > This seems like a good choice considering the __sframe_remove_section()
> > is called from only one place. You can pass the struct ma_state through
> > to the remove function and use it with mas_erase().
> >
> > Actually, reading it again, why are you starting a search at 0? And
> > why are you deleting everything after the sframe_start to ULONG_MAX?
> > This seems incorrect. Can you explain your plan a bit here?
>
> Let me give a brief overview of how and why maple trees are used for
> sframes:
>
> The sframe section is mapped to the user space address from the elf file
> when the application starts. The dynamic library loader could also do a
> system call to tell the kernel where the sframe is for some dynamically
> loaded code. Since there can be more than one text section that has an
> sframe associated to it, the mm->sframe_mt is used to hold the range of
> text to find its corresponding sframe section. That is, there's one sframe
> section for the code that was loaded during exec(), and then there may be a
> separate sframe section for every library that is loaded. Note, it is
> possible that the same sframe section may cover more than one range of text.
>
> When doing stack walking, the instruction pointer is used as the key in the
> maple tree to find its corresponding sframe section.
>
> Now, if the sframe is determined to be corrupted, it must be removed from
> the current->mm->sframe_mt. It also gets removed when the dynamic loader
> removes some text from the application that has the code.
>
> I'm guessing that the 0 to ULONG_MAX is to simply find and remove all the
> associated sframe sections, as there may be more than one text range that a
> single sframe section covers.
>
> Does this make sense?
>
Perhaps it's the corruption part that I'm missing here. If the sframe
is corrupt, you are iterating over all elements and checking the start
address passed in against the section start.
So if the section is corrupted then how can we depend on the
sec->sframe_start?
And is the maple tree corrupted? I mean, the mappings to sframe_start
-> sec is still reliable, right?
Looking at the storing code, you store text_start - text_end to sec,
presumably the text_start cannot be smaller than the sframe_start?
Thanks,
Liam
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 02/11] unwind_user/sframe: Store sframe section data in per-mm maple tree
2025-08-28 15:27 ` Liam R. Howlett
@ 2025-08-28 15:51 ` Steven Rostedt
0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2025-08-28 15:51 UTC (permalink / raw)
To: Liam R. Howlett
Cc: Steven Rostedt, linux-kernel, linux-trace-kernel, bpf, x86,
Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Jens Remus, Linus Torvalds,
Andrew Morton, Florian Weimer, Sam James, Kees Cook,
Carlos O'Donell, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm
On Thu, 28 Aug 2025 11:27:00 -0400
"Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
> > Does this make sense?
> >
>
> Perhaps it's the corruption part that I'm missing here. If the sframe
> is corrupt, you are iterating over all elements and checking the start
> address passed in against the section start.
>
> So if the section is corrupted then how can we depend on the
> sec->sframe_start?
>
> And is the maple tree corrupted? I mean, the mappings to sframe_start
> -> sec is still reliable, right?
>
> Looking at the storing code, you store text_start - text_end to sec,
> presumably the text_start cannot be smaller than the sframe_start?
Sorry, that's not what gets corrupted. I should have expanded on it.
The sframe section is two tables that describe how to get the return
address from text locations, much like how ORC works in the kernel. We get
a start and end address of where the sframe exists (that has the two
tables) and a start and end section of the text it represents.
When I said "corrupted", I meant that the sframe tables are totally created
by user space and can not be trusted. While reading the sframe tables, if
there's any anomaly that is found, it is considered "corrupted". So no, the
start and end of where the sframes are and where the text should be
validated at the start (I need to check that we do ;-).
But once we start reading the sframe tables, they could hold garbage, or
have something in there that the kernel doesn't support. As soon as that is
detected, it gets removed so that it isn't looked at again.
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-08-28 15:51 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 20:15 [PATCH v10 00/11] unwind_deferred: Implement sframe handling Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 01/11] unwind_user/sframe: Add support for reading .sframe headers Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 02/11] unwind_user/sframe: Store sframe section data in per-mm maple tree Steven Rostedt
2025-08-28 1:46 ` Liam R. Howlett
2025-08-28 14:28 ` Steven Rostedt
2025-08-28 15:27 ` Liam R. Howlett
2025-08-28 15:51 ` Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 03/11] x86/uaccess: Add unsafe_copy_from_user() implementation Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 04/11] unwind_user/sframe: Add support for reading .sframe contents Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 05/11] unwind_user/sframe: Detect .sframe sections in executables Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 06/11] unwind_user/sframe: Wire up unwind_user to sframe Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 07/11] unwind_user/sframe/x86: Enable sframe unwinding on x86 Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 08/11] unwind_user/sframe: Remove .sframe section on detected corruption Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 09/11] unwind_user/sframe: Show file name in debug output Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 10/11] unwind_user/sframe: Add .sframe validation option Steven Rostedt
2025-08-27 20:15 ` [PATCH v10 11/11] [DO NOT APPLY]unwind_user/sframe: Add prctl() interface for registering .sframe sections Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).