* [PATCH v8 01/27] x86/hw_breakpoint: Unify breakpoint install/uninstall
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
@ 2025-11-10 16:35 ` Jinchao Wang
2025-11-10 16:35 ` [PATCH v8 02/27] x86/hw_breakpoint: Add arch_reinstall_hw_breakpoint Jinchao Wang
` (26 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:35 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Consolidate breakpoint management to reduce code duplication.
The diffstat was misleading, so the stripped code size is compared instead.
After refactoring, it is reduced from 11976 bytes to 11448 bytes on my
x86_64 system built with clang.
This also makes it easier to introduce arch_reinstall_hw_breakpoint().
In addition, including linux/types.h to fix a missing build dependency.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
arch/x86/include/asm/hw_breakpoint.h | 6 ++
arch/x86/kernel/hw_breakpoint.c | 141 +++++++++++++++------------
2 files changed, 84 insertions(+), 63 deletions(-)
diff --git a/arch/x86/include/asm/hw_breakpoint.h b/arch/x86/include/asm/hw_breakpoint.h
index 0bc931cd0698..aa6adac6c3a2 100644
--- a/arch/x86/include/asm/hw_breakpoint.h
+++ b/arch/x86/include/asm/hw_breakpoint.h
@@ -5,6 +5,7 @@
#include <uapi/asm/hw_breakpoint.h>
#define __ARCH_HW_BREAKPOINT_H
+#include <linux/types.h>
/*
* The name should probably be something dealt in
@@ -18,6 +19,11 @@ struct arch_hw_breakpoint {
u8 type;
};
+enum bp_slot_action {
+ BP_SLOT_ACTION_INSTALL,
+ BP_SLOT_ACTION_UNINSTALL,
+};
+
#include <linux/kdebug.h>
#include <linux/percpu.h>
#include <linux/list.h>
diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
index b01644c949b2..3658ace4bd8d 100644
--- a/arch/x86/kernel/hw_breakpoint.c
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -48,7 +48,6 @@ static DEFINE_PER_CPU(unsigned long, cpu_debugreg[HBP_NUM]);
*/
static DEFINE_PER_CPU(struct perf_event *, bp_per_reg[HBP_NUM]);
-
static inline unsigned long
__encode_dr7(int drnum, unsigned int len, unsigned int type)
{
@@ -85,96 +84,112 @@ int decode_dr7(unsigned long dr7, int bpnum, unsigned *len, unsigned *type)
}
/*
- * Install a perf counter breakpoint.
- *
- * We seek a free debug address register and use it for this
- * breakpoint. Eventually we enable it in the debug control register.
- *
- * Atomic: we hold the counter->ctx->lock and we only handle variables
- * and registers local to this cpu.
+ * We seek a slot and change it or keep it based on the action.
+ * Returns slot number on success, negative error on failure.
+ * Must be called with IRQs disabled.
*/
-int arch_install_hw_breakpoint(struct perf_event *bp)
+static int manage_bp_slot(struct perf_event *bp, enum bp_slot_action action)
{
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
- unsigned long *dr7;
- int i;
-
- lockdep_assert_irqs_disabled();
+ struct perf_event *old_bp;
+ struct perf_event *new_bp;
+ int slot;
+
+ switch (action) {
+ case BP_SLOT_ACTION_INSTALL:
+ old_bp = NULL;
+ new_bp = bp;
+ break;
+ case BP_SLOT_ACTION_UNINSTALL:
+ old_bp = bp;
+ new_bp = NULL;
+ break;
+ default:
+ return -EINVAL;
+ }
- for (i = 0; i < HBP_NUM; i++) {
- struct perf_event **slot = this_cpu_ptr(&bp_per_reg[i]);
+ for (slot = 0; slot < HBP_NUM; slot++) {
+ struct perf_event **curr = this_cpu_ptr(&bp_per_reg[slot]);
- if (!*slot) {
- *slot = bp;
- break;
+ if (*curr == old_bp) {
+ *curr = new_bp;
+ return slot;
}
}
- if (WARN_ONCE(i == HBP_NUM, "Can't find any breakpoint slot"))
- return -EBUSY;
+ if (old_bp) {
+ WARN_ONCE(1, "Can't find matching breakpoint slot");
+ return -EINVAL;
+ }
+
+ WARN_ONCE(1, "No free breakpoint slots");
+ return -EBUSY;
+}
+
+static void setup_hwbp(struct arch_hw_breakpoint *info, int slot, bool enable)
+{
+ unsigned long dr7;
- set_debugreg(info->address, i);
- __this_cpu_write(cpu_debugreg[i], info->address);
+ set_debugreg(info->address, slot);
+ __this_cpu_write(cpu_debugreg[slot], info->address);
- dr7 = this_cpu_ptr(&cpu_dr7);
- *dr7 |= encode_dr7(i, info->len, info->type);
+ dr7 = this_cpu_read(cpu_dr7);
+ if (enable)
+ dr7 |= encode_dr7(slot, info->len, info->type);
+ else
+ dr7 &= ~__encode_dr7(slot, info->len, info->type);
/*
- * Ensure we first write cpu_dr7 before we set the DR7 register.
- * This ensures an NMI never see cpu_dr7 0 when DR7 is not.
+ * Enabling:
+ * Ensure we first write cpu_dr7 before we set the DR7 register.
+ * This ensures an NMI never see cpu_dr7 0 when DR7 is not.
*/
+ if (enable)
+ this_cpu_write(cpu_dr7, dr7);
+
barrier();
- set_debugreg(*dr7, 7);
+ set_debugreg(dr7, 7);
+
if (info->mask)
- amd_set_dr_addr_mask(info->mask, i);
+ amd_set_dr_addr_mask(enable ? info->mask : 0, slot);
- return 0;
+ /*
+ * Disabling:
+ * Ensure the write to cpu_dr7 is after we've set the DR7 register.
+ * This ensures an NMI never see cpu_dr7 0 when DR7 is not.
+ */
+ if (!enable)
+ this_cpu_write(cpu_dr7, dr7);
}
/*
- * Uninstall the breakpoint contained in the given counter.
- *
- * First we search the debug address register it uses and then we disable
- * it.
- *
- * Atomic: we hold the counter->ctx->lock and we only handle variables
- * and registers local to this cpu.
+ * find suitable breakpoint slot and set it up based on the action
*/
-void arch_uninstall_hw_breakpoint(struct perf_event *bp)
+static int arch_manage_bp(struct perf_event *bp, enum bp_slot_action action)
{
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
- unsigned long dr7;
- int i;
+ struct arch_hw_breakpoint *info;
+ int slot;
lockdep_assert_irqs_disabled();
- for (i = 0; i < HBP_NUM; i++) {
- struct perf_event **slot = this_cpu_ptr(&bp_per_reg[i]);
-
- if (*slot == bp) {
- *slot = NULL;
- break;
- }
- }
-
- if (WARN_ONCE(i == HBP_NUM, "Can't find any breakpoint slot"))
- return;
+ slot = manage_bp_slot(bp, action);
+ if (slot < 0)
+ return slot;
- dr7 = this_cpu_read(cpu_dr7);
- dr7 &= ~__encode_dr7(i, info->len, info->type);
+ info = counter_arch_bp(bp);
+ setup_hwbp(info, slot, action != BP_SLOT_ACTION_UNINSTALL);
- set_debugreg(dr7, 7);
- if (info->mask)
- amd_set_dr_addr_mask(0, i);
+ return 0;
+}
- /*
- * Ensure the write to cpu_dr7 is after we've set the DR7 register.
- * This ensures an NMI never see cpu_dr7 0 when DR7 is not.
- */
- barrier();
+int arch_install_hw_breakpoint(struct perf_event *bp)
+{
+ return arch_manage_bp(bp, BP_SLOT_ACTION_INSTALL);
+}
- this_cpu_write(cpu_dr7, dr7);
+void arch_uninstall_hw_breakpoint(struct perf_event *bp)
+{
+ arch_manage_bp(bp, BP_SLOT_ACTION_UNINSTALL);
}
static int arch_bp_generic_len(int x86_len)
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 02/27] x86/hw_breakpoint: Add arch_reinstall_hw_breakpoint
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
2025-11-10 16:35 ` [PATCH v8 01/27] x86/hw_breakpoint: Unify breakpoint install/uninstall Jinchao Wang
@ 2025-11-10 16:35 ` Jinchao Wang
2025-11-10 16:35 ` [PATCH v8 03/27] HWBP: Add modify_wide_hw_breakpoint_local() API Jinchao Wang
` (25 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:35 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
The new arch_reinstall_hw_breakpoint() function can be used in an
atomic context, unlike the more expensive free and re-allocation path.
This allows callers to efficiently re-establish an existing breakpoint.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
arch/x86/include/asm/hw_breakpoint.h | 2 ++
arch/x86/kernel/hw_breakpoint.c | 9 +++++++++
2 files changed, 11 insertions(+)
diff --git a/arch/x86/include/asm/hw_breakpoint.h b/arch/x86/include/asm/hw_breakpoint.h
index aa6adac6c3a2..c22cc4e87fc5 100644
--- a/arch/x86/include/asm/hw_breakpoint.h
+++ b/arch/x86/include/asm/hw_breakpoint.h
@@ -21,6 +21,7 @@ struct arch_hw_breakpoint {
enum bp_slot_action {
BP_SLOT_ACTION_INSTALL,
+ BP_SLOT_ACTION_REINSTALL,
BP_SLOT_ACTION_UNINSTALL,
};
@@ -65,6 +66,7 @@ extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
int arch_install_hw_breakpoint(struct perf_event *bp);
+int arch_reinstall_hw_breakpoint(struct perf_event *bp);
void arch_uninstall_hw_breakpoint(struct perf_event *bp);
void hw_breakpoint_pmu_read(struct perf_event *bp);
void hw_breakpoint_pmu_unthrottle(struct perf_event *bp);
diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
index 3658ace4bd8d..29c9369264d4 100644
--- a/arch/x86/kernel/hw_breakpoint.c
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -99,6 +99,10 @@ static int manage_bp_slot(struct perf_event *bp, enum bp_slot_action action)
old_bp = NULL;
new_bp = bp;
break;
+ case BP_SLOT_ACTION_REINSTALL:
+ old_bp = bp;
+ new_bp = bp;
+ break;
case BP_SLOT_ACTION_UNINSTALL:
old_bp = bp;
new_bp = NULL;
@@ -187,6 +191,11 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
return arch_manage_bp(bp, BP_SLOT_ACTION_INSTALL);
}
+int arch_reinstall_hw_breakpoint(struct perf_event *bp)
+{
+ return arch_manage_bp(bp, BP_SLOT_ACTION_REINSTALL);
+}
+
void arch_uninstall_hw_breakpoint(struct perf_event *bp)
{
arch_manage_bp(bp, BP_SLOT_ACTION_UNINSTALL);
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 03/27] HWBP: Add modify_wide_hw_breakpoint_local() API
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
2025-11-10 16:35 ` [PATCH v8 01/27] x86/hw_breakpoint: Unify breakpoint install/uninstall Jinchao Wang
2025-11-10 16:35 ` [PATCH v8 02/27] x86/hw_breakpoint: Add arch_reinstall_hw_breakpoint Jinchao Wang
@ 2025-11-10 16:35 ` Jinchao Wang
2025-11-10 16:35 ` [PATCH v8 04/27] mm/ksw: add build system support Jinchao Wang
` (24 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:35 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Add modify_wide_hw_breakpoint_local() arch-wide interface which allows
hwbp users to update watch address on-line. This is available if the
arch supports CONFIG_HAVE_REINSTALL_HW_BREAKPOINT.
Note that this allows to change the type only for compatible types,
because it does not release and reserve the hwbp slot based on type.
For instance, you can not change HW_BREAKPOINT_W to HW_BREAKPOINT_X.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
arch/Kconfig | 10 ++++++++++
arch/x86/Kconfig | 1 +
include/linux/hw_breakpoint.h | 6 ++++++
kernel/events/hw_breakpoint.c | 37 +++++++++++++++++++++++++++++++++++
4 files changed, 54 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig
index 61130b88964b..c45fe5366125 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -456,6 +456,16 @@ config HAVE_MIXED_BREAKPOINTS_REGS
Select this option if your arch implements breakpoints under the
latter fashion.
+config HAVE_REINSTALL_HW_BREAKPOINT
+ bool
+ depends on HAVE_HW_BREAKPOINT
+ help
+ Depending on the arch implementation of hardware breakpoints,
+ some of them are able to update the breakpoint configuration
+ without release and reserve the hardware breakpoint register.
+ What configuration is able to update depends on hardware and
+ software implementation.
+
config HAVE_USER_RETURN_NOTIFIER
bool
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fa3b616af03a..4d2ef8a45681 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -245,6 +245,7 @@ config X86
select HAVE_FUNCTION_TRACER
select HAVE_GCC_PLUGINS
select HAVE_HW_BREAKPOINT
+ select HAVE_REINSTALL_HW_BREAKPOINT
select HAVE_IOREMAP_PROT
select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
select HAVE_IRQ_TIME_ACCOUNTING
diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index db199d653dd1..ea373f2587f8 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -81,6 +81,9 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
void *context);
+extern int modify_wide_hw_breakpoint_local(struct perf_event *bp,
+ struct perf_event_attr *attr);
+
extern int register_perf_hw_breakpoint(struct perf_event *bp);
extern void unregister_hw_breakpoint(struct perf_event *bp);
extern void unregister_wide_hw_breakpoint(struct perf_event * __percpu *cpu_events);
@@ -124,6 +127,9 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
void *context) { return NULL; }
static inline int
+modify_wide_hw_breakpoint_local(struct perf_event *bp,
+ struct perf_event_attr *attr) { return -ENOSYS; }
+static inline int
register_perf_hw_breakpoint(struct perf_event *bp) { return -ENOSYS; }
static inline void unregister_hw_breakpoint(struct perf_event *bp) { }
static inline void
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 8ec2cb688903..5ee1522a99c9 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -887,6 +887,43 @@ void unregister_wide_hw_breakpoint(struct perf_event * __percpu *cpu_events)
}
EXPORT_SYMBOL_GPL(unregister_wide_hw_breakpoint);
+/**
+ * modify_wide_hw_breakpoint_local - update breakpoint config for local CPU
+ * @bp: the hwbp perf event for this CPU
+ * @attr: the new attribute for @bp
+ *
+ * This does not release and reserve the slot of a HWBP; it just reuses the
+ * current slot on local CPU. So the users must update the other CPUs by
+ * themselves.
+ * Also, since this does not release/reserve the slot, this can not change the
+ * type to incompatible type of the HWBP.
+ * Return err if attr is invalid or the CPU fails to update debug register
+ * for new @attr.
+ */
+#ifdef CONFIG_HAVE_REINSTALL_HW_BREAKPOINT
+int modify_wide_hw_breakpoint_local(struct perf_event *bp,
+ struct perf_event_attr *attr)
+{
+ int ret;
+
+ if (find_slot_idx(bp->attr.bp_type) != find_slot_idx(attr->bp_type))
+ return -EINVAL;
+
+ ret = hw_breakpoint_arch_parse(bp, attr, counter_arch_bp(bp));
+ if (ret)
+ return ret;
+
+ return arch_reinstall_hw_breakpoint(bp);
+}
+#else
+int modify_wide_hw_breakpoint_local(struct perf_event *bp,
+ struct perf_event_attr *attr)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+EXPORT_SYMBOL_GPL(modify_wide_hw_breakpoint_local);
+
/**
* hw_breakpoint_is_used - check if breakpoints are currently used
*
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 04/27] mm/ksw: add build system support
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (2 preceding siblings ...)
2025-11-10 16:35 ` [PATCH v8 03/27] HWBP: Add modify_wide_hw_breakpoint_local() API Jinchao Wang
@ 2025-11-10 16:35 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 05/27] mm/ksw: add ksw_config struct and parser Jinchao Wang
` (23 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:35 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Add Kconfig and Makefile infrastructure.
The implementation is located under `mm/kstackwatch/`.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
include/linux/kstackwatch.h | 5 +++++
mm/Kconfig | 1 +
mm/Makefile | 1 +
mm/kstackwatch/Kconfig | 14 ++++++++++++++
mm/kstackwatch/Makefile | 2 ++
mm/kstackwatch/kernel.c | 23 +++++++++++++++++++++++
mm/kstackwatch/stack.c | 1 +
mm/kstackwatch/watch.c | 1 +
8 files changed, 48 insertions(+)
create mode 100644 include/linux/kstackwatch.h
create mode 100644 mm/kstackwatch/Kconfig
create mode 100644 mm/kstackwatch/Makefile
create mode 100644 mm/kstackwatch/kernel.c
create mode 100644 mm/kstackwatch/stack.c
create mode 100644 mm/kstackwatch/watch.c
diff --git a/include/linux/kstackwatch.h b/include/linux/kstackwatch.h
new file mode 100644
index 000000000000..0273ef478a26
--- /dev/null
+++ b/include/linux/kstackwatch.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _KSTACKWATCH_H
+#define _KSTACKWATCH_H
+
+#endif /* _KSTACKWATCH_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 0e26f4fc8717..61d4e6edadf2 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1373,5 +1373,6 @@ config FIND_NORMAL_PAGE
def_bool n
source "mm/damon/Kconfig"
+source "mm/kstackwatch/Kconfig"
endmenu
diff --git a/mm/Makefile b/mm/Makefile
index 21abb3353550..efc101816f00 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -92,6 +92,7 @@ obj-$(CONFIG_PAGE_POISONING) += page_poison.o
obj-$(CONFIG_KASAN) += kasan/
obj-$(CONFIG_KFENCE) += kfence/
obj-$(CONFIG_KMSAN) += kmsan/
+obj-$(CONFIG_KSTACKWATCH) += kstackwatch/
obj-$(CONFIG_FAILSLAB) += failslab.o
obj-$(CONFIG_FAIL_PAGE_ALLOC) += fail_page_alloc.o
obj-$(CONFIG_MEMTEST) += memtest.o
diff --git a/mm/kstackwatch/Kconfig b/mm/kstackwatch/Kconfig
new file mode 100644
index 000000000000..496caf264f35
--- /dev/null
+++ b/mm/kstackwatch/Kconfig
@@ -0,0 +1,14 @@
+config KSTACKWATCH
+ bool "Kernel Stack Watch"
+ depends on HAVE_HW_BREAKPOINT && KPROBES && FPROBE && STACKTRACE
+ help
+ A lightweight real-time debugging tool to detect stack corruption
+ and abnormal stack usage patterns in the kernel. It monitors stack
+ boundaries and detects overwrites in real time using hardware
+ breakpoints and probe-based instrumentation.
+
+ This feature is intended for kernel developers or advanced users
+ diagnosing rare stack overflow or memory corruption bugs. It may
+ introduce minor overhead during runtime monitoring.
+
+ If unsure, say N.
diff --git a/mm/kstackwatch/Makefile b/mm/kstackwatch/Makefile
new file mode 100644
index 000000000000..c99c621eac02
--- /dev/null
+++ b/mm/kstackwatch/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_KSTACKWATCH) += kstackwatch.o
+kstackwatch-y := kernel.o stack.o watch.o
diff --git a/mm/kstackwatch/kernel.c b/mm/kstackwatch/kernel.c
new file mode 100644
index 000000000000..78f1d019225f
--- /dev/null
+++ b/mm/kstackwatch/kernel.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+
+static int __init kstackwatch_init(void)
+{
+ pr_info("module loaded\n");
+ return 0;
+}
+
+static void __exit kstackwatch_exit(void)
+{
+ pr_info("module unloaded\n");
+}
+
+module_init(kstackwatch_init);
+module_exit(kstackwatch_exit);
+
+MODULE_AUTHOR("Jinchao Wang");
+MODULE_DESCRIPTION("Kernel Stack Watch");
+MODULE_LICENSE("GPL");
+
diff --git a/mm/kstackwatch/stack.c b/mm/kstackwatch/stack.c
new file mode 100644
index 000000000000..cec594032515
--- /dev/null
+++ b/mm/kstackwatch/stack.c
@@ -0,0 +1 @@
+// SPDX-License-Identifier: GPL-2.0
diff --git a/mm/kstackwatch/watch.c b/mm/kstackwatch/watch.c
new file mode 100644
index 000000000000..cec594032515
--- /dev/null
+++ b/mm/kstackwatch/watch.c
@@ -0,0 +1 @@
+// SPDX-License-Identifier: GPL-2.0
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 05/27] mm/ksw: add ksw_config struct and parser
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (3 preceding siblings ...)
2025-11-10 16:35 ` [PATCH v8 04/27] mm/ksw: add build system support Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 06/27] mm/ksw: add singleton debugfs interface Jinchao Wang
` (22 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Add struct ksw_config and ksw_parse_config() to parse user string.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
include/linux/kstackwatch.h | 33 +++++++++++
mm/kstackwatch/kernel.c | 114 ++++++++++++++++++++++++++++++++++++
2 files changed, 147 insertions(+)
diff --git a/include/linux/kstackwatch.h b/include/linux/kstackwatch.h
index 0273ef478a26..dd00c4c8922e 100644
--- a/include/linux/kstackwatch.h
+++ b/include/linux/kstackwatch.h
@@ -2,4 +2,37 @@
#ifndef _KSTACKWATCH_H
#define _KSTACKWATCH_H
+#include <linux/types.h>
+
+#define MAX_CONFIG_STR_LEN 128
+
+struct ksw_config {
+ char *func_name;
+ u16 depth;
+
+ /*
+ * watched variable info:
+ * - func_offset : instruction offset in the function, typically the
+ * assignment of the watched variable, where ksw
+ * registers a kprobe post-handler.
+ * - sp_offset : offset from stack pointer at func_offset. Usually 0.
+ * - watch_len : size of the watched variable (1, 2, 4, or 8 bytes).
+ */
+ u16 func_offset;
+ u16 sp_offset;
+ u16 watch_len;
+
+ /* max number of hwbps that can be used */
+ u16 max_watch;
+
+ /* search canary as watch target automatically */
+ u16 auto_canary;
+
+ /* panic on watchpoint hit */
+ u16 panic_hit;
+
+ /* save to show */
+ char *user_input;
+};
+
#endif /* _KSTACKWATCH_H */
diff --git a/mm/kstackwatch/kernel.c b/mm/kstackwatch/kernel.c
index 78f1d019225f..50104e78cf3d 100644
--- a/mm/kstackwatch/kernel.c
+++ b/mm/kstackwatch/kernel.c
@@ -1,16 +1,130 @@
// SPDX-License-Identifier: GPL-2.0
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/kstackwatch.h>
+#include <linux/kstrtox.h>
+#include <linux/slab.h>
#include <linux/module.h>
+#include <linux/string.h>
+
+static struct ksw_config *ksw_config;
+
+struct param_map {
+ const char *name; /* long name */
+ const char *short_name; /* short name (2 letters) */
+ size_t offset; /* offsetof(struct ksw_config, field) */
+ bool is_string; /* true for string */
+};
+
+/* macro generates both long and short name automatically */
+#define PMAP(field, short, is_str) \
+ { #field, #short, offsetof(struct ksw_config, field), is_str }
+
+static const struct param_map ksw_params[] = {
+ PMAP(func_name, fn, true),
+ PMAP(func_offset, fo, false),
+ PMAP(depth, dp, false),
+ PMAP(max_watch, mw, false),
+ PMAP(sp_offset, so, false),
+ PMAP(watch_len, wl, false),
+ PMAP(auto_canary, ac, false),
+ PMAP(panic_hit, ph, false),
+};
+
+static int ksw_parse_param(struct ksw_config *config, const char *key,
+ const char *val)
+{
+ const struct param_map *pm = NULL;
+ int ret;
+
+ for (int i = 0; i < ARRAY_SIZE(ksw_params); i++) {
+ if (strcmp(key, ksw_params[i].name) == 0 ||
+ strcmp(key, ksw_params[i].short_name) == 0) {
+ pm = &ksw_params[i];
+ break;
+ }
+ }
+
+ if (!pm)
+ return -EINVAL;
+
+ if (pm->is_string) {
+ char **dst = (char **)((char *)config + pm->offset);
+ *dst = kstrdup(val, GFP_KERNEL);
+ if (!*dst)
+ return -ENOMEM;
+ } else {
+ ret = kstrtou16(val, 0, (u16 *)((char *)config + pm->offset));
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+/*
+ * Configuration string format:
+ * param_name=<value> [param_name=<value> ...]
+ *
+ * Required parameters:
+ * - func_name |fn (str) : target function name
+ * - func_offset|fo (u16) : instruction pointer offset
+ *
+ * Optional parameters:
+ * - depth |dp (u16) : recursion depth
+ * - max_watch |mw (u16) : maximum number of watchpoints
+ * - sp_offset |so (u16) : offset from stack pointer at func_offset
+ * - watch_len |wl (u16) : watch length (1,2,4,8)
+ */
+static int __maybe_unused ksw_parse_config(char *buf, struct ksw_config *config)
+{
+ char *part, *key, *val;
+ int ret;
+
+ kfree(config->func_name);
+ kfree(config->user_input);
+ memset(ksw_config, 0, sizeof(*ksw_config));
+
+ buf = strim(buf);
+ config->user_input = kstrdup(buf, GFP_KERNEL);
+ if (!config->user_input)
+ return -ENOMEM;
+
+ while ((part = strsep(&buf, " \t\n")) != NULL) {
+ if (*part == '\0')
+ continue;
+
+ key = strsep(&part, "=");
+ val = part;
+ if (!key || !val)
+ continue;
+ ret = ksw_parse_param(config, key, val);
+ if (ret)
+ pr_warn("unsupported param %s=%s", key, val);
+ }
+
+ if (!config->func_name) {
+ pr_err("Missing required parameters: function or func_offset\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
static int __init kstackwatch_init(void)
{
+ ksw_config = kzalloc(sizeof(*ksw_config), GFP_KERNEL);
+ if (!ksw_config)
+ return -ENOMEM;
+
pr_info("module loaded\n");
return 0;
}
static void __exit kstackwatch_exit(void)
{
+ kfree(ksw_config);
+
pr_info("module unloaded\n");
}
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 06/27] mm/ksw: add singleton debugfs interface
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (4 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 05/27] mm/ksw: add ksw_config struct and parser Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 07/27] mm/ksw: add HWBP pre-allocation Jinchao Wang
` (21 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Provide the debugfs config file to read or update the configuration.
Only a single process can open this file at a time, enforced using atomic
config_file_busy, to prevent concurrent access.
ksw_get_config() exposes the configuration pointer as const.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
include/linux/kstackwatch.h | 3 ++
mm/kstackwatch/kernel.c | 103 ++++++++++++++++++++++++++++++++++--
2 files changed, 103 insertions(+), 3 deletions(-)
diff --git a/include/linux/kstackwatch.h b/include/linux/kstackwatch.h
index dd00c4c8922e..ada5ac64190c 100644
--- a/include/linux/kstackwatch.h
+++ b/include/linux/kstackwatch.h
@@ -35,4 +35,7 @@ struct ksw_config {
char *user_input;
};
+// singleton, only modified in kernel.c
+const struct ksw_config *ksw_get_config(void);
+
#endif /* _KSTACKWATCH_H */
diff --git a/mm/kstackwatch/kernel.c b/mm/kstackwatch/kernel.c
index 50104e78cf3d..87fef139f494 100644
--- a/mm/kstackwatch/kernel.c
+++ b/mm/kstackwatch/kernel.c
@@ -1,13 +1,18 @@
// SPDX-License-Identifier: GPL-2.0
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/debugfs.h>
#include <linux/kstackwatch.h>
#include <linux/kstrtox.h>
#include <linux/slab.h>
#include <linux/module.h>
#include <linux/string.h>
+#include <linux/uaccess.h>
+static atomic_t dbgfs_config_busy = ATOMIC_INIT(0);
static struct ksw_config *ksw_config;
+static struct dentry *dbgfs_config;
+static struct dentry *dbgfs_dir;
struct param_map {
const char *name; /* long name */
@@ -76,7 +81,7 @@ static int ksw_parse_param(struct ksw_config *config, const char *key,
* - sp_offset |so (u16) : offset from stack pointer at func_offset
* - watch_len |wl (u16) : watch length (1,2,4,8)
*/
-static int __maybe_unused ksw_parse_config(char *buf, struct ksw_config *config)
+static int ksw_parse_config(char *buf, struct ksw_config *config)
{
char *part, *key, *val;
int ret;
@@ -111,18 +116,110 @@ static int __maybe_unused ksw_parse_config(char *buf, struct ksw_config *config)
return 0;
}
+static ssize_t ksw_dbgfs_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ return simple_read_from_buffer(buf, count, ppos, ksw_config->user_input,
+ ksw_config->user_input ? strlen(ksw_config->user_input) : 0);
+}
+
+static ssize_t ksw_dbgfs_write(struct file *file, const char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ char input[MAX_CONFIG_STR_LEN];
+ int ret;
+
+ if (count == 0 || count >= sizeof(input))
+ return -EINVAL;
+
+ if (copy_from_user(input, buffer, count))
+ return -EFAULT;
+
+ input[count] = '\0';
+ strim(input);
+
+ if (!strlen(input)) {
+ pr_info("config cleared\n");
+ return count;
+ }
+
+ ret = ksw_parse_config(input, ksw_config);
+ if (ret) {
+ pr_err("Failed to parse config %d\n", ret);
+ return ret;
+ }
+
+ return count;
+}
+
+static int ksw_dbgfs_open(struct inode *inode, struct file *file)
+{
+ if (atomic_cmpxchg(&dbgfs_config_busy, 0, 1))
+ return -EBUSY;
+ return 0;
+}
+
+static int ksw_dbgfs_release(struct inode *inode, struct file *file)
+{
+ atomic_set(&dbgfs_config_busy, 0);
+ return 0;
+}
+
+static const struct file_operations kstackwatch_fops = {
+ .owner = THIS_MODULE,
+ .open = ksw_dbgfs_open,
+ .read = ksw_dbgfs_read,
+ .write = ksw_dbgfs_write,
+ .release = ksw_dbgfs_release,
+ .llseek = default_llseek,
+};
+
+const struct ksw_config *ksw_get_config(void)
+{
+ return ksw_config;
+}
+
static int __init kstackwatch_init(void)
{
+ int ret = 0;
+
ksw_config = kzalloc(sizeof(*ksw_config), GFP_KERNEL);
- if (!ksw_config)
- return -ENOMEM;
+ if (!ksw_config) {
+ ret = -ENOMEM;
+ goto err_alloc;
+ }
+
+ dbgfs_dir = debugfs_create_dir("kstackwatch", NULL);
+ if (!dbgfs_dir) {
+ ret = -ENOMEM;
+ goto err_dir;
+ }
+
+ dbgfs_config = debugfs_create_file("config", 0600, dbgfs_dir, NULL,
+ &kstackwatch_fops);
+ if (!dbgfs_config) {
+ ret = -ENOMEM;
+ goto err_file;
+ }
pr_info("module loaded\n");
return 0;
+
+err_file:
+ debugfs_remove_recursive(dbgfs_dir);
+ dbgfs_dir = NULL;
+err_dir:
+ kfree(ksw_config);
+ ksw_config = NULL;
+err_alloc:
+ return ret;
}
static void __exit kstackwatch_exit(void)
{
+ debugfs_remove_recursive(dbgfs_dir);
+ kfree(ksw_config->func_name);
+ kfree(ksw_config->user_input);
kfree(ksw_config);
pr_info("module unloaded\n");
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 07/27] mm/ksw: add HWBP pre-allocation
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (5 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 06/27] mm/ksw: add singleton debugfs interface Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 08/27] mm/ksw: Add atomic watchpoint management api Jinchao Wang
` (20 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Pre-allocate per-CPU hardware breakpoints at init with a place holder
address, which will be retargeted dynamically in kprobe handler.
This avoids allocation in atomic context.
At most max_watch breakpoints are allocated (0 means no limit).
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
include/linux/kstackwatch.h | 13 ++++++
mm/kstackwatch/watch.c | 93 +++++++++++++++++++++++++++++++++++++
2 files changed, 106 insertions(+)
diff --git a/include/linux/kstackwatch.h b/include/linux/kstackwatch.h
index ada5ac64190c..eb9f2b4f2109 100644
--- a/include/linux/kstackwatch.h
+++ b/include/linux/kstackwatch.h
@@ -2,6 +2,9 @@
#ifndef _KSTACKWATCH_H
#define _KSTACKWATCH_H
+#include <linux/llist.h>
+#include <linux/percpu.h>
+#include <linux/perf_event.h>
#include <linux/types.h>
#define MAX_CONFIG_STR_LEN 128
@@ -38,4 +41,14 @@ struct ksw_config {
// singleton, only modified in kernel.c
const struct ksw_config *ksw_get_config(void);
+/* watch management */
+struct ksw_watchpoint {
+ struct perf_event *__percpu *event;
+ struct perf_event_attr attr;
+ struct llist_node node; // for atomic watch_on and off
+ struct list_head list; // for cpu online and offline
+};
+int ksw_watch_init(void);
+void ksw_watch_exit(void);
+
#endif /* _KSTACKWATCH_H */
diff --git a/mm/kstackwatch/watch.c b/mm/kstackwatch/watch.c
index cec594032515..4947eac32c61 100644
--- a/mm/kstackwatch/watch.c
+++ b/mm/kstackwatch/watch.c
@@ -1 +1,94 @@
// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/cpuhotplug.h>
+#include <linux/hw_breakpoint.h>
+#include <linux/irqflags.h>
+#include <linux/kstackwatch.h>
+#include <linux/mutex.h>
+#include <linux/printk.h>
+
+static LLIST_HEAD(free_wp_list);
+static LIST_HEAD(all_wp_list);
+static DEFINE_MUTEX(all_wp_mutex);
+
+static ulong holder;
+
+static void ksw_watch_handler(struct perf_event *bp,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ pr_err("========== KStackWatch: Caught stack corruption =======\n");
+ pr_err("config %s\n", ksw_get_config()->user_input);
+ dump_stack();
+ pr_err("=================== KStackWatch End ===================\n");
+
+ if (ksw_get_config()->panic_hit)
+ panic("Stack corruption detected");
+}
+
+static int ksw_watch_alloc(void)
+{
+ int max_watch = ksw_get_config()->max_watch;
+ struct ksw_watchpoint *wp;
+ int success = 0;
+ int ret;
+
+ init_llist_head(&free_wp_list);
+
+ //max_watch=0 means at most
+ while (!max_watch || success < max_watch) {
+ wp = kzalloc(sizeof(*wp), GFP_KERNEL);
+ if (!wp)
+ return success > 0 ? success : -EINVAL;
+
+ hw_breakpoint_init(&wp->attr);
+ wp->attr.bp_addr = (ulong)&holder;
+ wp->attr.bp_len = sizeof(ulong);
+ wp->attr.bp_type = HW_BREAKPOINT_W;
+ wp->event = register_wide_hw_breakpoint(&wp->attr,
+ ksw_watch_handler, wp);
+ if (IS_ERR((void *)wp->event)) {
+ ret = PTR_ERR((void *)wp->event);
+ kfree(wp);
+ return success > 0 ? success : ret;
+ }
+ llist_add(&wp->node, &free_wp_list);
+ mutex_lock(&all_wp_mutex);
+ list_add(&wp->list, &all_wp_list);
+ mutex_unlock(&all_wp_mutex);
+ success++;
+ }
+
+ return success;
+}
+
+static void ksw_watch_free(void)
+{
+ struct ksw_watchpoint *wp, *tmp;
+
+ mutex_lock(&all_wp_mutex);
+ list_for_each_entry_safe(wp, tmp, &all_wp_list, list) {
+ list_del(&wp->list);
+ unregister_wide_hw_breakpoint(wp->event);
+ kfree(wp);
+ }
+ mutex_unlock(&all_wp_mutex);
+}
+
+int ksw_watch_init(void)
+{
+ int ret;
+
+ ret = ksw_watch_alloc();
+ if (ret <= 0)
+ return -EBUSY;
+
+
+ return 0;
+}
+
+void ksw_watch_exit(void)
+{
+ ksw_watch_free();
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 08/27] mm/ksw: Add atomic watchpoint management api
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (6 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 07/27] mm/ksw: add HWBP pre-allocation Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 09/27] mm/ksw: ignore false positives from exit trampolines Jinchao Wang
` (19 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Add three functions for atomic lifecycle management of watchpoints:
- ksw_watch_get(): Acquires a watchpoint from a llist.
- ksw_watch_on(): Enables the watchpoint on all online CPUs.
- ksw_watch_off(): Disables the watchpoint and returns it to the llist.
For cross-CPU synchronization, updates are propagated using direct
modification on the local CPU and asynchronous IPIs for remote CPUs.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
include/linux/kstackwatch.h | 4 ++
mm/kstackwatch/watch.c | 85 ++++++++++++++++++++++++++++++++++++-
2 files changed, 88 insertions(+), 1 deletion(-)
diff --git a/include/linux/kstackwatch.h b/include/linux/kstackwatch.h
index eb9f2b4f2109..d7ea89c8c6af 100644
--- a/include/linux/kstackwatch.h
+++ b/include/linux/kstackwatch.h
@@ -44,11 +44,15 @@ const struct ksw_config *ksw_get_config(void);
/* watch management */
struct ksw_watchpoint {
struct perf_event *__percpu *event;
+ call_single_data_t __percpu *csd;
struct perf_event_attr attr;
struct llist_node node; // for atomic watch_on and off
struct list_head list; // for cpu online and offline
};
int ksw_watch_init(void);
void ksw_watch_exit(void);
+int ksw_watch_get(struct ksw_watchpoint **out_wp);
+int ksw_watch_on(struct ksw_watchpoint *wp, ulong watch_addr, u16 watch_len);
+int ksw_watch_off(struct ksw_watchpoint *wp);
#endif /* _KSTACKWATCH_H */
diff --git a/mm/kstackwatch/watch.c b/mm/kstackwatch/watch.c
index 4947eac32c61..3817a172dc25 100644
--- a/mm/kstackwatch/watch.c
+++ b/mm/kstackwatch/watch.c
@@ -27,11 +27,83 @@ static void ksw_watch_handler(struct perf_event *bp,
panic("Stack corruption detected");
}
+static void ksw_watch_on_local_cpu(void *info)
+{
+ struct ksw_watchpoint *wp = info;
+ struct perf_event *bp;
+ ulong flags;
+ int cpu;
+ int ret;
+
+ local_irq_save(flags);
+ cpu = raw_smp_processor_id();
+ bp = per_cpu(*wp->event, cpu);
+ if (!bp) {
+ local_irq_restore(flags);
+ return;
+ }
+
+ ret = modify_wide_hw_breakpoint_local(bp, &wp->attr);
+ local_irq_restore(flags);
+ WARN(ret, "fail to reinstall HWBP on CPU%d ret %d", cpu, ret);
+}
+
+static void ksw_watch_update(struct ksw_watchpoint *wp, ulong addr, u16 len)
+{
+ call_single_data_t *csd;
+ int cur_cpu;
+ int cpu;
+
+ wp->attr.bp_addr = addr;
+ wp->attr.bp_len = len;
+
+ cur_cpu = raw_smp_processor_id();
+ for_each_online_cpu(cpu) {
+ /* remote cpu first */
+ if (cpu == cur_cpu)
+ continue;
+ csd = per_cpu_ptr(wp->csd, cpu);
+ smp_call_function_single_async(cpu, csd);
+ }
+ ksw_watch_on_local_cpu(wp);
+}
+
+int ksw_watch_get(struct ksw_watchpoint **out_wp)
+{
+ struct ksw_watchpoint *wp;
+ struct llist_node *node;
+
+ node = llist_del_first(&free_wp_list);
+ if (!node)
+ return -EBUSY;
+
+ wp = llist_entry(node, struct ksw_watchpoint, node);
+ WARN_ON_ONCE(wp->attr.bp_addr != (u64)&holder);
+
+ *out_wp = wp;
+ return 0;
+}
+int ksw_watch_on(struct ksw_watchpoint *wp, ulong watch_addr, u16 watch_len)
+{
+ ksw_watch_update(wp, watch_addr, watch_len);
+ return 0;
+}
+
+int ksw_watch_off(struct ksw_watchpoint *wp)
+{
+ WARN_ON_ONCE(wp->attr.bp_addr == (u64)&holder);
+ ksw_watch_update(wp, (ulong)&holder, sizeof(ulong));
+ llist_add(&wp->node, &free_wp_list);
+ return 0;
+}
+
static int ksw_watch_alloc(void)
{
int max_watch = ksw_get_config()->max_watch;
struct ksw_watchpoint *wp;
+ call_single_data_t *csd;
int success = 0;
+ int cpu;
int ret;
init_llist_head(&free_wp_list);
@@ -41,6 +113,16 @@ static int ksw_watch_alloc(void)
wp = kzalloc(sizeof(*wp), GFP_KERNEL);
if (!wp)
return success > 0 ? success : -EINVAL;
+ wp->csd = alloc_percpu(call_single_data_t);
+ if (!wp->csd) {
+ kfree(wp);
+ return success > 0 ? success : -EINVAL;
+ }
+
+ for_each_possible_cpu(cpu) {
+ csd = per_cpu_ptr(wp->csd, cpu);
+ INIT_CSD(csd, ksw_watch_on_local_cpu, wp);
+ }
hw_breakpoint_init(&wp->attr);
wp->attr.bp_addr = (ulong)&holder;
@@ -50,6 +132,7 @@ static int ksw_watch_alloc(void)
ksw_watch_handler, wp);
if (IS_ERR((void *)wp->event)) {
ret = PTR_ERR((void *)wp->event);
+ free_percpu(wp->csd);
kfree(wp);
return success > 0 ? success : ret;
}
@@ -71,6 +154,7 @@ static void ksw_watch_free(void)
list_for_each_entry_safe(wp, tmp, &all_wp_list, list) {
list_del(&wp->list);
unregister_wide_hw_breakpoint(wp->event);
+ free_percpu(wp->csd);
kfree(wp);
}
mutex_unlock(&all_wp_mutex);
@@ -84,7 +168,6 @@ int ksw_watch_init(void)
if (ret <= 0)
return -EBUSY;
-
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 09/27] mm/ksw: ignore false positives from exit trampolines
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (7 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 08/27] mm/ksw: Add atomic watchpoint management api Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 10/27] mm/ksw: support CPU hotplug Jinchao Wang
` (18 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Because trampolines run after the watched function returns but before the
exit_handler is called, and in the original stack frame, so the trampoline
code may overwrite the watched stack address.
These false positives should be ignored. is_ftrace_trampoline() does
not cover all trampolines, so add a local check to handle the remaining
cases.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
mm/kstackwatch/watch.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/mm/kstackwatch/watch.c b/mm/kstackwatch/watch.c
index 3817a172dc25..f922b4164be5 100644
--- a/mm/kstackwatch/watch.c
+++ b/mm/kstackwatch/watch.c
@@ -2,6 +2,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/cpuhotplug.h>
+#include <linux/ftrace.h>
#include <linux/hw_breakpoint.h>
#include <linux/irqflags.h>
#include <linux/kstackwatch.h>
@@ -14,10 +15,46 @@ static DEFINE_MUTEX(all_wp_mutex);
static ulong holder;
+#define TRAMPOLINE_NAME "return_to_handler"
+#define TRAMPOLINE_DEPTH 16
+
+/* Resolved once, then reused */
+static unsigned long tramp_start, tramp_end;
+
+static void ksw_watch_resolve_trampoline(void)
+{
+ unsigned long sz, off;
+
+ if (likely(tramp_start && tramp_end))
+ return;
+
+ tramp_start = kallsyms_lookup_name(TRAMPOLINE_NAME);
+ if (tramp_start && kallsyms_lookup_size_offset(tramp_start, &sz, &off))
+ tramp_end = tramp_start + sz;
+}
+
+static bool ksw_watch_in_trampoline(unsigned long ip)
+{
+ if (tramp_start && tramp_end && ip >= tramp_start && ip < tramp_end)
+ return true;
+ return false;
+}
static void ksw_watch_handler(struct perf_event *bp,
struct perf_sample_data *data,
struct pt_regs *regs)
{
+ unsigned long entries[TRAMPOLINE_DEPTH];
+ int i, nr = 0;
+
+ nr = stack_trace_save_regs(regs, entries, TRAMPOLINE_DEPTH, 0);
+ for (i = 0; i < nr; i++) {
+ //ignore trampoline
+ if (is_ftrace_trampoline(entries[i]))
+ return;
+ if (ksw_watch_in_trampoline(entries[i]))
+ return;
+ }
+
pr_err("========== KStackWatch: Caught stack corruption =======\n");
pr_err("config %s\n", ksw_get_config()->user_input);
dump_stack();
@@ -164,6 +201,7 @@ int ksw_watch_init(void)
{
int ret;
+ ksw_watch_resolve_trampoline();
ret = ksw_watch_alloc();
if (ret <= 0)
return -EBUSY;
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 10/27] mm/ksw: support CPU hotplug
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (8 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 09/27] mm/ksw: ignore false positives from exit trampolines Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 11/27] sched/ksw: add per-task context Jinchao Wang
` (17 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Register CPU online/offline callbacks via cpuhp_setup_state_nocalls()
so stack watches are installed/removed dynamically as CPUs come online
or go offline.
When a new CPU comes online, register a hardware breakpoint for the holder,
avoiding races with watch_on()/watch_off() that may run on another CPU. The
watch address will be updated the next time watch_on() is called.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
mm/kstackwatch/watch.c | 52 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/mm/kstackwatch/watch.c b/mm/kstackwatch/watch.c
index f922b4164be5..99184f63d7e3 100644
--- a/mm/kstackwatch/watch.c
+++ b/mm/kstackwatch/watch.c
@@ -85,6 +85,48 @@ static void ksw_watch_on_local_cpu(void *info)
WARN(ret, "fail to reinstall HWBP on CPU%d ret %d", cpu, ret);
}
+static int ksw_watch_cpu_online(unsigned int cpu)
+{
+ struct perf_event_attr attr;
+ struct ksw_watchpoint *wp;
+ call_single_data_t *csd;
+ struct perf_event *bp;
+
+ mutex_lock(&all_wp_mutex);
+ list_for_each_entry(wp, &all_wp_list, list) {
+ attr = wp->attr;
+ attr.bp_addr = (u64)&holder;
+ bp = perf_event_create_kernel_counter(&attr, cpu, NULL,
+ ksw_watch_handler, wp);
+ if (IS_ERR(bp)) {
+ pr_warn("%s failed to create watch on CPU %d: %ld\n",
+ __func__, cpu, PTR_ERR(bp));
+ continue;
+ }
+
+ per_cpu(*wp->event, cpu) = bp;
+ csd = per_cpu_ptr(wp->csd, cpu);
+ INIT_CSD(csd, ksw_watch_on_local_cpu, wp);
+ }
+ mutex_unlock(&all_wp_mutex);
+ return 0;
+}
+
+static int ksw_watch_cpu_offline(unsigned int cpu)
+{
+ struct ksw_watchpoint *wp;
+ struct perf_event *bp;
+
+ mutex_lock(&all_wp_mutex);
+ list_for_each_entry(wp, &all_wp_list, list) {
+ bp = per_cpu(*wp->event, cpu);
+ if (bp)
+ unregister_hw_breakpoint(bp);
+ }
+ mutex_unlock(&all_wp_mutex);
+ return 0;
+}
+
static void ksw_watch_update(struct ksw_watchpoint *wp, ulong addr, u16 len)
{
call_single_data_t *csd;
@@ -206,6 +248,16 @@ int ksw_watch_init(void)
if (ret <= 0)
return -EBUSY;
+ ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+ "kstackwatch:online",
+ ksw_watch_cpu_online,
+ ksw_watch_cpu_offline);
+ if (ret < 0) {
+ ksw_watch_free();
+ pr_err("Failed to register CPU hotplug notifier\n");
+ return ret;
+ }
+
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 11/27] sched/ksw: add per-task context
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (9 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 10/27] mm/ksw: support CPU hotplug Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 12/27] mm/ksw: add entry kprobe and exit fprobe management Jinchao Wang
` (16 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Introduce struct ksw_ctx to enable lockless per-task state
tracking. This is required because KStackWatch operates in NMI context
(via kprobe handler) where traditional locking is unsafe.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
include/linux/kstackwatch_types.h | 14 ++++++++++++++
include/linux/sched.h | 5 +++++
2 files changed, 19 insertions(+)
create mode 100644 include/linux/kstackwatch_types.h
diff --git a/include/linux/kstackwatch_types.h b/include/linux/kstackwatch_types.h
new file mode 100644
index 000000000000..8c4e9b0f0c6a
--- /dev/null
+++ b/include/linux/kstackwatch_types.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_KSTACKWATCH_TYPES_H
+#define _LINUX_KSTACKWATCH_TYPES_H
+#include <linux/types.h>
+
+struct ksw_watchpoint;
+struct ksw_ctx {
+ struct ksw_watchpoint *wp;
+ ulong sp;
+ u16 depth;
+ u16 generation;
+};
+
+#endif /* _LINUX_KSTACKWATCH_TYPES_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b469878de25c..db49325428b3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -22,6 +22,7 @@
#include <linux/sem_types.h>
#include <linux/shm.h>
#include <linux/kmsan_types.h>
+#include <linux/kstackwatch_types.h>
#include <linux/mutex_types.h>
#include <linux/plist_types.h>
#include <linux/hrtimer_types.h>
@@ -1487,6 +1488,10 @@ struct task_struct {
struct kmsan_ctx kmsan_ctx;
#endif
+#if IS_ENABLED(CONFIG_KSTACKWATCH)
+ struct ksw_ctx ksw_ctx;
+#endif
+
#if IS_ENABLED(CONFIG_KUNIT)
struct kunit *kunit_test;
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 12/27] mm/ksw: add entry kprobe and exit fprobe management
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (10 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 11/27] sched/ksw: add per-task context Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 13/27] mm/ksw: add per-task ctx tracking Jinchao Wang
` (15 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Provide ksw_stack_init() and ksw_stack_exit() to manage entry and exit
probes for the target function from ksw_get_config().
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
include/linux/kstackwatch.h | 4 ++
mm/kstackwatch/stack.c | 100 ++++++++++++++++++++++++++++++++++++
2 files changed, 104 insertions(+)
diff --git a/include/linux/kstackwatch.h b/include/linux/kstackwatch.h
index d7ea89c8c6af..afedd9823de9 100644
--- a/include/linux/kstackwatch.h
+++ b/include/linux/kstackwatch.h
@@ -41,6 +41,10 @@ struct ksw_config {
// singleton, only modified in kernel.c
const struct ksw_config *ksw_get_config(void);
+/* stack management */
+int ksw_stack_init(void);
+void ksw_stack_exit(void);
+
/* watch management */
struct ksw_watchpoint {
struct perf_event *__percpu *event;
diff --git a/mm/kstackwatch/stack.c b/mm/kstackwatch/stack.c
index cec594032515..3aa02f8370af 100644
--- a/mm/kstackwatch/stack.c
+++ b/mm/kstackwatch/stack.c
@@ -1 +1,101 @@
// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/atomic.h>
+#include <linux/fprobe.h>
+#include <linux/kprobes.h>
+#include <linux/kstackwatch.h>
+#include <linux/kstackwatch_types.h>
+#include <linux/printk.h>
+
+static struct kprobe entry_probe;
+static struct fprobe exit_probe;
+
+static int ksw_stack_prepare_watch(struct pt_regs *regs,
+ const struct ksw_config *config,
+ ulong *watch_addr, u16 *watch_len)
+{
+ /* implement logic will be added in following patches */
+ *watch_addr = 0;
+ *watch_len = 0;
+ return 0;
+}
+
+static void ksw_stack_entry_handler(struct kprobe *p, struct pt_regs *regs,
+ unsigned long flags)
+{
+ struct ksw_ctx *ctx = ¤t->ksw_ctx;
+ ulong watch_addr;
+ u16 watch_len;
+ int ret;
+
+ ret = ksw_watch_get(&ctx->wp);
+ if (ret)
+ return;
+
+ ret = ksw_stack_prepare_watch(regs, ksw_get_config(), &watch_addr,
+ &watch_len);
+ if (ret) {
+ ksw_watch_off(ctx->wp);
+ ctx->wp = NULL;
+ pr_err("failed to prepare watch target: %d\n", ret);
+ return;
+ }
+
+ ret = ksw_watch_on(ctx->wp, watch_addr, watch_len);
+ if (ret) {
+ pr_err("failed to watch on depth:%d addr:0x%lx len:%u %d\n",
+ ksw_get_config()->depth, watch_addr, watch_len, ret);
+ return;
+ }
+
+}
+
+static void ksw_stack_exit_handler(struct fprobe *fp, unsigned long ip,
+ unsigned long ret_ip,
+ struct ftrace_regs *regs, void *data)
+{
+ struct ksw_ctx *ctx = ¤t->ksw_ctx;
+
+
+ if (ctx->wp) {
+ ksw_watch_off(ctx->wp);
+ ctx->wp = NULL;
+ ctx->sp = 0;
+ }
+}
+
+int ksw_stack_init(void)
+{
+ int ret;
+ char *symbuf = NULL;
+
+ memset(&entry_probe, 0, sizeof(entry_probe));
+ entry_probe.symbol_name = ksw_get_config()->func_name;
+ entry_probe.offset = ksw_get_config()->func_offset;
+ entry_probe.post_handler = ksw_stack_entry_handler;
+ ret = register_kprobe(&entry_probe);
+ if (ret) {
+ pr_err("failed to register kprobe ret %d\n", ret);
+ return ret;
+ }
+
+ memset(&exit_probe, 0, sizeof(exit_probe));
+ exit_probe.exit_handler = ksw_stack_exit_handler;
+ symbuf = (char *)ksw_get_config()->func_name;
+
+ ret = register_fprobe_syms(&exit_probe, (const char **)&symbuf, 1);
+ if (ret < 0) {
+ pr_err("failed to register fprobe ret %d\n", ret);
+ unregister_kprobe(&entry_probe);
+ return ret;
+ }
+
+ return 0;
+}
+
+void ksw_stack_exit(void)
+{
+ unregister_fprobe(&exit_probe);
+ unregister_kprobe(&entry_probe);
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 13/27] mm/ksw: add per-task ctx tracking
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (11 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 12/27] mm/ksw: add entry kprobe and exit fprobe management Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 14/27] mm/ksw: resolve stack watch addr and len Jinchao Wang
` (14 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Each task tracks its depth, stack pointer, and generation. A watchpoint is
enabled only when the configured depth is reached, and disabled on function
exit.
The context is reset when probes are disabled, generation changes, or exit
depth becomes inconsistent.
Duplicate arming on the same frame is skipped.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
mm/kstackwatch/stack.c | 67 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 67 insertions(+)
diff --git a/mm/kstackwatch/stack.c b/mm/kstackwatch/stack.c
index 3aa02f8370af..96014eb4cb12 100644
--- a/mm/kstackwatch/stack.c
+++ b/mm/kstackwatch/stack.c
@@ -11,6 +11,53 @@
static struct kprobe entry_probe;
static struct fprobe exit_probe;
+static bool probe_enable;
+static u16 probe_generation;
+
+static void ksw_reset_ctx(void)
+{
+ struct ksw_ctx *ctx = ¤t->ksw_ctx;
+
+ if (ctx->wp)
+ ksw_watch_off(ctx->wp);
+
+ ctx->wp = NULL;
+ ctx->sp = 0;
+ ctx->depth = 0;
+ ctx->generation = READ_ONCE(probe_generation);
+}
+
+static bool ksw_stack_check_ctx(bool entry)
+{
+ struct ksw_ctx *ctx = ¤t->ksw_ctx;
+ u16 cur_enable = READ_ONCE(probe_enable);
+ u16 cur_generation = READ_ONCE(probe_generation);
+ u16 cur_depth, target_depth = ksw_get_config()->depth;
+
+ if (!cur_enable) {
+ ksw_reset_ctx();
+ return false;
+ }
+
+ if (ctx->generation != cur_generation)
+ ksw_reset_ctx();
+
+ if (!entry && !ctx->depth) {
+ ksw_reset_ctx();
+ return false;
+ }
+
+ if (entry)
+ cur_depth = ctx->depth++;
+ else
+ cur_depth = --ctx->depth;
+
+ if (cur_depth == target_depth)
+ return true;
+ else
+ return false;
+}
+
static int ksw_stack_prepare_watch(struct pt_regs *regs,
const struct ksw_config *config,
ulong *watch_addr, u16 *watch_len)
@@ -25,10 +72,22 @@ static void ksw_stack_entry_handler(struct kprobe *p, struct pt_regs *regs,
unsigned long flags)
{
struct ksw_ctx *ctx = ¤t->ksw_ctx;
+ ulong stack_pointer;
ulong watch_addr;
u16 watch_len;
int ret;
+ stack_pointer = kernel_stack_pointer(regs);
+
+ /*
+ * triggered more than once, may be in a loop
+ */
+ if (ctx->wp && ctx->sp == stack_pointer)
+ return;
+
+ if (!ksw_stack_check_ctx(true))
+ return;
+
ret = ksw_watch_get(&ctx->wp);
if (ret)
return;
@@ -49,6 +108,7 @@ static void ksw_stack_entry_handler(struct kprobe *p, struct pt_regs *regs,
return;
}
+ ctx->sp = stack_pointer;
}
static void ksw_stack_exit_handler(struct fprobe *fp, unsigned long ip,
@@ -57,6 +117,8 @@ static void ksw_stack_exit_handler(struct fprobe *fp, unsigned long ip,
{
struct ksw_ctx *ctx = ¤t->ksw_ctx;
+ if (!ksw_stack_check_ctx(false))
+ return;
if (ctx->wp) {
ksw_watch_off(ctx->wp);
@@ -91,11 +153,16 @@ int ksw_stack_init(void)
return ret;
}
+ WRITE_ONCE(probe_generation, READ_ONCE(probe_generation) + 1);
+ WRITE_ONCE(probe_enable, true);
+
return 0;
}
void ksw_stack_exit(void)
{
+ WRITE_ONCE(probe_enable, false);
+ WRITE_ONCE(probe_generation, READ_ONCE(probe_generation) + 1);
unregister_fprobe(&exit_probe);
unregister_kprobe(&entry_probe);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 14/27] mm/ksw: resolve stack watch addr and len
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (12 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 13/27] mm/ksw: add per-task ctx tracking Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 15/27] mm/ksw: limit canary search to current stack frame Jinchao Wang
` (13 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Add helpers to find the stack canary or a local variable addr and len
for the probed function based on ksw_get_config(). For canary search,
limits search to a fixed number of steps to avoid scanning the entire
stack. Validates that the computed address and length are within the
kernel stack.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
mm/kstackwatch/stack.c | 80 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 77 insertions(+), 3 deletions(-)
diff --git a/mm/kstackwatch/stack.c b/mm/kstackwatch/stack.c
index 96014eb4cb12..60371b292915 100644
--- a/mm/kstackwatch/stack.c
+++ b/mm/kstackwatch/stack.c
@@ -8,6 +8,7 @@
#include <linux/kstackwatch_types.h>
#include <linux/printk.h>
+#define MAX_CANARY_SEARCH_STEPS 128
static struct kprobe entry_probe;
static struct fprobe exit_probe;
@@ -58,13 +59,86 @@ static bool ksw_stack_check_ctx(bool entry)
return false;
}
+static unsigned long ksw_find_stack_canary_addr(struct pt_regs *regs)
+{
+ unsigned long *stack_ptr, *stack_end, *stack_base;
+ unsigned long expected_canary;
+ unsigned int i;
+
+ stack_ptr = (unsigned long *)kernel_stack_pointer(regs);
+
+ stack_base = (unsigned long *)(current->stack);
+
+ // TODO: limit it to the current frame
+ stack_end = (unsigned long *)((char *)current->stack + THREAD_SIZE);
+
+ expected_canary = current->stack_canary;
+
+ if (stack_ptr < stack_base || stack_ptr >= stack_end) {
+ pr_err("Stack pointer 0x%lx out of bounds [0x%lx, 0x%lx)\n",
+ (unsigned long)stack_ptr, (unsigned long)stack_base,
+ (unsigned long)stack_end);
+ return 0;
+ }
+
+ for (i = 0; i < MAX_CANARY_SEARCH_STEPS; i++) {
+ if (&stack_ptr[i] >= stack_end)
+ break;
+
+ if (stack_ptr[i] == expected_canary) {
+ pr_debug("canary found i:%d 0x%lx\n", i,
+ (unsigned long)&stack_ptr[i]);
+ return (unsigned long)&stack_ptr[i];
+ }
+ }
+
+ pr_debug("canary not found in first %d steps\n",
+ MAX_CANARY_SEARCH_STEPS);
+ return 0;
+}
+
+static int ksw_stack_validate_addr(unsigned long addr, size_t size)
+{
+ unsigned long stack_start, stack_end;
+
+ if (!addr || !size)
+ return -EINVAL;
+
+ stack_start = (unsigned long)current->stack;
+ stack_end = stack_start + THREAD_SIZE;
+
+ if (addr < stack_start || (addr + size) > stack_end)
+ return -ERANGE;
+
+ return 0;
+}
+
static int ksw_stack_prepare_watch(struct pt_regs *regs,
const struct ksw_config *config,
ulong *watch_addr, u16 *watch_len)
{
- /* implement logic will be added in following patches */
- *watch_addr = 0;
- *watch_len = 0;
+ ulong addr;
+ u16 len;
+
+ if (ksw_get_config()->auto_canary) {
+ addr = ksw_find_stack_canary_addr(regs);
+ if (!addr)
+ return -EINVAL;
+ len = sizeof(ulong);
+ } else {
+ addr = kernel_stack_pointer(regs) + ksw_get_config()->sp_offset;
+ len = ksw_get_config()->watch_len;
+ if (!len)
+ len = sizeof(ulong);
+ }
+
+ if (ksw_stack_validate_addr(addr, len)) {
+ pr_err("invalid stack addr:0x%lx len :%u\n", addr, len);
+ return -EINVAL;
+ }
+
+ *watch_addr = addr;
+ *watch_len = len;
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 15/27] mm/ksw: limit canary search to current stack frame
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (13 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 14/27] mm/ksw: resolve stack watch addr and len Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 16/27] mm/ksw: manage probe and HWBP lifecycle via procfs Jinchao Wang
` (12 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Use the compiler-provided frame pointer when CONFIG_FRAME_POINTER is
enabled to restrict the stack canary search range to the current
function frame. This prevents scanning beyond valid stack bounds and
improves reliability across architectures.
Also add explicit handling for missing CONFIG_STACKPROTECTOR and make
the failure message more visible.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
mm/kstackwatch/stack.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/mm/kstackwatch/stack.c b/mm/kstackwatch/stack.c
index 60371b292915..3455d1e70db9 100644
--- a/mm/kstackwatch/stack.c
+++ b/mm/kstackwatch/stack.c
@@ -64,15 +64,32 @@ static unsigned long ksw_find_stack_canary_addr(struct pt_regs *regs)
unsigned long *stack_ptr, *stack_end, *stack_base;
unsigned long expected_canary;
unsigned int i;
+#ifdef CONFIG_FRAME_POINTER
+ unsigned long *fp = NULL;
+#endif
stack_ptr = (unsigned long *)kernel_stack_pointer(regs);
-
stack_base = (unsigned long *)(current->stack);
- // TODO: limit it to the current frame
stack_end = (unsigned long *)((char *)current->stack + THREAD_SIZE);
+#ifdef CONFIG_FRAME_POINTER
+ /*
+ * Use the compiler-provided frame pointer.
+ * Limit the search to the current frame
+ * Works on any arch that keeps FP when CONFIG_FRAME_POINTER=y.
+ */
+ fp = __builtin_frame_address(0);
+ if (fp > stack_ptr && fp < stack_end)
+ stack_end = fp;
+#endif
+
+#ifdef CONFIG_STACKPROTECTOR
expected_canary = current->stack_canary;
+#else
+ pr_err("no canary without CONFIG_STACKPROTECTOR\n");
+ return 0;
+#endif
if (stack_ptr < stack_base || stack_ptr >= stack_end) {
pr_err("Stack pointer 0x%lx out of bounds [0x%lx, 0x%lx)\n",
@@ -85,15 +102,11 @@ static unsigned long ksw_find_stack_canary_addr(struct pt_regs *regs)
if (&stack_ptr[i] >= stack_end)
break;
- if (stack_ptr[i] == expected_canary) {
- pr_debug("canary found i:%d 0x%lx\n", i,
- (unsigned long)&stack_ptr[i]);
+ if (stack_ptr[i] == expected_canary)
return (unsigned long)&stack_ptr[i];
- }
}
- pr_debug("canary not found in first %d steps\n",
- MAX_CANARY_SEARCH_STEPS);
+ pr_err("canary not found in first %d steps\n", MAX_CANARY_SEARCH_STEPS);
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 16/27] mm/ksw: manage probe and HWBP lifecycle via procfs
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (14 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 15/27] mm/ksw: limit canary search to current stack frame Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 17/27] mm/ksw: add KSTACKWATCH_PROFILING to measure probe cost Jinchao Wang
` (11 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Allow dynamic enabling/disabling of KStackWatch through user input of proc.
With this patch, the entire system becomes functional.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
mm/kstackwatch/kernel.c | 60 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 58 insertions(+), 2 deletions(-)
diff --git a/mm/kstackwatch/kernel.c b/mm/kstackwatch/kernel.c
index 87fef139f494..a0e676e60692 100644
--- a/mm/kstackwatch/kernel.c
+++ b/mm/kstackwatch/kernel.c
@@ -14,6 +14,43 @@ static struct ksw_config *ksw_config;
static struct dentry *dbgfs_config;
static struct dentry *dbgfs_dir;
+static bool watching_active;
+
+static int ksw_start_watching(void)
+{
+ int ret;
+
+ /*
+ * Watch init will preallocate the HWBP,
+ * so it must happen before stack init
+ */
+ ret = ksw_watch_init();
+ if (ret) {
+ pr_err("ksw_watch_init ret: %d\n", ret);
+ return ret;
+ }
+
+ ret = ksw_stack_init();
+ if (ret) {
+ pr_err("ksw_stack_init ret: %d\n", ret);
+ ksw_watch_exit();
+ return ret;
+ }
+ watching_active = true;
+
+ pr_info("start watching: %s\n", ksw_config->user_input);
+ return 0;
+}
+
+static void ksw_stop_watching(void)
+{
+ ksw_stack_exit();
+ ksw_watch_exit();
+ watching_active = false;
+
+ pr_info("stop watching: %s\n", ksw_config->user_input);
+}
+
struct param_map {
const char *name; /* long name */
const char *short_name; /* short name (2 letters) */
@@ -119,8 +156,18 @@ static int ksw_parse_config(char *buf, struct ksw_config *config)
static ssize_t ksw_dbgfs_read(struct file *file, char __user *buf, size_t count,
loff_t *ppos)
{
- return simple_read_from_buffer(buf, count, ppos, ksw_config->user_input,
- ksw_config->user_input ? strlen(ksw_config->user_input) : 0);
+ const char *out;
+ size_t len;
+
+ if (watching_active && ksw_config->user_input) {
+ out = ksw_config->user_input;
+ len = strlen(out);
+ } else {
+ out = "not watching\n";
+ len = strlen(out);
+ }
+
+ return simple_read_from_buffer(buf, count, ppos, out, len);
}
static ssize_t ksw_dbgfs_write(struct file *file, const char __user *buffer,
@@ -135,6 +182,9 @@ static ssize_t ksw_dbgfs_write(struct file *file, const char __user *buffer,
if (copy_from_user(input, buffer, count))
return -EFAULT;
+ if (watching_active)
+ ksw_stop_watching();
+
input[count] = '\0';
strim(input);
@@ -149,6 +199,12 @@ static ssize_t ksw_dbgfs_write(struct file *file, const char __user *buffer,
return ret;
}
+ ret = ksw_start_watching();
+ if (ret) {
+ pr_err("Failed to start watching with %d\n", ret);
+ return ret;
+ }
+
return count;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 17/27] mm/ksw: add KSTACKWATCH_PROFILING to measure probe cost
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (15 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 16/27] mm/ksw: manage probe and HWBP lifecycle via procfs Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 18/27] arm64/hw_breakpoint: Add arch_reinstall_hw_breakpoint Jinchao Wang
` (10 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
CONFIG_KSTACKWATCH_PROFILING enables runtime measurement of KStackWatch
probe latencies. When profiling is enabled, KStackWatch collects
entry/exit latencies in its probe callbacks. When KStackWatch is
disabled by clearing its config file, the previously collected statistics
are printed.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
mm/kstackwatch/Kconfig | 10 +++
mm/kstackwatch/stack.c | 185 ++++++++++++++++++++++++++++++++++++++---
2 files changed, 183 insertions(+), 12 deletions(-)
diff --git a/mm/kstackwatch/Kconfig b/mm/kstackwatch/Kconfig
index 496caf264f35..3c9385a15c33 100644
--- a/mm/kstackwatch/Kconfig
+++ b/mm/kstackwatch/Kconfig
@@ -12,3 +12,13 @@ config KSTACKWATCH
introduce minor overhead during runtime monitoring.
If unsure, say N.
+
+config KSTACKWATCH_PROFILING
+ bool "KStackWatch profiling"
+ depends on KSTACKWATCH
+ help
+ Measure probe latency and overhead in KStackWatch. It records
+ entry/exit probe times (ns and cycles) and shows statistics when
+ stopping. Useful for performance tuning, not for production use.
+
+ If unsure, say N.
diff --git a/mm/kstackwatch/stack.c b/mm/kstackwatch/stack.c
index 3455d1e70db9..72ae2d3adeec 100644
--- a/mm/kstackwatch/stack.c
+++ b/mm/kstackwatch/stack.c
@@ -6,7 +6,10 @@
#include <linux/kprobes.h>
#include <linux/kstackwatch.h>
#include <linux/kstackwatch_types.h>
+#include <linux/ktime.h>
+#include <linux/percpu.h>
#include <linux/printk.h>
+#include <linux/timex.h>
#define MAX_CANARY_SEARCH_STEPS 128
static struct kprobe entry_probe;
@@ -15,6 +18,120 @@ static struct fprobe exit_probe;
static bool probe_enable;
static u16 probe_generation;
+#ifdef CONFIG_KSTACKWATCH_PROFILING
+struct measure_data {
+ u64 total_entry_with_watch_ns;
+ u64 total_entry_with_watch_cycles;
+ u64 total_entry_without_watch_ns;
+ u64 total_entry_without_watch_cycles;
+ u64 total_exit_with_watch_ns;
+ u64 total_exit_with_watch_cycles;
+ u64 total_exit_without_watch_ns;
+ u64 total_exit_without_watch_cycles;
+ u64 entry_with_watch_count;
+ u64 entry_without_watch_count;
+ u64 exit_with_watch_count;
+ u64 exit_without_watch_count;
+};
+
+static DEFINE_PER_CPU(struct measure_data, measure_stats);
+
+struct measure_ctx {
+ u64 ns_start;
+ u64 cycles_start;
+};
+
+static __always_inline void measure_start(struct measure_ctx *ctx)
+{
+ ctx->ns_start = ktime_get_ns();
+ ctx->cycles_start = get_cycles();
+}
+
+static __always_inline void measure_end(struct measure_ctx *ctx, u64 *total_ns,
+ u64 *total_cycles, u64 *count)
+{
+ u64 ns_end = ktime_get_ns();
+ u64 c_end = get_cycles();
+
+ *total_ns += ns_end - ctx->ns_start;
+ *total_cycles += c_end - ctx->cycles_start;
+ (*count)++;
+}
+
+static void show_measure_stats(void)
+{
+ int cpu;
+ struct measure_data sum = {};
+
+ for_each_possible_cpu(cpu) {
+ struct measure_data *md = per_cpu_ptr(&measure_stats, cpu);
+
+ sum.total_entry_with_watch_ns += md->total_entry_with_watch_ns;
+ sum.total_entry_with_watch_cycles +=
+ md->total_entry_with_watch_cycles;
+ sum.total_entry_without_watch_ns +=
+ md->total_entry_without_watch_ns;
+ sum.total_entry_without_watch_cycles +=
+ md->total_entry_without_watch_cycles;
+
+ sum.total_exit_with_watch_ns += md->total_exit_with_watch_ns;
+ sum.total_exit_with_watch_cycles +=
+ md->total_exit_with_watch_cycles;
+ sum.total_exit_without_watch_ns +=
+ md->total_exit_without_watch_ns;
+ sum.total_exit_without_watch_cycles +=
+ md->total_exit_without_watch_cycles;
+
+ sum.entry_with_watch_count += md->entry_with_watch_count;
+ sum.entry_without_watch_count += md->entry_without_watch_count;
+ sum.exit_with_watch_count += md->exit_with_watch_count;
+ sum.exit_without_watch_count += md->exit_without_watch_count;
+ }
+
+#define AVG(ns, cnt) ((cnt) ? ((ns) / (cnt)) : 0ULL)
+
+ pr_info("entry (with watch): %llu ns, %llu cycles (%llu samples)\n",
+ AVG(sum.total_entry_with_watch_ns, sum.entry_with_watch_count),
+ AVG(sum.total_entry_with_watch_cycles,
+ sum.entry_with_watch_count),
+ sum.entry_with_watch_count);
+
+ pr_info("entry (without watch): %llu ns, %llu cycles (%llu samples)\n",
+ AVG(sum.total_entry_without_watch_ns,
+ sum.entry_without_watch_count),
+ AVG(sum.total_entry_without_watch_cycles,
+ sum.entry_without_watch_count),
+ sum.entry_without_watch_count);
+
+ pr_info("exit (with watch): %llu ns, %llu cycles (%llu samples)\n",
+ AVG(sum.total_exit_with_watch_ns, sum.exit_with_watch_count),
+ AVG(sum.total_exit_with_watch_cycles,
+ sum.exit_with_watch_count),
+ sum.exit_with_watch_count);
+
+ pr_info("exit (without watch): %llu ns, %llu cycles (%llu samples)\n",
+ AVG(sum.total_exit_without_watch_ns,
+ sum.exit_without_watch_count),
+ AVG(sum.total_exit_without_watch_cycles,
+ sum.exit_without_watch_count),
+ sum.exit_without_watch_count);
+}
+
+static void reset_measure_stats(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct measure_data *md = per_cpu_ptr(&measure_stats, cpu);
+
+ memset(md, 0, sizeof(*md));
+ }
+
+ pr_info("measure stats reset.\n");
+}
+
+#endif
+
static void ksw_reset_ctx(void)
{
struct ksw_ctx *ctx = ¤t->ksw_ctx;
@@ -159,25 +276,28 @@ static void ksw_stack_entry_handler(struct kprobe *p, struct pt_regs *regs,
unsigned long flags)
{
struct ksw_ctx *ctx = ¤t->ksw_ctx;
- ulong stack_pointer;
- ulong watch_addr;
+ ulong stack_pointer, watch_addr;
u16 watch_len;
int ret;
+#ifdef CONFIG_KSTACKWATCH_PROFILING
+ struct measure_ctx m;
+ struct measure_data *md = this_cpu_ptr(&measure_stats);
+ bool watched = false;
+
+ measure_start(&m);
+#endif
stack_pointer = kernel_stack_pointer(regs);
- /*
- * triggered more than once, may be in a loop
- */
if (ctx->wp && ctx->sp == stack_pointer)
- return;
+ goto out;
if (!ksw_stack_check_ctx(true))
- return;
+ goto out;
ret = ksw_watch_get(&ctx->wp);
if (ret)
- return;
+ goto out;
ret = ksw_stack_prepare_watch(regs, ksw_get_config(), &watch_addr,
&watch_len);
@@ -185,17 +305,32 @@ static void ksw_stack_entry_handler(struct kprobe *p, struct pt_regs *regs,
ksw_watch_off(ctx->wp);
ctx->wp = NULL;
pr_err("failed to prepare watch target: %d\n", ret);
- return;
+ goto out;
}
ret = ksw_watch_on(ctx->wp, watch_addr, watch_len);
if (ret) {
pr_err("failed to watch on depth:%d addr:0x%lx len:%u %d\n",
ksw_get_config()->depth, watch_addr, watch_len, ret);
- return;
+ goto out;
}
ctx->sp = stack_pointer;
+#ifdef CONFIG_KSTACKWATCH_PROFILING
+ watched = true;
+#endif
+
+out:
+#ifdef CONFIG_KSTACKWATCH_PROFILING
+ if (watched)
+ measure_end(&m, &md->total_entry_with_watch_ns,
+ &md->total_entry_with_watch_cycles,
+ &md->entry_with_watch_count);
+ else
+ measure_end(&m, &md->total_entry_without_watch_ns,
+ &md->total_entry_without_watch_cycles,
+ &md->entry_without_watch_count);
+#endif
}
static void ksw_stack_exit_handler(struct fprobe *fp, unsigned long ip,
@@ -203,15 +338,36 @@ static void ksw_stack_exit_handler(struct fprobe *fp, unsigned long ip,
struct ftrace_regs *regs, void *data)
{
struct ksw_ctx *ctx = ¤t->ksw_ctx;
+#ifdef CONFIG_KSTACKWATCH_PROFILING
+ struct measure_ctx m;
+ struct measure_data *md = this_cpu_ptr(&measure_stats);
+ bool watched = false;
+ measure_start(&m);
+#endif
if (!ksw_stack_check_ctx(false))
- return;
+ goto out;
if (ctx->wp) {
ksw_watch_off(ctx->wp);
ctx->wp = NULL;
ctx->sp = 0;
+#ifdef CONFIG_KSTACKWATCH_PROFILING
+ watched = true;
+#endif
}
+
+out:
+#ifdef CONFIG_KSTACKWATCH_PROFILING
+ if (watched)
+ measure_end(&m, &md->total_exit_with_watch_ns,
+ &md->total_exit_with_watch_cycles,
+ &md->exit_with_watch_count);
+ else
+ measure_end(&m, &md->total_exit_without_watch_ns,
+ &md->total_exit_without_watch_cycles,
+ &md->exit_without_watch_count);
+#endif
}
int ksw_stack_init(void)
@@ -239,7 +395,9 @@ int ksw_stack_init(void)
unregister_kprobe(&entry_probe);
return ret;
}
-
+#ifdef CONFIG_KSTACKWATCH_PROFILING
+ reset_measure_stats();
+#endif
WRITE_ONCE(probe_generation, READ_ONCE(probe_generation) + 1);
WRITE_ONCE(probe_enable, true);
@@ -252,4 +410,7 @@ void ksw_stack_exit(void)
WRITE_ONCE(probe_generation, READ_ONCE(probe_generation) + 1);
unregister_fprobe(&exit_probe);
unregister_kprobe(&entry_probe);
+#ifdef CONFIG_KSTACKWATCH_PROFILING
+ show_measure_stats();
+#endif
}
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 18/27] arm64/hw_breakpoint: Add arch_reinstall_hw_breakpoint
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (16 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 17/27] mm/ksw: add KSTACKWATCH_PROFILING to measure probe cost Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 19/27] arm64/hwbp/ksw: integrate KStackWatch handler support Jinchao Wang
` (9 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Add arch_reinstall_hw_breakpoint() to restore a hardware breakpoint
in an atomic context. Unlike the full uninstall and reallocation
path, this lightweight function re-establishes an existing breakpoint
efficiently and safely.
This aligns ARM64 with x86 support for atomic breakpoint reinstalls.
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/hw_breakpoint.h | 1 +
arch/arm64/kernel/hw_breakpoint.c | 5 +++++
3 files changed, 7 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6663ffd23f25..fa35dfa2f5cc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -232,6 +232,7 @@ config ARM64
select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && \
HW_PERF_EVENTS && HAVE_PERF_EVENTS_NMI
select HAVE_HW_BREAKPOINT if PERF_EVENTS
+ select HAVE_REINSTALL_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IOREMAP_PROT
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_LIVEPATCH
diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h
index bd81cf17744a..6c98bbbc6aa6 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -119,6 +119,7 @@ extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
extern int arch_install_hw_breakpoint(struct perf_event *bp);
+extern int arch_reinstall_hw_breakpoint(struct perf_event *bp);
extern void arch_uninstall_hw_breakpoint(struct perf_event *bp);
extern void hw_breakpoint_pmu_read(struct perf_event *bp);
extern int hw_breakpoint_slots(int type);
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index ab76b36dce82..bd7d23d7893d 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -292,6 +292,11 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
return hw_breakpoint_control(bp, HW_BREAKPOINT_INSTALL);
}
+int arch_reinstall_hw_breakpoint(struct perf_event *bp)
+{
+ return hw_breakpoint_control(bp, HW_BREAKPOINT_RESTORE);
+}
+
void arch_uninstall_hw_breakpoint(struct perf_event *bp)
{
hw_breakpoint_control(bp, HW_BREAKPOINT_UNINSTALL);
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 19/27] arm64/hwbp/ksw: integrate KStackWatch handler support
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (17 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 18/27] arm64/hw_breakpoint: Add arch_reinstall_hw_breakpoint Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 20/27] mm/ksw: add self-debug helpers Jinchao Wang
` (8 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Add support for identifying KStackWatch watchpoints in the ARM64
hardware breakpoint handler. When a watchpoint belongs to KStackWatch,
the handler bypasses single-step re-arming to allow proper recovery.
Introduce is_ksw_watch_handler() to detect KStackWatch-managed
breakpoints and use it in watchpoint_report() under
CONFIG_KSTACKWATCH.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
arch/arm64/kernel/hw_breakpoint.c | 7 +++++++
include/linux/kstackwatch.h | 2 ++
mm/kstackwatch/watch.c | 8 ++++++++
3 files changed, 17 insertions(+)
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index bd7d23d7893d..7abcd988c5c2 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -14,6 +14,9 @@
#include <linux/errno.h>
#include <linux/hw_breakpoint.h>
#include <linux/kprobes.h>
+#ifdef CONFIG_KSTACKWATCH
+#include <linux/kstackwatch.h>
+#endif
#include <linux/perf_event.h>
#include <linux/ptrace.h>
#include <linux/smp.h>
@@ -738,6 +741,10 @@ static int watchpoint_report(struct perf_event *wp, unsigned long addr,
struct pt_regs *regs)
{
int step = is_default_overflow_handler(wp);
+#ifdef CONFIG_KSTACKWATCH
+ if (is_ksw_watch_handler(wp))
+ step = 1;
+#endif
struct arch_hw_breakpoint *info = counter_arch_bp(wp);
info->trigger = addr;
diff --git a/include/linux/kstackwatch.h b/include/linux/kstackwatch.h
index afedd9823de9..ce3882acc5dc 100644
--- a/include/linux/kstackwatch.h
+++ b/include/linux/kstackwatch.h
@@ -53,6 +53,8 @@ struct ksw_watchpoint {
struct llist_node node; // for atomic watch_on and off
struct list_head list; // for cpu online and offline
};
+
+bool is_ksw_watch_handler(struct perf_event *event);
int ksw_watch_init(void);
void ksw_watch_exit(void);
int ksw_watch_get(struct ksw_watchpoint **out_wp);
diff --git a/mm/kstackwatch/watch.c b/mm/kstackwatch/watch.c
index 99184f63d7e3..c2aa912bf4c4 100644
--- a/mm/kstackwatch/watch.c
+++ b/mm/kstackwatch/watch.c
@@ -64,6 +64,14 @@ static void ksw_watch_handler(struct perf_event *bp,
panic("Stack corruption detected");
}
+bool is_ksw_watch_handler(struct perf_event *event)
+{
+ perf_overflow_handler_t overflow_handler = event->overflow_handler;
+
+ if (unlikely(overflow_handler == ksw_watch_handler))
+ return true;
+ return false;
+}
static void ksw_watch_on_local_cpu(void *info)
{
struct ksw_watchpoint *wp = info;
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 20/27] mm/ksw: add self-debug helpers
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (18 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 19/27] arm64/hwbp/ksw: integrate KStackWatch handler support Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 21/27] mm/ksw: add test module Jinchao Wang
` (7 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Provide two debug helpers:
- ksw_watch_show(): print the current watch target address and length.
- ksw_watch_fire(): intentionally trigger the watchpoint immediately
by writing to the watched address, useful for testing HWBP behavior.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
include/linux/kstackwatch.h | 2 ++
mm/kstackwatch/watch.c | 34 ++++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+)
diff --git a/include/linux/kstackwatch.h b/include/linux/kstackwatch.h
index ce3882acc5dc..6daded932ba6 100644
--- a/include/linux/kstackwatch.h
+++ b/include/linux/kstackwatch.h
@@ -60,5 +60,7 @@ void ksw_watch_exit(void);
int ksw_watch_get(struct ksw_watchpoint **out_wp);
int ksw_watch_on(struct ksw_watchpoint *wp, ulong watch_addr, u16 watch_len);
int ksw_watch_off(struct ksw_watchpoint *wp);
+void ksw_watch_show(void);
+void ksw_watch_fire(void);
#endif /* _KSTACKWATCH_H */
diff --git a/mm/kstackwatch/watch.c b/mm/kstackwatch/watch.c
index c2aa912bf4c4..a298c31848a2 100644
--- a/mm/kstackwatch/watch.c
+++ b/mm/kstackwatch/watch.c
@@ -273,3 +273,37 @@ void ksw_watch_exit(void)
{
ksw_watch_free();
}
+
+/* self debug function */
+void ksw_watch_show(void)
+{
+ struct ksw_watchpoint *wp = current->ksw_ctx.wp;
+
+ if (!wp) {
+ pr_info("nothing to show\n");
+ return;
+ }
+
+ pr_info("watch target bp_addr: 0x%llx len:%llu\n", wp->attr.bp_addr,
+ wp->attr.bp_len);
+}
+EXPORT_SYMBOL_GPL(ksw_watch_show);
+
+/* self debug function */
+void ksw_watch_fire(void)
+{
+ struct ksw_watchpoint *wp;
+ char *ptr;
+
+ wp = current->ksw_ctx.wp;
+
+ if (!wp) {
+ pr_info("nothing to fire\n");
+ return;
+ }
+
+ ptr = (char *)wp->attr.bp_addr;
+ pr_warn("watch triggered immediately\n");
+ *ptr = 0x42; // This should trigger immediately for any bp_len
+}
+EXPORT_SYMBOL_GPL(ksw_watch_fire);
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 21/27] mm/ksw: add test module
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (19 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 20/27] mm/ksw: add self-debug helpers Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 22/27] mm/ksw: add stack overflow test Jinchao Wang
` (6 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Add a standalone test module for KStackWatch to validate functionality
in controlled scenarios.
The module exposes a simple interface via debugfs
(/sys/kernel/debug/kstackwatch/test), allowing specific test cases to
be triggered with commands such as:
echo test0 > /sys/kernel/debug/kstackwatch/test
To ensure predictable behavior during testing, the module is built with
optimizations disabled.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
show addr of buf and watch_addr of test case
---
include/linux/kstackwatch.h | 2 +
mm/kstackwatch/Kconfig | 10 +++
mm/kstackwatch/Makefile | 6 ++
mm/kstackwatch/kernel.c | 5 ++
mm/kstackwatch/test.c | 121 ++++++++++++++++++++++++++++++++++++
5 files changed, 144 insertions(+)
create mode 100644 mm/kstackwatch/test.c
diff --git a/include/linux/kstackwatch.h b/include/linux/kstackwatch.h
index 6daded932ba6..7711efe85240 100644
--- a/include/linux/kstackwatch.h
+++ b/include/linux/kstackwatch.h
@@ -40,6 +40,8 @@ struct ksw_config {
// singleton, only modified in kernel.c
const struct ksw_config *ksw_get_config(void);
+struct dentry *ksw_get_dbgdir(void);
+
/* stack management */
int ksw_stack_init(void);
diff --git a/mm/kstackwatch/Kconfig b/mm/kstackwatch/Kconfig
index 3c9385a15c33..343b492ddbd3 100644
--- a/mm/kstackwatch/Kconfig
+++ b/mm/kstackwatch/Kconfig
@@ -22,3 +22,13 @@ config KSTACKWATCH_PROFILING
stopping. Useful for performance tuning, not for production use.
If unsure, say N.
+
+config KSTACKWATCH_TEST
+ tristate "KStackWatch Test Module"
+ depends on KSTACKWATCH
+ help
+ This module provides controlled stack corruption scenarios to verify
+ the functionality of KStackWatch. It is useful for development and
+ validation of KStackWatch mechanism.
+
+ If unsure, say N.
diff --git a/mm/kstackwatch/Makefile b/mm/kstackwatch/Makefile
index c99c621eac02..a2c7cd647f69 100644
--- a/mm/kstackwatch/Makefile
+++ b/mm/kstackwatch/Makefile
@@ -1,2 +1,8 @@
obj-$(CONFIG_KSTACKWATCH) += kstackwatch.o
kstackwatch-y := kernel.o stack.o watch.o
+
+obj-$(CONFIG_KSTACKWATCH_TEST) += kstackwatch_test.o
+kstackwatch_test-y := test.o
+CFLAGS_test.o := -fno-inline \
+ -fno-optimize-sibling-calls \
+ -fno-pic -fno-pie -O0 -Og
diff --git a/mm/kstackwatch/kernel.c b/mm/kstackwatch/kernel.c
index a0e676e60692..b25cf6830b15 100644
--- a/mm/kstackwatch/kernel.c
+++ b/mm/kstackwatch/kernel.c
@@ -235,6 +235,11 @@ const struct ksw_config *ksw_get_config(void)
return ksw_config;
}
+struct dentry *ksw_get_dbgdir(void)
+{
+ return dbgfs_dir;
+}
+
static int __init kstackwatch_init(void)
{
int ret = 0;
diff --git a/mm/kstackwatch/test.c b/mm/kstackwatch/test.c
new file mode 100644
index 000000000000..2969564b1a00
--- /dev/null
+++ b/mm/kstackwatch/test.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/debugfs.h>
+#include <linux/delay.h>
+#include <linux/kthread.h>
+#include <linux/kstackwatch.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/prandom.h>
+#include <linux/printk.h>
+#include <linux/random.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
+
+static struct dentry *test_file;
+
+#define BUFFER_SIZE 32
+
+static void test_watch_fire(void)
+{
+ u64 buffer[BUFFER_SIZE] = { 0 };
+
+ pr_info("entry of %s\n", __func__);
+ ksw_watch_show();
+ pr_info("buf: 0x%px\n", buffer);
+
+ ksw_watch_fire();
+
+ barrier_data(buffer);
+ pr_info("exit of %s\n", __func__);
+}
+
+static ssize_t test_dbgfs_write(struct file *file, const char __user *buffer,
+ size_t count, loff_t *pos)
+{
+ char cmd[256];
+ int test_num;
+
+ if (count >= sizeof(cmd))
+ return -EINVAL;
+
+ if (copy_from_user(cmd, buffer, count))
+ return -EFAULT;
+
+ cmd[count] = '\0';
+ strim(cmd);
+
+ pr_info("received command: %s\n", cmd);
+
+ if (sscanf(cmd, "test%d", &test_num) == 1) {
+ switch (test_num) {
+ case 0:
+ test_watch_fire();
+ break;
+ default:
+ pr_err("Unknown test number %d\n", test_num);
+ return -EINVAL;
+ }
+ } else {
+ pr_err("invalid command format. Use 'testN'.\n");
+ return -EINVAL;
+ }
+
+ return count;
+}
+
+static ssize_t test_dbgfs_read(struct file *file, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ static const char usage[] =
+ "KStackWatch Simplified Test Module\n"
+ "============ usage ===============\n"
+ "Usage:\n"
+ "echo test{i} > /sys/kernel/debug/kstackwatch/test\n"
+ " test0 - test watch fire\n";
+
+ return simple_read_from_buffer(buffer, count, ppos, usage,
+ strlen(usage));
+}
+
+static const struct file_operations test_dbgfs_fops = {
+ .owner = THIS_MODULE,
+ .read = test_dbgfs_read,
+ .write = test_dbgfs_write,
+ .llseek = noop_llseek,
+};
+
+static int __init kstackwatch_test_init(void)
+{
+ struct dentry *ksw_dir = ksw_get_dbgdir();
+
+ if (!ksw_dir) {
+ pr_err("kstackwatch must be loaded first\n");
+ return -ENODEV;
+ }
+
+ test_file = debugfs_create_file("test", 0600, ksw_dir, NULL,
+ &test_dbgfs_fops);
+ if (!test_file) {
+ pr_err("Failed to create debugfs test file\n");
+ return -ENOMEM;
+ }
+
+ pr_info("module loaded\n");
+ return 0;
+}
+
+static void __exit kstackwatch_test_exit(void)
+{
+ debugfs_remove(test_file);
+ pr_info("module unloaded\n");
+}
+
+module_init(kstackwatch_test_init);
+module_exit(kstackwatch_test_exit);
+
+MODULE_AUTHOR("Jinchao Wang");
+MODULE_DESCRIPTION("KStackWatch Test Module");
+MODULE_LICENSE("GPL");
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 22/27] mm/ksw: add stack overflow test
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (20 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 21/27] mm/ksw: add test module Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 23/27] mm/ksw: add recursive depth test Jinchao Wang
` (5 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Extend the test module with a new test case (test1) that intentionally
overflows a local u64 buffer to corrupt the stack canary.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
show addr of buf and watch_addr of test case
---
mm/kstackwatch/test.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/mm/kstackwatch/test.c b/mm/kstackwatch/test.c
index 2969564b1a00..b3f363d9e1e8 100644
--- a/mm/kstackwatch/test.c
+++ b/mm/kstackwatch/test.c
@@ -32,6 +32,22 @@ static void test_watch_fire(void)
pr_info("exit of %s\n", __func__);
}
+static void test_canary_overflow(void)
+{
+ u64 buffer[BUFFER_SIZE];
+
+ pr_info("entry of %s\n", __func__);
+ ksw_watch_show();
+ pr_info("buf: 0x%px\n", buffer);
+
+ /* intentionally overflow */
+ for (int i = BUFFER_SIZE; i < BUFFER_SIZE + 10; i++)
+ buffer[i] = 0xdeadbeefdeadbeef;
+ barrier_data(buffer);
+
+ pr_info("exit of %s\n", __func__);
+}
+
static ssize_t test_dbgfs_write(struct file *file, const char __user *buffer,
size_t count, loff_t *pos)
{
@@ -54,6 +70,9 @@ static ssize_t test_dbgfs_write(struct file *file, const char __user *buffer,
case 0:
test_watch_fire();
break;
+ case 1:
+ test_canary_overflow();
+ break;
default:
pr_err("Unknown test number %d\n", test_num);
return -EINVAL;
@@ -74,7 +93,8 @@ static ssize_t test_dbgfs_read(struct file *file, char __user *buffer,
"============ usage ===============\n"
"Usage:\n"
"echo test{i} > /sys/kernel/debug/kstackwatch/test\n"
- " test0 - test watch fire\n";
+ " test0 - test watch fire\n"
+ " test1 - test canary overflow\n";
return simple_read_from_buffer(buffer, count, ppos, usage,
strlen(usage));
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 23/27] mm/ksw: add recursive depth test
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (21 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 22/27] mm/ksw: add stack overflow test Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 24/27] mm/ksw: add multi-thread corruption test cases Jinchao Wang
` (4 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Introduce a test that performs stack writes in recursive calls to exercise
stack watch at a specific recursion depth.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
mm/kstackwatch/test.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/mm/kstackwatch/test.c b/mm/kstackwatch/test.c
index b3f363d9e1e8..1d196f72faba 100644
--- a/mm/kstackwatch/test.c
+++ b/mm/kstackwatch/test.c
@@ -17,6 +17,7 @@
static struct dentry *test_file;
#define BUFFER_SIZE 32
+#define MAX_DEPTH 6
static void test_watch_fire(void)
{
@@ -48,6 +49,21 @@ static void test_canary_overflow(void)
pr_info("exit of %s\n", __func__);
}
+static void test_recursive_depth(int depth)
+{
+ u64 buffer[BUFFER_SIZE];
+
+ pr_info("entry of %s depth:%d\n", __func__, depth);
+
+ if (depth < MAX_DEPTH)
+ test_recursive_depth(depth + 1);
+
+ buffer[0] = depth;
+ barrier_data(buffer);
+
+ pr_info("exit of %s depth:%d\n", __func__, depth);
+}
+
static ssize_t test_dbgfs_write(struct file *file, const char __user *buffer,
size_t count, loff_t *pos)
{
@@ -73,6 +89,9 @@ static ssize_t test_dbgfs_write(struct file *file, const char __user *buffer,
case 1:
test_canary_overflow();
break;
+ case 2:
+ test_recursive_depth(0);
+ break;
default:
pr_err("Unknown test number %d\n", test_num);
return -EINVAL;
@@ -94,7 +113,8 @@ static ssize_t test_dbgfs_read(struct file *file, char __user *buffer,
"Usage:\n"
"echo test{i} > /sys/kernel/debug/kstackwatch/test\n"
" test0 - test watch fire\n"
- " test1 - test canary overflow\n";
+ " test1 - test canary overflow\n"
+ " test2 - test recursive func\n";
return simple_read_from_buffer(buffer, count, ppos, usage,
strlen(usage));
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 24/27] mm/ksw: add multi-thread corruption test cases
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (22 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 23/27] mm/ksw: add recursive depth test Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 25/27] tools/ksw: add arch-specific test script Jinchao Wang
` (3 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
These tests share a common structure and are grouped together.
- buggy():
exposes the stack address to corrupting(); may omit waiting
- corrupting():
reads the exposed pointer and modifies memory;
if buggy() omits waiting, victim()'s buffer is corrupted
- victim():
initializes a local buffer and later verifies it;
reports an error if the buffer was unexpectedly modified
buggy() and victim() run in worker() thread, with similar stack frame sizes
to simplify testing. By adjusting fence_size in corrupting(), the test can
trigger either silent corruption or overflow across threads.
- Test 3: one worker, 20 loops, silent corruption
- Test 4: 20 workers, one loop each, silent corruption
- Test 5: one worker, one loop, overflow corruption
Test 4 also exercises multiple watchpoint instances.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
mm/ksw: add KSTACKWATCH_PROFILING to measure probe cost
Introduce CONFIG_KSTACKWATCH_PROFILING to enable optional runtime
profiling in KStackWatch. When enabled, it records entry and exit
probe latencies (in nanoseconds and CPU cycles) and reports averaged
statistics at module exit.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
mm/kstackwatch/test.c | 186 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 185 insertions(+), 1 deletion(-)
diff --git a/mm/kstackwatch/test.c b/mm/kstackwatch/test.c
index 1d196f72faba..4bd0e5026fd9 100644
--- a/mm/kstackwatch/test.c
+++ b/mm/kstackwatch/test.c
@@ -19,6 +19,20 @@ static struct dentry *test_file;
#define BUFFER_SIZE 32
#define MAX_DEPTH 6
+struct work_node {
+ ulong *ptr;
+ u64 start_ns;
+ struct completion done;
+ struct list_head list;
+};
+
+static DECLARE_COMPLETION(work_res);
+static DEFINE_MUTEX(work_mutex);
+static LIST_HEAD(work_list);
+
+static int global_fence_size;
+static int global_loop_count;
+
static void test_watch_fire(void)
{
u64 buffer[BUFFER_SIZE] = { 0 };
@@ -64,6 +78,164 @@ static void test_recursive_depth(int depth)
pr_info("exit of %s depth:%d\n", __func__, depth);
}
+static struct work_node *test_mthread_buggy(int thread_id, int seq_id)
+{
+ ulong buf[BUFFER_SIZE];
+ struct work_node *node;
+ bool trigger;
+
+ node = kmalloc(sizeof(*node), GFP_KERNEL);
+ if (!node)
+ return NULL;
+
+ init_completion(&node->done);
+ node->ptr = buf;
+ node->start_ns = ktime_get_ns();
+ mutex_lock(&work_mutex);
+ list_add(&node->list, &work_list);
+ mutex_unlock(&work_mutex);
+ complete(&work_res);
+
+ trigger = (get_random_u32() % 100) < 10;
+ if (trigger)
+ return node; /* let the caller handle cleanup */
+
+ wait_for_completion(&node->done);
+ kfree(node);
+ return NULL;
+}
+
+#define CORRUPTING_MINIOR_WAIT_NS (100000)
+#define VICTIM_MINIOR_WAIT_NS (300000)
+
+static inline void silent_wait_us(u64 start_ns, u64 min_wait_us)
+{
+ u64 diff_ns, remain_us;
+
+ diff_ns = ktime_get_ns() - start_ns;
+ if (diff_ns < min_wait_us * 1000ULL) {
+ remain_us = min_wait_us - (diff_ns >> 10);
+ usleep_range(remain_us, remain_us + 200);
+ }
+}
+
+static void test_mthread_victim(int thread_id, int seq_id, u64 start_ns)
+{
+ ulong buf[BUFFER_SIZE];
+
+ for (int j = 0; j < BUFFER_SIZE; j++)
+ buf[j] = 0xdeadbeef + seq_id;
+ if (start_ns)
+ silent_wait_us(start_ns, VICTIM_MINIOR_WAIT_NS);
+
+ for (int j = 0; j < BUFFER_SIZE; j++) {
+ if (buf[j] != (0xdeadbeef + seq_id)) {
+ pr_warn("victim[%d][%d]: unhappy buf[%d]=0x%lx\n",
+ thread_id, seq_id, j, buf[j]);
+ return;
+ }
+ }
+
+ pr_info("victim[%d][%d]: happy\n", thread_id, seq_id);
+}
+
+static int test_mthread_corrupting(void *data)
+{
+ struct work_node *node;
+ int fence_size;
+
+ while (!kthread_should_stop()) {
+ if (!wait_for_completion_timeout(&work_res, HZ))
+ continue;
+ while (true) {
+ mutex_lock(&work_mutex);
+ node = list_first_entry_or_null(&work_list,
+ struct work_node, list);
+ if (node)
+ list_del(&node->list);
+ mutex_unlock(&work_mutex);
+
+ if (!node)
+ break; /* no more nodes, exit inner loop */
+ silent_wait_us(node->start_ns,
+ CORRUPTING_MINIOR_WAIT_NS);
+
+ fence_size = READ_ONCE(global_fence_size);
+ for (int i = fence_size; i < BUFFER_SIZE - fence_size;
+ i++)
+ node->ptr[i] = 0xabcdabcd;
+
+ complete(&node->done);
+ }
+ }
+
+ return 0;
+}
+
+static int test_mthread_worker(void *data)
+{
+ int thread_id = (long)data;
+ int loop_count;
+ struct work_node *node;
+
+ loop_count = READ_ONCE(global_loop_count);
+
+ for (int i = 0; i < loop_count; i++) {
+ node = test_mthread_buggy(thread_id, i);
+
+ if (node)
+ test_mthread_victim(thread_id, i, node->start_ns);
+ else
+ test_mthread_victim(thread_id, i, 0);
+ if (node) {
+ wait_for_completion(&node->done);
+ kfree(node);
+ }
+ }
+ return 0;
+}
+
+static void test_mthread_case(int num_workers, int loop_count, int fence_size)
+{
+ static struct task_struct *corrupting;
+ static struct task_struct **workers;
+
+ WRITE_ONCE(global_loop_count, loop_count);
+ WRITE_ONCE(global_fence_size, fence_size);
+
+ init_completion(&work_res);
+ workers = kmalloc_array(num_workers, sizeof(void *), GFP_KERNEL);
+ memset(workers, 0, sizeof(struct task_struct *) * num_workers);
+
+ corrupting = kthread_run(test_mthread_corrupting, NULL, "corrupting");
+ if (IS_ERR(corrupting)) {
+ pr_err("failed to create corrupting thread\n");
+ return;
+ }
+
+ for (ulong i = 0; i < num_workers; i++) {
+ workers[i] = kthread_run(test_mthread_worker, (void *)i,
+ "worker_%ld", i);
+ if (IS_ERR(workers[i])) {
+ pr_err("failto create worker thread %ld", i);
+ workers[i] = NULL;
+ }
+ }
+
+ for (ulong i = 0; i < num_workers; i++) {
+ if (workers[i] && workers[i]->__state != TASK_DEAD) {
+ usleep_range(1000, 2000);
+ i--;
+ }
+ }
+ kfree(workers);
+
+ if (corrupting && !IS_ERR(corrupting)) {
+ kthread_stop(corrupting);
+ corrupting = NULL;
+ }
+}
+
static ssize_t test_dbgfs_write(struct file *file, const char __user *buffer,
size_t count, loff_t *pos)
{
@@ -92,6 +264,15 @@ static ssize_t test_dbgfs_write(struct file *file, const char __user *buffer,
case 2:
test_recursive_depth(0);
break;
+ case 3:
+ test_mthread_case(1, 20, BUFFER_SIZE / 4);
+ break;
+ case 4:
+ test_mthread_case(200, 1, BUFFER_SIZE / 4);
+ break;
+ case 5:
+ test_mthread_case(1, 1, -3);
+ break;
default:
pr_err("Unknown test number %d\n", test_num);
return -EINVAL;
@@ -114,7 +295,10 @@ static ssize_t test_dbgfs_read(struct file *file, char __user *buffer,
"echo test{i} > /sys/kernel/debug/kstackwatch/test\n"
" test0 - test watch fire\n"
" test1 - test canary overflow\n"
- " test2 - test recursive func\n";
+ " test2 - test recursive func\n"
+ " test3 - test silent corruption\n"
+ " test4 - test multiple silent corruption\n"
+ " test5 - test prologue corruption\n";
return simple_read_from_buffer(buffer, count, ppos, usage,
strlen(usage));
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 25/27] tools/ksw: add arch-specific test script
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (23 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 24/27] mm/ksw: add multi-thread corruption test cases Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 26/27] docs: add KStackWatch document Jinchao Wang
` (2 subsequent siblings)
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Add a shell script under tools/kstackwatch to run self-tests such as
canary overflow and recursive depth. The script supports both x86_64
and arm64, selecting parameters automatically based on uname -m.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
tools/kstackwatch/kstackwatch_test.sh | 85 +++++++++++++++++++++++++++
1 file changed, 85 insertions(+)
create mode 100755 tools/kstackwatch/kstackwatch_test.sh
diff --git a/tools/kstackwatch/kstackwatch_test.sh b/tools/kstackwatch/kstackwatch_test.sh
new file mode 100755
index 000000000000..6e83397d3213
--- /dev/null
+++ b/tools/kstackwatch/kstackwatch_test.sh
@@ -0,0 +1,85 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+echo "IMPORTANT: Before running, make sure you have updated the config values!"
+
+usage() {
+ echo "Usage: $0 [0-5]"
+ echo " 0 - test watch fire"
+ echo " 1 - test canary overflow"
+ echo " 2 - test recursive depth"
+ echo " 3 - test silent corruption"
+ echo " 4 - test multi-threaded silent corruption"
+ echo " 5 - test multi-threaded overflow"
+}
+
+run_test_x86_64() {
+ local test_num=$1
+ case "$test_num" in
+ 0) echo fn=test_watch_fire fo=0x29 ac=1 >/sys/kernel/debug/kstackwatch/config
+ echo test0 > /sys/kernel/debug/kstackwatch/test
+ ;;
+ 1) echo fn=test_canary_overflow fo=0x14 >/sys/kernel/debug/kstackwatch/config
+ echo test1 >/sys/kernel/debug/kstackwatch/test
+ ;;
+ 2) echo fn=test_recursive_depth fo=0x2f dp=3 wl=8 so=0 >/sys/kernel/debug/kstackwatch/config
+ echo test2 >/sys/kernel/debug/kstackwatch/test
+ ;;
+ 3) echo fn=test_mthread_victim fo=0x4c so=64 wl=8 >/sys/kernel/debug/kstackwatch/config
+ echo test3 >/sys/kernel/debug/kstackwatch/test
+ ;;
+ 4) echo fn=test_mthread_victim fo=0x4c so=64 wl=8 >/sys/kernel/debug/kstackwatch/config
+ echo test4 >/sys/kernel/debug/kstackwatch/test
+ ;;
+ 5) echo fn=test_mthread_buggy fo=0x16 so=0x100 wl=8 >/sys/kernel/debug/kstackwatch/config
+ echo test5 >/sys/kernel/debug/kstackwatch/test
+ ;;
+ *) usage
+ exit 1 ;;
+ esac
+ # Reset watch after test
+ echo >/sys/kernel/debug/kstackwatch/config
+}
+
+run_test_arm64() {
+ local test_num=$1
+ case "$test_num" in
+ 0) echo fn=test_watch_fire fo=0x50 ac=1 >/sys/kernel/debug/kstackwatch/config
+ echo test0 > /sys/kernel/debug/kstackwatch/test
+ ;;
+ 1) echo fn=test_canary_overflow fo=0x20 so=264 >/sys/kernel/debug/kstackwatch/config
+ echo test1 >/sys/kernel/debug/kstackwatch/test
+ ;;
+ 2) echo fn=test_recursive_depth fo=0x34 dp=3 wl=8 so=8 >/sys/kernel/debug/kstackwatch/config
+ echo test2 >/sys/kernel/debug/kstackwatch/test
+ ;;
+ 3) echo fn=test_mthread_victim fo=0x6c so=0x48 wl=8 >/sys/kernel/debug/kstackwatch/config
+ echo test3 >/sys/kernel/debug/kstackwatch/test
+ ;;
+ 4) echo fn=test_mthread_victim fo=0x6c so=0x48 wl=8 >/sys/kernel/debug/kstackwatch/config
+ echo test4 >/sys/kernel/debug/kstackwatch/test
+ ;;
+ 5) echo fn=test_mthread_buggy fo=0x20 so=264 >/sys/kernel/debug/kstackwatch/config
+ echo test5 >/sys/kernel/debug/kstackwatch/test
+ ;;
+ *) usage
+ exit 1 ;;
+ esac
+ # Reset watch after test
+ echo >/sys/kernel/debug/kstackwatch/config
+}
+
+# Check root and module
+[ "$EUID" -ne 0 ] && echo "Run as root" && exit 1
+for f in /sys/kernel/debug/kstackwatch/config /sys/kernel/debug/kstackwatch/test; do
+ [ ! -f "$f" ] && echo "$f not found" && exit 1
+done
+
+# Run
+[ -z "$1" ] && { usage; exit 0; }
+
+arch=$(uname -m)
+case "$arch" in
+ x86_64|aarch64) run_test_${arch} "$1" ;;
+ *) echo "Unsupported architecture: $arch" && exit 1 ;;
+esac
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 26/27] docs: add KStackWatch document
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (24 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 25/27] tools/ksw: add arch-specific test script Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 27/27] MAINTAINERS: add entry for KStackWatch Jinchao Wang
2025-11-10 17:33 ` [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Matthew Wilcox
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Add documentation for KStackWatch under Documentation/.
It provides an overview, main features, usage details, configuration
parameters, and example scenarios with test cases. The document also
explains how to locate function offsets and interpret logs.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kstackwatch.rst | 377 ++++++++++++++++++++++++
2 files changed, 378 insertions(+)
create mode 100644 Documentation/dev-tools/kstackwatch.rst
diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 4b8425e348ab..272ae9b76863 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -32,6 +32,7 @@ Documentation/process/debugging/index.rst
lkmm/index
kfence
kselftest
+ kstackwatch
kunit/index
ktap
checkuapi
diff --git a/Documentation/dev-tools/kstackwatch.rst b/Documentation/dev-tools/kstackwatch.rst
new file mode 100644
index 000000000000..9b710b90e512
--- /dev/null
+++ b/Documentation/dev-tools/kstackwatch.rst
@@ -0,0 +1,377 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================================
+Kernel Stack Watch (KStackWatch)
+=================================
+
+Overview
+========
+
+KStackWatch is a lightweight debugging tool designed to detect kernel stack
+corruption in real time. It installs a hardware breakpoint (watchpoint) at a
+function's specified offset using *kprobe.post_handler* and removes it in
+*fprobe.exit_handler*. This covers the full execution window and reports
+corruption immediately with time, location, and call stack.
+
+Main features:
+
+* Immediate and precise stack corruption detection
+* Support for multiple concurrent watchpoints with configurable limits
+* Lockless design, usable in any context
+* Depth filter for recursive calls
+* Low overhead of memory and CPU
+* Flexible debugfs configuration with key=val syntax
+* Architecture support: x86_64 and arm64
+* Auto-canary detection to simplify configuration
+
+Performance Impact
+==================
+
+Runtime overhead was measured on Intel Core Ultra 5 125H @ 3 GHz running
+kernel 6.17, using test4:
+
++------------------------+-------------+---------+
+| Type | Time (ns) | Cycles |
++========================+=============+=========+
+| entry with watch | 10892 | 32620 |
++------------------------+-------------+---------+
+| entry without watch | 159 | 466 |
++------------------------+-------------+---------+
+| exit with watch | 12541 | 37556 |
++------------------------+-------------+---------+
+| exit without watch | 124 | 369 |
++------------------------+-------------+---------+
+
+From a broader perspective, the overall comparison is as follows:
+
++----------------------------+----------------------+-------------------------+
+| Mode | CPU Overhead (add) | Memory Overhead (add) |
++============================+======================+=========================+
+| Compiled but not enabled | None | ~20 B per task |
++----------------------------+----------------------+-------------------------+
+| Enabled, no function hit | None | ~few hundred B |
++----------------------------+----------------------+-------------------------+
+| Func hit, HWBP not toggled | ~140 ns per call | None |
++----------------------------+----------------------+-------------------------+
+| Func hit, HWBP toggled | ~11–12 µs per call | None |
++----------------------------+----------------------+-------------------------+
+
+The overhead is minimal, making KStackWatch suitable for production
+environments where stack corruption is suspected but kernel rebuilds are not
+feasible.
+
+Kconfig Options
+===============
+
+The following configuration options control KStackWatch builds:
+
+- CONFIG_KSTACKWATCH
+
+ Builds the kernel with KStackWatch enabled.
+
+- CONFIG_KSTACKWATCH_PROFILING
+
+ Measures probe runtime overhead for performance analysis and tuning.
+
+- CONFIG_KSTACKWATCH_TEST
+
+ Builds a test module to validate KStackWatch functionality.
+
+Usage
+=====
+
+KStackWatch provides optional configurations for different use cases.
+CONFIG_KSTACKWATCH enables real-time stack corruption detection using hardware breakpoints and probes.
+CONFIG_KSTACKWATCH_PROFILING allows measurement of probe latency and overhead for performance analysis.
+CONFIG_KSTACKWATCH_TEST builds a test module for validating KStackWatch functionality under controlled conditions.
+
+KStackWatch is configured through */sys/kernel/debug/kstackwatch/config* using a
+key=value format. Both long and short forms are supported. Writing an empty
+string disables the watch.
+
+.. code-block:: bash
+
+ # long form
+ echo func_name=? func_offset=? ... > /sys/kernel/debug/kstackwatch/config
+
+ # short form
+ echo fn=? fo=? ... > /sys/kernel/debug/kstackwatch/config
+
+ # disable
+ echo > /sys/kernel/debug/kstackwatch/config
+
+The func_name and the func_offset where the watchpoint should be placed must be
+known. This information can be obtained from *objdump* or other tools.
+
+Required parameters
+--------------------
+
++--------------+--------+-----------------------------------------+
+| Parameter | Short | Description |
++==============+========+=========================================+
+| func_name | fn | Name of the target function |
++--------------+--------+-----------------------------------------+
+| func_offset | fo | Instruction pointer offset |
++--------------+--------+-----------------------------------------+
+
+Optional parameters
+--------------------
+
+Default 0 and can be omitted.
+Both decimal and hexadecimal are supported.
+
++--------------+--------+------------------------------------------------+
+| Parameter | Short | Description |
++==============+========+================================================+
+| auto_canary | ac | Automatically calculated canary sp_offset |
++--------------+--------+------------------------------------------------+
+| depth | dp | Recursion depth filter |
++--------------+--------+------------------------------------------------+
+| | | Maximum number of concurrent watchpoints |
+| max_watch | mw | (default 0, capped by available hardware |
+| | | breakpoints) |
++--------------+--------+------------------------------------------------+
+| panic_hit | ph | Panic system on watchpoint hit (default 0) |
++--------------+--------+------------------------------------------------+
+| sp_offset | so | Watching addr offset from stack pointer |
++--------------+--------+------------------------------------------------+
+| watch_len | wl | Watch length in bytes (1, 2, 4, 8 onX86_64) |
++--------------+--------+------------------------------------------------+
+
+
+Workflow Example
+================
+
+Silent corruption
+-----------------
+
+Consider *test3* in *kstackwatch_test.sh*. Run it directly:
+
+.. code-block:: bash
+
+ echo test3 >/sys/kernel/debug/kstackwatch/test
+
+Sometimes, *test_mthread_victim()* may report as unhappy:
+
+.. code-block:: bash
+
+ [ 7.807082] kstackwatch_test: victim[0][11]: unhappy buf[8]=0xabcdabcd
+
+Its source code is:
+
+.. code-block:: c
+
+ static void test_mthread_victim(int thread_id, int seq_id, u64 start_ns)
+ {
+ ulong buf[BUFFER_SIZE];
+
+ for (int j = 0; j < BUFFER_SIZE; j++)
+ buf[j] = 0xdeadbeef + seq_id;
+
+ if (start_ns)
+ silent_wait_us(start_ns, VICTIM_MINIOR_WAIT_NS);
+
+ for (int j = 0; j < BUFFER_SIZE; j++) {
+ if (buf[j] != (0xdeadbeef + seq_id)) {
+ pr_warn("victim[%d][%d]: unhappy buf[%d]=0x%lx\n",
+ thread_id, seq_id, j, buf[j]);
+ return;
+ }
+ }
+
+ pr_info("victim[%d][%d]: happy\n", thread_id, seq_id);
+ }
+
+From the source code, the report indicates buf[8] was unexpectedly modified,
+a case of silent corruption.
+
+Configuration
+-------------
+
+Since buf[8] is the corrupted variable, the following configuration shows
+how to use KStackWatch to detect its corruption.
+
+func_name
+~~~~~~~~~~~
+
+As seen, buf[8] is initialized and modified in *test_mthread_victim*\(\) ,
+which sets *func_name*.
+
+func_offset & sp_offset
+~~~~~~~~~~~~~~~~~~~~~~~~~
+The watchpoint should be set after the assignment and as close as
+possible, which sets *func_offset*.
+
+The watchpoint should be set to watch buf[8], which sets *sp_offset*.
+
+Use the objdump output to disassemble the function:
+
+.. code-block:: bash
+
+ objdump -S --disassemble=test_mthread_victim vmlinux
+
+A shortened output is:
+
+.. code-block:: text
+
+ static void test_mthread_victim(int thread_id, int seq_id, u64 start_ns)
+ {
+ ffffffff815cb4e0: e8 5b 9b ca ff call ffffffff81275040 <__fentry__>
+ ffffffff815cb4e5: 55 push %rbp
+ ffffffff815cb4e6: 53 push %rbx
+ ffffffff815cb4e7: 48 81 ec 08 01 00 00 sub $0x108,%rsp
+ ffffffff815cb4ee: 89 fd mov %edi,%ebp
+ ffffffff815cb4f0: 89 f3 mov %esi,%ebx
+ ffffffff815cb4f2: 49 89 d0 mov %rdx,%r8
+ ffffffff815cb4f5: 65 48 8b 05 0b cb 80 mov %gs:0x280cb0b(%rip),%rax # ffffffff83dd8008 <__stack_chk_guard>
+ ffffffff815cb4fc: 02
+ ffffffff815cb4fd: 48 89 84 24 00 01 00 mov %rax,0x100(%rsp)
+ ffffffff815cb504: 00
+ ffffffff815cb505: 31 c0 xor %eax,%eax
+ ulong buf[BUFFER_SIZE];
+ ffffffff815cb507: 48 89 e2 mov %rsp,%rdx
+ ffffffff815cb50a: b9 20 00 00 00 mov $0x20,%ecx
+ ffffffff815cb50f: 48 89 d7 mov %rdx,%rdi
+ ffffffff815cb512: f3 48 ab rep stos %rax,%es:(%rdi)
+
+ for (int j = 0; j < BUFFER_SIZE; j++)
+ ffffffff815cb515: eb 10 jmp ffffffff815cb527 <test_mthread_victim+0x47>
+ buf[j] = 0xdeadbeef + seq_id;
+ ffffffff815cb517: 8d 93 ef be ad de lea -0x21524111(%rbx),%edx
+ ffffffff815cb51d: 48 63 c8 movslq %eax,%rcx
+ ffffffff815cb520: 48 89 14 cc mov %rdx,(%rsp,%rcx,8)
+ ffffffff815cb524: 83 c0 01 add $0x1,%eax
+ ffffffff815cb527: 83 f8 1f cmp $0x1f,%eax
+ ffffffff815cb52a: 7e eb jle ffffffff815cb517 <test_mthread_victim+0x37>
+ if (start_ns)
+ ffffffff815cb52c: 4d 85 c0 test %r8,%r8
+ ffffffff815cb52f: 75 21 jne ffffffff815cb552 <test_mthread_victim+0x72>
+ silent_wait_us(start_ns, VICTIM_MINIOR_WAIT_NS);
+ ...
+ ffffffff815cb571: 48 8b 84 24 00 01 00 mov 0x100(%rsp),%rax
+ ffffffff815cb579: 65 48 2b 05 87 ca 80 sub %gs:0x280ca87(%rip),%rax # ffffffff83dd8008 <__stack_chk_guard>
+ ...
+ ffffffff815cb5a1: eb ce jmp ffffffff815cb571 <test_mthread_victim+0x91>
+ }
+ ffffffff815cb5a3: e8 d8 86 f1 00 call ffffffff824e3c80 <__stack_chk_fail>
+
+
+func_offset
+^^^^^^^^^^^
+
+The function begins at ffffffff815cb4e0. The *buf* array is initialized in a loop.
+The instruction storing values into the array is at ffffffff815cb520, and the
+first instruction after the loop is at ffffffff815cb52c.
+
+Because KStackWatch uses *kprobe.post_handler*, the watchpoint can be
+set right after ffffffff815cb520. However, this will cause false positive
+because the watchpoint is active before buf[8] is assigned.
+
+An alternative is to place the watchpoint at ffffffff815cb52c, right
+after the loop. This avoids false positives but leaves a small window
+for false negatives.
+
+In this document, ffffffff815cb52c is chosen for cleaner logs. If false
+negatives are suspected, repeat the test to catch the corruption.
+
+The required offset is calculated from the function start:
+
+*func_offset* is 0x4c (ffffffff815cb52c - ffffffff815cb4e0).
+
+sp_offset
+^^^^^^^^^^^
+
+From the disassembly, the buf array is at the top of the stack,
+meaning buf == rsp. Therefore, buf[8] sits at rsp + 8 * sizeof(ulong) =
+rsp + 64. Thus, *sp_offset* is 64.
+
+Other parameters
+~~~~~~~~~~~~~~~~~~
+
+* *depth* is 0, as test_mthread_victim is not recursive
+* *max_watch* is 0 to use all available hwbps
+* *watch_len* is 8, the size of a ulong on x86_64
+
+Parameters with a value of 0 can be omitted as defaults.
+
+Configure the watch:
+
+.. code-block:: bash
+
+ echo "fn=test_mthread_victim fo=0x4c so=64 wl=8" > /sys/kernel/debug/kstackwatch/config
+
+Now rerun the test:
+
+.. code-block:: bash
+
+ echo test3 >/sys/kernel/debug/kstackwatch/test
+
+The dmesg log shows:
+
+.. code-block:: text
+
+ [ 7.607074] kstackwatch: ========== KStackWatch: Caught stack corruption =======
+ [ 7.607077] kstackwatch: config fn=test_mthread_victim fo=0x4c so=64 wl=8
+ [ 7.607080] CPU: 2 UID: 0 PID: 347 Comm: corrupting Not tainted 6.17.0-rc7-00022-g90270f3db80a-dirty #509 PREEMPT(voluntary)
+ [ 7.607083] Call Trace:
+ [ 7.607084] <#DB>
+ [ 7.607085] dump_stack_lvl+0x66/0xa0
+ [ 7.607091] ksw_watch_handler.part.0+0x2b/0x60
+ [ 7.607094] ksw_watch_handler+0xba/0xd0
+ [ 7.607095] ? test_mthread_corrupting+0x48/0xd0
+ [ 7.607097] ? kthread+0x10d/0x210
+ [ 7.607099] ? ret_from_fork+0x187/0x1e0
+ [ 7.607102] ? ret_from_fork_asm+0x1a/0x30
+ [ 7.607105] __perf_event_overflow+0x154/0x570
+ [ 7.607108] perf_bp_event+0xb4/0xc0
+ [ 7.607112] ? look_up_lock_class+0x59/0x150
+ [ 7.607115] hw_breakpoint_exceptions_notify+0xf7/0x110
+ [ 7.607117] notifier_call_chain+0x44/0x110
+ [ 7.607119] atomic_notifier_call_chain+0x5f/0x110
+ [ 7.607121] notify_die+0x4c/0xb0
+ [ 7.607123] exc_debug_kernel+0xaf/0x170
+ [ 7.607126] asm_exc_debug+0x1e/0x40
+ [ 7.607127] RIP: 0010:test_mthread_corrupting+0x48/0xd0
+ [ 7.607129] Code: c7 80 0a 24 83 e8 48 f1 f1 00 48 85 c0 74 dd eb 30 bb 00 00 00 00 eb 59 48 63 c2 48 c1 e0 03 48 03 03 be cd ab cd ab 48 89 30 <83> c2 01 b8 20 00 00 00 29 c8 39 d0 7f e0 48 8d 7b 10 e8 d1 86 d4
+ [ 7.607130] RSP: 0018:ffffc90000acfee0 EFLAGS: 00000286
+ [ 7.607132] RAX: ffffc90000a13de8 RBX: ffff888102d57580 RCX: 0000000000000008
+ [ 7.607132] RDX: 0000000000000008 RSI: 00000000abcdabcd RDI: ffffc90000acfe00
+ [ 7.607133] RBP: ffff8881085bc800 R08: 0000000000000001 R09: 0000000000000000
+ [ 7.607133] R10: 0000000000000001 R11: 0000000000000000 R12: ffff888105398000
+ [ 7.607134] R13: ffff8881085bc800 R14: ffffffff815cb660 R15: 0000000000000000
+ [ 7.607134] ? __pfx_test_mthread_corrupting+0x10/0x10
+ [ 7.607137] </#DB>
+ [ 7.607138] <TASK>
+ [ 7.607138] kthread+0x10d/0x210
+ [ 7.607140] ? __pfx_kthread+0x10/0x10
+ [ 7.607141] ret_from_fork+0x187/0x1e0
+ [ 7.607143] ? __pfx_kthread+0x10/0x10
+ [ 7.607144] ret_from_fork_asm+0x1a/0x30
+ [ 7.607147] </TASK>
+ [ 7.607147] kstackwatch: =================== KStackWatch End ===================
+ [ 7.807082] kstackwatch_test: victim[0][11]: unhappy buf[8]=0xabcdabcd
+
+The line ``RIP: 0010:test_mthread_corrupting+0x48/0xd0`` shows the exact
+location where the corruption occurred. Now that the ``corrupting()`` function has
+been identified, it is straightforward to trace back to ``buggy()`` and fix the bug.
+
+
+More usage examples and corruption scenarios are provided in
+``kstackwatch_test.sh`` and ``mm/kstackwatch/test.c``.
+
+Limitations
+===========
+
+* Limited by available hardware breakpoints
+* Only one function can be watched at a time
+* Canary search limited to 128 * sizeof(ulong) from the current stack
+ pointer. This is sufficient for most cases, but has three limitations:
+
+ - If the stack frame is larger, the search may fail.
+ - If the function does not have a canary, the search may fail.
+ - If stack memory occasionally contains the same value as the canary,
+ it may be incorrectly matched.
+
+ In these cases, the user can provide the canary location using
+ ``sp_offset``, or treat any memory in the function prologue
+ as the canary.
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v8 27/27] MAINTAINERS: add entry for KStackWatch
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (25 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 26/27] docs: add KStackWatch document Jinchao Wang
@ 2025-11-10 16:36 ` Jinchao Wang
2025-11-10 17:33 ` [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Matthew Wilcox
27 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-10 16:36 UTC (permalink / raw)
To: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark,
Jinchao Wang, Jinjie Ruan, Jiri Olsa, Jonathan Corbet, Juri Lelli,
Justin Stitt, kasan-dev, Kees Cook, Liam R. Howlett, Liang Kan,
Linus Walleij, linux-arm-kernel, linux-doc, linux-kernel,
linux-mm, linux-perf-users, linux-trace-kernel, llvm,
Lorenzo Stoakes, Mark Rutland, Masahiro Yamada, Mathieu Desnoyers,
Mel Gorman, Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
Add a maintainer entry for Kernel Stack Watch.
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
MAINTAINERS | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index ddecf1ef3bed..9757775de515 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13615,6 +13615,15 @@ F: Documentation/filesystems/smb/ksmbd.rst
F: fs/smb/common/
F: fs/smb/server/
+KERNEL STACK WATCH
+M: Jinchao Wang <wangjinchao600@gmail.com>
+S: Maintained
+F: Documentation/dev-tools/kstackwatch.rst
+F: include/linux/kstackwatch.h
+F: include/linux/kstackwatch_types.h
+F: mm/kstackwatch/
+F: tools/kstackwatch/
+
KERNEL UNIT TESTING FRAMEWORK (KUnit)
M: Brendan Higgins <brendan.higgins@linux.dev>
M: David Gow <davidgow@google.com>
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool
2025-11-10 16:35 [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Jinchao Wang
` (26 preceding siblings ...)
2025-11-10 16:36 ` [PATCH v8 27/27] MAINTAINERS: add entry for KStackWatch Jinchao Wang
@ 2025-11-10 17:33 ` Matthew Wilcox
2025-11-12 2:14 ` Jinchao Wang
27 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2025-11-10 17:33 UTC (permalink / raw)
To: Jinchao Wang
Cc: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark, Jinjie Ruan,
Jiri Olsa, Jonathan Corbet, Juri Lelli, Justin Stitt, kasan-dev,
Kees Cook, Liam R. Howlett, Liang Kan, Linus Walleij,
linux-arm-kernel, linux-doc, linux-kernel, linux-mm,
linux-perf-users, linux-trace-kernel, llvm, Lorenzo Stoakes,
Mark Rutland, Masahiro Yamada, Mathieu Desnoyers, Mel Gorman,
Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
On Tue, Nov 11, 2025 at 12:35:55AM +0800, Jinchao Wang wrote:
> Earlier this year, I debugged a stack corruption panic that revealed the
> limitations of existing debugging tools. The bug persisted for 739 days
> before being fixed (CVE-2025-22036), and my reproduction scenario
> differed from the CVE report—highlighting how unpredictably these bugs
> manifest.
Well, this demonstrates the dangers of keeping this problem siloed
within your own exfat group. The fix made in 1bb7ff4204b6 is wrong!
It was fixed properly in 7375f22495e7 which lists its Fixes: as
Linux-2.6.12-rc2, but that's simply the beginning of git history.
It's actually been there since v2.4.6.4 where it's documented as simply:
- some subtle fs/buffer.c race conditions (Andrew Morton, me)
As far as I can tell the changes made in 1bb7ff4204b6 should be
reverted.
> Initially, I enabled KASAN, but the bug did not reproduce. Reviewing the
> code in __blk_flush_plug(), I found it difficult to trace all logic
> paths due to indirect function calls through function pointers.
So why is the solution here not simply to fix KASAN instead of this
giant patch series?
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool
2025-11-10 17:33 ` [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Matthew Wilcox
@ 2025-11-12 2:14 ` Jinchao Wang
2025-11-12 20:36 ` Matthew Wilcox
0 siblings, 1 reply; 32+ messages in thread
From: Jinchao Wang @ 2025-11-12 2:14 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Masami Hiramatsu (Google), Peter Zijlstra,
Randy Dunlap, Marco Elver, Mike Rapoport, Alexander Potapenko,
Adrian Hunter, Alexander Shishkin, Alice Ryhl, Andrey Konovalov,
Andrey Ryabinin, Andrii Nakryiko, Ard Biesheuvel,
Arnaldo Carvalho de Melo, Ben Segall, Bill Wendling,
Borislav Petkov, Catalin Marinas, Dave Hansen, David Hildenbrand,
David Kaplan, David S. Miller, Dietmar Eggemann, Dmitry Vyukov,
H. Peter Anvin, Ian Rogers, Ingo Molnar, James Clark, Jinjie Ruan,
Jiri Olsa, Jonathan Corbet, Juri Lelli, Justin Stitt, kasan-dev,
Kees Cook, Liam R. Howlett, Liang Kan, Linus Walleij,
linux-arm-kernel, linux-doc, linux-kernel, linux-mm,
linux-perf-users, linux-trace-kernel, llvm, Lorenzo Stoakes,
Mark Rutland, Masahiro Yamada, Mathieu Desnoyers, Mel Gorman,
Michal Hocko, Miguel Ojeda, Nam Cao, Namhyung Kim,
Nathan Chancellor, Naveen N Rao, Nick Desaulniers, Rong Xu,
Sami Tolvanen, Steven Rostedt, Suren Baghdasaryan,
Thomas Gleixner, Thomas Weißschuh, Valentin Schneider,
Vincent Guittot, Vincenzo Frascino, Vlastimil Babka, Will Deacon,
workflows, x86
On Mon, Nov 10, 2025 at 05:33:22PM +0000, Matthew Wilcox wrote:
> On Tue, Nov 11, 2025 at 12:35:55AM +0800, Jinchao Wang wrote:
> > Earlier this year, I debugged a stack corruption panic that revealed the
> > limitations of existing debugging tools. The bug persisted for 739 days
> > before being fixed (CVE-2025-22036), and my reproduction scenario
> > differed from the CVE report—highlighting how unpredictably these bugs
> > manifest.
>
> Well, this demonstrates the dangers of keeping this problem siloed
> within your own exfat group. The fix made in 1bb7ff4204b6 is wrong!
> It was fixed properly in 7375f22495e7 which lists its Fixes: as
> Linux-2.6.12-rc2, but that's simply the beginning of git history.
> It's actually been there since v2.4.6.4 where it's documented as simply:
>
> - some subtle fs/buffer.c race conditions (Andrew Morton, me)
>
> As far as I can tell the changes made in 1bb7ff4204b6 should be
> reverted.
Thank you for the correction and the detailed history. I wasn't aware this
dated back to v2.4.6.4. I'm not part of the exfat group; I simply
encountered a bug that 1bb7ff4204b6 happened to resolve in my scenario.
The timeline actually illustrates the exact problem KStackWatch addresses:
a bug introduced in 2001, partially addressed in 2025, then properly fixed
months later. The 24-year gap suggests these silent stack corruptions are
extremely difficult to locate.
>
> > Initially, I enabled KASAN, but the bug did not reproduce. Reviewing the
> > code in __blk_flush_plug(), I found it difficult to trace all logic
> > paths due to indirect function calls through function pointers.
>
> So why is the solution here not simply to fix KASAN instead of this
> giant patch series?
KASAN caught 7375f22495e7 because put_bh() accessed bh->b_count after
wait_on_buffer() of another thread returned—the stack was invalid.
In 1bb7ff4204b6 and my case, corruption occurred before the victim
function of another thread returned. The stack remained valid to KASAN,
so no warning triggered. This is timing-dependent, not a KASAN deficiency.
Making KASAN treat parts of active stack frame as invalid would be
complex and add significant overhead, likely worsening the reproduction
prevention issue. KASAN's overhead already prevented reproduction in my
environment.
KStackWatch takes a different approach: it watches stack frame regardless
of whether KASAN considers them valid or invalid, with much less overhead
thereby preserving reproduction scenarios.
The value proposition:
Finding where corruption occurs is the bottleneck. Once located,
subsystem experts can analyze the root cause. Without that location, even
experts are stuck.
If KStackWatch had existed earlier, this 24-year-old bug might have been
found sooner when someone hit a similar corruption. The same applies to
other stack corruption bugs.
I'd appreciate your thoughts on whether this addresses your concerns.
Best regards,
Jinchao
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool
2025-11-12 2:14 ` Jinchao Wang
@ 2025-11-12 20:36 ` Matthew Wilcox
2025-11-13 4:40 ` Jinchao Wang
0 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2025-11-12 20:36 UTC (permalink / raw)
To: Jinchao Wang
Cc: kasan-dev, linux-arm-kernel, linux-doc, linux-kernel, linux-mm,
linux-perf-users, linux-trace-kernel, llvm, workflows, x86
[dropping all the individual email addresses; leaving only the
mailing lists]
On Wed, Nov 12, 2025 at 10:14:29AM +0800, Jinchao Wang wrote:
> On Mon, Nov 10, 2025 at 05:33:22PM +0000, Matthew Wilcox wrote:
> > On Tue, Nov 11, 2025 at 12:35:55AM +0800, Jinchao Wang wrote:
> > > Earlier this year, I debugged a stack corruption panic that revealed the
> > > limitations of existing debugging tools. The bug persisted for 739 days
> > > before being fixed (CVE-2025-22036), and my reproduction scenario
> > > differed from the CVE report—highlighting how unpredictably these bugs
> > > manifest.
> >
> > Well, this demonstrates the dangers of keeping this problem siloed
> > within your own exfat group. The fix made in 1bb7ff4204b6 is wrong!
> > It was fixed properly in 7375f22495e7 which lists its Fixes: as
> > Linux-2.6.12-rc2, but that's simply the beginning of git history.
> > It's actually been there since v2.4.6.4 where it's documented as simply:
> >
> > - some subtle fs/buffer.c race conditions (Andrew Morton, me)
> >
> > As far as I can tell the changes made in 1bb7ff4204b6 should be
> > reverted.
>
> Thank you for the correction and the detailed history. I wasn't aware this
> dated back to v2.4.6.4. I'm not part of the exfat group; I simply
> encountered a bug that 1bb7ff4204b6 happened to resolve in my scenario.
> The timeline actually illustrates the exact problem KStackWatch addresses:
> a bug introduced in 2001, partially addressed in 2025, then properly fixed
> months later. The 24-year gap suggests these silent stack corruptions are
> extremely difficult to locate.
I think that's a misdiagnosis caused by not understanding the limited
circumstances in which the problem occurs. To hit this problem, you
have to have a buffer_head allocated on the stack. That doesn't happen
in many places:
fs/buffer.c: struct buffer_head tmp = {
fs/direct-io.c: struct buffer_head map_bh = { 0, };
fs/ext2/super.c: struct buffer_head tmp_bh;
fs/ext2/super.c: struct buffer_head tmp_bh;
fs/ext4/mballoc-test.c: struct buffer_head bitmap_bh;
fs/ext4/mballoc-test.c: struct buffer_head gd_bh;
fs/gfs2/bmap.c: struct buffer_head bh;
fs/gfs2/bmap.c: struct buffer_head bh;
fs/isofs/inode.c: struct buffer_head dummy;
fs/jfs/super.c: struct buffer_head tmp_bh;
fs/jfs/super.c: struct buffer_head tmp_bh;
fs/mpage.c: struct buffer_head map_bh;
fs/mpage.c: struct buffer_head map_bh;
It's far more common for buffer_heads to be allocated from slab and
attached to folios. The other necessary condition to hit this problem
is that get_block() has to actually read the data from disk. That's
not normal either! Most filesystems just fill in the metadata about
the block and defer the actual read to when the data is wanted. That's
the high-performance way to do it.
So our opportunity to catch this bug was highly limited by the fact that
we just don't run the codepaths that would allow it to trigger.
> > > Initially, I enabled KASAN, but the bug did not reproduce. Reviewing the
> > > code in __blk_flush_plug(), I found it difficult to trace all logic
> > > paths due to indirect function calls through function pointers.
> >
> > So why is the solution here not simply to fix KASAN instead of this
> > giant patch series?
>
> KASAN caught 7375f22495e7 because put_bh() accessed bh->b_count after
> wait_on_buffer() of another thread returned—the stack was invalid.
> In 1bb7ff4204b6 and my case, corruption occurred before the victim
> function of another thread returned. The stack remained valid to KASAN,
> so no warning triggered. This is timing-dependent, not a KASAN deficiency.
I agree that it's a narrow race window, but nevertheless KASAN did catch
it with ntfs and not with exfat. The KASAN documentation states that
it can catch this kind of bug:
Generic KASAN supports finding bugs in all of slab, page_alloc, vmap, vmalloc,
stack, and global memory.
Software Tag-Based KASAN supports slab, page_alloc, vmalloc, and stack memory.
Hardware Tag-Based KASAN supports slab, page_alloc, and non-executable vmalloc
memory.
(hm, were you using hwkasan instead of swkasan, and that's why you
couldn't see it?)
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool
2025-11-12 20:36 ` Matthew Wilcox
@ 2025-11-13 4:40 ` Jinchao Wang
0 siblings, 0 replies; 32+ messages in thread
From: Jinchao Wang @ 2025-11-13 4:40 UTC (permalink / raw)
To: Matthew Wilcox
Cc: kasan-dev, linux-arm-kernel, linux-doc, linux-kernel, linux-mm,
linux-perf-users, linux-trace-kernel, llvm, workflows, x86
On Wed, Nov 12, 2025 at 08:36:33PM +0000, Matthew Wilcox wrote:
> [dropping all the individual email addresses; leaving only the
> mailing lists]
>
> On Wed, Nov 12, 2025 at 10:14:29AM +0800, Jinchao Wang wrote:
> > On Mon, Nov 10, 2025 at 05:33:22PM +0000, Matthew Wilcox wrote:
> > > On Tue, Nov 11, 2025 at 12:35:55AM +0800, Jinchao Wang wrote:
> > > > Earlier this year, I debugged a stack corruption panic that revealed the
> > > > limitations of existing debugging tools. The bug persisted for 739 days
> > > > before being fixed (CVE-2025-22036), and my reproduction scenario
> > > > differed from the CVE report—highlighting how unpredictably these bugs
> > > > manifest.
> > >
> > > Well, this demonstrates the dangers of keeping this problem siloed
> > > within your own exfat group. The fix made in 1bb7ff4204b6 is wrong!
> > > It was fixed properly in 7375f22495e7 which lists its Fixes: as
> > > Linux-2.6.12-rc2, but that's simply the beginning of git history.
> > > It's actually been there since v2.4.6.4 where it's documented as simply:
> > >
> > > - some subtle fs/buffer.c race conditions (Andrew Morton, me)
> > >
> > > As far as I can tell the changes made in 1bb7ff4204b6 should be
> > > reverted.
> >
> > Thank you for the correction and the detailed history. I wasn't aware this
> > dated back to v2.4.6.4. I'm not part of the exfat group; I simply
> > encountered a bug that 1bb7ff4204b6 happened to resolve in my scenario.
> > The timeline actually illustrates the exact problem KStackWatch addresses:
> > a bug introduced in 2001, partially addressed in 2025, then properly fixed
> > months later. The 24-year gap suggests these silent stack corruptions are
> > extremely difficult to locate.
>
> I think that's a misdiagnosis caused by not understanding the limited
> circumstances in which the problem occurs. To hit this problem, you
> have to have a buffer_head allocated on the stack. That doesn't happen
> in many places:
>
> fs/buffer.c: struct buffer_head tmp = {
> fs/direct-io.c: struct buffer_head map_bh = { 0, };
> fs/ext2/super.c: struct buffer_head tmp_bh;
> fs/ext2/super.c: struct buffer_head tmp_bh;
> fs/ext4/mballoc-test.c: struct buffer_head bitmap_bh;
> fs/ext4/mballoc-test.c: struct buffer_head gd_bh;
> fs/gfs2/bmap.c: struct buffer_head bh;
> fs/gfs2/bmap.c: struct buffer_head bh;
> fs/isofs/inode.c: struct buffer_head dummy;
> fs/jfs/super.c: struct buffer_head tmp_bh;
> fs/jfs/super.c: struct buffer_head tmp_bh;
> fs/mpage.c: struct buffer_head map_bh;
> fs/mpage.c: struct buffer_head map_bh;
>
> It's far more common for buffer_heads to be allocated from slab and
> attached to folios. The other necessary condition to hit this problem
> is that get_block() has to actually read the data from disk. That's
> not normal either! Most filesystems just fill in the metadata about
> the block and defer the actual read to when the data is wanted. That's
> the high-performance way to do it.
>
> So our opportunity to catch this bug was highly limited by the fact that
> we just don't run the codepaths that would allow it to trigger.
>
> > > > Initially, I enabled KASAN, but the bug did not reproduce. Reviewing the
> > > > code in __blk_flush_plug(), I found it difficult to trace all logic
> > > > paths due to indirect function calls through function pointers.
> > >
> > > So why is the solution here not simply to fix KASAN instead of this
> > > giant patch series?
> >
> > KASAN caught 7375f22495e7 because put_bh() accessed bh->b_count after
> > wait_on_buffer() of another thread returned—the stack was invalid.
> > In 1bb7ff4204b6 and my case, corruption occurred before the victim
> > function of another thread returned. The stack remained valid to KASAN,
> > so no warning triggered. This is timing-dependent, not a KASAN deficiency.
>
> I agree that it's a narrow race window, but nevertheless KASAN did catch
> it with ntfs and not with exfat. The KASAN documentation states that
> it can catch this kind of bug:
>
> Generic KASAN supports finding bugs in all of slab, page_alloc, vmap, vmalloc,
> stack, and global memory.
>
> Software Tag-Based KASAN supports slab, page_alloc, vmalloc, and stack memory.
>
> Hardware Tag-Based KASAN supports slab, page_alloc, and non-executable vmalloc
> memory.
>
> (hm, were you using hwkasan instead of swkasan, and that's why you
> couldn't see it?)
>
You're right that these conditions are narrow. However, when these bugs
hit, they're severe and extremely difficult to debug. This year alone,
this specific buffer_head bug was hit at least twice: 1bb7ff4204b6 and my
case. Over 24 years, others likely encountered it but lacked tools to
pinpoint the root cause.
I used software KASAN for the exfat case, but the bug didn't reproduce,
likely due to timing changes from the overhead. More fundamentally, the
corruption was in-bounds within active stack frames, which KASAN cannot
detect by design.
Beyond buffer_head, I encountered another stack corruption bug in network
drivers this year. Without KStackWatch, I had to manually instrument the
code to locate where corruption occurred.
These issues may be more common than they appear. Given Linux's massive
user base combined with the kernel's huge codebase and the large volume of
driver code, both in-tree and out-of-tree, even narrow conditions will be
hit.
Since posting earlier versions, several developers have contacted me about
using KStackWatch for their own issues. KStackWatch fills a gap: it can
pinpoint in-bounds stack corruption with much lower overhead than KASAN.
^ permalink raw reply [flat|nested] 32+ messages in thread