From: Jinchao Wang <wangjinchao600@gmail.com>
To: "Andrew Morton" <akpm@linux-foundation.org>,
"Masami Hiramatsu (Google)" <mhiramat@kernel.org>,
"Peter Zijlstra" <peterz@infradead.org>,
"Randy Dunlap" <rdunlap@infradead.org>,
"Marco Elver" <elver@google.com>,
"Mike Rapoport" <rppt@kernel.org>,
"Alexander Potapenko" <glider@google.com>,
"Adrian Hunter" <adrian.hunter@intel.com>,
"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
"Alice Ryhl" <aliceryhl@google.com>,
"Andrey Konovalov" <andreyknvl@gmail.com>,
"Andrey Ryabinin" <ryabinin.a.a@gmail.com>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Ard Biesheuvel" <ardb@kernel.org>,
"Arnaldo Carvalho de Melo" <acme@kernel.org>,
"Ben Segall" <bsegall@google.com>,
"Bill Wendling" <morbo@google.com>,
"Borislav Petkov" <bp@alien8.de>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
"David Hildenbrand" <david@redhat.com>,
"David Kaplan" <david.kaplan@amd.com>,
"David S. Miller" <davem@davemloft.net>,
"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
"Dmitry Vyukov" <dvyukov@google.com>,
"H. Peter Anvin" <hpa@zytor.com>,
"Ian Rogers" <irogers@google.com>,
"Ingo Molnar" <mingo@redhat.com>,
"James Clark" <james.clark@linaro.org>,
"Jinchao Wang" <wangjinchao600@gmail.com>,
"Jinjie Ruan" <ruanjinjie@huawei.com>,
"Jiri Olsa" <jolsa@kernel.org>,
"Jonathan Corbet" <corbet@lwn.net>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Justin Stitt" <justinstitt@google.com>,
kasan-dev@googlegroups.com, "Kees Cook" <kees@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
"Liang Kan" <kan.liang@linux.intel.com>,
"Linus Walleij" <linus.walleij@linaro.org>,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-perf-users@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, llvm@lists.linux.dev,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Mark Rutland" <mark.rutland@arm.com>,
"Masahiro Yamada" <masahiroy@kernel.org>,
"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
"Mel Gorman" <mgorman@suse.de>, "Michal Hocko" <mhocko@suse.com>,
"Miguel Ojeda" <ojeda@kernel.org>,
"Nam Cao" <namcao@linutronix.de>,
"Namhyung Kim" <namhyung@kernel.org>,
"Nathan Chancellor" <nathan@kernel.org>,
"Naveen N Rao" <naveen@kernel.org>,
"Nick Desaulniers" <nick.desaulniers+lkml@gmail.com>,
"Rong Xu" <xur@google.com>,
"Sami Tolvanen" <samitolvanen@google.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Suren Baghdasaryan" <surenb@google.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Thomas Weißschuh" <thomas.weissschuh@linutronix.de>,
"Valentin Schneider" <vschneid@redhat.com>,
"Vincent Guittot" <vincent.guittot@linaro.org>,
"Vincenzo Frascino" <vincenzo.frascino@arm.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Will Deacon" <will@kernel.org>,
workflows@vger.kernel.org, x86@kernel.org
Subject: [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool
Date: Tue, 11 Nov 2025 00:35:55 +0800 [thread overview]
Message-ID: <20251110163634.3686676-1-wangjinchao600@gmail.com> (raw)
Earlier this year, I debugged a stack corruption panic that revealed the
limitations of existing debugging tools. The bug persisted for 739 days
before being fixed (CVE-2025-22036), and my reproduction scenario
differed from the CVE report—highlighting how unpredictably these bugs
manifest.
The panic call trace:
<4>[89318.486564] <TASK>
<4>[89318.486570] dump_stack_lvl+0x48/0x70
<4>[89318.486580] dump_stack+0x10/0x20
<4>[89318.486586] panic+0x345/0x3a0
<4>[89318.486596] ? __blk_flush_plug+0x121/0x130
<4>[89318.486603] __stack_chk_fail+0x14/0x20
<4>[89318.486612] __blk_flush_plug+0x121/0x130
...27 other frames omitted
<4>[89318.486824] ksys_read+0x6b/0xf0
<4>[89318.486829] __x64_sys_read+0x19/0x30
<4>[89318.486834] x64_sys_call+0x1ada/0x25c0
<4>[89318.486840] do_syscall_64+0x7f/0x180
<4>[89318.486847] ? exc_page_fault+0x94/0x1b0
<4>[89318.486855] entry_SYSCALL_64_after_hwframe+0x73/0x7b
<4>[89318.486866] </TASK>
Initially, I enabled KASAN, but the bug did not reproduce. Reviewing the
code in __blk_flush_plug(), I found it difficult to trace all logic
paths due to indirect function calls through function pointers.
I added canary-locating code to obtain the canary address and value,
then inserted extensive debugging code to track canary modifications. I
observed the canary being corrupted between two unrelated assignments,
indicating corruption by another thread—a silent stack corruption bug.
I then added hardware breakpoint (hwbp) code, but still failed to catch
the corruption. After adding PID filters, function parameter filters,
and depth filters, I discovered the corruption occurred in
end_buffer_read_sync() via atomic_dec(&bh->b_count), where bh->b_count
overlapped with __blk_flush_plug()'s canary address. Tracing the bh
lifecycle revealed the root cause in exfat_get_block()—a function not
even present in the panic call trace.
This bug was later assigned CVE-2025-22036
(https://lore.kernel.org/all/2025041658-CVE-2025-22036-6469@gregkh/).
The vulnerability was introduced in commit 11a347fb6cef (March 13, 2023)
and fixed in commit 1bb7ff4204b6 (March 21, 2025)—persisting for 739
days. Notably, my reproduction scenario differed significantly from that
described in the CVE report, highlighting how these bugs manifest
unpredictably across different workloads.
This experience revealed how notoriously difficult stack corruption bugs
are to debug: KASAN cannot reproduce them, call traces are misleading,
and the actual culprit often lies outside the visible call chain. Manual
instrumentation with hardware breakpoints and filters was effective but
extremely time-consuming.
This motivated KStackWatch: automating the debugging workflow I manually
performed, making hardware breakpoint-based stack monitoring readily
available to all kernel developers facing similar issues.
KStackWatch is a lightweight debugging tool to detect kernel stack
corruption in real time. It installs a hardware breakpoint (watchpoint)
at a function's specified offset using kprobe.post_handler and removes
it in fprobe.exit_handler. This covers the full execution window and
reports corruption immediately with time, location, and a call stack.
Beyond automating proven debugging workflows, KStackWatch incorporates
robust engineering to handle complex scenarios like context switches,
recursion, and concurrent execution, making it suitable for broad
debugging use cases.
## Key Features
* Immediate and precise stack corruption detection
* Support for multiple concurrent watchpoints with configurable limits
* Lockless design, usable in any context
* Depth filter for recursive calls
* Low overhead of memory and CPU
* Flexible debugfs configuration with key=val syntax
* Architecture support: x86_64 and arm64
* Auto-canary detection to simplify configuration
## Architecture Support
KStackWatch currently supports x86_64 and arm64. The design is
architecture-agnostic, requiring only:
* Hardware breakpoint modification in atomic context
Arm64 support required only ~20 lines of code(patch 18,19). Future ports
to other architectures (e.g., riscv) should be straightforward for
developers familiar with their hardware breakpoint implementations.
## Performance Impact
Runtime overhead was measured on Intel Core Ultra 5 125H @ 3 GHz running
kernel 6.17, using test4 from patch 24:
Type | Time (ns) | Cycles
-----------------------------------------------
entry with watch | 10892 | 32620
entry without watch | 159 | 466
exit with watch | 12541 | 37556
exit without watch | 124 | 369
Comparation with other scenarios:
Mode | CPU Overhead (add) | Memory Overhead (add)
----------------------------+----------------------+-------------------------
Compiled but not enabled | None | ~20 B per task
Enabled, no function hit | None | ~few hundred B
Func hit, HWBP not toggled | ~140 ns per call | None
Func hit, HWBP toggled | ~11–12 µs per call | None
The overhead is minimal, making KStackWatch suitable for production
environments where stack corruption is suspected but kernel rebuilds are not feasible.
## Validation
To validate the approach, this series includes a self-contained test module and
a companion shell script. The module provides several test cases covering
scenarios such as canary overflow, recursive depth tracking, multi-threaded
silent corruption, retaddr overwriten. A detailed workflow example and usage
guide are provided in the documentation (patch 26).
While KStackWatch itself is a new tool and has not yet discovered production
bugs, it automates the exact methodology that I used to manually uncover
CVE-2025-22036. The tool is designed to make this powerful debugging technique
readily available to kernel developers, enabling them to efficiently detect and
diagnose similar stack corruption issues in the future.
---
Patches 1–3 of this series are also used in the wprobe work proposed by
Masami Hiramatsu, so there may be some overlap between our patches.
Patch 3 comes directly from Masami Hiramatsu (thanks).
---
Changelog:
v8:
* Add arm64 support
* Implement hwbp_reinstall() for arm64.
* Use single-step mode as default in ksw_watch_handler().
* Add latency measurements for probe handlers.
* Update configuration options
* Introduce explicit auto_canary parameter.
* Default watch_len to sizeof(unsigned long) when zero.
* Replace panic_on_catch with panic_hit ksw_config option.
* Enable KStackWatch in non-debug builds.
* Limit canary search range to the current stack frame when possible.
* Add automatic architecture detection for test parameters.
* Move kstackwatch.h to include/linux/.
* Relocate Kconfig fragments to the kstackwatch/ directory.
v7:
https://lore.kernel.org/all/20251009105650.168917-1-wangjinchao600@gmail.com/
* Fix maintainer entry to alphabetical position
v6:
https://lore.kernel.org/all/20250930024402.1043776-1-wangjinchao600@gmail.com/
* Replace procfs with debugfs interface
* Fix typos
v5:
https://lore.kernel.org/all/20250924115124.194940-1-wangjinchao600@gmail.com/
* Support key=value input format
* Support multiple watchpoints
* Support watching instruction inside loop
* Support recursion depth tracking with generation
* Ignore triggers from fprobe trampoline
* Split watch_on into watch_get and watch_on to fail fast
* Handle ksw_stack_prepare_watch error
* Rewrite silent corruption test
* Add multiple watchpoints test
* Add an example in documentation
v4:
https://lore.kernel.org/all/20250912101145.465708-1-wangjinchao600@gmail.com/
* Solve the lockdep issues with:
* per-task KStackWatch context to track depth
* atomic flag to protect watched_addr
* Use refactored version of arch_reinstall_hw_breakpoint
v3:
https://lore.kernel.org/all/20250910052335.1151048-1-wangjinchao600@gmail.com/
* Use modify_wide_hw_breakpoint_local() (from Masami)
* Add atomic flag to restrict /proc/kstackwatch to a single opener
* Protect stack probe with an atomic PID flag
* Handle CPU hotplug for watchpoints
* Add preempt_disable/enable in ksw_watch_on_local_cpu()
* Introduce const struct ksw_config *ksw_get_config(void) and use it
* Switch to global watch_attr, remove struct watch_info
* Validate local_var_len in parser()
* Handle case when canary is not found
* Use dump_stack() instead of show_regs() to allow module build
* Reduce logging and comments
* Format logs with KBUILD_MODNAME
* Remove unused headers
* Add new document
v2:
https://lore.kernel.org/all/20250904002126.1514566-1-wangjinchao600@gmail.com/
* Make hardware breakpoint and stack operations
architecture-independent.
v1:
https://lore.kernel.org/all/20250828073311.1116593-1-wangjinchao600@gmail.com/
* Replaced kretprobe with fprobe for function exit hooking, as
suggested by Masami Hiramatsu
* Introduced per-task depth logic to track recursion across scheduling
* Removed the use of workqueue for a more efficient corruption check
* Reordered patches for better logical flow
* Simplified and improved commit messages throughout the series
* Removed initial archcheck which should be improved later
* Replaced the multiple-thread test with silent corruption test
* Split self-tests into a separate patch to improve clarity.
* Added a new entry for KStackWatch to the MAINTAINERS file.
---
Jinchao Wang (26):
x86/hw_breakpoint: Unify breakpoint install/uninstall
x86/hw_breakpoint: Add arch_reinstall_hw_breakpoint
mm/ksw: add build system support
mm/ksw: add ksw_config struct and parser
mm/ksw: add singleton debugfs interface
mm/ksw: add HWBP pre-allocation
mm/ksw: Add atomic watchpoint management api
mm/ksw: ignore false positives from exit trampolines
mm/ksw: support CPU hotplug
sched/ksw: add per-task context
mm/ksw: add entry kprobe and exit fprobe management
mm/ksw: add per-task ctx tracking
mm/ksw: resolve stack watch addr and len
mm/ksw: limit canary search to current stack frame
mm/ksw: manage probe and HWBP lifecycle via procfs
mm/ksw: add KSTACKWATCH_PROFILING to measure probe cost
arm64/hw_breakpoint: Add arch_reinstall_hw_breakpoint
arm64/hwbp/ksw: integrate KStackWatch handler support
mm/ksw: add self-debug helpers
mm/ksw: add test module
mm/ksw: add stack overflow test
mm/ksw: add recursive depth test
mm/ksw: add multi-thread corruption test cases
tools/ksw: add arch-specific test script
docs: add KStackWatch document
MAINTAINERS: add entry for KStackWatch
Masami Hiramatsu (Google) (1):
HWBP: Add modify_wide_hw_breakpoint_local() API
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kstackwatch.rst | 377 +++++++++++++++++++++
MAINTAINERS | 9 +
arch/Kconfig | 10 +
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/hw_breakpoint.h | 1 +
arch/arm64/kernel/hw_breakpoint.c | 12 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/hw_breakpoint.h | 8 +
arch/x86/kernel/hw_breakpoint.c | 148 +++++----
include/linux/hw_breakpoint.h | 6 +
include/linux/kstackwatch.h | 68 ++++
include/linux/kstackwatch_types.h | 14 +
include/linux/sched.h | 5 +
kernel/events/hw_breakpoint.c | 37 +++
mm/Kconfig | 1 +
mm/Makefile | 1 +
mm/kstackwatch/Kconfig | 34 ++
mm/kstackwatch/Makefile | 8 +
mm/kstackwatch/kernel.c | 295 +++++++++++++++++
mm/kstackwatch/stack.c | 416 ++++++++++++++++++++++++
mm/kstackwatch/test.c | 345 ++++++++++++++++++++
mm/kstackwatch/watch.c | 309 ++++++++++++++++++
tools/kstackwatch/kstackwatch_test.sh | 85 +++++
24 files changed, 2130 insertions(+), 62 deletions(-)
create mode 100644 Documentation/dev-tools/kstackwatch.rst
create mode 100644 include/linux/kstackwatch.h
create mode 100644 include/linux/kstackwatch_types.h
create mode 100644 mm/kstackwatch/Kconfig
create mode 100644 mm/kstackwatch/Makefile
create mode 100644 mm/kstackwatch/kernel.c
create mode 100644 mm/kstackwatch/stack.c
create mode 100644 mm/kstackwatch/test.c
create mode 100644 mm/kstackwatch/watch.c
create mode 100755 tools/kstackwatch/kstackwatch_test.sh
-*
2.43.0
next reply other threads:[~2025-11-10 16:36 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 16:35 Jinchao Wang [this message]
2025-11-10 16:35 ` [PATCH v8 01/27] x86/hw_breakpoint: Unify breakpoint install/uninstall Jinchao Wang
2025-11-10 16:35 ` [PATCH v8 02/27] x86/hw_breakpoint: Add arch_reinstall_hw_breakpoint Jinchao Wang
2025-11-10 16:35 ` [PATCH v8 03/27] HWBP: Add modify_wide_hw_breakpoint_local() API Jinchao Wang
2025-11-10 16:35 ` [PATCH v8 04/27] mm/ksw: add build system support Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 05/27] mm/ksw: add ksw_config struct and parser Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 06/27] mm/ksw: add singleton debugfs interface Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 07/27] mm/ksw: add HWBP pre-allocation Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 08/27] mm/ksw: Add atomic watchpoint management api Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 09/27] mm/ksw: ignore false positives from exit trampolines Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 10/27] mm/ksw: support CPU hotplug Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 11/27] sched/ksw: add per-task context Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 12/27] mm/ksw: add entry kprobe and exit fprobe management Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 13/27] mm/ksw: add per-task ctx tracking Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 14/27] mm/ksw: resolve stack watch addr and len Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 15/27] mm/ksw: limit canary search to current stack frame Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 16/27] mm/ksw: manage probe and HWBP lifecycle via procfs Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 17/27] mm/ksw: add KSTACKWATCH_PROFILING to measure probe cost Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 18/27] arm64/hw_breakpoint: Add arch_reinstall_hw_breakpoint Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 19/27] arm64/hwbp/ksw: integrate KStackWatch handler support Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 20/27] mm/ksw: add self-debug helpers Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 21/27] mm/ksw: add test module Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 22/27] mm/ksw: add stack overflow test Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 23/27] mm/ksw: add recursive depth test Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 24/27] mm/ksw: add multi-thread corruption test cases Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 25/27] tools/ksw: add arch-specific test script Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 26/27] docs: add KStackWatch document Jinchao Wang
2025-11-10 16:36 ` [PATCH v8 27/27] MAINTAINERS: add entry for KStackWatch Jinchao Wang
2025-11-10 17:33 ` [PATCH v8 00/27] mm/ksw: Introduce KStackWatch debugging tool Matthew Wilcox
2025-11-12 2:14 ` Jinchao Wang
2025-11-12 20:36 ` Matthew Wilcox
2025-11-13 4:40 ` Jinchao Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251110163634.3686676-1-wangjinchao600@gmail.com \
--to=wangjinchao600@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=aliceryhl@google.com \
--cc=andreyknvl@gmail.com \
--cc=andrii@kernel.org \
--cc=ardb@kernel.org \
--cc=bp@alien8.de \
--cc=bsegall@google.com \
--cc=catalin.marinas@arm.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=david.kaplan@amd.com \
--cc=david@redhat.com \
--cc=dietmar.eggemann@arm.com \
--cc=dvyukov@google.com \
--cc=elver@google.com \
--cc=glider@google.com \
--cc=hpa@zytor.com \
--cc=irogers@google.com \
--cc=james.clark@linaro.org \
--cc=jolsa@kernel.org \
--cc=juri.lelli@redhat.com \
--cc=justinstitt@google.com \
--cc=kan.liang@linux.intel.com \
--cc=kasan-dev@googlegroups.com \
--cc=kees@kernel.org \
--cc=linus.walleij@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=lorenzo.stoakes@oracle.com \
--cc=mark.rutland@arm.com \
--cc=masahiroy@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@suse.de \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=morbo@google.com \
--cc=namcao@linutronix.de \
--cc=namhyung@kernel.org \
--cc=nathan@kernel.org \
--cc=naveen@kernel.org \
--cc=nick.desaulniers+lkml@gmail.com \
--cc=ojeda@kernel.org \
--cc=peterz@infradead.org \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=ruanjinjie@huawei.com \
--cc=ryabinin.a.a@gmail.com \
--cc=samitolvanen@google.com \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=thomas.weissschuh@linutronix.de \
--cc=vbabka@suse.cz \
--cc=vincent.guittot@linaro.org \
--cc=vincenzo.frascino@arm.com \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=workflows@vger.kernel.org \
--cc=x86@kernel.org \
--cc=xur@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).