All of lore.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Feng Tang <feng.tang@linux.alibaba.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Petr Mladek <pmladek@suse.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	paulmck@kernel.org, Douglas Anderson <dianders@chromium.org>,
	Thomas Gleixner <tglx@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1] kernel: add a simple timer based software watchpoint
Date: Mon, 22 Jun 2026 10:42:06 +0200	[thread overview]
Message-ID: <e59ca845-2134-45c5-ad31-5e4348bbbd5f@kernel.org> (raw)
In-Reply-To: <20260622081430.37557-1-feng.tang@linux.alibaba.com>

On 6/22/26 10:14, Feng Tang wrote:
> During debugging some bios/hardware related nasty memory corruption
> issues, we found using periodic timer to monitor specific dram/mmio
> physical address is very useful for debugging, which acts like
> a basic software watchpoint.
> 
> For those bugs,  who (and when) change(corrupt) those dram or mmio
> register is hard to trace, and sometimes even hardware jtag debugger
> can't help (say the physical address watchpoint doesn't work).
> 
> The biggest shortcoming of this method is it can never capture the
> exact point like a hardware watchpoint. The idea is trying to
> approach the point by adjusting the timer interval, hoping the caught
> context have enough debug info (which did help us in solving quite
> some bios/hardware bugs)
> 
> The working flow is simple: after suspected address is identified,
> start periodic timer polling it to catch if its value is changed to
> target 'magic' value, then halt the cpu (better limit to have only
> one cpu online), or panic, or print out system information, so that
> the error environment is frozen for further check , or let
> kexec/kdump to record the vmcore, etc.
> 
> One real use case was:
> "
> On an arm64 platform, some BIOS/HW config caused OS boot easy to
> stall in systemd init phase, then the reproducing was simplified it
> by making it boot to console with a function-reduced rootfs, and
> always triggering 'segmentation fault' when running 'less' command.
> 
> By using GDB, some static array of 'less' is found corrupted before
> being written, and one byte in its memory is always '0x33'.At this
> stage the static array is in bss segment first, and backed by kernel
> zero page after first read, so it was an obvious memory corruption.
> 
> HW engineers tried to capture HW traces after the issue happened, but
> could not find valuable hints, as the corruption could happen long
> before 'less' is run, and the trace/context of that time was gone.
> 
> As physical address of kernel zero page was known, and offset of the
> corrupted byte was fixed, the address was A. But HW debugger failed
> to breakpoint the point that address A was written with '0x33'. Then
> this method was used to monitor 'writing 0x33 to A' with 30ms
> interval, and halted the system  by 'while (1);' (the system was made
> to a UP by using 'nr_cpus=1' cmdline parameter) once hit, then HW
> people collected the HW trace they need and root caused it to be a
> bad config.
> "
> 
> The culprits of memory corruption issues we met are mainly:
> * broken devices (like ethernet card)
> * BIOS runtime service
> * silicon bugs
> * kernel itself
> 
> As kernel already have many useful debug featues like slub_debug,
> kasan, kfence, kmemleak etc.., this method could be more fit for the
> upper three types.
> 
> All the settings are module parameters:
> 
>   watch_interval_ms: SW watchpoint check interval in ms
>   paddr_dram_to_watch: Physical dram address to monitor.
>   target_dram_val: Expected value at the dram address that triggers the watchpoint.
>   paddr_mmio_to_watch: Physical mmio address to monitor. Must be 32-bit aligned.
>   target_mmio_val: Expected value at the mmio address that triggers the watchpoint.
>   panic_on_hit: Trigger kernel panic when watchpoint condition hits.
>   hang_on_hit: halt the CPU (wait for HW debugger)
> 
> This provides the basic function, and there are more todoes:
>   * add a ftrace mode to do function level monitoring with ftrace hook,
>     which is more accurate timing wise, as suggested by Steven Rostedt
>   * merge the dram/mmio interface to auto detect it's dram or mmio
>   * support runtime changing the address
>   * move the starting point earlier in boot phase
>   * monitor a whole memory region
>   * currently is monitoring 'changing to a value', add support
>     for 'changing from a value'

That really looks more like the kind of thing you would want to carry as a OOT
hack for your special debugging needs :)

-- 
Cheers,

David

  reply	other threads:[~2026-06-22  8:42 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-22  8:14 [PATCH v1] kernel: add a simple timer based software watchpoint Feng Tang
2026-06-22  8:42 ` David Hildenbrand (Arm) [this message]
2026-06-22 10:53   ` Thomas Gleixner
2026-06-22 12:45   ` Feng Tang
2026-06-22 14:13     ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e59ca845-2134-45c5-ad31-5e4348bbbd5f@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dianders@chromium.org \
    --cc=feng.tang@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@kernel.org \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.