From: Steven Rostedt <rostedt@goodmis.org>
To: Feng Tang <feng.tang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Petr Mladek <pmladek@suse.com>,
paulmck@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] lib/sys_info: add a simple timer based memory corruption detector
Date: Wed, 10 Jun 2026 12:50:13 -0400 [thread overview]
Message-ID: <20260610125013.4cd775e3@robin> (raw)
In-Reply-To: <20260527034324.51136-1-feng.tang@linux.alibaba.com>
On Wed, 27 May 2026 11:43:24 +0800
Feng Tang <feng.tang@linux.alibaba.com> wrote:
> During debugging some bios/hardware related nasty memory corruption
> issues, we found using periodic timer to monitor specific dram/mmio
> physical address is very useful for debugging, which acts like
> a basic software watchpoint.
>
> For those bugs, who (and when) change(corrupt) those dram or mmio
> register is hard to trace, and sometimes even hardware jtag debugger
> can't help (say the physical address watchpoint doesn't work).
>
> The biggest shortcoming is it can never capture the exact point like
> a hardware watchpoint, no matter how small the timer interval is set,
> the idea is trying to approach the point, hoping the caught context
> have enough debug info (which did help us in solving bios/hardware
> bugs)
Instead of using a timer, can't you use the function tracer? That is,
have the value checked at *every* function call. You can easily add a
custom callback that gets called when every function is executed.
See https://docs.kernel.org/trace/ftrace-uses.html
>
> The working flow is simple: after suspected address is identified,
> start periodic timer polling it to catch if its value is changed to
> target 'magic' value, then halt the cpu (better limit to have only
> one cpu online), or panic, or print out system information, so that
> the error environment is frozen for further check , or let
> kexec/kdump to record the vmore, etc.
>
> All the settings are module parameters:
>
> watch_interval_ms: SW watchpoint check interval in ms
> paddr_dram_to_watch: Physical dram address to monitor.
> target_dram_val: Expected value at the dram address that triggers the watchpoint.
> paddr_mmio_to_watch: Physical mmio address to monitor. Must be 32-bit aligned.
> target_mmio_val: Expected value at the mmio address that triggers the watchpoint.
> panic_on_hit: Trigger kernel panic when watchpoint condition hits.
> hang_on_hit: halt the CPU (wait for HW debugger)
>
> This RFC is trying to show the idea and get feedback, and there are
> some todos:
> * merge the dram/mmio interface to auto detect it's dram or mmio
> * support runtime changing the address
> * move the starting point earlier in boot phase
> * currently is monitoring 'changing to a value', add support
> for 'changing from a value'
>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> +static void sw_watchpoint_timer_fn(struct timer_list *unused)
> +{
> + bool hit = false;
> +
> + if (vaddr_mmio && (*vaddr_mmio == target_mmio_val)) {
> + pr_info("MMIO [@0x%lx] hit the target value [0x%x]!\n",
> + paddr_mmio_to_watch, target_mmio_val);
> + hit = true;
> + }
> +
> + if (vaddr_dram && (*vaddr_dram == target_dram_val)) {
> + pr_info("DRAM [@0x%lx] hit the target value [0x%lx]!\n",
> + paddr_dram_to_watch, target_dram_val);
> + hit = true;
> + }
> +
> + if (hit) {
> + sys_info(0);
> +
> + /* Useful for attaching HW debugger */
> + if (hang_on_hit) {
> + pr_warn("Will dead loop on this CPU\n");
> + while (1);
> + }
> +
> + /* Could be used to trigger kexec/kdump */
> + if (panic_on_hit)
> + panic("SW watchpoint hit!");
> +
> + if (check_once)
> + return;
> + }
The above function would be:
static bool no_check;
static void sw_watchpoint_fn(unsigned long ip, unsigned long pip,
struct ftrace_ops *op, struct ftrace_regs *fregs)
{
bool hit = false;
int bit;
if (no_check)
return;
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
[ do the above code ]
if (check_once) {
// possibly call irq_work to unregister ftrace
no_check = true;
}
ftrace_test_recursion_unlock(bit);
}
static struct ftrace_ops sw_watchpoint_fops = {
.func = sw_watchpoint_fn;
};
[..]
> +
> + mod_timer(&sw_watchpoint_timer, jiffies + msecs_to_jiffies(watch_interval_ms));
> +}
> +
> +static int __init sw_watchpoint_timer_init(void)
> +{
> + if (paddr_mmio_to_watch) {
> + vaddr_mmio = ioremap(paddr_mmio_to_watch & PAGE_MASK, PAGE_SIZE);
> + if (!vaddr_mmio)
> + return -ENOMEM;
> +
> + vaddr_mmio += (paddr_mmio_to_watch % PAGE_SIZE) / 4;
> + }
> +
> + if (paddr_dram_to_watch) {
> + vaddr_dram = phys_to_virt(paddr_dram_to_watch);
> + if (!vaddr_dram)
> + return -ENOMEM;
> + }
> +
> + timer_setup(&sw_watchpoint_timer, sw_watchpoint_timer_fn, 0);
> + sw_watchpoint_timer.expires = jiffies + msecs_to_jiffies(watch_interval_ms);
> + add_timer(&sw_watchpoint_timer);
Instead of the above, have:
register_ftrace_function(&sw_watchpoint_fops);
-- Steve
> +
> + return 0;
> +}
> +core_initcall(sw_watchpoint_timer_init);
> +#endif
>
> base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d
prev parent reply other threads:[~2026-06-10 16:50 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-27 3:43 [PATCH] lib/sys_info: add a simple timer based memory corruption detector Feng Tang
2026-06-08 8:03 ` Petr Mladek
2026-06-08 9:53 ` Feng Tang
2026-06-10 16:50 ` Steven Rostedt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260610125013.4cd775e3@robin \
--to=rostedt@goodmis.org \
--cc=akpm@linux-foundation.org \
--cc=feng.tang@linux.alibaba.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=pmladek@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox