The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Feng Tang <feng.tang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Petr Mladek <pmladek@suse.com>,
	paulmck@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] lib/sys_info: add a simple timer based memory corruption detector
Date: Wed, 10 Jun 2026 12:50:13 -0400	[thread overview]
Message-ID: <20260610125013.4cd775e3@robin> (raw)
In-Reply-To: <20260527034324.51136-1-feng.tang@linux.alibaba.com>

On Wed, 27 May 2026 11:43:24 +0800
Feng Tang <feng.tang@linux.alibaba.com> wrote:

> During debugging some bios/hardware related nasty memory corruption
> issues, we found using periodic timer to monitor specific dram/mmio
> physical address is very useful for debugging, which acts like
> a basic software watchpoint.
> 
> For those bugs,  who (and when) change(corrupt) those dram or mmio
> register is hard to trace, and sometimes even hardware jtag debugger
> can't help (say the physical address watchpoint doesn't work).
> 
> The biggest shortcoming is it can never capture the exact point like
> a hardware watchpoint, no matter how small the timer interval is set,
> the idea is trying to approach the point, hoping the caught context
> have enough debug info (which did help us in solving bios/hardware
> bugs)

Instead of using a timer, can't you use the function tracer? That is,
have the value checked at *every* function call. You can easily add a
custom callback that gets called when every function is executed.

See https://docs.kernel.org/trace/ftrace-uses.html

> 
> The working flow is simple: after suspected address is identified,
> start periodic timer polling it to catch if its value is changed to
> target 'magic' value, then halt the cpu (better limit to have only
> one cpu online), or panic, or print out system information, so that
> the error environment is frozen for further check , or let
> kexec/kdump to record the vmore, etc.
> 
> All the settings are module parameters:
> 
>   watch_interval_ms: SW watchpoint check interval in ms
>   paddr_dram_to_watch: Physical dram address to monitor.
>   target_dram_val: Expected value at the dram address that triggers the watchpoint.
>   paddr_mmio_to_watch: Physical mmio address to monitor. Must be 32-bit aligned.
>   target_mmio_val: Expected value at the mmio address that triggers the watchpoint.
>   panic_on_hit: Trigger kernel panic when watchpoint condition hits.
>   hang_on_hit: halt the CPU (wait for HW debugger)
> 
> This RFC is trying to show the idea and get feedback, and there are
> some todos:
>   * merge the dram/mmio interface to auto detect it's dram or mmio
>   * support runtime changing the address
>   * move the starting point earlier in boot phase
>   * currently is monitoring 'changing to a value', add support
>     for 'changing from a value'
> 
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>


> +static void sw_watchpoint_timer_fn(struct timer_list *unused)
> +{
> +	bool hit = false;
> +
> +	if (vaddr_mmio && (*vaddr_mmio == target_mmio_val)) {
> +		pr_info("MMIO [@0x%lx] hit the target value [0x%x]!\n",
> +			paddr_mmio_to_watch, target_mmio_val);
> +		hit = true;
> +	}
> +
> +	if (vaddr_dram && (*vaddr_dram == target_dram_val)) {
> +		pr_info("DRAM [@0x%lx] hit the target value [0x%lx]!\n",
> +			paddr_dram_to_watch, target_dram_val);
> +		hit = true;
> +	}
> +
> +	if (hit) {
> +		sys_info(0);
> +
> +		/* Useful for attaching HW debugger */
> +		if (hang_on_hit) {
> +			pr_warn("Will dead loop on this CPU\n");
> +			while (1);
> +		}
> +
> +		/* Could be used to trigger kexec/kdump */
> +		if (panic_on_hit)
> +			panic("SW watchpoint hit!");
> +
> +		if (check_once)
> +			return;
> +	}

The above function would be:

static bool no_check;

static void sw_watchpoint_fn(unsigned long ip, unsigned long pip,
			     struct ftrace_ops *op, struct ftrace_regs *fregs)
{
	bool hit = false;
	int bit;

	if (no_check)
		return;

	bit = ftrace_test_recursion_trylock(ip, parent_ip);
	if (bit < 0)
		return;

	[ do the above code ]

		if (check_once) {
			// possibly call irq_work to unregister ftrace
			no_check = true;
		}

	ftrace_test_recursion_unlock(bit);
}

static struct ftrace_ops sw_watchpoint_fops = {
	.func = sw_watchpoint_fn;
};

[..]

> +
> +	mod_timer(&sw_watchpoint_timer, jiffies + msecs_to_jiffies(watch_interval_ms));
> +}
> +
> +static int __init sw_watchpoint_timer_init(void)
> +{
> +	if (paddr_mmio_to_watch) {
> +		vaddr_mmio = ioremap(paddr_mmio_to_watch & PAGE_MASK, PAGE_SIZE);
> +		if (!vaddr_mmio)
> +			return -ENOMEM;
> +
> +		vaddr_mmio += (paddr_mmio_to_watch % PAGE_SIZE) / 4;
> +	}
> +
> +	if (paddr_dram_to_watch) {
> +		vaddr_dram = phys_to_virt(paddr_dram_to_watch);
> +		if (!vaddr_dram)
> +			return -ENOMEM;
> +	}
> +
> +	timer_setup(&sw_watchpoint_timer, sw_watchpoint_timer_fn, 0);
> +	sw_watchpoint_timer.expires = jiffies + msecs_to_jiffies(watch_interval_ms);
> +	add_timer(&sw_watchpoint_timer);

Instead of the above, have:

	register_ftrace_function(&sw_watchpoint_fops);

-- Steve

> +
> +	return 0;
> +}
> +core_initcall(sw_watchpoint_timer_init);
> +#endif
> 
> base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d


      parent reply	other threads:[~2026-06-10 16:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-27  3:43 [PATCH] lib/sys_info: add a simple timer based memory corruption detector Feng Tang
2026-06-08  8:03 ` Petr Mladek
2026-06-08  9:53   ` Feng Tang
2026-06-10 16:50 ` Steven Rostedt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260610125013.4cd775e3@robin \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=feng.tang@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=pmladek@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox