All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Feng Tang <feng.tang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Petr Mladek <pmladek@suse.com>,
	paulmck@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] lib/sys_info: add a simple timer based memory corruption detector
Date: Wed, 10 Jun 2026 12:50:13 -0400	[thread overview]
Message-ID: <20260610125013.4cd775e3@robin> (raw)
In-Reply-To: <20260527034324.51136-1-feng.tang@linux.alibaba.com>

On Wed, 27 May 2026 11:43:24 +0800
Feng Tang <feng.tang@linux.alibaba.com> wrote:

> During debugging some bios/hardware related nasty memory corruption
> issues, we found using periodic timer to monitor specific dram/mmio
> physical address is very useful for debugging, which acts like
> a basic software watchpoint.
> 
> For those bugs,  who (and when) change(corrupt) those dram or mmio
> register is hard to trace, and sometimes even hardware jtag debugger
> can't help (say the physical address watchpoint doesn't work).
> 
> The biggest shortcoming is it can never capture the exact point like
> a hardware watchpoint, no matter how small the timer interval is set,
> the idea is trying to approach the point, hoping the caught context
> have enough debug info (which did help us in solving bios/hardware
> bugs)

Instead of using a timer, can't you use the function tracer? That is,
have the value checked at *every* function call. You can easily add a
custom callback that gets called when every function is executed.

See https://docs.kernel.org/trace/ftrace-uses.html

> 
> The working flow is simple: after suspected address is identified,
> start periodic timer polling it to catch if its value is changed to
> target 'magic' value, then halt the cpu (better limit to have only
> one cpu online), or panic, or print out system information, so that
> the error environment is frozen for further check , or let
> kexec/kdump to record the vmore, etc.
> 
> All the settings are module parameters:
> 
>   watch_interval_ms: SW watchpoint check interval in ms
>   paddr_dram_to_watch: Physical dram address to monitor.
>   target_dram_val: Expected value at the dram address that triggers the watchpoint.
>   paddr_mmio_to_watch: Physical mmio address to monitor. Must be 32-bit aligned.
>   target_mmio_val: Expected value at the mmio address that triggers the watchpoint.
>   panic_on_hit: Trigger kernel panic when watchpoint condition hits.
>   hang_on_hit: halt the CPU (wait for HW debugger)
> 
> This RFC is trying to show the idea and get feedback, and there are
> some todos:
>   * merge the dram/mmio interface to auto detect it's dram or mmio
>   * support runtime changing the address
>   * move the starting point earlier in boot phase
>   * currently is monitoring 'changing to a value', add support
>     for 'changing from a value'
> 
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>


> +static void sw_watchpoint_timer_fn(struct timer_list *unused)
> +{
> +	bool hit = false;
> +
> +	if (vaddr_mmio && (*vaddr_mmio == target_mmio_val)) {
> +		pr_info("MMIO [@0x%lx] hit the target value [0x%x]!\n",
> +			paddr_mmio_to_watch, target_mmio_val);
> +		hit = true;
> +	}
> +
> +	if (vaddr_dram && (*vaddr_dram == target_dram_val)) {
> +		pr_info("DRAM [@0x%lx] hit the target value [0x%lx]!\n",
> +			paddr_dram_to_watch, target_dram_val);
> +		hit = true;
> +	}
> +
> +	if (hit) {
> +		sys_info(0);
> +
> +		/* Useful for attaching HW debugger */
> +		if (hang_on_hit) {
> +			pr_warn("Will dead loop on this CPU\n");
> +			while (1);
> +		}
> +
> +		/* Could be used to trigger kexec/kdump */
> +		if (panic_on_hit)
> +			panic("SW watchpoint hit!");
> +
> +		if (check_once)
> +			return;
> +	}

The above function would be:

static bool no_check;

static void sw_watchpoint_fn(unsigned long ip, unsigned long pip,
			     struct ftrace_ops *op, struct ftrace_regs *fregs)
{
	bool hit = false;
	int bit;

	if (no_check)
		return;

	bit = ftrace_test_recursion_trylock(ip, parent_ip);
	if (bit < 0)
		return;

	[ do the above code ]

		if (check_once) {
			// possibly call irq_work to unregister ftrace
			no_check = true;
		}

	ftrace_test_recursion_unlock(bit);
}

static struct ftrace_ops sw_watchpoint_fops = {
	.func = sw_watchpoint_fn;
};

[..]

> +
> +	mod_timer(&sw_watchpoint_timer, jiffies + msecs_to_jiffies(watch_interval_ms));
> +}
> +
> +static int __init sw_watchpoint_timer_init(void)
> +{
> +	if (paddr_mmio_to_watch) {
> +		vaddr_mmio = ioremap(paddr_mmio_to_watch & PAGE_MASK, PAGE_SIZE);
> +		if (!vaddr_mmio)
> +			return -ENOMEM;
> +
> +		vaddr_mmio += (paddr_mmio_to_watch % PAGE_SIZE) / 4;
> +	}
> +
> +	if (paddr_dram_to_watch) {
> +		vaddr_dram = phys_to_virt(paddr_dram_to_watch);
> +		if (!vaddr_dram)
> +			return -ENOMEM;
> +	}
> +
> +	timer_setup(&sw_watchpoint_timer, sw_watchpoint_timer_fn, 0);
> +	sw_watchpoint_timer.expires = jiffies + msecs_to_jiffies(watch_interval_ms);
> +	add_timer(&sw_watchpoint_timer);

Instead of the above, have:

	register_ftrace_function(&sw_watchpoint_fops);

-- Steve

> +
> +	return 0;
> +}
> +core_initcall(sw_watchpoint_timer_init);
> +#endif
> 
> base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d


      parent reply	other threads:[~2026-06-10 16:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-27  3:43 [PATCH] lib/sys_info: add a simple timer based memory corruption detector Feng Tang
2026-06-08  8:03 ` Petr Mladek
2026-06-08  9:53   ` Feng Tang
2026-06-10 16:50 ` Steven Rostedt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260610125013.4cd775e3@robin \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=feng.tang@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=pmladek@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.