All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: mrungta@google.com
Cc: Jonathan Corbet <corbet@lwn.net>,
	Jinchao Wang <wangjinchao600@gmail.com>,
	Yunhui Cui <cuiyunhui@bytedance.com>,
	Stephane Eranian <eranian@google.com>,
	Ian Rogers <irogers@google.com>, Li Huafei <lihuafei1@huawei.com>,
	Feng Tang <feng.tang@linux.alibaba.com>,
	Max Kellermann <max.kellermann@ionos.com>,
	Douglas Anderson <dianders@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
Date: Wed, 4 Mar 2026 15:44:17 +0100	[thread overview]
Message-ID: <aahFQaHxNFsoaxEb@pathway.suse.cz> (raw)
In-Reply-To: <20260212-hardlockup-watchdog-fixes-v1-1-745f1dce04c3@google.com>

On Thu 2026-02-12 14:12:10, Mayank Rungta via B4 Relay wrote:
> From: Mayank Rungta <mrungta@google.com>
> 
> Currently, arch_touch_nmi_watchdog() causes an early return that
> skips updating hrtimer_interrupts_saved. This leads to stale
> comparisons and delayed lockup detection.
> 
> Update the saved interrupt count before checking the touched flag
> to ensure detection timeliness.

IMHO, it is not that easy, see below.

So I am curious. Have you found this when debugging a particular
problem or just by reading the code, please?

> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -186,7 +186,21 @@ static void watchdog_hardlockup_kick(void)
>  
>  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  {
> +	bool is_hl;
>  	int hardlockup_all_cpu_backtrace;
> +	/*
> +	 * Check for a hardlockup by making sure the CPU's timer
> +	 * interrupt is incrementing. The timer interrupt should have
> +	 * fired multiple times before we overflow'd. If it hasn't
> +	 * then this is a good indication the cpu is stuck
> +	 *
> +	 * Purposely check this _before_ checking watchdog_hardlockup_touched
> +	 * so we make sure we still update the saved value of the interrupts.
> +	 * Without that we'll take an extra round through this function before
> +	 * we can detect a lockup.
> +	 */
> +
> +	is_hl = is_hardlockup(cpu);
>  
>  	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
>  		per_cpu(watchdog_hardlockup_touched, cpu) = false;

Hmm, this does not look correct to me.

1. First, let's agree on the meaning of "watchdog_hardlockup_touched".

   My understanding is that arch_touch_nmi_watchdog() is called by code
   which might block interrupts (timers) for a long time and wants to
   hide it.

   Blocking interrupts for too long is _bad_. In the ideal word,
   nobody should call arch_touch_nmi_watchdog() because we want
   to know about all sinners.

   In the real world, we allow to hide some sinners because
   they might produce "false" positives, see touch_nmi_watchdog()
   callers:

     + Most callers are printk() related.

       We might argue whether it is a false positive or not.

       The argument for "touching the watchdog" is that slow serial
       consoles might block IRQs for a long time. But they work as
       expected and can't do better.

       Also the stall is kind of expected in this case. We could
       confuse users and/or hide the original problem if the stall
       was reported.

       Note that there is a bigger problem with the legacy console
       drivers. printk() tries to emit them immediately. And the
       current console_lock() owner become responsible for flushing
       new messages added by other CPUs in parallel.

       It is better with the new NBCON console drivers which are
       offloaded to a kthread. Here, printk() tries to flush them directly
       only when called in an emergency mode (WARN, stall report)
       or in panic().

     + There are some other callers, for example, multi_stop_cpu(),
       or hv_do_rep_hypercall_ex(). IMHO, they create stalls on
       purpose.


2. Let's look at is_hardlockup() in detail:

    static bool is_hardlockup(unsigned int cpu)
    {
    	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
    
    	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
    		per_cpu(hrtimer_interrupts_missed, cpu)++;
    		if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
    			return true;
    
    		return false;
    	}
    
    	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
    	per_cpu(hrtimer_interrupts_missed, cpu) = 0;
    
    	return false;
    }

    If we call it when the watchdog was touched then
    (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)

        =>  per_cpu(hrtimer_interrupts_missed, cpu)++;

    is called even when watchdog was touched.

    As a result, we might report stall which should have been hidden,
    for example:

CPU0				   CPU1

 <NMI>
   watchdog_hardlockup_check() # passes
     is_hardlockup() # no
       hr_int_saved = hr_int;
       hr_int_missed = 0;
 </NMI>

  local_irq_save()
    printk()
      console_trylock()
      console_unlock()
        console_flush_all()

           touch_nmi_watchdog()

				   // Other CPUs print many messages,
				   // e.g. during boot when initializing a lot of HW
				   for (i=0; i<1000; i++) do
				       printk();

      <NMI>
        watchdog_hardlockup_check()
	  is_hardlockup() # yes
	    hr_int_missed++  # 1

          # skip because touched
      </NMI>

         touch_nmi_watchdog()

      <NMI>
        watchdog_hardlockup_check()
	  is_hardlockup() # yes
	    hr_int_missed++  # 2

          # skip because touched
      </NMI>

    ... repeat many times ...

  local_irq_restore()

    # this might normally trigger handling of pending IRQs
    # including the timers. But IMHO, it can be offloaded
    # to a kthread (at least on RT)

      <NMI>
        watchdog_hardlockup_check()
	  is_hardlockup() # yes
	    hr_int_missed++  # might be already 3, 4,...

          Report hardlockup even when all the "hr_int_missed"
	  values should have been ignored because of
	  touch_watchdog.

      </NMI>


A solution might be clearing "hrtimer_interrupts_missed"
when the watchdog was touched.

But honestly, I am not sure if this is worth the complexity.


Higher level look:
------------------

My understanding is that this patch has an effect only when
touch_nmi_watchdog() is called as frequently as
watchdog_hardlockup_check().

The original code gives the system more time to recover after
a known stall (touch_nmi_watchdog() called).

The new code is more eager to report a stall. It might be more prone
to report "false" positives.

IMHO, the root of the problem is that touch_nmi_watchdog() is
called too frequently. And this patch is rather dancing around
then fixing it.


Alternative:
------------

An alternative solution might to detect and report when too many
watchdog_hardlockup_check() calls are ignored because of
touch_nmi_watchdog().

It might help to find a mis-use of touch_nmi_watchdog(). The question
is what details should be reported in this case.

It should be optional because touch_nmi_watchdog() is supposed
to hide "well-known" sinners after all.



> @@ -195,13 +209,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  
>  	hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
>  					1 : sysctl_hardlockup_all_cpu_backtrace;
> -	/*
> -	 * Check for a hardlockup by making sure the CPU's timer
> -	 * interrupt is incrementing. The timer interrupt should have
> -	 * fired multiple times before we overflow'd. If it hasn't
> -	 * then this is a good indication the cpu is stuck
> -	 */
> -	if (is_hardlockup(cpu)) {
> +
> +	if (is_hl) {
>  		unsigned int this_cpu = smp_processor_id();
>  		unsigned long flags;
>  

Best Regards,
Petr

  parent reply	other threads:[~2026-03-04 14:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta
2026-02-12 21:12 ` Mayank Rungta via B4 Relay
2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta
2026-02-12 21:12   ` Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-04 14:44   ` Petr Mladek [this message]
2026-03-05  0:58     ` Doug Anderson
2026-03-05 11:27       ` Petr Mladek
2026-03-05 16:13         ` Doug Anderson
2026-03-09 13:33           ` Petr Mladek
2026-03-11  2:51             ` Mayank Rungta
2026-03-11 13:56               ` Petr Mladek
2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta
2026-02-12 21:12   ` Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-05 12:33   ` Petr Mladek
2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta
2026-02-12 21:12   ` Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson
2026-03-05 13:46   ` Petr Mladek
2026-03-05 16:45     ` Doug Anderson
2026-03-11 14:07       ` Petr Mladek
2026-03-12 21:02         ` Doug Anderson
2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta
2026-02-12 21:12   ` Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aahFQaHxNFsoaxEb@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=cuiyunhui@bytedance.com \
    --cc=dianders@chromium.org \
    --cc=eranian@google.com \
    --cc=feng.tang@linux.alibaba.com \
    --cc=irogers@google.com \
    --cc=lihuafei1@huawei.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=max.kellermann@ionos.com \
    --cc=mrungta@google.com \
    --cc=wangjinchao600@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.