public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: mrungta@google.com
Cc: Jonathan Corbet <corbet@lwn.net>,
	Jinchao Wang <wangjinchao600@gmail.com>,
	Yunhui Cui <cuiyunhui@bytedance.com>,
	Stephane Eranian <eranian@google.com>,
	Ian Rogers <irogers@google.com>, Li Huafei <lihuafei1@huawei.com>,
	Feng Tang <feng.tang@linux.alibaba.com>,
	Max Kellermann <max.kellermann@ionos.com>,
	Douglas Anderson <dianders@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
Date: Wed, 4 Mar 2026 15:44:17 +0100	[thread overview]
Message-ID: <aahFQaHxNFsoaxEb@pathway.suse.cz> (raw)
In-Reply-To: <20260212-hardlockup-watchdog-fixes-v1-1-745f1dce04c3@google.com>

On Thu 2026-02-12 14:12:10, Mayank Rungta via B4 Relay wrote:
> From: Mayank Rungta <mrungta@google.com>
> 
> Currently, arch_touch_nmi_watchdog() causes an early return that
> skips updating hrtimer_interrupts_saved. This leads to stale
> comparisons and delayed lockup detection.
> 
> Update the saved interrupt count before checking the touched flag
> to ensure detection timeliness.

IMHO, it is not that easy, see below.

So I am curious. Have you found this when debugging a particular
problem or just by reading the code, please?

> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -186,7 +186,21 @@ static void watchdog_hardlockup_kick(void)
>  
>  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  {
> +	bool is_hl;
>  	int hardlockup_all_cpu_backtrace;
> +	/*
> +	 * Check for a hardlockup by making sure the CPU's timer
> +	 * interrupt is incrementing. The timer interrupt should have
> +	 * fired multiple times before we overflow'd. If it hasn't
> +	 * then this is a good indication the cpu is stuck
> +	 *
> +	 * Purposely check this _before_ checking watchdog_hardlockup_touched
> +	 * so we make sure we still update the saved value of the interrupts.
> +	 * Without that we'll take an extra round through this function before
> +	 * we can detect a lockup.
> +	 */
> +
> +	is_hl = is_hardlockup(cpu);
>  
>  	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
>  		per_cpu(watchdog_hardlockup_touched, cpu) = false;

Hmm, this does not look correct to me.

1. First, let's agree on the meaning of "watchdog_hardlockup_touched".

   My understanding is that arch_touch_nmi_watchdog() is called by code
   which might block interrupts (timers) for a long time and wants to
   hide it.

   Blocking interrupts for too long is _bad_. In the ideal word,
   nobody should call arch_touch_nmi_watchdog() because we want
   to know about all sinners.

   In the real world, we allow to hide some sinners because
   they might produce "false" positives, see touch_nmi_watchdog()
   callers:

     + Most callers are printk() related.

       We might argue whether it is a false positive or not.

       The argument for "touching the watchdog" is that slow serial
       consoles might block IRQs for a long time. But they work as
       expected and can't do better.

       Also the stall is kind of expected in this case. We could
       confuse users and/or hide the original problem if the stall
       was reported.

       Note that there is a bigger problem with the legacy console
       drivers. printk() tries to emit them immediately. And the
       current console_lock() owner become responsible for flushing
       new messages added by other CPUs in parallel.

       It is better with the new NBCON console drivers which are
       offloaded to a kthread. Here, printk() tries to flush them directly
       only when called in an emergency mode (WARN, stall report)
       or in panic().

     + There are some other callers, for example, multi_stop_cpu(),
       or hv_do_rep_hypercall_ex(). IMHO, they create stalls on
       purpose.


2. Let's look at is_hardlockup() in detail:

    static bool is_hardlockup(unsigned int cpu)
    {
    	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
    
    	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
    		per_cpu(hrtimer_interrupts_missed, cpu)++;
    		if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
    			return true;
    
    		return false;
    	}
    
    	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
    	per_cpu(hrtimer_interrupts_missed, cpu) = 0;
    
    	return false;
    }

    If we call it when the watchdog was touched then
    (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)

        =>  per_cpu(hrtimer_interrupts_missed, cpu)++;

    is called even when watchdog was touched.

    As a result, we might report stall which should have been hidden,
    for example:

CPU0				   CPU1

 <NMI>
   watchdog_hardlockup_check() # passes
     is_hardlockup() # no
       hr_int_saved = hr_int;
       hr_int_missed = 0;
 </NMI>

  local_irq_save()
    printk()
      console_trylock()
      console_unlock()
        console_flush_all()

           touch_nmi_watchdog()

				   // Other CPUs print many messages,
				   // e.g. during boot when initializing a lot of HW
				   for (i=0; i<1000; i++) do
				       printk();

      <NMI>
        watchdog_hardlockup_check()
	  is_hardlockup() # yes
	    hr_int_missed++  # 1

          # skip because touched
      </NMI>

         touch_nmi_watchdog()

      <NMI>
        watchdog_hardlockup_check()
	  is_hardlockup() # yes
	    hr_int_missed++  # 2

          # skip because touched
      </NMI>

    ... repeat many times ...

  local_irq_restore()

    # this might normally trigger handling of pending IRQs
    # including the timers. But IMHO, it can be offloaded
    # to a kthread (at least on RT)

      <NMI>
        watchdog_hardlockup_check()
	  is_hardlockup() # yes
	    hr_int_missed++  # might be already 3, 4,...

          Report hardlockup even when all the "hr_int_missed"
	  values should have been ignored because of
	  touch_watchdog.

      </NMI>


A solution might be clearing "hrtimer_interrupts_missed"
when the watchdog was touched.

But honestly, I am not sure if this is worth the complexity.


Higher level look:
------------------

My understanding is that this patch has an effect only when
touch_nmi_watchdog() is called as frequently as
watchdog_hardlockup_check().

The original code gives the system more time to recover after
a known stall (touch_nmi_watchdog() called).

The new code is more eager to report a stall. It might be more prone
to report "false" positives.

IMHO, the root of the problem is that touch_nmi_watchdog() is
called too frequently. And this patch is rather dancing around
then fixing it.


Alternative:
------------

An alternative solution might to detect and report when too many
watchdog_hardlockup_check() calls are ignored because of
touch_nmi_watchdog().

It might help to find a mis-use of touch_nmi_watchdog(). The question
is what details should be reported in this case.

It should be optional because touch_nmi_watchdog() is supposed
to hide "well-known" sinners after all.



> @@ -195,13 +209,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  
>  	hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
>  					1 : sysctl_hardlockup_all_cpu_backtrace;
> -	/*
> -	 * Check for a hardlockup by making sure the CPU's timer
> -	 * interrupt is incrementing. The timer interrupt should have
> -	 * fired multiple times before we overflow'd. If it hasn't
> -	 * then this is a good indication the cpu is stuck
> -	 */
> -	if (is_hardlockup(cpu)) {
> +
> +	if (is_hl) {
>  		unsigned int this_cpu = smp_processor_id();
>  		unsigned long flags;
>  

Best Regards,
Petr

  parent reply	other threads:[~2026-03-04 14:44 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-04 14:44   ` Petr Mladek [this message]
2026-03-05  0:58     ` Doug Anderson
2026-03-05 11:27       ` Petr Mladek
2026-03-05 16:13         ` Doug Anderson
2026-03-09 13:33           ` Petr Mladek
2026-03-11  2:51             ` Mayank Rungta
2026-03-11 13:56               ` Petr Mladek
2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-05 12:33   ` Petr Mladek
2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson
2026-03-05 13:46   ` Petr Mladek
2026-03-05 16:45     ` Doug Anderson
2026-03-11 14:07       ` Petr Mladek
2026-03-12 21:02         ` Doug Anderson
2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aahFQaHxNFsoaxEb@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=cuiyunhui@bytedance.com \
    --cc=dianders@chromium.org \
    --cc=eranian@google.com \
    --cc=feng.tang@linux.alibaba.com \
    --cc=irogers@google.com \
    --cc=lihuafei1@huawei.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=max.kellermann@ionos.com \
    --cc=mrungta@google.com \
    --cc=wangjinchao600@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox