public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Doug Anderson <dianders@chromium.org>
Cc: mrungta@google.com, Jonathan Corbet <corbet@lwn.net>,
	Jinchao Wang <wangjinchao600@gmail.com>,
	Yunhui Cui <cuiyunhui@bytedance.com>,
	Stephane Eranian <eranian@google.com>,
	Ian Rogers <irogers@google.com>, Li Huafei <lihuafei1@huawei.com>,
	Feng Tang <feng.tang@linux.alibaba.com>,
	Max Kellermann <max.kellermann@ionos.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
Date: Thu, 5 Mar 2026 12:27:38 +0100	[thread overview]
Message-ID: <aaloqnsgdVp75xcV@pathway.suse.cz> (raw)
In-Reply-To: <CAD=FV=Vw7EQd1dDFx0Q0rHNgxRZfJCURRvysz=H9Vg+E-ae1Dg@mail.gmail.com>

On Wed 2026-03-04 16:58:35, Doug Anderson wrote:
> Hi,
> 
> On Wed, Mar 4, 2026 at 6:44 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Thu 2026-02-12 14:12:10, Mayank Rungta via B4 Relay wrote:
> > > From: Mayank Rungta <mrungta@google.com>
> > >
> > > Currently, arch_touch_nmi_watchdog() causes an early return that
> > > skips updating hrtimer_interrupts_saved. This leads to stale
> > > comparisons and delayed lockup detection.
> > >
> > > Update the saved interrupt count before checking the touched flag
> > > to ensure detection timeliness.
> >
> > IMHO, it is not that easy, see below.
> >
> > So I am curious. Have you found this when debugging a particular
> > problem or just by reading the code, please?
> 
> As I understand it, Mayank found this because the watchdog was
> reacting significantly more slowly than he expected. In his caes, he
> tracked it down to the fact that 8250 console driver has several calls
> to touch_nmi_watchdog(), including on every call to console_write().
> This caused the watchdog to take _much_ longer to fire.
> 
> On devices that fairly chatty w/ their output to the serial console,
> the console_write() is called almost constantly. That means that the
> watchdog is being touched constantly. If I remember Mayank tracked it
> down as:
> 
> * watchdog_hardlockup_check() called and saves counter (1000)
> * timer runs and updates the timer (1000 -> 1001)
> * touch_nmi_watchdog() is called
> * CPU locks up
> * 10 seconds pass
> * watchdog_hardlockup_check() called and saves counter (1001)
> * 10 seconds pass
> * watchdog_hardlockup_check() called and notices touch

Great visualization!

Nit: It seems to be actually the other way around:

 * 10 seconds pass
 * watchdog_hardlockup_check() called and notices touch and skips updating counters
 * 10 seconds pass
 * watchdog_hardlockup_check() called and saves counter (1001)

> * 10 seconds pass
> * watchdog_hardlockup_check() called and finally detects lockup
> 
> ...so we detect the lockup after 30 seconds, which is pretty bad. With
> his new scheme, we'd detect the lockup in 20 seconds.

Fair enough.

> > > @@ -186,7 +186,21 @@ static void watchdog_hardlockup_kick(void)
> > >
> > >  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > >  {
> > > +     bool is_hl;
> > >       int hardlockup_all_cpu_backtrace;
> > > +     /*
> > > +      * Check for a hardlockup by making sure the CPU's timer
> > > +      * interrupt is incrementing. The timer interrupt should have
> > > +      * fired multiple times before we overflow'd. If it hasn't
> > > +      * then this is a good indication the cpu is stuck
> > > +      *
> > > +      * Purposely check this _before_ checking watchdog_hardlockup_touched
> > > +      * so we make sure we still update the saved value of the interrupts.
> > > +      * Without that we'll take an extra round through this function before
> > > +      * we can detect a lockup.
> > > +      */
> > > +
> > > +     is_hl = is_hardlockup(cpu);
> > >
> > >       if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> > >               per_cpu(watchdog_hardlockup_touched, cpu) = false;
> >
> > Hmm, this does not look correct to me.
> >
> > 2. Let's look at is_hardlockup() in detail:
> >
> >     static bool is_hardlockup(unsigned int cpu)
> >     {
> >         int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> >
> >         if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> >                 per_cpu(hrtimer_interrupts_missed, cpu)++;
> >                 if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> >                         return true;
> >
> >                 return false;
> >         }
> >
> >         per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> >         per_cpu(hrtimer_interrupts_missed, cpu) = 0;
> >
> >         return false;
> >     }
> >
> >     If we call it when the watchdog was touched then
> >     (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> >
> >         =>  per_cpu(hrtimer_interrupts_missed, cpu)++;
> >
> >     is called even when watchdog was touched.
> >
> >     As a result, we might report stall which should have been hidden,
> >     for example:
> >
> > CPU0                               CPU1
> >
> >  <NMI>
> >    watchdog_hardlockup_check() # passes
> >      is_hardlockup() # no
> >        hr_int_saved = hr_int;
> >        hr_int_missed = 0;
> >  </NMI>
> >
> >   local_irq_save()
> >     printk()
> >       console_trylock()
> >       console_unlock()
> >         console_flush_all()
> >
> >            touch_nmi_watchdog()
> >
> >                                    // Other CPUs print many messages,
> >                                    // e.g. during boot when initializing a lot of HW
> >                                    for (i=0; i<1000; i++) do
> >                                        printk();
> >
> >       <NMI>
> >         watchdog_hardlockup_check()
> >           is_hardlockup() # yes
> >             hr_int_missed++  # 1
> >
> >           # skip because touched
> >       </NMI>
> >
> >          touch_nmi_watchdog()
> >
> >       <NMI>
> >         watchdog_hardlockup_check()
> >           is_hardlockup() # yes
> >             hr_int_missed++  # 2
> >
> >           # skip because touched
> >       </NMI>
> >
> >     ... repeat many times ...
> >
> >   local_irq_restore()
> >
> >     # this might normally trigger handling of pending IRQs
> >     # including the timers. But IMHO, it can be offloaded
> >     # to a kthread (at least on RT)
> >
> >       <NMI>
> >         watchdog_hardlockup_check()
> >           is_hardlockup() # yes
> >             hr_int_missed++  # might be already 3, 4,...
> >
> >           Report hardlockup even when all the "hr_int_missed"
> >           values should have been ignored because of
> >           touch_watchdog.
> >
> >       </NMI>
> >
> >
> > A solution might be clearing "hrtimer_interrupts_missed"
> > when the watchdog was touched.
> 
> Great catch! When I was thinking about Mayank's patches, I thought
> about them independently. ...and I believe that independently, each
> patch is fine. The problem is that together they have exactly the
> problem you indicated.

Heh, I was not aware that "hrtimer_interrupts_missed" was added by
the 3rd patch. I looked at the final code with all patches applied ;-)

> Clearing "hrtimer_interrupts_missed" seems like the right solution in
> Mayank's patch #3.

OK, this 1st patch moves "is_hardlockup()" up because it has some
"side effects". It adds a 4-line comment to explain it.
But it still causes problems in the 3rd patch.

A better solution might be to separate the check and update/reset
of the values. Something like (on top of this patchset, just
compilation tested):

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 30199eaeb5d7..4d0851f0f412 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -167,18 +167,10 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
 	per_cpu(watchdog_hardlockup_touched, cpu) = true;
 }
 
-static bool is_hardlockup(unsigned int cpu)
+static void watchdog_hardlockup_update_reset(unsigned int cpu)
 {
 	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
 
-	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
-		per_cpu(hrtimer_interrupts_missed, cpu)++;
-		if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
-			return true;
-
-		return false;
-	}
-
 	/*
 	 * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
 	 * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
@@ -186,8 +178,20 @@ static bool is_hardlockup(unsigned int cpu)
 	 */
 	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
 	per_cpu(hrtimer_interrupts_missed, cpu) = 0;
+}
 
-	return false;
+static bool is_hardlockup(unsigned int cpu)
+{
+	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
+
+	if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint)
+		return false;
+
+	per_cpu(hrtimer_interrupts_missed, cpu)++;
+	if (per_cpu(hrtimer_interrupts_missed, cpu) < watchdog_hardlockup_miss_thresh)
+		return false;
+
+	return true;
 }
 
 static void watchdog_hardlockup_kick(void)
@@ -200,23 +204,10 @@ static void watchdog_hardlockup_kick(void)
 
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
-	bool is_hl;
 	int hardlockup_all_cpu_backtrace;
-	/*
-	 * Check for a hardlockup by making sure the CPU's timer
-	 * interrupt is incrementing. The timer interrupt should have
-	 * fired multiple times before we overflow'd. If it hasn't
-	 * then this is a good indication the cpu is stuck
-	 *
-	 * Purposely check this _before_ checking watchdog_hardlockup_touched
-	 * so we make sure we still update the saved value of the interrupts.
-	 * Without that we'll take an extra round through this function before
-	 * we can detect a lockup.
-	 */
-
-	is_hl = is_hardlockup(cpu);
 
 	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
+		watchdog_hardlockup_update_reset(cpu);
 		per_cpu(watchdog_hardlockup_touched, cpu) = false;
 		return;
 	}
@@ -224,7 +215,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 	hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
 					1 : sysctl_hardlockup_all_cpu_backtrace;
 
-	if (is_hl) {
+	/*
+	 * Check for a hardlockup by making sure the CPU's timer
+	 * interrupt is incrementing. The timer interrupt should have
+	 * fired multiple times before we overflow'd. If it hasn't
+	 * then this is a good indication the cpu is stuck
+	 */
+	if (is_hardlockup(cpu)) {
 		unsigned int this_cpu = smp_processor_id();
 		unsigned long flags;
 
@@ -290,6 +287,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 
 		per_cpu(watchdog_hardlockup_warned, cpu) = true;
 	} else {
+		watchdog_hardlockup_update_reset(cpu);
 		per_cpu(watchdog_hardlockup_warned, cpu) = false;
 	}
 }

> > But honestly, I am not sure if this is worth the complexity.
> >
> >
> > Higher level look:
> > ------------------
> >
> > My understanding is that this patch has an effect only when
> > touch_nmi_watchdog() is called as frequently as
> > watchdog_hardlockup_check().
> >
> > The original code gives the system more time to recover after
> > a known stall (touch_nmi_watchdog() called).
> >
> > The new code is more eager to report a stall. It might be more prone
> > to report "false" positives.
> >
> > IMHO, the root of the problem is that touch_nmi_watchdog() is
> > called too frequently. And this patch is rather dancing around
> > then fixing it.
> 
> I don't think it's really any more likely to report false positives
> after the bug you pointed out is fixed. The old watchdog was just too
> conservative. With Mayank's proposal I think calling
> touch_nmi_watchdog() should reset the watchdog the same amount as
> letting the hrtimer run once and that seems like a very reasonable
> interpretation.

Fair enough.

> > Alternative:
> > ------------
> >
> > An alternative solution might to detect and report when too many
> > watchdog_hardlockup_check() calls are ignored because of
> > touch_nmi_watchdog().
> >
> > It might help to find a mis-use of touch_nmi_watchdog(). The question
> > is what details should be reported in this case.
> >
> > It should be optional because touch_nmi_watchdog() is supposed
> > to hide "well-known" sinners after all.
> 
> Hmmmm. I certainly support trying to reduce the number of places that
> call touch_nmi_watchdog(), but at the same time I don't think Mayank's
> patch is "dancing around" the problem. IMO considering the
> touch_nmi_watchdog() to be "pretend a timer interrupt fired" is the
> intuitive way one would think the call should work. The fact that the
> code gave an entire extra 10 seconds before the watchdog could be
> caught just feels like a bug that should be fixed.
> 
> For the 8250 driver in particular, it looks like the
> touch_nmi_watchdog() was removed from serial8250_console_write() as
> part of nbcon, but then that got reverted. That would still leave two
> other touch_nmi_watchdog() calls in that driver...

Sigh, it seems that touch_nmi_watchdog() can't be removed easily.

Best Regards,
Petr

  reply	other threads:[~2026-03-05 11:27 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-04 14:44   ` Petr Mladek
2026-03-05  0:58     ` Doug Anderson
2026-03-05 11:27       ` Petr Mladek [this message]
2026-03-05 16:13         ` Doug Anderson
2026-03-09 13:33           ` Petr Mladek
2026-03-11  2:51             ` Mayank Rungta
2026-03-11 13:56               ` Petr Mladek
2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-05 12:33   ` Petr Mladek
2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson
2026-03-05 13:46   ` Petr Mladek
2026-03-05 16:45     ` Doug Anderson
2026-03-11 14:07       ` Petr Mladek
2026-03-12 21:02         ` Doug Anderson
2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaloqnsgdVp75xcV@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=cuiyunhui@bytedance.com \
    --cc=dianders@chromium.org \
    --cc=eranian@google.com \
    --cc=feng.tang@linux.alibaba.com \
    --cc=irogers@google.com \
    --cc=lihuafei1@huawei.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=max.kellermann@ionos.com \
    --cc=mrungta@google.com \
    --cc=wangjinchao600@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox