From: Petr Mladek <pmladek@suse.com>
To: Doug Anderson <dianders@chromium.org>
Cc: mrungta@google.com, Jonathan Corbet <corbet@lwn.net>,
Jinchao Wang <wangjinchao600@gmail.com>,
Yunhui Cui <cuiyunhui@bytedance.com>,
Stephane Eranian <eranian@google.com>,
Ian Rogers <irogers@google.com>, Li Huafei <lihuafei1@huawei.com>,
Feng Tang <feng.tang@linux.alibaba.com>,
Max Kellermann <max.kellermann@ionos.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
Date: Thu, 5 Mar 2026 12:27:38 +0100 [thread overview]
Message-ID: <aaloqnsgdVp75xcV@pathway.suse.cz> (raw)
In-Reply-To: <CAD=FV=Vw7EQd1dDFx0Q0rHNgxRZfJCURRvysz=H9Vg+E-ae1Dg@mail.gmail.com>
On Wed 2026-03-04 16:58:35, Doug Anderson wrote:
> Hi,
>
> On Wed, Mar 4, 2026 at 6:44 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Thu 2026-02-12 14:12:10, Mayank Rungta via B4 Relay wrote:
> > > From: Mayank Rungta <mrungta@google.com>
> > >
> > > Currently, arch_touch_nmi_watchdog() causes an early return that
> > > skips updating hrtimer_interrupts_saved. This leads to stale
> > > comparisons and delayed lockup detection.
> > >
> > > Update the saved interrupt count before checking the touched flag
> > > to ensure detection timeliness.
> >
> > IMHO, it is not that easy, see below.
> >
> > So I am curious. Have you found this when debugging a particular
> > problem or just by reading the code, please?
>
> As I understand it, Mayank found this because the watchdog was
> reacting significantly more slowly than he expected. In his caes, he
> tracked it down to the fact that 8250 console driver has several calls
> to touch_nmi_watchdog(), including on every call to console_write().
> This caused the watchdog to take _much_ longer to fire.
>
> On devices that fairly chatty w/ their output to the serial console,
> the console_write() is called almost constantly. That means that the
> watchdog is being touched constantly. If I remember Mayank tracked it
> down as:
>
> * watchdog_hardlockup_check() called and saves counter (1000)
> * timer runs and updates the timer (1000 -> 1001)
> * touch_nmi_watchdog() is called
> * CPU locks up
> * 10 seconds pass
> * watchdog_hardlockup_check() called and saves counter (1001)
> * 10 seconds pass
> * watchdog_hardlockup_check() called and notices touch
Great visualization!
Nit: It seems to be actually the other way around:
* 10 seconds pass
* watchdog_hardlockup_check() called and notices touch and skips updating counters
* 10 seconds pass
* watchdog_hardlockup_check() called and saves counter (1001)
> * 10 seconds pass
> * watchdog_hardlockup_check() called and finally detects lockup
>
> ...so we detect the lockup after 30 seconds, which is pretty bad. With
> his new scheme, we'd detect the lockup in 20 seconds.
Fair enough.
> > > @@ -186,7 +186,21 @@ static void watchdog_hardlockup_kick(void)
> > >
> > > void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > {
> > > + bool is_hl;
> > > int hardlockup_all_cpu_backtrace;
> > > + /*
> > > + * Check for a hardlockup by making sure the CPU's timer
> > > + * interrupt is incrementing. The timer interrupt should have
> > > + * fired multiple times before we overflow'd. If it hasn't
> > > + * then this is a good indication the cpu is stuck
> > > + *
> > > + * Purposely check this _before_ checking watchdog_hardlockup_touched
> > > + * so we make sure we still update the saved value of the interrupts.
> > > + * Without that we'll take an extra round through this function before
> > > + * we can detect a lockup.
> > > + */
> > > +
> > > + is_hl = is_hardlockup(cpu);
> > >
> > > if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> > > per_cpu(watchdog_hardlockup_touched, cpu) = false;
> >
> > Hmm, this does not look correct to me.
> >
> > 2. Let's look at is_hardlockup() in detail:
> >
> > static bool is_hardlockup(unsigned int cpu)
> > {
> > int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> >
> > if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > per_cpu(hrtimer_interrupts_missed, cpu)++;
> > if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> > return true;
> >
> > return false;
> > }
> >
> > per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> > per_cpu(hrtimer_interrupts_missed, cpu) = 0;
> >
> > return false;
> > }
> >
> > If we call it when the watchdog was touched then
> > (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> >
> > => per_cpu(hrtimer_interrupts_missed, cpu)++;
> >
> > is called even when watchdog was touched.
> >
> > As a result, we might report stall which should have been hidden,
> > for example:
> >
> > CPU0 CPU1
> >
> > <NMI>
> > watchdog_hardlockup_check() # passes
> > is_hardlockup() # no
> > hr_int_saved = hr_int;
> > hr_int_missed = 0;
> > </NMI>
> >
> > local_irq_save()
> > printk()
> > console_trylock()
> > console_unlock()
> > console_flush_all()
> >
> > touch_nmi_watchdog()
> >
> > // Other CPUs print many messages,
> > // e.g. during boot when initializing a lot of HW
> > for (i=0; i<1000; i++) do
> > printk();
> >
> > <NMI>
> > watchdog_hardlockup_check()
> > is_hardlockup() # yes
> > hr_int_missed++ # 1
> >
> > # skip because touched
> > </NMI>
> >
> > touch_nmi_watchdog()
> >
> > <NMI>
> > watchdog_hardlockup_check()
> > is_hardlockup() # yes
> > hr_int_missed++ # 2
> >
> > # skip because touched
> > </NMI>
> >
> > ... repeat many times ...
> >
> > local_irq_restore()
> >
> > # this might normally trigger handling of pending IRQs
> > # including the timers. But IMHO, it can be offloaded
> > # to a kthread (at least on RT)
> >
> > <NMI>
> > watchdog_hardlockup_check()
> > is_hardlockup() # yes
> > hr_int_missed++ # might be already 3, 4,...
> >
> > Report hardlockup even when all the "hr_int_missed"
> > values should have been ignored because of
> > touch_watchdog.
> >
> > </NMI>
> >
> >
> > A solution might be clearing "hrtimer_interrupts_missed"
> > when the watchdog was touched.
>
> Great catch! When I was thinking about Mayank's patches, I thought
> about them independently. ...and I believe that independently, each
> patch is fine. The problem is that together they have exactly the
> problem you indicated.
Heh, I was not aware that "hrtimer_interrupts_missed" was added by
the 3rd patch. I looked at the final code with all patches applied ;-)
> Clearing "hrtimer_interrupts_missed" seems like the right solution in
> Mayank's patch #3.
OK, this 1st patch moves "is_hardlockup()" up because it has some
"side effects". It adds a 4-line comment to explain it.
But it still causes problems in the 3rd patch.
A better solution might be to separate the check and update/reset
of the values. Something like (on top of this patchset, just
compilation tested):
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 30199eaeb5d7..4d0851f0f412 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -167,18 +167,10 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
per_cpu(watchdog_hardlockup_touched, cpu) = true;
}
-static bool is_hardlockup(unsigned int cpu)
+static void watchdog_hardlockup_update_reset(unsigned int cpu)
{
int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
- if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
- per_cpu(hrtimer_interrupts_missed, cpu)++;
- if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
- return true;
-
- return false;
- }
-
/*
* NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
* for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
@@ -186,8 +178,20 @@ static bool is_hardlockup(unsigned int cpu)
*/
per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
per_cpu(hrtimer_interrupts_missed, cpu) = 0;
+}
- return false;
+static bool is_hardlockup(unsigned int cpu)
+{
+ int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
+
+ if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint)
+ return false;
+
+ per_cpu(hrtimer_interrupts_missed, cpu)++;
+ if (per_cpu(hrtimer_interrupts_missed, cpu) < watchdog_hardlockup_miss_thresh)
+ return false;
+
+ return true;
}
static void watchdog_hardlockup_kick(void)
@@ -200,23 +204,10 @@ static void watchdog_hardlockup_kick(void)
void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
{
- bool is_hl;
int hardlockup_all_cpu_backtrace;
- /*
- * Check for a hardlockup by making sure the CPU's timer
- * interrupt is incrementing. The timer interrupt should have
- * fired multiple times before we overflow'd. If it hasn't
- * then this is a good indication the cpu is stuck
- *
- * Purposely check this _before_ checking watchdog_hardlockup_touched
- * so we make sure we still update the saved value of the interrupts.
- * Without that we'll take an extra round through this function before
- * we can detect a lockup.
- */
-
- is_hl = is_hardlockup(cpu);
if (per_cpu(watchdog_hardlockup_touched, cpu)) {
+ watchdog_hardlockup_update_reset(cpu);
per_cpu(watchdog_hardlockup_touched, cpu) = false;
return;
}
@@ -224,7 +215,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
1 : sysctl_hardlockup_all_cpu_backtrace;
- if (is_hl) {
+ /*
+ * Check for a hardlockup by making sure the CPU's timer
+ * interrupt is incrementing. The timer interrupt should have
+ * fired multiple times before we overflow'd. If it hasn't
+ * then this is a good indication the cpu is stuck
+ */
+ if (is_hardlockup(cpu)) {
unsigned int this_cpu = smp_processor_id();
unsigned long flags;
@@ -290,6 +287,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
per_cpu(watchdog_hardlockup_warned, cpu) = true;
} else {
+ watchdog_hardlockup_update_reset(cpu);
per_cpu(watchdog_hardlockup_warned, cpu) = false;
}
}
> > But honestly, I am not sure if this is worth the complexity.
> >
> >
> > Higher level look:
> > ------------------
> >
> > My understanding is that this patch has an effect only when
> > touch_nmi_watchdog() is called as frequently as
> > watchdog_hardlockup_check().
> >
> > The original code gives the system more time to recover after
> > a known stall (touch_nmi_watchdog() called).
> >
> > The new code is more eager to report a stall. It might be more prone
> > to report "false" positives.
> >
> > IMHO, the root of the problem is that touch_nmi_watchdog() is
> > called too frequently. And this patch is rather dancing around
> > then fixing it.
>
> I don't think it's really any more likely to report false positives
> after the bug you pointed out is fixed. The old watchdog was just too
> conservative. With Mayank's proposal I think calling
> touch_nmi_watchdog() should reset the watchdog the same amount as
> letting the hrtimer run once and that seems like a very reasonable
> interpretation.
Fair enough.
> > Alternative:
> > ------------
> >
> > An alternative solution might to detect and report when too many
> > watchdog_hardlockup_check() calls are ignored because of
> > touch_nmi_watchdog().
> >
> > It might help to find a mis-use of touch_nmi_watchdog(). The question
> > is what details should be reported in this case.
> >
> > It should be optional because touch_nmi_watchdog() is supposed
> > to hide "well-known" sinners after all.
>
> Hmmmm. I certainly support trying to reduce the number of places that
> call touch_nmi_watchdog(), but at the same time I don't think Mayank's
> patch is "dancing around" the problem. IMO considering the
> touch_nmi_watchdog() to be "pretend a timer interrupt fired" is the
> intuitive way one would think the call should work. The fact that the
> code gave an entire extra 10 seconds before the watchdog could be
> caught just feels like a bug that should be fixed.
>
> For the 8250 driver in particular, it looks like the
> touch_nmi_watchdog() was removed from serial8250_console_write() as
> part of nbcon, but then that got reverted. That would still leave two
> other touch_nmi_watchdog() calls in that driver...
Sigh, it seems that touch_nmi_watchdog() can't be removed easily.
Best Regards,
Petr
next prev parent reply other threads:[~2026-03-05 11:27 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta
2026-02-12 21:12 ` Mayank Rungta via B4 Relay
2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta
2026-02-12 21:12 ` Mayank Rungta via B4 Relay
2026-02-13 16:29 ` Doug Anderson
2026-03-04 14:44 ` Petr Mladek
2026-03-05 0:58 ` Doug Anderson
2026-03-05 11:27 ` Petr Mladek [this message]
2026-03-05 16:13 ` Doug Anderson
2026-03-09 13:33 ` Petr Mladek
2026-03-11 2:51 ` Mayank Rungta
2026-03-11 13:56 ` Petr Mladek
2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta
2026-02-12 21:12 ` Mayank Rungta via B4 Relay
2026-02-13 16:29 ` Doug Anderson
2026-03-05 12:33 ` Petr Mladek
2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta
2026-02-12 21:12 ` Mayank Rungta via B4 Relay
2026-02-13 16:30 ` Doug Anderson
2026-03-05 13:46 ` Petr Mladek
2026-03-05 16:45 ` Doug Anderson
2026-03-11 14:07 ` Petr Mladek
2026-03-12 21:02 ` Doug Anderson
2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta
2026-02-12 21:12 ` Mayank Rungta via B4 Relay
2026-02-13 16:30 ` Doug Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaloqnsgdVp75xcV@pathway.suse.cz \
--to=pmladek@suse.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=cuiyunhui@bytedance.com \
--cc=dianders@chromium.org \
--cc=eranian@google.com \
--cc=feng.tang@linux.alibaba.com \
--cc=irogers@google.com \
--cc=lihuafei1@huawei.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=max.kellermann@ionos.com \
--cc=mrungta@google.com \
--cc=wangjinchao600@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.