From: Petr Mladek <pmladek@suse.com>
To: Mayank Rungta <mrungta@google.com>
Cc: Doug Anderson <dianders@chromium.org>,
Jonathan Corbet <corbet@lwn.net>,
Jinchao Wang <wangjinchao600@gmail.com>,
Yunhui Cui <cuiyunhui@bytedance.com>,
Stephane Eranian <eranian@google.com>,
Ian Rogers <irogers@google.com>, Li Huafei <lihuafei1@huawei.com>,
Feng Tang <feng.tang@linux.alibaba.com>,
Max Kellermann <max.kellermann@ionos.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
Date: Wed, 11 Mar 2026 14:56:45 +0100 [thread overview]
Message-ID: <abF0naZb9lnFeiid@pathway.suse.cz> (raw)
In-Reply-To: <CABz7rdA+kYq1VMasBemp7iNsk8O4+eAJdb57d+dTZLjJmazv0Q@mail.gmail.com>
On Tue 2026-03-10 19:51:21, Mayank Rungta wrote:
> > On Mon, Mar 9, 2026 at 6:33 AM Petr Mladek <pmladek@suse.com> wrote:
> > On Thu 2026-03-05 08:13:39, Doug Anderson wrote:
> > > On Thu, Mar 5, 2026 at 3:27 AM Petr Mladek <pmladek@suse.com> wrote:
> > > > A better solution might be to separate the check and update/reset
> > > > of the values. Something like (on top of this patchset, just
> > > > compilation tested):
> > > >
> > > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > > index 30199eaeb5d7..4d0851f0f412 100644
> > > > --- a/kernel/watchdog.c
> > > > +++ b/kernel/watchdog.c
> > > > @@ -167,18 +167,10 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
> > > > per_cpu(watchdog_hardlockup_touched, cpu) = true;
> > > > }
> > > >
> > > > -static bool is_hardlockup(unsigned int cpu)
> > > > +static void watchdog_hardlockup_update_reset(unsigned int cpu)
> > > > {
> > > > int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > > >
> > > > - if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > > > - per_cpu(hrtimer_interrupts_missed, cpu)++;
> > > > - if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> > > > - return true;
> > > > -
> > > > - return false;
> > > > - }
> > > > -
> > > > /*
> > > > * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> > > > * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> > > > @@ -186,8 +178,20 @@ static bool is_hardlockup(unsigned int cpu)
> > > > */
> > > > per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> > > > per_cpu(hrtimer_interrupts_missed, cpu) = 0;
> > > > +}
> > > >
> > > > - return false;
> > > > +static bool is_hardlockup(unsigned int cpu)
> > > > +{
> > > > + int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > > > +
> > > > + if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint)
> > > > + return false;
> > > > +
> > > > + per_cpu(hrtimer_interrupts_missed, cpu)++;
> > > > + if (per_cpu(hrtimer_interrupts_missed, cpu) < watchdog_hardlockup_miss_thresh)
> > > > + return false;
> > > > +
> > > > + return true;
> > > > }
> > > >
> > > > static void watchdog_hardlockup_kick(void)
> > > > @@ -200,23 +204,10 @@ static void watchdog_hardlockup_kick(void)
> > > >
> > > > void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > > {
> > > > - bool is_hl;
> > > > int hardlockup_all_cpu_backtrace;
> > > > - /*
> > > > - * Check for a hardlockup by making sure the CPU's timer
> > > > - * interrupt is incrementing. The timer interrupt should have
> > > > - * fired multiple times before we overflow'd. If it hasn't
> > > > - * then this is a good indication the cpu is stuck
> > > > - *
> > > > - * Purposely check this _before_ checking watchdog_hardlockup_touched
> > > > - * so we make sure we still update the saved value of the interrupts.
> > > > - * Without that we'll take an extra round through this function before
> > > > - * we can detect a lockup.
> > > > - */
> > > > -
> > > > - is_hl = is_hardlockup(cpu);
> > > >
> > > > if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> > > > + watchdog_hardlockup_update_reset(cpu);
> > > > per_cpu(watchdog_hardlockup_touched, cpu) = false;
> > > > return;
> > > > }
> > > > @@ -224,7 +215,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > > hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
> > > > 1 : sysctl_hardlockup_all_cpu_backtrace;
> > > >
> > > > - if (is_hl) {
> > > > + /*
> > > > + * Check for a hardlockup by making sure the CPU's timer
> > > > + * interrupt is incrementing. The timer interrupt should have
> > > > + * fired multiple times before we overflow'd. If it hasn't
> > > > + * then this is a good indication the cpu is stuck
> > > > + */
> > > > + if (is_hardlockup(cpu)) {
> > > > unsigned int this_cpu = smp_processor_id();
> > > > unsigned long flags;
> > > >
> > > > @@ -290,6 +287,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > >
> > > > per_cpu(watchdog_hardlockup_warned, cpu) = true;
> > > > } else {
> > > > + watchdog_hardlockup_update_reset(cpu);
> > > > per_cpu(watchdog_hardlockup_warned, cpu) = false;
> > > > }
> > > > }
> > >
> > > I haven't tested it, but that actually looks like a pretty nice final
> > > result to me. Mayank: What do you think? You'd have to figure out how
> > > to rework your two patches to incorporate Petr's ideas.
> > >
>
> Thanks for your suggestion, this is pretty close, but we cannot call
> watchdog_hardlockup_update_reset(cpu) in a general else block. If we
> did, the hrtimer_interrupts_missed count would be reset on every check
> that isn't a hardlockup; even when no progress was actually made and
> we would never hit the watchdog_threshold.
Great catch!
> I’ve moved the update/reset inside the "progress detected" path within
> is_hardlockup() instead:
>
> static bool is_hardlockup(unsigned int cpu)
> {
> int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
>
> if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint) {
> watchdog_hardlockup_update_reset(cpu);
> return false;
> }
>
> per_cpu(hrtimer_interrupts_missed, cpu)++;
> if (per_cpu(hrtimer_interrupts_missed, cpu) <
> watchdog_hardlockup_miss_thresh)
> return false;
>
> return true;
> }
Looks good to me.
> I still went ahead and flipped the `is_hardlockup` test in the main
> check function to keep the indentation clean
Great.
Best Regards,
Petr
next prev parent reply other threads:[~2026-03-11 13:56 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta
2026-02-12 21:12 ` Mayank Rungta via B4 Relay
2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta
2026-02-12 21:12 ` Mayank Rungta via B4 Relay
2026-02-13 16:29 ` Doug Anderson
2026-03-04 14:44 ` Petr Mladek
2026-03-05 0:58 ` Doug Anderson
2026-03-05 11:27 ` Petr Mladek
2026-03-05 16:13 ` Doug Anderson
2026-03-09 13:33 ` Petr Mladek
2026-03-11 2:51 ` Mayank Rungta
2026-03-11 13:56 ` Petr Mladek [this message]
2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta
2026-02-12 21:12 ` Mayank Rungta via B4 Relay
2026-02-13 16:29 ` Doug Anderson
2026-03-05 12:33 ` Petr Mladek
2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta
2026-02-12 21:12 ` Mayank Rungta via B4 Relay
2026-02-13 16:30 ` Doug Anderson
2026-03-05 13:46 ` Petr Mladek
2026-03-05 16:45 ` Doug Anderson
2026-03-11 14:07 ` Petr Mladek
2026-03-12 21:02 ` Doug Anderson
2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta
2026-02-12 21:12 ` Mayank Rungta via B4 Relay
2026-02-13 16:30 ` Doug Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abF0naZb9lnFeiid@pathway.suse.cz \
--to=pmladek@suse.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=cuiyunhui@bytedance.com \
--cc=dianders@chromium.org \
--cc=eranian@google.com \
--cc=feng.tang@linux.alibaba.com \
--cc=irogers@google.com \
--cc=lihuafei1@huawei.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=max.kellermann@ionos.com \
--cc=mrungta@google.com \
--cc=wangjinchao600@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.