From: Petr Mladek <pmladek@suse.com>
To: Doug Anderson <dianders@chromium.org>
Cc: mrungta@google.com, Jonathan Corbet <corbet@lwn.net>,
Jinchao Wang <wangjinchao600@gmail.com>,
Yunhui Cui <cuiyunhui@bytedance.com>,
Stephane Eranian <eranian@google.com>,
Ian Rogers <irogers@google.com>, Li Huafei <lihuafei1@huawei.com>,
Feng Tang <feng.tang@linux.alibaba.com>,
Max Kellermann <max.kellermann@ionos.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
Date: Mon, 9 Mar 2026 14:33:36 +0100 [thread overview]
Message-ID: <aa7MMCuJcIafi-gM@pathway.suse.cz> (raw)
In-Reply-To: <CAD=FV=WVPmR4-QNp-_pFoSfRXDWq_Mv+hWNZVjtvu+GUHCmT+A@mail.gmail.com>
On Thu 2026-03-05 08:13:39, Doug Anderson wrote:
> Hi,
>
> On Thu, Mar 5, 2026 at 3:27 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > > * watchdog_hardlockup_check() called and saves counter (1000)
> > > * timer runs and updates the timer (1000 -> 1001)
> > > * touch_nmi_watchdog() is called
> > > * CPU locks up
> > > * 10 seconds pass
> > > * watchdog_hardlockup_check() called and saves counter (1001)
> > > * 10 seconds pass
> > > * watchdog_hardlockup_check() called and notices touch
> >
> > Great visualization!
> >
> > Nit: It seems to be actually the other way around:
> >
> > * 10 seconds pass
> > * watchdog_hardlockup_check() called and notices touch and skips updating counters
> > * 10 seconds pass
> > * watchdog_hardlockup_check() called and saves counter (1001)
>
> Oops, right! :-) Mayank: it's probably worth adding some form of the
> (corrected) example here to the commit message. Also, you could
> mention in the commit message that you were seeing real problems
> because of the 8250 console prints with the general rule that if
> someone asks a question during the a review it's worth including that
> info in the next version of the commit message. ;-)
>
>
> > A better solution might be to separate the check and update/reset
> > of the values. Something like (on top of this patchset, just
> > compilation tested):
> >
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index 30199eaeb5d7..4d0851f0f412 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -167,18 +167,10 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
> > per_cpu(watchdog_hardlockup_touched, cpu) = true;
> > }
> >
> > -static bool is_hardlockup(unsigned int cpu)
> > +static void watchdog_hardlockup_update_reset(unsigned int cpu)
> > {
> > int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> >
> > - if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > - per_cpu(hrtimer_interrupts_missed, cpu)++;
> > - if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> > - return true;
> > -
> > - return false;
> > - }
> > -
> > /*
> > * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> > * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> > @@ -186,8 +178,20 @@ static bool is_hardlockup(unsigned int cpu)
> > */
> > per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> > per_cpu(hrtimer_interrupts_missed, cpu) = 0;
> > +}
> >
> > - return false;
> > +static bool is_hardlockup(unsigned int cpu)
> > +{
> > + int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > +
> > + if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint)
> > + return false;
> > +
> > + per_cpu(hrtimer_interrupts_missed, cpu)++;
> > + if (per_cpu(hrtimer_interrupts_missed, cpu) < watchdog_hardlockup_miss_thresh)
> > + return false;
> > +
> > + return true;
> > }
> >
> > static void watchdog_hardlockup_kick(void)
> > @@ -200,23 +204,10 @@ static void watchdog_hardlockup_kick(void)
> >
> > void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > {
> > - bool is_hl;
> > int hardlockup_all_cpu_backtrace;
> > - /*
> > - * Check for a hardlockup by making sure the CPU's timer
> > - * interrupt is incrementing. The timer interrupt should have
> > - * fired multiple times before we overflow'd. If it hasn't
> > - * then this is a good indication the cpu is stuck
> > - *
> > - * Purposely check this _before_ checking watchdog_hardlockup_touched
> > - * so we make sure we still update the saved value of the interrupts.
> > - * Without that we'll take an extra round through this function before
> > - * we can detect a lockup.
> > - */
> > -
> > - is_hl = is_hardlockup(cpu);
> >
> > if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> > + watchdog_hardlockup_update_reset(cpu);
> > per_cpu(watchdog_hardlockup_touched, cpu) = false;
> > return;
> > }
> > @@ -224,7 +215,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
> > 1 : sysctl_hardlockup_all_cpu_backtrace;
> >
> > - if (is_hl) {
> > + /*
> > + * Check for a hardlockup by making sure the CPU's timer
> > + * interrupt is incrementing. The timer interrupt should have
> > + * fired multiple times before we overflow'd. If it hasn't
> > + * then this is a good indication the cpu is stuck
> > + */
> > + if (is_hardlockup(cpu)) {
> > unsigned int this_cpu = smp_processor_id();
> > unsigned long flags;
> >
> > @@ -290,6 +287,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >
> > per_cpu(watchdog_hardlockup_warned, cpu) = true;
> > } else {
> > + watchdog_hardlockup_update_reset(cpu);
> > per_cpu(watchdog_hardlockup_warned, cpu) = false;
> > }
> > }
>
> I haven't tested it, but that actually looks like a pretty nice final
> result to me. Mayank: What do you think? You'd have to figure out how
> to rework your two patches to incorporate Petr's ideas.
>
> Petr: Since you gave your ideas as a diff, what are you thinking in
> terms of tags on Mayank's v2? You didn't provide a Signed-off-by on
> your diff, so I guess you're expecting Mayank not to incorprate it
> directly but take it as a "suggestion" for improving his patches (AKA
> not add any of your tags to his v2).
I expected that Mayank could rework his patchset using ideas from the
diff. Feel free to use the changes as they are and copy&paste them
from my diff. It is just a refactoring.
> One nit: in the final result, it might be nice to invert the
> "is_hardlockup()" test so we can return early and get rid of a level
> of indentation. AKA:
>
> if (!is_hardlockup(cpu)) {
> watchdog_hardlockup_update_reset(cpu);
> per_cpu(watchdog_hardlockup_warned, cpu) = false;
> return;
> }
>
> Not only does it reduce indentation, but it also keeps the two calls
> to watchdog_hardlockup_update_reset() closer to each other.
Yeah, that would be great. I actually wanted to do it in my diff
as well. But I did not do it to keep the diff simple.
It might be better to invert the logic as a separate preparation
patch so that we do not hide other changes in the reshuffling.
Best Regards,
Petr
next prev parent reply other threads:[~2026-03-09 13:33 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta via B4 Relay
2026-02-13 16:29 ` Doug Anderson
2026-03-04 14:44 ` Petr Mladek
2026-03-05 0:58 ` Doug Anderson
2026-03-05 11:27 ` Petr Mladek
2026-03-05 16:13 ` Doug Anderson
2026-03-09 13:33 ` Petr Mladek [this message]
2026-03-11 2:51 ` Mayank Rungta
2026-03-11 13:56 ` Petr Mladek
2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
2026-02-13 16:29 ` Doug Anderson
2026-03-05 12:33 ` Petr Mladek
2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
2026-02-13 16:30 ` Doug Anderson
2026-03-05 13:46 ` Petr Mladek
2026-03-05 16:45 ` Doug Anderson
2026-03-11 14:07 ` Petr Mladek
2026-03-12 21:02 ` Doug Anderson
2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
2026-02-13 16:30 ` Doug Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aa7MMCuJcIafi-gM@pathway.suse.cz \
--to=pmladek@suse.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=cuiyunhui@bytedance.com \
--cc=dianders@chromium.org \
--cc=eranian@google.com \
--cc=feng.tang@linux.alibaba.com \
--cc=irogers@google.com \
--cc=lihuafei1@huawei.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=max.kellermann@ionos.com \
--cc=mrungta@google.com \
--cc=wangjinchao600@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox