public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Mayank Rungta <mrungta@google.com>
Cc: Doug Anderson <dianders@chromium.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Jinchao Wang <wangjinchao600@gmail.com>,
	Yunhui Cui <cuiyunhui@bytedance.com>,
	Stephane Eranian <eranian@google.com>,
	Ian Rogers <irogers@google.com>, Li Huafei <lihuafei1@huawei.com>,
	Feng Tang <feng.tang@linux.alibaba.com>,
	Max Kellermann <max.kellermann@ionos.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
Date: Wed, 11 Mar 2026 14:56:45 +0100	[thread overview]
Message-ID: <abF0naZb9lnFeiid@pathway.suse.cz> (raw)
In-Reply-To: <CABz7rdA+kYq1VMasBemp7iNsk8O4+eAJdb57d+dTZLjJmazv0Q@mail.gmail.com>

On Tue 2026-03-10 19:51:21, Mayank Rungta wrote:
> > On Mon, Mar 9, 2026 at 6:33 AM Petr Mladek <pmladek@suse.com> wrote:
> > On Thu 2026-03-05 08:13:39, Doug Anderson wrote:
> > > On Thu, Mar 5, 2026 at 3:27 AM Petr Mladek <pmladek@suse.com> wrote:
> > > > A better solution might be to separate the check and update/reset
> > > > of the values. Something like (on top of this patchset, just
> > > > compilation tested):
> > > >
> > > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > > index 30199eaeb5d7..4d0851f0f412 100644
> > > > --- a/kernel/watchdog.c
> > > > +++ b/kernel/watchdog.c
> > > > @@ -167,18 +167,10 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
> > > >         per_cpu(watchdog_hardlockup_touched, cpu) = true;
> > > >  }
> > > >
> > > > -static bool is_hardlockup(unsigned int cpu)
> > > > +static void watchdog_hardlockup_update_reset(unsigned int cpu)
> > > >  {
> > > >         int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > > >
> > > > -       if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > > > -               per_cpu(hrtimer_interrupts_missed, cpu)++;
> > > > -               if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> > > > -                       return true;
> > > > -
> > > > -               return false;
> > > > -       }
> > > > -
> > > >         /*
> > > >          * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> > > >          * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> > > > @@ -186,8 +178,20 @@ static bool is_hardlockup(unsigned int cpu)
> > > >          */
> > > >         per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> > > >         per_cpu(hrtimer_interrupts_missed, cpu) = 0;
> > > > +}
> > > >
> > > > -       return false;
> > > > +static bool is_hardlockup(unsigned int cpu)
> > > > +{
> > > > +       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > > > +
> > > > +       if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint)
> > > > +               return false;
> > > > +
> > > > +       per_cpu(hrtimer_interrupts_missed, cpu)++;
> > > > +       if (per_cpu(hrtimer_interrupts_missed, cpu) < watchdog_hardlockup_miss_thresh)
> > > > +               return false;
> > > > +
> > > > +       return true;
> > > >  }
> > > >
> > > >  static void watchdog_hardlockup_kick(void)
> > > > @@ -200,23 +204,10 @@ static void watchdog_hardlockup_kick(void)
> > > >
> > > >  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > >  {
> > > > -       bool is_hl;
> > > >         int hardlockup_all_cpu_backtrace;
> > > > -       /*
> > > > -        * Check for a hardlockup by making sure the CPU's timer
> > > > -        * interrupt is incrementing. The timer interrupt should have
> > > > -        * fired multiple times before we overflow'd. If it hasn't
> > > > -        * then this is a good indication the cpu is stuck
> > > > -        *
> > > > -        * Purposely check this _before_ checking watchdog_hardlockup_touched
> > > > -        * so we make sure we still update the saved value of the interrupts.
> > > > -        * Without that we'll take an extra round through this function before
> > > > -        * we can detect a lockup.
> > > > -        */
> > > > -
> > > > -       is_hl = is_hardlockup(cpu);
> > > >
> > > >         if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> > > > +               watchdog_hardlockup_update_reset(cpu);
> > > >                 per_cpu(watchdog_hardlockup_touched, cpu) = false;
> > > >                 return;
> > > >         }
> > > > @@ -224,7 +215,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > >         hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
> > > >                                         1 : sysctl_hardlockup_all_cpu_backtrace;
> > > >
> > > > -       if (is_hl) {
> > > > +       /*
> > > > +        * Check for a hardlockup by making sure the CPU's timer
> > > > +        * interrupt is incrementing. The timer interrupt should have
> > > > +        * fired multiple times before we overflow'd. If it hasn't
> > > > +        * then this is a good indication the cpu is stuck
> > > > +        */
> > > > +       if (is_hardlockup(cpu)) {
> > > >                 unsigned int this_cpu = smp_processor_id();
> > > >                 unsigned long flags;
> > > >
> > > > @@ -290,6 +287,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> > > >
> > > >                 per_cpu(watchdog_hardlockup_warned, cpu) = true;
> > > >         } else {
> > > > +               watchdog_hardlockup_update_reset(cpu);
> > > >                 per_cpu(watchdog_hardlockup_warned, cpu) = false;
> > > >         }
> > > >  }
> > >
> > > I haven't tested it, but that actually looks like a pretty nice final
> > > result to me. Mayank: What do you think? You'd have to figure out how
> > > to rework your two patches to incorporate Petr's ideas.
> > >
> 
> Thanks for your suggestion, this is pretty close, but we cannot call
> watchdog_hardlockup_update_reset(cpu) in a general else block. If we
> did, the hrtimer_interrupts_missed count would be reset on every check
> that isn't a hardlockup; even when no progress was actually made and
> we would never hit the watchdog_threshold.

Great catch!

> I’ve moved the update/reset inside the "progress detected" path within
> is_hardlockup() instead:
> 
> static bool is_hardlockup(unsigned int cpu)
> {
>         int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> 
>         if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint) {
>                 watchdog_hardlockup_update_reset(cpu);
>                 return false;
>         }
> 
>         per_cpu(hrtimer_interrupts_missed, cpu)++;
>         if (per_cpu(hrtimer_interrupts_missed, cpu) <
> watchdog_hardlockup_miss_thresh)
>                 return false;
> 
>         return true;
> }

Looks good to me.

> I still went ahead and flipped the `is_hardlockup` test in the main
> check function to keep the indentation clean

Great.

Best Regards,
Petr

  reply	other threads:[~2026-03-11 13:56 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-04 14:44   ` Petr Mladek
2026-03-05  0:58     ` Doug Anderson
2026-03-05 11:27       ` Petr Mladek
2026-03-05 16:13         ` Doug Anderson
2026-03-09 13:33           ` Petr Mladek
2026-03-11  2:51             ` Mayank Rungta
2026-03-11 13:56               ` Petr Mladek [this message]
2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-05 12:33   ` Petr Mladek
2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson
2026-03-05 13:46   ` Petr Mladek
2026-03-05 16:45     ` Doug Anderson
2026-03-11 14:07       ` Petr Mladek
2026-03-12 21:02         ` Doug Anderson
2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abF0naZb9lnFeiid@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=cuiyunhui@bytedance.com \
    --cc=dianders@chromium.org \
    --cc=eranian@google.com \
    --cc=feng.tang@linux.alibaba.com \
    --cc=irogers@google.com \
    --cc=lihuafei1@huawei.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=max.kellermann@ionos.com \
    --cc=mrungta@google.com \
    --cc=wangjinchao600@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox