All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Doug Anderson <dianders@chromium.org>
Cc: mrungta@google.com, Jonathan Corbet <corbet@lwn.net>,
	Jinchao Wang <wangjinchao600@gmail.com>,
	Yunhui Cui <cuiyunhui@bytedance.com>,
	Stephane Eranian <eranian@google.com>,
	Ian Rogers <irogers@google.com>, Li Huafei <lihuafei1@huawei.com>,
	Feng Tang <feng.tang@linux.alibaba.com>,
	Max Kellermann <max.kellermann@ionos.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check
Date: Mon, 9 Mar 2026 14:33:36 +0100	[thread overview]
Message-ID: <aa7MMCuJcIafi-gM@pathway.suse.cz> (raw)
In-Reply-To: <CAD=FV=WVPmR4-QNp-_pFoSfRXDWq_Mv+hWNZVjtvu+GUHCmT+A@mail.gmail.com>

On Thu 2026-03-05 08:13:39, Doug Anderson wrote:
> Hi,
> 
> On Thu, Mar 5, 2026 at 3:27 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > > * watchdog_hardlockup_check() called and saves counter (1000)
> > > * timer runs and updates the timer (1000 -> 1001)
> > > * touch_nmi_watchdog() is called
> > > * CPU locks up
> > > * 10 seconds pass
> > > * watchdog_hardlockup_check() called and saves counter (1001)
> > > * 10 seconds pass
> > > * watchdog_hardlockup_check() called and notices touch
> >
> > Great visualization!
> >
> > Nit: It seems to be actually the other way around:
> >
> >  * 10 seconds pass
> >  * watchdog_hardlockup_check() called and notices touch and skips updating counters
> >  * 10 seconds pass
> >  * watchdog_hardlockup_check() called and saves counter (1001)
> 
> Oops, right! :-) Mayank: it's probably worth adding some form of the
> (corrected) example here to the commit message. Also, you could
> mention in the commit message that you were seeing real problems
> because of the 8250 console prints with the general rule that if
> someone asks a question during the a review it's worth including that
> info in the next version of the commit message. ;-)
> 
> 
> > A better solution might be to separate the check and update/reset
> > of the values. Something like (on top of this patchset, just
> > compilation tested):
> >
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index 30199eaeb5d7..4d0851f0f412 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -167,18 +167,10 @@ void watchdog_hardlockup_touch_cpu(unsigned int cpu)
> >         per_cpu(watchdog_hardlockup_touched, cpu) = true;
> >  }
> >
> > -static bool is_hardlockup(unsigned int cpu)
> > +static void watchdog_hardlockup_update_reset(unsigned int cpu)
> >  {
> >         int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> >
> > -       if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > -               per_cpu(hrtimer_interrupts_missed, cpu)++;
> > -               if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
> > -                       return true;
> > -
> > -               return false;
> > -       }
> > -
> >         /*
> >          * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> >          * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> > @@ -186,8 +178,20 @@ static bool is_hardlockup(unsigned int cpu)
> >          */
> >         per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> >         per_cpu(hrtimer_interrupts_missed, cpu) = 0;
> > +}
> >
> > -       return false;
> > +static bool is_hardlockup(unsigned int cpu)
> > +{
> > +       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > +
> > +       if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint)
> > +               return false;
> > +
> > +       per_cpu(hrtimer_interrupts_missed, cpu)++;
> > +       if (per_cpu(hrtimer_interrupts_missed, cpu) < watchdog_hardlockup_miss_thresh)
> > +               return false;
> > +
> > +       return true;
> >  }
> >
> >  static void watchdog_hardlockup_kick(void)
> > @@ -200,23 +204,10 @@ static void watchdog_hardlockup_kick(void)
> >
> >  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >  {
> > -       bool is_hl;
> >         int hardlockup_all_cpu_backtrace;
> > -       /*
> > -        * Check for a hardlockup by making sure the CPU's timer
> > -        * interrupt is incrementing. The timer interrupt should have
> > -        * fired multiple times before we overflow'd. If it hasn't
> > -        * then this is a good indication the cpu is stuck
> > -        *
> > -        * Purposely check this _before_ checking watchdog_hardlockup_touched
> > -        * so we make sure we still update the saved value of the interrupts.
> > -        * Without that we'll take an extra round through this function before
> > -        * we can detect a lockup.
> > -        */
> > -
> > -       is_hl = is_hardlockup(cpu);
> >
> >         if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> > +               watchdog_hardlockup_update_reset(cpu);
> >                 per_cpu(watchdog_hardlockup_touched, cpu) = false;
> >                 return;
> >         }
> > @@ -224,7 +215,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >         hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
> >                                         1 : sysctl_hardlockup_all_cpu_backtrace;
> >
> > -       if (is_hl) {
> > +       /*
> > +        * Check for a hardlockup by making sure the CPU's timer
> > +        * interrupt is incrementing. The timer interrupt should have
> > +        * fired multiple times before we overflow'd. If it hasn't
> > +        * then this is a good indication the cpu is stuck
> > +        */
> > +       if (is_hardlockup(cpu)) {
> >                 unsigned int this_cpu = smp_processor_id();
> >                 unsigned long flags;
> >
> > @@ -290,6 +287,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >
> >                 per_cpu(watchdog_hardlockup_warned, cpu) = true;
> >         } else {
> > +               watchdog_hardlockup_update_reset(cpu);
> >                 per_cpu(watchdog_hardlockup_warned, cpu) = false;
> >         }
> >  }
> 
> I haven't tested it, but that actually looks like a pretty nice final
> result to me. Mayank: What do you think? You'd have to figure out how
> to rework your two patches to incorporate Petr's ideas.
> 
> Petr: Since you gave your ideas as a diff, what are you thinking in
> terms of tags on Mayank's v2? You didn't provide a Signed-off-by on
> your diff, so I guess you're expecting Mayank not to incorprate it
> directly but take it as a "suggestion" for improving his patches (AKA
> not add any of your tags to his v2).

I expected that Mayank could rework his patchset using ideas from the
diff. Feel free to use the changes as they are and copy&paste them
from my diff. It is just a refactoring.

> One nit: in the final result, it might be nice to invert the
> "is_hardlockup()" test so we can return early and get rid of a level
> of indentation. AKA:
> 
> if (!is_hardlockup(cpu)) {
>   watchdog_hardlockup_update_reset(cpu);
>   per_cpu(watchdog_hardlockup_warned, cpu) = false;
>   return;
> }
> 
> Not only does it reduce indentation, but it also keeps the two calls
> to watchdog_hardlockup_update_reset() closer to each other.

Yeah, that would be great. I actually wanted to do it in my diff
as well. But I did not do it to keep the diff simple.

It might be better to invert the logic as a separate preparation
patch so that we do not hide other changes in the reshuffling.

Best Regards,
Petr

  reply	other threads:[~2026-03-09 13:33 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta
2026-02-12 21:12 ` Mayank Rungta via B4 Relay
2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta
2026-02-12 21:12   ` Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-04 14:44   ` Petr Mladek
2026-03-05  0:58     ` Doug Anderson
2026-03-05 11:27       ` Petr Mladek
2026-03-05 16:13         ` Doug Anderson
2026-03-09 13:33           ` Petr Mladek [this message]
2026-03-11  2:51             ` Mayank Rungta
2026-03-11 13:56               ` Petr Mladek
2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta
2026-02-12 21:12   ` Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-05 12:33   ` Petr Mladek
2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta
2026-02-12 21:12   ` Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson
2026-03-05 13:46   ` Petr Mladek
2026-03-05 16:45     ` Doug Anderson
2026-03-11 14:07       ` Petr Mladek
2026-03-12 21:02         ` Doug Anderson
2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta
2026-02-12 21:12   ` Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa7MMCuJcIafi-gM@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=cuiyunhui@bytedance.com \
    --cc=dianders@chromium.org \
    --cc=eranian@google.com \
    --cc=feng.tang@linux.alibaba.com \
    --cc=irogers@google.com \
    --cc=lihuafei1@huawei.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=max.kellermann@ionos.com \
    --cc=mrungta@google.com \
    --cc=wangjinchao600@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.