public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: mrungta@google.com
Cc: Jonathan Corbet <corbet@lwn.net>,
	Jinchao Wang <wangjinchao600@gmail.com>,
	Yunhui Cui <cuiyunhui@bytedance.com>,
	Stephane Eranian <eranian@google.com>,
	Ian Rogers <irogers@google.com>, Li Huafei <lihuafei1@huawei.com>,
	Feng Tang <feng.tang@linux.alibaba.com>,
	Max Kellermann <max.kellermann@ionos.com>,
	Douglas Anderson <dianders@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness
Date: Thu, 5 Mar 2026 14:46:56 +0100	[thread overview]
Message-ID: <aamJUImqf4WfTu3d@pathway.suse.cz> (raw)
In-Reply-To: <20260212-hardlockup-watchdog-fixes-v1-3-745f1dce04c3@google.com>

On Thu 2026-02-12 14:12:12, Mayank Rungta via B4 Relay wrote:
> From: Mayank Rungta <mrungta@google.com>
> 
> Currently, the buddy system only performs checks every 3rd sample. With
> a 4-second interval. If a check window is missed, the next check occurs
> 12 seconds later, potentially delaying hard lockup detection for up to
> 24 seconds.
> 
> Modify the buddy system to perform checks at every interval (4s).
> Introduce a missed-interrupt threshold to maintain the existing grace
> period while reducing the detection window to 8-12 seconds.
> 
> Best and worst case detection scenarios:
> 
> Before (12s check window):
> - Best case: Lockup occurs after first check but just before heartbeat
>   interval. Detected in ~8s (8s till next check).
> - Worst case: Lockup occurs just after a check.
>   Detected in ~24s (missed check + 12s till next check + 12s logic).
> 
> After (4s check window with threshold of 3):
> - Best case: Lockup occurs just before a check.
>   Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
> - Worst case: Lockup occurs just after a check.
>   Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).

One might argue that the interval <8s,24s> is not much worse than
<6s,20s> achieved by the perf detector.

But I personally like that the disperse of <8s,12s> is lower so that
the result is more predictable. And it is relatively cheap.

People might have different option. But I am fine with this change.

> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -163,8 +171,13 @@ static bool is_hardlockup(unsigned int cpu)
>  {
>  	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
>  
> -	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> -		return true;
> +	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> +		per_cpu(hrtimer_interrupts_missed, cpu)++;
> +		if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)

This would return true for every check when missed >= 3.
As a result, the hardlockup would be reported every 4s.

I would keep the 12s cadence and change this to:

		if (per_cpu(hrtimer_interrupts_missed, cpu) % watchdog_hardlockup_miss_thresh == 0)

> +			return true;
> +
> +		return false;
> +	}
>  
>  	/*
>  	 * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> --- a/kernel/watchdog_buddy.c
> +++ b/kernel/watchdog_buddy.c
> @@ -86,14 +87,6 @@ void watchdog_buddy_check_hardlockup(int hrtimer_interrupts)
>  {
>  	unsigned int next_cpu;
>  
> -	/*
> -	 * Test for hardlockups every 3 samples. The sample period is
> -	 *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
> -	 *  watchdog_thresh (over by 20%).
> -	 */
> -	if (hrtimer_interrupts % 3 != 0)
> -		return;

It would be symetric with the "% 3" above.

> -
>  	/* check for a hardlockup on the next CPU */
>  	next_cpu = watchdog_next_cpu(smp_processor_id());
>  	if (next_cpu >= nr_cpu_ids)

Best Regards,
Petr

  parent reply	other threads:[~2026-03-05 13:47 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12 21:12 [PATCH 0/4] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
2026-02-12 21:12 ` [PATCH 1/4] watchdog/hardlockup: Always update saved interrupts during check Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-04 14:44   ` Petr Mladek
2026-03-05  0:58     ` Doug Anderson
2026-03-05 11:27       ` Petr Mladek
2026-03-05 16:13         ` Doug Anderson
2026-03-09 13:33           ` Petr Mladek
2026-03-11  2:51             ` Mayank Rungta
2026-03-11 13:56               ` Petr Mladek
2026-02-12 21:12 ` [PATCH 2/4] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
2026-02-13 16:29   ` Doug Anderson
2026-03-05 12:33   ` Petr Mladek
2026-02-12 21:12 ` [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson
2026-03-05 13:46   ` Petr Mladek [this message]
2026-03-05 16:45     ` Doug Anderson
2026-03-11 14:07       ` Petr Mladek
2026-03-12 21:02         ` Doug Anderson
2026-02-12 21:12 ` [PATCH 4/4] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
2026-02-13 16:30   ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aamJUImqf4WfTu3d@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=cuiyunhui@bytedance.com \
    --cc=dianders@chromium.org \
    --cc=eranian@google.com \
    --cc=feng.tang@linux.alibaba.com \
    --cc=irogers@google.com \
    --cc=lihuafei1@huawei.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=max.kellermann@ionos.com \
    --cc=mrungta@google.com \
    --cc=wangjinchao600@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox