From: Lance Yang <lance.yang@linux.dev>
To: lirongqing <lirongqing@baidu.com>
Cc: Nicholas Piggin <npiggin@gmail.com>,
Christophe Leroy <chleroy@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-aspeed@lists.ozlabs.org, linux-openrisc@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, dri-devel@lists.freedesktop.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
wireguard@lists.zx2c4.com, netdev@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] watchdog: softlockup: panic when lockup duration exceeds N thresholds
Date: Wed, 17 Dec 2025 14:27:22 +0800 [thread overview]
Message-ID: <e216e4ae-b882-454d-be8f-24f21a3549d9@linux.dev> (raw)
In-Reply-To: <20251216074521.2796-1-lirongqing@baidu.com>
On 2025/12/16 15:45, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
>
> The softlockup_panic sysctl is currently a binary option: panic immediately
> or never panic on soft lockups.
>
> Panicking on any soft lockup, regardless of duration, can be overly
> aggressive for brief stalls that may be caused by legitimate operations.
> Conversely, never panicking may allow severe system hangs to persist
> undetected.
>
> Extend softlockup_panic to accept an integer threshold, allowing the kernel
> to panic only when the normalized lockup duration exceeds N watchdog
> threshold periods. This provides finer-grained control to distinguish
> between transient delays and persistent system failures.
>
> The accepted values are:
> - 0: Don't panic (unchanged)
> - 1: Panic when duration >= 1 * threshold (20s default, original behavior)
> - N > 1: Panic when duration >= N * threshold (e.g., 2 = 40s, 3 = 60s.)
>
> The original behavior is preserved for values 0 and 1, maintaining full
> backward compatibility while allowing systems to tolerate brief lockups
> while still catching severe, persistent hangs.
Thanks! Just a couple of minor things below ;)
>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 10 +++++-----
> arch/arm/configs/aspeed_g5_defconfig | 2 +-
> arch/arm/configs/pxa3xx_defconfig | 2 +-
> arch/openrisc/configs/or1klitex_defconfig | 2 +-
> arch/powerpc/configs/skiroot_defconfig | 2 +-
> drivers/gpu/drm/ci/arm.config | 2 +-
> drivers/gpu/drm/ci/arm64.config | 2 +-
> drivers/gpu/drm/ci/x86_64.config | 2 +-
> kernel/watchdog.c | 8 +++++---
> lib/Kconfig.debug | 13 +++++++------
> tools/testing/selftests/bpf/config | 2 +-
> tools/testing/selftests/wireguard/qemu/kernel.config | 2 +-
> 12 files changed, 26 insertions(+), 23 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index a8d0afd..27c5f96 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6934,12 +6934,12 @@ Kernel parameters
>
> softlockup_panic=
> [KNL] Should the soft-lockup detector generate panics.
> - Format: 0 | 1
> + Format: <int>
>
> - A value of 1 instructs the soft-lockup detector
> - to panic the machine when a soft-lockup occurs. It is
> - also controlled by the kernel.softlockup_panic sysctl
> - and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, which is the
> + A value of non-zero instructs the soft-lockup detector
> + to panic the machine when a soft-lockup duration exceeds
> + N thresholds. It is also controlled by the kernel.softlockup_panic
> + sysctl and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, which is the
> respective build-time switch to that functionality.
Seems like kernel/configs/debug.config still has the old format
"# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set" ...
Should be updated to "CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=0", right?
>
> softlockup_all_cpu_backtrace=
> diff --git a/arch/arm/configs/aspeed_g5_defconfig b/arch/arm/configs/aspeed_g5_defconfig
> index 2e6ea13..ec558e5 100644
> --- a/arch/arm/configs/aspeed_g5_defconfig
> +++ b/arch/arm/configs/aspeed_g5_defconfig
> @@ -306,7 +306,7 @@ CONFIG_SCHED_STACK_END_CHECK=y
> CONFIG_PANIC_ON_OOPS=y
> CONFIG_PANIC_TIMEOUT=-1
> CONFIG_SOFTLOCKUP_DETECTOR=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
> CONFIG_BOOTPARAM_HUNG_TASK_PANIC=1
> CONFIG_WQ_WATCHDOG=y
> # CONFIG_SCHED_DEBUG is not set
> diff --git a/arch/arm/configs/pxa3xx_defconfig b/arch/arm/configs/pxa3xx_defconfig
> index 07d422f..fb272e3 100644
> --- a/arch/arm/configs/pxa3xx_defconfig
> +++ b/arch/arm/configs/pxa3xx_defconfig
> @@ -100,7 +100,7 @@ CONFIG_PRINTK_TIME=y
> CONFIG_DEBUG_KERNEL=y
> CONFIG_MAGIC_SYSRQ=y
> CONFIG_DEBUG_SHIRQ=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
> # CONFIG_SCHED_DEBUG is not set
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_SPINLOCK_SLEEP=y
> diff --git a/arch/openrisc/configs/or1klitex_defconfig b/arch/openrisc/configs/or1klitex_defconfig
> index fb1eb9a..984b0e3 100644
> --- a/arch/openrisc/configs/or1klitex_defconfig
> +++ b/arch/openrisc/configs/or1klitex_defconfig
> @@ -52,5 +52,5 @@ CONFIG_LSM="lockdown,yama,loadpin,safesetid,integrity,bpf"
> CONFIG_PRINTK_TIME=y
> CONFIG_PANIC_ON_OOPS=y
> CONFIG_SOFTLOCKUP_DETECTOR=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
> CONFIG_BUG_ON_DATA_CORRUPTION=y
> diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig
> index 2b71a6d..a4114fc 100644
> --- a/arch/powerpc/configs/skiroot_defconfig
> +++ b/arch/powerpc/configs/skiroot_defconfig
> @@ -289,7 +289,7 @@ CONFIG_SCHED_STACK_END_CHECK=y
> CONFIG_DEBUG_STACKOVERFLOW=y
> CONFIG_PANIC_ON_OOPS=y
> CONFIG_SOFTLOCKUP_DETECTOR=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
> CONFIG_HARDLOCKUP_DETECTOR=y
> CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
> CONFIG_WQ_WATCHDOG=y
> diff --git a/drivers/gpu/drm/ci/arm.config b/drivers/gpu/drm/ci/arm.config
> index 411e814..d7c5167 100644
> --- a/drivers/gpu/drm/ci/arm.config
> +++ b/drivers/gpu/drm/ci/arm.config
> @@ -52,7 +52,7 @@ CONFIG_TMPFS=y
> CONFIG_PROVE_LOCKING=n
> CONFIG_DEBUG_LOCKDEP=n
> CONFIG_SOFTLOCKUP_DETECTOR=n
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=n
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=0
>
> CONFIG_FW_LOADER_COMPRESS=y
>
> diff --git a/drivers/gpu/drm/ci/arm64.config b/drivers/gpu/drm/ci/arm64.config
> index fddfbd4..ea0e307 100644
> --- a/drivers/gpu/drm/ci/arm64.config
> +++ b/drivers/gpu/drm/ci/arm64.config
> @@ -161,7 +161,7 @@ CONFIG_TMPFS=y
> CONFIG_PROVE_LOCKING=n
> CONFIG_DEBUG_LOCKDEP=n
> CONFIG_SOFTLOCKUP_DETECTOR=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
>
> CONFIG_DETECT_HUNG_TASK=y
>
> diff --git a/drivers/gpu/drm/ci/x86_64.config b/drivers/gpu/drm/ci/x86_64.config
> index 8eaba388..7ac98a7 100644
> --- a/drivers/gpu/drm/ci/x86_64.config
> +++ b/drivers/gpu/drm/ci/x86_64.config
> @@ -47,7 +47,7 @@ CONFIG_TMPFS=y
> CONFIG_PROVE_LOCKING=n
> CONFIG_DEBUG_LOCKDEP=n
> CONFIG_SOFTLOCKUP_DETECTOR=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
>
> CONFIG_DETECT_HUNG_TASK=y
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 0685e3a..a5fa116 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -363,7 +363,7 @@ static struct cpumask watchdog_allowed_mask __read_mostly;
>
> /* Global variables, exported for sysctl */
> unsigned int __read_mostly softlockup_panic =
> - IS_ENABLED(CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC);
> + CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC;
>
> static bool softlockup_initialized __read_mostly;
> static u64 __read_mostly sample_period;
> @@ -879,7 +879,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>
> add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> sys_info(softlockup_si_mask & ~SYS_INFO_ALL_BT);
> - if (softlockup_panic)
> + duration = duration / get_softlockup_thresh();
Nit: reusing "duration" here makes things a bit confusing, maybe just
use a temp variable?
thresh_count = duration / get_softlockup_thresh();
if (softlockup_panic && thresh_count >= softlockup_panic)
panic("softlockup: hung tasks");
Cheers,
Lance
> +
> + if (softlockup_panic && duration >= softlockup_panic)
> panic("softlockup: hung tasks");
> }
>
> @@ -1228,7 +1230,7 @@ static const struct ctl_table watchdog_sysctls[] = {
> .mode = 0644,
> .proc_handler = proc_dointvec_minmax,
> .extra1 = SYSCTL_ZERO,
> - .extra2 = SYSCTL_ONE,
> + .extra2 = SYSCTL_INT_MAX,
> },
> {
> .procname = "softlockup_sys_info",
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index ba36939..17a7a77 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1110,13 +1110,14 @@ config SOFTLOCKUP_DETECTOR_INTR_STORM
> the CPU stats and the interrupt counts during the "soft lockups".
>
> config BOOTPARAM_SOFTLOCKUP_PANIC
> - bool "Panic (Reboot) On Soft Lockups"
> + int "Panic (Reboot) On Soft Lockups"
> depends on SOFTLOCKUP_DETECTOR
> + default 0
> help
> - Say Y here to enable the kernel to panic on "soft lockups",
> - which are bugs that cause the kernel to loop in kernel
> - mode for more than 20 seconds (configurable using the watchdog_thresh
> - sysctl), without giving other tasks a chance to run.
> + Set to a non-zero value N to enable the kernel to panic on "soft
> + lockups", which are bugs that cause the kernel to loop in kernel
> + mode for more than (N * 20 seconds) (configurable using the
> + watchdog_thresh sysctl), without giving other tasks a chance to run.
>
> The panic can be used in combination with panic_timeout,
> to cause the system to reboot automatically after a
> @@ -1124,7 +1125,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
> high-availability systems that have uptime guarantees and
> where a lockup must be resolved ASAP.
>
> - Say N if unsure.
> + Say 0 if unsure.
>
> config HAVE_HARDLOCKUP_DETECTOR_BUDDY
> bool
> diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
> index 558839e..2485538 100644
> --- a/tools/testing/selftests/bpf/config
> +++ b/tools/testing/selftests/bpf/config
> @@ -1,6 +1,6 @@
> CONFIG_BLK_DEV_LOOP=y
> CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
> CONFIG_BPF=y
> CONFIG_BPF_EVENTS=y
> CONFIG_BPF_JIT=y
> diff --git a/tools/testing/selftests/wireguard/qemu/kernel.config b/tools/testing/selftests/wireguard/qemu/kernel.config
> index 0504c11..bb89d2d 100644
> --- a/tools/testing/selftests/wireguard/qemu/kernel.config
> +++ b/tools/testing/selftests/wireguard/qemu/kernel.config
> @@ -80,7 +80,7 @@ CONFIG_HARDLOCKUP_DETECTOR=y
> CONFIG_WQ_WATCHDOG=y
> CONFIG_DETECT_HUNG_TASK=y
> CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
> -CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> +CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=1
> CONFIG_BOOTPARAM_HUNG_TASK_PANIC=1
> CONFIG_PANIC_TIMEOUT=-1
> CONFIG_STACKTRACE=y
next prev parent reply other threads:[~2025-12-17 6:38 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-16 7:45 [PATCH] watchdog: softlockup: panic when lockup duration exceeds N thresholds lirongqing
2025-12-17 6:27 ` Lance Yang [this message]
2025-12-17 7:43 ` 答复: [外部邮件] " Li,Rongqing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e216e4ae-b882-454d-be8f-24f21a3549d9@linux.dev \
--to=lance.yang@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=bpf@vger.kernel.org \
--cc=chleroy@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-aspeed@lists.ozlabs.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-openrisc@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lirongqing@baidu.com \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=npiggin@gmail.com \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=wireguard@lists.zx2c4.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).