linux-aspeed.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time
@ 2025-10-15  6:36 lirongqing
  2025-10-16  5:07 ` Lance Yang
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: lirongqing @ 2025-10-15  6:36 UTC (permalink / raw)
  To: Andrew Morton, Lance Yang, Masami Hiramatsu, linux-kernel
  Cc: linux-doc, linux-arm-kernel, linux-aspeed, wireguard, netdev,
	linux-kselftest, Li RongQing, Andrew Jeffery, Anshuman Khandual,
	Arnd Bergmann, David Hildenbrand, Florian Wesphal, Jakub Kacinski,
	Jason A . Donenfeld, Joel Granados, Joel Stanley, Jonathan Corbet,
	Kees Cook, Liam Howlett, Lorenzo Stoakes, Paul E . McKenney,
	Pawan Gupta, Petr Mladek, Phil Auld, Randy Dunlap, Russell King,
	Shuah Khan, Simon Horman, Stanislav Fomichev, Steven Rostedt

From: Li RongQing <lirongqing@baidu.com>

Currently, when 'hung_task_panic' is enabled, the kernel panics
immediately upon detecting the first hung task. However, some hung
tasks are transient and allow system recovery, while persistent hangs
should trigger a panic when accumulating beyond a threshold.

Extend the 'hung_task_panic' sysctl to accept a threshold value
specifying the number of hung tasks that must be detected before
triggering a kernel panic. This provides finer control for environments
where transient hangs may occur but persistent hangs should be fatal.

The sysctl now accepts:
- 0: don't panic (maintains original behavior)
- 1: panic on first hung task (maintains original behavior)
- N > 1: panic after N hung tasks are detected in a single scan

This maintains backward compatibility while providing flexibility for
different hang scenarios.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Cc: Andrew Jeffery <andrew@codeconstruct.com.au>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Florian Wesphal <fw@strlen.de>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
diff with v3: comments modification, suggested by Lance, Masami, Randy and Petr
diff with v2: do not add a new sysctl, extend hung_task_panic, suggested by Kees Cook

 Documentation/admin-guide/kernel-parameters.txt      | 20 +++++++++++++-------
 Documentation/admin-guide/sysctl/kernel.rst          |  9 +++++----
 arch/arm/configs/aspeed_g5_defconfig                 |  2 +-
 kernel/configs/debug.config                          |  2 +-
 kernel/hung_task.c                                   | 15 ++++++++++-----
 lib/Kconfig.debug                                    |  9 +++++----
 tools/testing/selftests/wireguard/qemu/kernel.config |  2 +-
 7 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a51ab46..492f0bc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1992,14 +1992,20 @@
 			the added memory block itself do not be affected.
 
 	hung_task_panic=
-			[KNL] Should the hung task detector generate panics.
-			Format: 0 | 1
+			[KNL] Number of hung tasks to trigger kernel panic.
+			Format: <int>
+
+			When set to a non-zero value, a kernel panic will be triggered if
+			the number of detected hung tasks reaches this value.
+
+			0: don't panic
+			1: panic immediately on first hung task
+			N: panic after N hung tasks are detected in a single scan
 
-			A value of 1 instructs the kernel to panic when a
-			hung task is detected. The default value is controlled
-			by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
-			option. The value selected by this boot parameter can
-			be changed later by the kernel.hung_task_panic sysctl.
+			The default value is controlled by the
+			CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time option. The value
+			selected by this boot parameter can be changed later by the
+			kernel.hung_task_panic sysctl.
 
 	hvc_iucv=	[S390]	Number of z/VM IUCV hypervisor console (HVC)
 				terminal devices. Valid values: 0..8
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index f3ee807..0065a55 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -397,13 +397,14 @@ a hung task is detected.
 hung_task_panic
 ===============
 
-Controls the kernel's behavior when a hung task is detected.
+When set to a non-zero value, a kernel panic will be triggered if the
+number of hung tasks found during a single scan reaches this value.
 This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
 
-= =================================================
+= =======================================================
 0 Continue operation. This is the default behavior.
-1 Panic immediately.
-= =================================================
+N Panic when N hung tasks are found during a single scan.
+= =======================================================
 
 
 hung_task_check_count
diff --git a/arch/arm/configs/aspeed_g5_defconfig b/arch/arm/configs/aspeed_g5_defconfig
index 61cee1e..c3b0d5f 100644
--- a/arch/arm/configs/aspeed_g5_defconfig
+++ b/arch/arm/configs/aspeed_g5_defconfig
@@ -308,7 +308,7 @@ CONFIG_PANIC_ON_OOPS=y
 CONFIG_PANIC_TIMEOUT=-1
 CONFIG_SOFTLOCKUP_DETECTOR=y
 CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
-CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
+CONFIG_BOOTPARAM_HUNG_TASK_PANIC=1
 CONFIG_WQ_WATCHDOG=y
 # CONFIG_SCHED_DEBUG is not set
 CONFIG_FUNCTION_TRACER=y
diff --git a/kernel/configs/debug.config b/kernel/configs/debug.config
index e81327d..9f6ab7d 100644
--- a/kernel/configs/debug.config
+++ b/kernel/configs/debug.config
@@ -83,7 +83,7 @@ CONFIG_SLUB_DEBUG_ON=y
 #
 # Debug Oops, Lockups and Hangs
 #
-# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
+CONFIG_BOOTPARAM_HUNG_TASK_PANIC=0
 # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
 CONFIG_DEBUG_ATOMIC_SLEEP=y
 CONFIG_DETECT_HUNG_TASK=y
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index b2c1f14..84b4b04 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -81,7 +81,7 @@ static unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
  * hung task is detected:
  */
 static unsigned int __read_mostly sysctl_hung_task_panic =
-	IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC);
+	CONFIG_BOOTPARAM_HUNG_TASK_PANIC;
 
 static int
 hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
@@ -218,8 +218,11 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
 }
 #endif
 
-static void check_hung_task(struct task_struct *t, unsigned long timeout)
+static void check_hung_task(struct task_struct *t, unsigned long timeout,
+		unsigned long prev_detect_count)
 {
+	unsigned long total_hung_task;
+
 	if (!task_is_hung(t, timeout))
 		return;
 
@@ -229,9 +232,10 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 	 */
 	sysctl_hung_task_detect_count++;
 
+	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
 	trace_sched_process_hang(t);
 
-	if (sysctl_hung_task_panic) {
+	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
 		console_verbose();
 		hung_task_show_lock = true;
 		hung_task_call_panic = true;
@@ -300,6 +304,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	int max_count = sysctl_hung_task_check_count;
 	unsigned long last_break = jiffies;
 	struct task_struct *g, *t;
+	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
 
 	/*
 	 * If the system crashed already then all bets are off,
@@ -320,7 +325,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 			last_break = jiffies;
 		}
 
-		check_hung_task(t, timeout);
+		check_hung_task(t, timeout, prev_detect_count);
 	}
  unlock:
 	rcu_read_unlock();
@@ -389,7 +394,7 @@ static const struct ctl_table hung_task_sysctls[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
-		.extra2		= SYSCTL_ONE,
+		.extra2		= SYSCTL_INT_MAX,
 	},
 	{
 		.procname	= "hung_task_check_count",
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 3034e294..3976c90 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1258,12 +1258,13 @@ config DEFAULT_HUNG_TASK_TIMEOUT
 	  Keeping the default should be fine in most cases.
 
 config BOOTPARAM_HUNG_TASK_PANIC
-	bool "Panic (Reboot) On Hung Tasks"
+	int "Number of hung tasks to trigger kernel panic"
 	depends on DETECT_HUNG_TASK
+	default 0
 	help
-	  Say Y here to enable the kernel to panic on "hung tasks",
-	  which are bugs that cause the kernel to leave a task stuck
-	  in uninterruptible "D" state.
+	  When set to a non-zero value, a kernel panic will be triggered
+	  if the number of hung tasks found during a single scan reaches
+	  this value.
 
 	  The panic can be used in combination with panic_timeout,
 	  to cause the system to reboot automatically after a
diff --git a/tools/testing/selftests/wireguard/qemu/kernel.config b/tools/testing/selftests/wireguard/qemu/kernel.config
index 936b18b..0504c11 100644
--- a/tools/testing/selftests/wireguard/qemu/kernel.config
+++ b/tools/testing/selftests/wireguard/qemu/kernel.config
@@ -81,7 +81,7 @@ CONFIG_WQ_WATCHDOG=y
 CONFIG_DETECT_HUNG_TASK=y
 CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
 CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
-CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
+CONFIG_BOOTPARAM_HUNG_TASK_PANIC=1
 CONFIG_PANIC_TIMEOUT=-1
 CONFIG_STACKTRACE=y
 CONFIG_EARLY_PRINTK=y
-- 
2.9.4



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time
  2025-10-15  6:36 [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time lirongqing
@ 2025-10-16  5:07 ` Lance Yang
  2025-10-16  5:57   ` [外部邮件] " Li,Rongqing
  2025-10-16  8:02 ` Masami Hiramatsu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Lance Yang @ 2025-10-16  5:07 UTC (permalink / raw)
  To: lirongqing, Andrew Morton
  Cc: linux-doc, linux-arm-kernel, linux-aspeed, wireguard, netdev,
	linux-kselftest, Masami Hiramatsu, Andrew Jeffery,
	Anshuman Khandual, Arnd Bergmann, David Hildenbrand,
	Florian Wesphal, Jakub Kacinski, Jason A . Donenfeld,
	Joel Granados, Joel Stanley, Jonathan Corbet, Kees Cook,
	Liam Howlett, Lorenzo Stoakes, Paul E . McKenney, Pawan Gupta,
	Petr Mladek, Phil Auld, Randy Dunlap, Russell King, Shuah Khan,
	Simon Horman, Stanislav Fomichev, Steven Rostedt, linux-kernel

LGTM. It works as expected, thanks!

On 2025/10/15 14:36, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>

For the commit message, I'd suggest the following for better clarity:

```
The hung_task_panic sysctl is currently a blunt instrument: it's all
or nothing.

Panicking on a single hung task can be an overreaction to a transient
glitch. A more reliable indicator of a systemic problem is when multiple
tasks hang simultaneously.

Extend hung_task_panic to accept an integer threshold, allowing the kernel
to panic only when N hung tasks are detected in a single scan. This
provides finer control to distinguish between isolated incidents and
system-wide failures.

The accepted values are:
- 0: Don't panic (unchanged)
- 1: Panic on the first hung task (unchanged)
- N > 1: Panic after N hung tasks are detected in a single scan

The original behavior is preserved for values 0 and 1, maintaining full
backward compatibility.
```

If you agree, likely no need to resend - Andrew could pick it up
directly when applying :)

> 
> Currently, when 'hung_task_panic' is enabled, the kernel panics
> immediately upon detecting the first hung task. However, some hung
> tasks are transient and allow system recovery, while persistent hangs
> should trigger a panic when accumulating beyond a threshold.
> 
> Extend the 'hung_task_panic' sysctl to accept a threshold value
> specifying the number of hung tasks that must be detected before
> triggering a kernel panic. This provides finer control for environments
> where transient hangs may occur but persistent hangs should be fatal.
> 
> The sysctl now accepts:
> - 0: don't panic (maintains original behavior)
> - 1: panic on first hung task (maintains original behavior)
> - N > 1: panic after N hung tasks are detected in a single scan
> 
> This maintains backward compatibility while providing flexibility for
> different hang scenarios.
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> Cc: Andrew Jeffery <andrew@codeconstruct.com.au>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Florian Wesphal <fw@strlen.de>
> Cc: Jakub Kacinski <kuba@kernel.org>
> Cc: Jason A. Donenfeld <jason@zx2c4.com>
> Cc: Joel Granados <joel.granados@kernel.org>
> Cc: Joel Stanley <joel@jms.id.au>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Kees Cook <kees@kernel.org>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Liam Howlett <liam.howlett@oracle.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
> Cc: "Paul E . McKenney" <paulmck@kernel.org>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Phil Auld <pauld@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Simon Horman <horms@kernel.org>
> Cc: Stanislav Fomichev <sdf@fomichev.me>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> ---

So:

Reviewed-by: Lance Yang <lance.yang@linux.dev>
Tested-by: Lance Yang <lance.yang@linux.dev>

Cheers,
Lance


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [外部邮件] Re: [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time
  2025-10-16  5:07 ` Lance Yang
@ 2025-10-16  5:57   ` Li,Rongqing
  2025-10-16 20:50     ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Li,Rongqing @ 2025-10-16  5:57 UTC (permalink / raw)
  To: Lance Yang, Andrew Morton
  Cc: linux-doc@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-aspeed@lists.ozlabs.org, wireguard@lists.zx2c4.com,
	netdev@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Masami Hiramatsu, Andrew Jeffery, Anshuman Khandual,
	Arnd Bergmann, David Hildenbrand, Florian Wesphal, Jakub Kacinski,
	Jason A . Donenfeld, Joel Granados, Joel Stanley, Jonathan Corbet,
	Kees Cook, Liam Howlett, Lorenzo Stoakes, Paul E . McKenney,
	Pawan Gupta, Petr Mladek, Phil Auld, Randy Dunlap, Russell King,
	Shuah Khan, Simon Horman, Stanislav Fomichev, Steven Rostedt,
	linux-kernel@vger.kernel.org


> LGTM. It works as expected, thanks!
> 
> On 2025/10/15 14:36, lirongqing wrote:
> > From: Li RongQing <lirongqing@baidu.com>
> 
> For the commit message, I'd suggest the following for better clarity:
> 
> ```
> The hung_task_panic sysctl is currently a blunt instrument: it's all or nothing.
> 
> Panicking on a single hung task can be an overreaction to a transient glitch. A
> more reliable indicator of a systemic problem is when multiple tasks hang
> simultaneously.
> 
> Extend hung_task_panic to accept an integer threshold, allowing the kernel
> to panic only when N hung tasks are detected in a single scan. This provides
> finer control to distinguish between isolated incidents and system-wide
> failures.
> 
> The accepted values are:
> - 0: Don't panic (unchanged)
> - 1: Panic on the first hung task (unchanged)
> - N > 1: Panic after N hung tasks are detected in a single scan
> 
> The original behavior is preserved for values 0 and 1, maintaining full
> backward compatibility.
> ```
> 
> If you agree, likely no need to resend - Andrew could pick it up directly when
> applying :)
> 

This is better;

Andrew, could you pick it up directly

Thanks

-Li

> >
> > Currently, when 'hung_task_panic' is enabled, the kernel panics
> > immediately upon detecting the first hung task. However, some hung
> > tasks are transient and allow system recovery, while persistent hangs
> > should trigger a panic when accumulating beyond a threshold.
> >
> > Extend the 'hung_task_panic' sysctl to accept a threshold value
> > specifying the number of hung tasks that must be detected before
> > triggering a kernel panic. This provides finer control for
> > environments where transient hangs may occur but persistent hangs
> should be fatal.
> >
> > The sysctl now accepts:
> > - 0: don't panic (maintains original behavior)
> > - 1: panic on first hung task (maintains original behavior)
> > - N > 1: panic after N hung tasks are detected in a single scan
> >
> > This maintains backward compatibility while providing flexibility for
> > different hang scenarios.
> >
> > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > Cc: Andrew Jeffery <andrew@codeconstruct.com.au>
> > Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Florian Wesphal <fw@strlen.de>
> > Cc: Jakub Kacinski <kuba@kernel.org>
> > Cc: Jason A. Donenfeld <jason@zx2c4.com>
> > Cc: Joel Granados <joel.granados@kernel.org>
> > Cc: Joel Stanley <joel@jms.id.au>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Kees Cook <kees@kernel.org>
> > Cc: Lance Yang <lance.yang@linux.dev>
> > Cc: Liam Howlett <liam.howlett@oracle.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
> > Cc: "Paul E . McKenney" <paulmck@kernel.org>
> > Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > Cc: Petr Mladek <pmladek@suse.com>
> > Cc: Phil Auld <pauld@redhat.com>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Russell King <linux@armlinux.org.uk>
> > Cc: Shuah Khan <shuah@kernel.org>
> > Cc: Simon Horman <horms@kernel.org>
> > Cc: Stanislav Fomichev <sdf@fomichev.me>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > ---
> 
> So:
> 
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Tested-by: Lance Yang <lance.yang@linux.dev>
> 
> Cheers,
> Lance


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time
  2025-10-15  6:36 [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time lirongqing
  2025-10-16  5:07 ` Lance Yang
@ 2025-10-16  8:02 ` Masami Hiramatsu
  2025-10-16 12:47 ` Paul Menzel
  2025-10-17  5:17 ` Andrew Jeffery
  3 siblings, 0 replies; 8+ messages in thread
From: Masami Hiramatsu @ 2025-10-16  8:02 UTC (permalink / raw)
  To: lirongqing
  Cc: Andrew Morton, Lance Yang, linux-kernel, linux-doc,
	linux-arm-kernel, linux-aspeed, wireguard, netdev,
	linux-kselftest, Andrew Jeffery, Anshuman Khandual, Arnd Bergmann,
	David Hildenbrand, Florian Wesphal, Jakub Kacinski,
	Jason A . Donenfeld, Joel Granados, Joel Stanley, Jonathan Corbet,
	Kees Cook, Liam Howlett, Lorenzo Stoakes, Paul E . McKenney,
	Pawan Gupta, Petr Mladek, Phil Auld, Randy Dunlap, Russell King,
	Shuah Khan, Simon Horman, Stanislav Fomichev, Steven Rostedt

On Wed, 15 Oct 2025 14:36:15 +0800
lirongqing <lirongqing@baidu.com> wrote:

> From: Li RongQing <lirongqing@baidu.com>
> 
> Currently, when 'hung_task_panic' is enabled, the kernel panics
> immediately upon detecting the first hung task. However, some hung
> tasks are transient and allow system recovery, while persistent hangs
> should trigger a panic when accumulating beyond a threshold.
> 
> Extend the 'hung_task_panic' sysctl to accept a threshold value
> specifying the number of hung tasks that must be detected before
> triggering a kernel panic. This provides finer control for environments
> where transient hangs may occur but persistent hangs should be fatal.
> 
> The sysctl now accepts:
> - 0: don't panic (maintains original behavior)
> - 1: panic on first hung task (maintains original behavior)
> - N > 1: panic after N hung tasks are detected in a single scan
> 
> This maintains backward compatibility while providing flexibility for
> different hang scenarios.

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thank you,

> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> Cc: Andrew Jeffery <andrew@codeconstruct.com.au>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Florian Wesphal <fw@strlen.de>
> Cc: Jakub Kacinski <kuba@kernel.org>
> Cc: Jason A. Donenfeld <jason@zx2c4.com>
> Cc: Joel Granados <joel.granados@kernel.org>
> Cc: Joel Stanley <joel@jms.id.au>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Kees Cook <kees@kernel.org>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Liam Howlett <liam.howlett@oracle.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
> Cc: "Paul E . McKenney" <paulmck@kernel.org>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Phil Auld <pauld@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Simon Horman <horms@kernel.org>
> Cc: Stanislav Fomichev <sdf@fomichev.me>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> ---
> diff with v3: comments modification, suggested by Lance, Masami, Randy and Petr
> diff with v2: do not add a new sysctl, extend hung_task_panic, suggested by Kees Cook
> 
>  Documentation/admin-guide/kernel-parameters.txt      | 20 +++++++++++++-------
>  Documentation/admin-guide/sysctl/kernel.rst          |  9 +++++----
>  arch/arm/configs/aspeed_g5_defconfig                 |  2 +-
>  kernel/configs/debug.config                          |  2 +-
>  kernel/hung_task.c                                   | 15 ++++++++++-----
>  lib/Kconfig.debug                                    |  9 +++++----
>  tools/testing/selftests/wireguard/qemu/kernel.config |  2 +-
>  7 files changed, 36 insertions(+), 23 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index a51ab46..492f0bc 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1992,14 +1992,20 @@
>  			the added memory block itself do not be affected.
>  
>  	hung_task_panic=
> -			[KNL] Should the hung task detector generate panics.
> -			Format: 0 | 1
> +			[KNL] Number of hung tasks to trigger kernel panic.
> +			Format: <int>
> +
> +			When set to a non-zero value, a kernel panic will be triggered if
> +			the number of detected hung tasks reaches this value.
> +
> +			0: don't panic
> +			1: panic immediately on first hung task
> +			N: panic after N hung tasks are detected in a single scan
>  
> -			A value of 1 instructs the kernel to panic when a
> -			hung task is detected. The default value is controlled
> -			by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
> -			option. The value selected by this boot parameter can
> -			be changed later by the kernel.hung_task_panic sysctl.
> +			The default value is controlled by the
> +			CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time option. The value
> +			selected by this boot parameter can be changed later by the
> +			kernel.hung_task_panic sysctl.
>  
>  	hvc_iucv=	[S390]	Number of z/VM IUCV hypervisor console (HVC)
>  				terminal devices. Valid values: 0..8
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index f3ee807..0065a55 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -397,13 +397,14 @@ a hung task is detected.
>  hung_task_panic
>  ===============
>  
> -Controls the kernel's behavior when a hung task is detected.
> +When set to a non-zero value, a kernel panic will be triggered if the
> +number of hung tasks found during a single scan reaches this value.
>  This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>  
> -= =================================================
> += =======================================================
>  0 Continue operation. This is the default behavior.
> -1 Panic immediately.
> -= =================================================
> +N Panic when N hung tasks are found during a single scan.
> += =======================================================
>  
>  
>  hung_task_check_count
> diff --git a/arch/arm/configs/aspeed_g5_defconfig b/arch/arm/configs/aspeed_g5_defconfig
> index 61cee1e..c3b0d5f 100644
> --- a/arch/arm/configs/aspeed_g5_defconfig
> +++ b/arch/arm/configs/aspeed_g5_defconfig
> @@ -308,7 +308,7 @@ CONFIG_PANIC_ON_OOPS=y
>  CONFIG_PANIC_TIMEOUT=-1
>  CONFIG_SOFTLOCKUP_DETECTOR=y
>  CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> -CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
> +CONFIG_BOOTPARAM_HUNG_TASK_PANIC=1
>  CONFIG_WQ_WATCHDOG=y
>  # CONFIG_SCHED_DEBUG is not set
>  CONFIG_FUNCTION_TRACER=y
> diff --git a/kernel/configs/debug.config b/kernel/configs/debug.config
> index e81327d..9f6ab7d 100644
> --- a/kernel/configs/debug.config
> +++ b/kernel/configs/debug.config
> @@ -83,7 +83,7 @@ CONFIG_SLUB_DEBUG_ON=y
>  #
>  # Debug Oops, Lockups and Hangs
>  #
> -# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
> +CONFIG_BOOTPARAM_HUNG_TASK_PANIC=0
>  # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
>  CONFIG_DEBUG_ATOMIC_SLEEP=y
>  CONFIG_DETECT_HUNG_TASK=y
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index b2c1f14..84b4b04 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -81,7 +81,7 @@ static unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
>   * hung task is detected:
>   */
>  static unsigned int __read_mostly sysctl_hung_task_panic =
> -	IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC);
> +	CONFIG_BOOTPARAM_HUNG_TASK_PANIC;
>  
>  static int
>  hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
> @@ -218,8 +218,11 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
>  }
>  #endif
>  
> -static void check_hung_task(struct task_struct *t, unsigned long timeout)
> +static void check_hung_task(struct task_struct *t, unsigned long timeout,
> +		unsigned long prev_detect_count)
>  {
> +	unsigned long total_hung_task;
> +
>  	if (!task_is_hung(t, timeout))
>  		return;
>  
> @@ -229,9 +232,10 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
>  	 */
>  	sysctl_hung_task_detect_count++;
>  
> +	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
>  	trace_sched_process_hang(t);
>  
> -	if (sysctl_hung_task_panic) {
> +	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
>  		console_verbose();
>  		hung_task_show_lock = true;
>  		hung_task_call_panic = true;
> @@ -300,6 +304,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  	int max_count = sysctl_hung_task_check_count;
>  	unsigned long last_break = jiffies;
>  	struct task_struct *g, *t;
> +	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
>  
>  	/*
>  	 * If the system crashed already then all bets are off,
> @@ -320,7 +325,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  			last_break = jiffies;
>  		}
>  
> -		check_hung_task(t, timeout);
> +		check_hung_task(t, timeout, prev_detect_count);
>  	}
>   unlock:
>  	rcu_read_unlock();
> @@ -389,7 +394,7 @@ static const struct ctl_table hung_task_sysctls[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec_minmax,
>  		.extra1		= SYSCTL_ZERO,
> -		.extra2		= SYSCTL_ONE,
> +		.extra2		= SYSCTL_INT_MAX,
>  	},
>  	{
>  		.procname	= "hung_task_check_count",
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 3034e294..3976c90 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1258,12 +1258,13 @@ config DEFAULT_HUNG_TASK_TIMEOUT
>  	  Keeping the default should be fine in most cases.
>  
>  config BOOTPARAM_HUNG_TASK_PANIC
> -	bool "Panic (Reboot) On Hung Tasks"
> +	int "Number of hung tasks to trigger kernel panic"
>  	depends on DETECT_HUNG_TASK
> +	default 0
>  	help
> -	  Say Y here to enable the kernel to panic on "hung tasks",
> -	  which are bugs that cause the kernel to leave a task stuck
> -	  in uninterruptible "D" state.
> +	  When set to a non-zero value, a kernel panic will be triggered
> +	  if the number of hung tasks found during a single scan reaches
> +	  this value.
>  
>  	  The panic can be used in combination with panic_timeout,
>  	  to cause the system to reboot automatically after a
> diff --git a/tools/testing/selftests/wireguard/qemu/kernel.config b/tools/testing/selftests/wireguard/qemu/kernel.config
> index 936b18b..0504c11 100644
> --- a/tools/testing/selftests/wireguard/qemu/kernel.config
> +++ b/tools/testing/selftests/wireguard/qemu/kernel.config
> @@ -81,7 +81,7 @@ CONFIG_WQ_WATCHDOG=y
>  CONFIG_DETECT_HUNG_TASK=y
>  CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
>  CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
> -CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
> +CONFIG_BOOTPARAM_HUNG_TASK_PANIC=1
>  CONFIG_PANIC_TIMEOUT=-1
>  CONFIG_STACKTRACE=y
>  CONFIG_EARLY_PRINTK=y
> -- 
> 2.9.4
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time
  2025-10-15  6:36 [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time lirongqing
  2025-10-16  5:07 ` Lance Yang
  2025-10-16  8:02 ` Masami Hiramatsu
@ 2025-10-16 12:47 ` Paul Menzel
  2025-10-17  2:09   ` [外部邮件] " Li,Rongqing
  2025-10-17  5:17 ` Andrew Jeffery
  3 siblings, 1 reply; 8+ messages in thread
From: Paul Menzel @ 2025-10-16 12:47 UTC (permalink / raw)
  To: Li RongQing
  Cc: Andrew Morton, Lance Yang, Masami Hiramatsu, linux-kernel,
	linux-doc, linux-arm-kernel, linux-aspeed, wireguard, netdev,
	linux-kselftest, Andrew Jeffery, Anshuman Khandual, Arnd Bergmann,
	David Hildenbrand, Florian Wesphal, Jakub Kacinski,
	Jason A . Donenfeld, Joel Granados, Joel Stanley, Jonathan Corbet,
	Kees Cook, Liam Howlett, Lorenzo Stoakes, Paul E . McKenney,
	Pawan Gupta, Petr Mladek, Phil Auld, Randy Dunlap, Russell King,
	Shuah Khan, Simon Horman, Stanislav Fomichev, Steven Rostedt

Dear RongQing,


Thank you for the patch. One minor comment regarding the Kconfig 
description.

Am 15.10.25 um 08:36 schrieb lirongqing:
> From: Li RongQing <lirongqing@baidu.com>
> 
> Currently, when 'hung_task_panic' is enabled, the kernel panics
> immediately upon detecting the first hung task. However, some hung
> tasks are transient and allow system recovery, while persistent hangs
> should trigger a panic when accumulating beyond a threshold.
> 
> Extend the 'hung_task_panic' sysctl to accept a threshold value
> specifying the number of hung tasks that must be detected before
> triggering a kernel panic. This provides finer control for environments
> where transient hangs may occur but persistent hangs should be fatal.
> 
> The sysctl now accepts:
> - 0: don't panic (maintains original behavior)
> - 1: panic on first hung task (maintains original behavior)
> - N > 1: panic after N hung tasks are detected in a single scan
> 
> This maintains backward compatibility while providing flexibility for
> different hang scenarios.
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> Cc: Andrew Jeffery <andrew@codeconstruct.com.au>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Florian Wesphal <fw@strlen.de>
> Cc: Jakub Kacinski <kuba@kernel.org>
> Cc: Jason A. Donenfeld <jason@zx2c4.com>
> Cc: Joel Granados <joel.granados@kernel.org>
> Cc: Joel Stanley <joel@jms.id.au>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Kees Cook <kees@kernel.org>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Liam Howlett <liam.howlett@oracle.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
> Cc: "Paul E . McKenney" <paulmck@kernel.org>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Phil Auld <pauld@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Simon Horman <horms@kernel.org>
> Cc: Stanislav Fomichev <sdf@fomichev.me>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> ---
> diff with v3: comments modification, suggested by Lance, Masami, Randy and Petr
> diff with v2: do not add a new sysctl, extend hung_task_panic, suggested by Kees Cook
> 
>   Documentation/admin-guide/kernel-parameters.txt      | 20 +++++++++++++-------
>   Documentation/admin-guide/sysctl/kernel.rst          |  9 +++++----
>   arch/arm/configs/aspeed_g5_defconfig                 |  2 +-
>   kernel/configs/debug.config                          |  2 +-
>   kernel/hung_task.c                                   | 15 ++++++++++-----
>   lib/Kconfig.debug                                    |  9 +++++----
>   tools/testing/selftests/wireguard/qemu/kernel.config |  2 +-
>   7 files changed, 36 insertions(+), 23 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index a51ab46..492f0bc 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1992,14 +1992,20 @@
>   			the added memory block itself do not be affected.
>   
>   	hung_task_panic=
> -			[KNL] Should the hung task detector generate panics.
> -			Format: 0 | 1
> +			[KNL] Number of hung tasks to trigger kernel panic.
> +			Format: <int>
> +
> +			When set to a non-zero value, a kernel panic will be triggered if
> +			the number of detected hung tasks reaches this value.
> +
> +			0: don't panic
> +			1: panic immediately on first hung task
> +			N: panic after N hung tasks are detected in a single scan
>   
> -			A value of 1 instructs the kernel to panic when a
> -			hung task is detected. The default value is controlled
> -			by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
> -			option. The value selected by this boot parameter can
> -			be changed later by the kernel.hung_task_panic sysctl.
> +			The default value is controlled by the
> +			CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time option. The value
> +			selected by this boot parameter can be changed later by the
> +			kernel.hung_task_panic sysctl.
>   
>   	hvc_iucv=	[S390]	Number of z/VM IUCV hypervisor console (HVC)
>   				terminal devices. Valid values: 0..8
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index f3ee807..0065a55 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -397,13 +397,14 @@ a hung task is detected.
>   hung_task_panic
>   ===============
>   
> -Controls the kernel's behavior when a hung task is detected.
> +When set to a non-zero value, a kernel panic will be triggered if the
> +number of hung tasks found during a single scan reaches this value.
>   This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>   
> -= =================================================
> += =======================================================
>   0 Continue operation. This is the default behavior.
> -1 Panic immediately.
> -= =================================================
> +N Panic when N hung tasks are found during a single scan.
> += =======================================================
>   
>   
>   hung_task_check_count

[…]

> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 3034e294..3976c90 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1258,12 +1258,13 @@ config DEFAULT_HUNG_TASK_TIMEOUT
>   	  Keeping the default should be fine in most cases.
>   
>   config BOOTPARAM_HUNG_TASK_PANIC
> -	bool "Panic (Reboot) On Hung Tasks"
> +	int "Number of hung tasks to trigger kernel panic"
>   	depends on DETECT_HUNG_TASK
> +	default 0
>   	help
> -	  Say Y here to enable the kernel to panic on "hung tasks",
> -	  which are bugs that cause the kernel to leave a task stuck
> -	  in uninterruptible "D" state.
> +	  When set to a non-zero value, a kernel panic will be triggered
> +	  if the number of hung tasks found during a single scan reaches
> +	  this value.
>   
>   	  The panic can be used in combination with panic_timeout,
>   	  to cause the system to reboot automatically after a
Why not leave the sentence about the uninterruptible "D" state in there?

Also, it sounds like, some are actually using this in production. Maybe 
it should be moved out of `Kconfig.debug` too?


Kind regards,

Paul


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [外部邮件] Re: [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time
  2025-10-16  5:57   ` [外部邮件] " Li,Rongqing
@ 2025-10-16 20:50     ` Andrew Morton
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2025-10-16 20:50 UTC (permalink / raw)
  To: Li,Rongqing
  Cc: Lance Yang, linux-doc@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-aspeed@lists.ozlabs.org, wireguard@lists.zx2c4.com,
	netdev@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Masami Hiramatsu, Andrew Jeffery, Anshuman Khandual,
	Arnd Bergmann, David Hildenbrand, Florian Wesphal, Jakub Kacinski,
	Jason A . Donenfeld, Joel Granados, Joel Stanley, Jonathan Corbet,
	Kees Cook, Liam Howlett, Lorenzo Stoakes, Paul E . McKenney,
	Pawan Gupta, Petr Mladek, Phil Auld, Randy Dunlap, Russell King,
	Shuah Khan, Simon Horman, Stanislav Fomichev, Steven Rostedt,
	linux-kernel@vger.kernel.org

On Thu, 16 Oct 2025 05:57:34 +0000 "Li,Rongqing" <lirongqing@baidu.com> wrote:

> > If you agree, likely no need to resend - Andrew could pick it up directly when
> > applying :)
> > 
> 
> This is better;
> 
> Andrew, could you pick it up directly

No problems, thanks.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [外部邮件] Re: [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time
  2025-10-16 12:47 ` Paul Menzel
@ 2025-10-17  2:09   ` Li,Rongqing
  0 siblings, 0 replies; 8+ messages in thread
From: Li,Rongqing @ 2025-10-17  2:09 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Andrew Morton, Lance Yang, Masami Hiramatsu,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-aspeed@lists.ozlabs.org, wireguard@lists.zx2c4.com,
	netdev@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Andrew Jeffery, Anshuman Khandual, Arnd Bergmann,
	David Hildenbrand, Florian Wesphal, Jakub Kacinski,
	Jason A . Donenfeld, Joel Granados, Joel Stanley, Jonathan Corbet,
	Kees Cook, Liam Howlett, Lorenzo Stoakes, Paul E . McKenney,
	Pawan Gupta, Petr Mladek, Phil Auld, Randy Dunlap, Russell King,
	Shuah Khan, Simon Horman, Stanislav Fomichev, Steven Rostedt


> 
> Am 15.10.25 um 08:36 schrieb lirongqing:
> > From: Li RongQing <lirongqing@baidu.com>
> >
> > Currently, when 'hung_task_panic' is enabled, the kernel panics
> > immediately upon detecting the first hung task. However, some hung
> > tasks are transient and allow system recovery, while persistent hangs
> > should trigger a panic when accumulating beyond a threshold.
> >
> > Extend the 'hung_task_panic' sysctl to accept a threshold value
> > specifying the number of hung tasks that must be detected before
> > triggering a kernel panic. This provides finer control for
> > environments where transient hangs may occur but persistent hangs
> should be fatal.
> >
> > The sysctl now accepts:
> > - 0: don't panic (maintains original behavior)
> > - 1: panic on first hung task (maintains original behavior)
> > - N > 1: panic after N hung tasks are detected in a single scan
> >
> > This maintains backward compatibility while providing flexibility for
> > different hang scenarios.
> >
> > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > Cc: Andrew Jeffery <andrew@codeconstruct.com.au>
> > Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Florian Wesphal <fw@strlen.de>
> > Cc: Jakub Kacinski <kuba@kernel.org>
> > Cc: Jason A. Donenfeld <jason@zx2c4.com>
> > Cc: Joel Granados <joel.granados@kernel.org>
> > Cc: Joel Stanley <joel@jms.id.au>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Kees Cook <kees@kernel.org>
> > Cc: Lance Yang <lance.yang@linux.dev>
> > Cc: Liam Howlett <liam.howlett@oracle.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
> > Cc: "Paul E . McKenney" <paulmck@kernel.org>
> > Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > Cc: Petr Mladek <pmladek@suse.com>
> > Cc: Phil Auld <pauld@redhat.com>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Russell King <linux@armlinux.org.uk>
> > Cc: Shuah Khan <shuah@kernel.org>
> > Cc: Simon Horman <horms@kernel.org>
> > Cc: Stanislav Fomichev <sdf@fomichev.me>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > ---
> > diff with v3: comments modification, suggested by Lance, Masami, Randy
> > and Petr diff with v2: do not add a new sysctl, extend
> > hung_task_panic, suggested by Kees Cook
> >
> >   Documentation/admin-guide/kernel-parameters.txt      | 20
> +++++++++++++-------
> >   Documentation/admin-guide/sysctl/kernel.rst          |  9 +++++----
> >   arch/arm/configs/aspeed_g5_defconfig                 |  2 +-
> >   kernel/configs/debug.config                          |  2 +-
> >   kernel/hung_task.c                                   | 15
> ++++++++++-----
> >   lib/Kconfig.debug                                    |  9
> +++++----
> >   tools/testing/selftests/wireguard/qemu/kernel.config |  2 +-
> >   7 files changed, 36 insertions(+), 23 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index a51ab46..492f0bc 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1992,14 +1992,20 @@
> >   			the added memory block itself do not be affected.
> >
> >   	hung_task_panic=
> > -			[KNL] Should the hung task detector generate panics.
> > -			Format: 0 | 1
> > +			[KNL] Number of hung tasks to trigger kernel panic.
> > +			Format: <int>
> > +
> > +			When set to a non-zero value, a kernel panic will be triggered
> if
> > +			the number of detected hung tasks reaches this value.
> > +
> > +			0: don't panic
> > +			1: panic immediately on first hung task
> > +			N: panic after N hung tasks are detected in a single scan
> >
> > -			A value of 1 instructs the kernel to panic when a
> > -			hung task is detected. The default value is controlled
> > -			by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
> > -			option. The value selected by this boot parameter can
> > -			be changed later by the kernel.hung_task_panic sysctl.
> > +			The default value is controlled by the
> > +			CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time option.
> The value
> > +			selected by this boot parameter can be changed later by the
> > +			kernel.hung_task_panic sysctl.
> >
> >   	hvc_iucv=	[S390]	Number of z/VM IUCV hypervisor console
> (HVC)
> >   				terminal devices. Valid values: 0..8 diff --git
> > a/Documentation/admin-guide/sysctl/kernel.rst
> > b/Documentation/admin-guide/sysctl/kernel.rst
> > index f3ee807..0065a55 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -397,13 +397,14 @@ a hung task is detected.
> >   hung_task_panic
> >   ===============
> >
> > -Controls the kernel's behavior when a hung task is detected.
> > +When set to a non-zero value, a kernel panic will be triggered if the
> > +number of hung tasks found during a single scan reaches this value.
> >   This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
> >
> > -= =================================================
> > += =======================================================
> >   0 Continue operation. This is the default behavior.
> > -1 Panic immediately.
> > -= =================================================
> > +N Panic when N hung tasks are found during a single scan.
> > += =======================================================
> >
> >
> >   hung_task_check_count
> 
> […]
> 
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index
> > 3034e294..3976c90 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1258,12 +1258,13 @@ config DEFAULT_HUNG_TASK_TIMEOUT
> >   	  Keeping the default should be fine in most cases.
> >
> >   config BOOTPARAM_HUNG_TASK_PANIC
> > -	bool "Panic (Reboot) On Hung Tasks"
> > +	int "Number of hung tasks to trigger kernel panic"
> >   	depends on DETECT_HUNG_TASK
> > +	default 0
> >   	help
> > -	  Say Y here to enable the kernel to panic on "hung tasks",
> > -	  which are bugs that cause the kernel to leave a task stuck
> > -	  in uninterruptible "D" state.
> > +	  When set to a non-zero value, a kernel panic will be triggered
> > +	  if the number of hung tasks found during a single scan reaches
> > +	  this value.
> >
> >   	  The panic can be used in combination with panic_timeout,
> >   	  to cause the system to reboot automatically after a
> Why not leave the sentence about the uninterruptible "D" state in there?
> 
This seem to say a kernel bug to cause hung task, but it maybe hardware failure(or virtio backend bug); so I do not keep it

> Also, it sounds like, some are actually using this in production. Maybe it
> should be moved out of `Kconfig.debug` too?
> 

I think hung task panic is a useful feature, it should move out of Kconfig.debug

Thanks

-Li

> 
> Kind regards,
> 
> Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time
  2025-10-15  6:36 [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time lirongqing
                   ` (2 preceding siblings ...)
  2025-10-16 12:47 ` Paul Menzel
@ 2025-10-17  5:17 ` Andrew Jeffery
  3 siblings, 0 replies; 8+ messages in thread
From: Andrew Jeffery @ 2025-10-17  5:17 UTC (permalink / raw)
  To: lirongqing, Andrew Morton, Lance Yang, Masami Hiramatsu,
	linux-kernel
  Cc: linux-doc, linux-arm-kernel, linux-aspeed, wireguard, netdev,
	linux-kselftest, Anshuman Khandual, Arnd Bergmann,
	David Hildenbrand, Florian Wesphal, Jakub Kacinski,
	Jason A . Donenfeld, Joel Granados, Joel Stanley, Jonathan Corbet,
	Kees Cook, Liam Howlett, Lorenzo Stoakes, Paul E . McKenney,
	Pawan Gupta, Petr Mladek, Phil Auld, Randy Dunlap, Russell King,
	Shuah Khan, Simon Horman, Stanislav Fomichev, Steven Rostedt

On Wed, 2025-10-15 at 14:36 +0800, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> Currently, when 'hung_task_panic' is enabled, the kernel panics
> immediately upon detecting the first hung task. However, some hung
> tasks are transient and allow system recovery, while persistent hangs
> should trigger a panic when accumulating beyond a threshold.
> 
> Extend the 'hung_task_panic' sysctl to accept a threshold value
> specifying the number of hung tasks that must be detected before
> triggering a kernel panic. This provides finer control for environments
> where transient hangs may occur but persistent hangs should be fatal.
> 
> The sysctl now accepts:
> - 0: don't panic (maintains original behavior)
> - 1: panic on first hung task (maintains original behavior)
> - N > 1: panic after N hung tasks are detected in a single scan
> 
> This maintains backward compatibility while providing flexibility for
> different hang scenarios.
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> Cc: Andrew Jeffery <andrew@codeconstruct.com.au>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Florian Wesphal <fw@strlen.de>
> Cc: Jakub Kacinski <kuba@kernel.org>
> Cc: Jason A. Donenfeld <jason@zx2c4.com>
> Cc: Joel Granados <joel.granados@kernel.org>
> Cc: Joel Stanley <joel@jms.id.au>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Kees Cook <kees@kernel.org>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Liam Howlett <liam.howlett@oracle.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
> Cc: "Paul E . McKenney" <paulmck@kernel.org>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Phil Auld <pauld@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Simon Horman <horms@kernel.org>
> Cc: Stanislav Fomichev <sdf@fomichev.me>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> ---
> diff with v3: comments modification, suggested by Lance, Masami, Randy and Petr
> diff with v2: do not add a new sysctl, extend hung_task_panic, suggested by Kees Cook
> 
>  Documentation/admin-guide/kernel-parameters.txt      | 20 +++++++++++++-------
>  Documentation/admin-guide/sysctl/kernel.rst          |  9 +++++----
>  arch/arm/configs/aspeed_g5_defconfig                 |  2 +-

For the aspeed_g5_defconfig change:

Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-10-18  0:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-15  6:36 [PATCH][v4] hung_task: Panic when there are more than N hung tasks at the same time lirongqing
2025-10-16  5:07 ` Lance Yang
2025-10-16  5:57   ` [外部邮件] " Li,Rongqing
2025-10-16 20:50     ` Andrew Morton
2025-10-16  8:02 ` Masami Hiramatsu
2025-10-16 12:47 ` Paul Menzel
2025-10-17  2:09   ` [外部邮件] " Li,Rongqing
2025-10-17  5:17 ` Andrew Jeffery

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).