[PATCH][v2] hung_task: Panic after fixed number of hung tasks

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH][v2] hung_task: Panic after fixed number of hung tasks
@ 2025-09-28  5:31 lirongqing
  2025-09-28  6:55 ` Lance Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: lirongqing @ 2025-09-28  5:31 UTC (permalink / raw)
  To: corbet, akpm, lance.yang, mhiramat, paulmck, pawan.kumar.gupta,
	mingo, dave.hansen, rostedt, kees, arnd, lirongqing, feng.tang,
	pauld, joel.granados, linux-doc, linux-kernel

From: Li RongQing <lirongqing@baidu.com>

Currently, when hung_task_panic is enabled, kernel will panic immediately
upon detecting the first hung task. However, some hung tasks are transient
and the system can recover fully, while others are unrecoverable and
trigger consecutive hung task reports, and a panic is expected.

This commit adds a new sysctl parameter hung_task_count_to_panic to allows
specifying the number of consecutive hung tasks that must be detected
before triggering a kernel panic. This provides finer control for
environments where transient hangs maybe happen but persistent hangs should
still be fatal.

Acked-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
Diff with v1: change documentation as Lance suggested

 Documentation/admin-guide/sysctl/kernel.rst |  8 ++++++++
 kernel/hung_task.c                          | 14 +++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 8b49eab..98b47a7 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -405,6 +405,14 @@ This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
 1 Panic immediately.
 = =================================================
 
+hung_task_count_to_panic
+=====================
+
+When set to a non-zero value, a kernel panic will be triggered if the
+number of detected hung tasks reaches this value.
+
+Note that setting hung_task_panic=1 will still cause an immediate panic
+on the first hung task.
 
 hung_task_check_count
 =====================
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 8708a12..87a6421 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -83,6 +83,8 @@ static unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
 static unsigned int __read_mostly sysctl_hung_task_panic =
 	IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC);
 
+static unsigned int __read_mostly sysctl_hung_task_count_to_panic;
+
 static int
 hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
 {
@@ -219,7 +221,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 
 	trace_sched_process_hang(t);
 
-	if (sysctl_hung_task_panic) {
+	if (sysctl_hung_task_panic ||
+	    (sysctl_hung_task_count_to_panic &&
+	     (sysctl_hung_task_detect_count >= sysctl_hung_task_count_to_panic))) {
 		console_verbose();
 		hung_task_show_lock = true;
 		hung_task_call_panic = true;
@@ -388,6 +392,14 @@ static const struct ctl_table hung_task_sysctls[] = {
 		.extra2		= SYSCTL_ONE,
 	},
 	{
+		.procname	= "hung_task_count_to_panic",
+		.data		= &sysctl_hung_task_count_to_panic,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+	},
+	{
 		.procname	= "hung_task_check_count",
 		.data		= &sysctl_hung_task_check_count,
 		.maxlen		= sizeof(int),
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks
  2025-09-28  5:31 [PATCH][v2] hung_task: Panic after fixed number of hung tasks lirongqing
@ 2025-09-28  6:55 ` Lance Yang
  2025-09-28  7:03   ` [外部邮件] " Li,Rongqing
  2025-09-29  0:47 ` Masami Hiramatsu
  2025-10-11  0:25 ` Randy Dunlap
  2 siblings, 1 reply; 11+ messages in thread
From: Lance Yang @ 2025-09-28  6:55 UTC (permalink / raw)
  To: lirongqing
  Cc: corbet, mingo, pauld, joel.granados, arnd, linux-kernel,
	linux-doc, dave.hansen, akpm, feng.tang, kees, mhiramat, paulmck,
	pawan.kumar.gupta, rostedt

Hey Li,

On 2025/9/28 13:31, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> Currently, when hung_task_panic is enabled, kernel will panic immediately
> upon detecting the first hung task. However, some hung tasks are transient
> and the system can recover fully, while others are unrecoverable and
> trigger consecutive hung task reports, and a panic is expected.
> 
> This commit adds a new sysctl parameter hung_task_count_to_panic to allows
> specifying the number of consecutive hung tasks that must be detected
> before triggering a kernel panic. This provides finer control for
> environments where transient hangs maybe happen but persistent hangs should
> still be fatal.
> 
> Acked-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---

It's working as expect. So:
Tested-by: Lance Yang <lance.yang@linux.dev>

But on second thought: regarding this new sysctl parameter, I was wondering
if a name like max_hung_task_count_to_panic might be a bit more explicit,
just to follow the convention from max_rcu_stall_to_panic.

No strong opinion on this, though :)

Cheers,
Lance

> Diff with v1: change documentation as Lance suggested
> 
>   Documentation/admin-guide/sysctl/kernel.rst |  8 ++++++++
>   kernel/hung_task.c                          | 14 +++++++++++++-
>   2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 8b49eab..98b47a7 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -405,6 +405,14 @@ This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>   1 Panic immediately.
>   = =================================================
>   
> +hung_task_count_to_panic
> +=====================
> +
> +When set to a non-zero value, a kernel panic will be triggered if the
> +number of detected hung tasks reaches this value.
> +
> +Note that setting hung_task_panic=1 will still cause an immediate panic
> +on the first hung task.
>   
>   hung_task_check_count
>   =====================
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 8708a12..87a6421 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -83,6 +83,8 @@ static unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
>   static unsigned int __read_mostly sysctl_hung_task_panic =
>   	IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC);
>   
> +static unsigned int __read_mostly sysctl_hung_task_count_to_panic;
> +
>   static int
>   hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
>   {
> @@ -219,7 +221,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
>   
>   	trace_sched_process_hang(t);
>   
> -	if (sysctl_hung_task_panic) {
> +	if (sysctl_hung_task_panic ||
> +	    (sysctl_hung_task_count_to_panic &&
> +	     (sysctl_hung_task_detect_count >= sysctl_hung_task_count_to_panic))) {
>   		console_verbose();
>   		hung_task_show_lock = true;
>   		hung_task_call_panic = true;
> @@ -388,6 +392,14 @@ static const struct ctl_table hung_task_sysctls[] = {
>   		.extra2		= SYSCTL_ONE,
>   	},
>   	{
> +		.procname	= "hung_task_count_to_panic",
> +		.data		= &sysctl_hung_task_count_to_panic,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +	},
> +	{
>   		.procname	= "hung_task_check_count",
>   		.data		= &sysctl_hung_task_check_count,
>   		.maxlen		= sizeof(int),


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [外部邮件] Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks
  2025-09-28  6:55 ` Lance Yang
@ 2025-09-28  7:03   ` Li,Rongqing
  2025-09-28  7:12     ` Lance Yang
  0 siblings, 1 reply; 11+ messages in thread
From: Li,Rongqing @ 2025-09-28  7:03 UTC (permalink / raw)
  To: Lance Yang
  Cc: corbet@lwn.net, mingo@kernel.org, pauld@redhat.com,
	joel.granados@kernel.org, arnd@arndb.de,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	dave.hansen@linux.intel.com, akpm@linux-foundation.org,
	feng.tang@linux.alibaba.com, kees@kernel.org, mhiramat@kernel.org,
	paulmck@kernel.org, pawan.kumar.gupta@linux.intel.com,
	rostedt@goodmis.org

> On 2025/9/28 13:31, lirongqing wrote:
> > From: Li RongQing <lirongqing@baidu.com>
> >
> > Currently, when hung_task_panic is enabled, kernel will panic
> > immediately upon detecting the first hung task. However, some hung
> > tasks are transient and the system can recover fully, while others are
> > unrecoverable and trigger consecutive hung task reports, and a panic is
> expected.
> >
> > This commit adds a new sysctl parameter hung_task_count_to_panic to
> > allows specifying the number of consecutive hung tasks that must be
> > detected before triggering a kernel panic. This provides finer control
> > for environments where transient hangs maybe happen but persistent
> > hangs should still be fatal.
> >
> > Acked-by: Lance Yang <lance.yang@linux.dev>
> > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > ---
> 
> It's working as expect. So:
> Tested-by: Lance Yang <lance.yang@linux.dev>
> 
> But on second thought: regarding this new sysctl parameter, I was wondering if
> a name like max_hung_task_count_to_panic might be a bit more explicit, just to
> follow the convention from max_rcu_stall_to_panic.
> 

I see that all the hung task sysctl parameters start with "hung_task"? Should we keep this convention? If so, we could name it "hung_task_max_to_panic". If not, we could call it "max_hang_task_to_panic"?

-Li


> No strong opinion on this, though :)
> 
> Cheers,
> Lance
> 
> > Diff with v1: change documentation as Lance suggested
> >
> >   Documentation/admin-guide/sysctl/kernel.rst |  8 ++++++++
> >   kernel/hung_task.c                          | 14 +++++++++++++-
> >   2 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst
> > b/Documentation/admin-guide/sysctl/kernel.rst
> > index 8b49eab..98b47a7 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -405,6 +405,14 @@ This file shows up if
> ``CONFIG_DETECT_HUNG_TASK`` is enabled.
> >   1 Panic immediately.
> >   = =================================================
> >
> > +hung_task_count_to_panic
> > +=====================
> > +
> > +When set to a non-zero value, a kernel panic will be triggered if the
> > +number of detected hung tasks reaches this value.
> > +
> > +Note that setting hung_task_panic=1 will still cause an immediate
> > +panic on the first hung task.
> >
> >   hung_task_check_count
> >   =====================
> > diff --git a/kernel/hung_task.c b/kernel/hung_task.c index
> > 8708a12..87a6421 100644
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -83,6 +83,8 @@ static unsigned int __read_mostly
> sysctl_hung_task_all_cpu_backtrace;
> >   static unsigned int __read_mostly sysctl_hung_task_panic =
> >   	IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC);
> >
> > +static unsigned int __read_mostly sysctl_hung_task_count_to_panic;
> > +
> >   static int
> >   hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
> >   {
> > @@ -219,7 +221,9 @@ static void check_hung_task(struct task_struct *t,
> > unsigned long timeout)
> >
> >   	trace_sched_process_hang(t);
> >
> > -	if (sysctl_hung_task_panic) {
> > +	if (sysctl_hung_task_panic ||
> > +	    (sysctl_hung_task_count_to_panic &&
> > +	     (sysctl_hung_task_detect_count >=
> > +sysctl_hung_task_count_to_panic))) {
> >   		console_verbose();
> >   		hung_task_show_lock = true;
> >   		hung_task_call_panic = true;
> > @@ -388,6 +392,14 @@ static const struct ctl_table hung_task_sysctls[] = {
> >   		.extra2		= SYSCTL_ONE,
> >   	},
> >   	{
> > +		.procname	= "hung_task_count_to_panic",
> > +		.data		= &sysctl_hung_task_count_to_panic,
> > +		.maxlen		= sizeof(int),
> > +		.mode		= 0644,
> > +		.proc_handler	= proc_dointvec_minmax,
> > +		.extra1		= SYSCTL_ZERO,
> > +	},
> > +	{
> >   		.procname	= "hung_task_check_count",
> >   		.data		= &sysctl_hung_task_check_count,
> >   		.maxlen		= sizeof(int),


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [外部邮件] Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks
  2025-09-28  7:03   ` [外部邮件] " Li,Rongqing
@ 2025-09-28  7:12     ` Lance Yang
  0 siblings, 0 replies; 11+ messages in thread
From: Lance Yang @ 2025-09-28  7:12 UTC (permalink / raw)
  To: Li,Rongqing, akpm@linux-foundation.org, mhiramat@kernel.org
  Cc: corbet@lwn.net, mingo@kernel.org, pauld@redhat.com,
	joel.granados@kernel.org, arnd@arndb.de,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	dave.hansen@linux.intel.com, feng.tang@linux.alibaba.com,
	kees@kernel.org, paulmck@kernel.org,
	pawan.kumar.gupta@linux.intel.com, rostedt@goodmis.org



On 2025/9/28 15:03, Li,Rongqing wrote:
>> On 2025/9/28 13:31, lirongqing wrote:
>>> From: Li RongQing <lirongqing@baidu.com>
>>>
>>> Currently, when hung_task_panic is enabled, kernel will panic
>>> immediately upon detecting the first hung task. However, some hung
>>> tasks are transient and the system can recover fully, while others are
>>> unrecoverable and trigger consecutive hung task reports, and a panic is
>> expected.
>>>
>>> This commit adds a new sysctl parameter hung_task_count_to_panic to
>>> allows specifying the number of consecutive hung tasks that must be
>>> detected before triggering a kernel panic. This provides finer control
>>> for environments where transient hangs maybe happen but persistent
>>> hangs should still be fatal.
>>>
>>> Acked-by: Lance Yang <lance.yang@linux.dev>
>>> Signed-off-by: Li RongQing <lirongqing@baidu.com>
>>> ---
>>
>> It's working as expect. So:
>> Tested-by: Lance Yang <lance.yang@linux.dev>
>>
>> But on second thought: regarding this new sysctl parameter, I was wondering if
>> a name like max_hung_task_count_to_panic might be a bit more explicit, just to
>> follow the convention from max_rcu_stall_to_panic.
>>
> 
> I see that all the hung task sysctl parameters start with "hung_task"? Should we keep this convention? If so, we could name it "hung_task_max_to_panic". If not, we could call it "max_hang_task_to_panic"?

Well, let's see what other folks think ;)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks
  2025-09-28  5:31 [PATCH][v2] hung_task: Panic after fixed number of hung tasks lirongqing
  2025-09-28  6:55 ` Lance Yang
@ 2025-09-29  0:47 ` Masami Hiramatsu
  2025-10-11 12:03   ` [????] " Li,Rongqing
  2025-10-11  0:25 ` Randy Dunlap
  2 siblings, 1 reply; 11+ messages in thread
From: Masami Hiramatsu @ 2025-09-29  0:47 UTC (permalink / raw)
  To: lirongqing
  Cc: corbet, akpm, lance.yang, paulmck, pawan.kumar.gupta, mingo,
	dave.hansen, rostedt, kees, arnd, feng.tang, pauld, joel.granados,
	linux-doc, linux-kernel

On Sun, 28 Sep 2025 13:31:37 +0800
lirongqing <lirongqing@baidu.com> wrote:

> From: Li RongQing <lirongqing@baidu.com>
> 
> Currently, when hung_task_panic is enabled, kernel will panic immediately
> upon detecting the first hung task. However, some hung tasks are transient
> and the system can recover fully, while others are unrecoverable and
> trigger consecutive hung task reports, and a panic is expected.
> 
> This commit adds a new sysctl parameter hung_task_count_to_panic to allows
> specifying the number of consecutive hung tasks that must be detected
> before triggering a kernel panic. This provides finer control for
> environments where transient hangs maybe happen but persistent hangs should
> still be fatal.

IIUC, perhaps there are multiple groups that require different timeouts
for hang checks, and you want to set the hung task timeout to match
the shorter one, but ignore the longer ones at that point.

If so, this is essentially a problem with a long process that is
performed under TASK_UNINTERRUPTIBLE. Ideally, the progress of such
process should be checked periodically and the hang check should be
reset unless it is real blocked.
But this is not currently implemented. (For example, depending on
the media, it may not be possible to check whether long IO is being
performed.)

The hung_tasks will even simulate these types of hangs as task
hang-ups. But if you set a long detection time accordingly, you
will also have to wait until that detection time for hangs that
occur in a short period of time.

The hung tasks on one major lock can spread in a domino effect.
So setting a reasonably short detection time, but not panicking
until there are enough of them, seems like a reasonable strategy.
But in this case, I think we also need a "hard timeout limit"
of hung tasks, which will detect longer ones. And also you should
use peak value not accumulation value.

If it is really transient (thus, it is not hung), accumulation of
such normal but just slow operation will still kick hung_tasks.

Thank you,

> 
> Acked-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
> Diff with v1: change documentation as Lance suggested
> 
>  Documentation/admin-guide/sysctl/kernel.rst |  8 ++++++++
>  kernel/hung_task.c                          | 14 +++++++++++++-
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 8b49eab..98b47a7 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -405,6 +405,14 @@ This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>  1 Panic immediately.
>  = =================================================
>  
> +hung_task_count_to_panic
> +=====================
> +
> +When set to a non-zero value, a kernel panic will be triggered if the
> +number of detected hung tasks reaches this value.
> +
> +Note that setting hung_task_panic=1 will still cause an immediate panic
> +on the first hung task.

What happen if it is 0?

>  
>  hung_task_check_count
>  =====================
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 8708a12..87a6421 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -83,6 +83,8 @@ static unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
>  static unsigned int __read_mostly sysctl_hung_task_panic =
>  	IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC);
>  
> +static unsigned int __read_mostly sysctl_hung_task_count_to_panic;
> +
>  static int
>  hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
>  {
> @@ -219,7 +221,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
>  
>  	trace_sched_process_hang(t);
>  
> -	if (sysctl_hung_task_panic) {
> +	if (sysctl_hung_task_panic ||
> +	    (sysctl_hung_task_count_to_panic &&
> +	     (sysctl_hung_task_detect_count >= sysctl_hung_task_count_to_panic))) {
>  		console_verbose();
>  		hung_task_show_lock = true;
>  		hung_task_call_panic = true;
> @@ -388,6 +392,14 @@ static const struct ctl_table hung_task_sysctls[] = {
>  		.extra2		= SYSCTL_ONE,
>  	},
>  	{
> +		.procname	= "hung_task_count_to_panic",
> +		.data		= &sysctl_hung_task_count_to_panic,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +	},
> +	{
>  		.procname	= "hung_task_check_count",
>  		.data		= &sysctl_hung_task_check_count,
>  		.maxlen		= sizeof(int),
> -- 
> 2.9.4
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks
  2025-09-28  5:31 [PATCH][v2] hung_task: Panic after fixed number of hung tasks lirongqing
  2025-09-28  6:55 ` Lance Yang
  2025-09-29  0:47 ` Masami Hiramatsu
@ 2025-10-11  0:25 ` Randy Dunlap
  2025-10-11  5:47   ` Kees Cook
  2 siblings, 1 reply; 11+ messages in thread
From: Randy Dunlap @ 2025-10-11  0:25 UTC (permalink / raw)
  To: lirongqing, corbet, akpm, lance.yang, mhiramat, paulmck,
	pawan.kumar.gupta, mingo, dave.hansen, rostedt, kees, arnd,
	feng.tang, pauld, joel.granados, linux-doc, linux-kernel

Hi,

On 9/27/25 10:31 PM, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> Currently, when hung_task_panic is enabled, kernel will panic immediately
> upon detecting the first hung task. However, some hung tasks are transient
> and the system can recover fully, while others are unrecoverable and
> trigger consecutive hung task reports, and a panic is expected.
> 
> This commit adds a new sysctl parameter hung_task_count_to_panic to allows
> specifying the number of consecutive hung tasks that must be detected
> before triggering a kernel panic. This provides finer control for
> environments where transient hangs maybe happen but persistent hangs should
> still be fatal.
> 
> Acked-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
> Diff with v1: change documentation as Lance suggested
> 
>  Documentation/admin-guide/sysctl/kernel.rst |  8 ++++++++
>  kernel/hung_task.c                          | 14 +++++++++++++-
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 8b49eab..98b47a7 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -405,6 +405,14 @@ This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>  1 Panic immediately.
>  = =================================================
>  
> +hung_task_count_to_panic
> +=====================

The underline length should be at least as long as the title to
prevent kernel-doc build warnings. Same length is preferred.

> +
> +When set to a non-zero value, a kernel panic will be triggered if the
> +number of detected hung tasks reaches this value.
> +
> +Note that setting hung_task_panic=1 will still cause an immediate panic
> +on the first hung task.
>  
>  hung_task_check_count
>  =====================

-- 
~Randy


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks
  2025-10-11  0:25 ` Randy Dunlap
@ 2025-10-11  5:47   ` Kees Cook
  2025-10-11 10:57     ` [外部邮件] " Li,Rongqing
  0 siblings, 1 reply; 11+ messages in thread
From: Kees Cook @ 2025-10-11  5:47 UTC (permalink / raw)
  To: Randy Dunlap, lirongqing, corbet, akpm, lance.yang, mhiramat,
	paulmck, pawan.kumar.gupta, mingo, dave.hansen, rostedt, arnd,
	feng.tang, pauld, joel.granados, linux-doc, linux-kernel



On October 10, 2025 5:25:05 PM PDT, Randy Dunlap <rdunlap@infradead.org> wrote:
>Hi,
>
>On 9/27/25 10:31 PM, lirongqing wrote:
>> From: Li RongQing <lirongqing@baidu.com>
>> 
>> Currently, when hung_task_panic is enabled, kernel will panic immediately
>> upon detecting the first hung task. However, some hung tasks are transient
>> and the system can recover fully, while others are unrecoverable and
>> trigger consecutive hung task reports, and a panic is expected.
>> 
>> This commit adds a new sysctl parameter hung_task_count_to_panic to allows
>> specifying the number of consecutive hung tasks that must be detected

Why make a new sysctl? Can't you just use hung_task_panic and raise the max to INT_MAX?

-Kees

>> before triggering a kernel panic. This provides finer control for
>> environments where transient hangs maybe happen but persistent hangs should
>> still be fatal.
>> 
>> Acked-by: Lance Yang <lance.yang@linux.dev>
>> Signed-off-by: Li RongQing <lirongqing@baidu.com>
>> ---
>> Diff with v1: change documentation as Lance suggested
>> 
>>  Documentation/admin-guide/sysctl/kernel.rst |  8 ++++++++
>>  kernel/hung_task.c                          | 14 +++++++++++++-
>>  2 files changed, 21 insertions(+), 1 deletion(-)
>> 
>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
>> index 8b49eab..98b47a7 100644
>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>> @@ -405,6 +405,14 @@ This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>>  1 Panic immediately.
>>  = =================================================
>>  
>> +hung_task_count_to_panic
>> +=====================
>
>The underline length should be at least as long as the title to
>prevent kernel-doc build warnings. Same length is preferred.
>
>> +
>> +When set to a non-zero value, a kernel panic will be triggered if the
>> +number of detected hung tasks reaches this value.
>> +
>> +Note that setting hung_task_panic=1 will still cause an immediate panic
>> +on the first hung task.
>>  
>>  hung_task_check_count
>>  =====================
>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [外部邮件] Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks
  2025-10-11  5:47   ` Kees Cook
@ 2025-10-11 10:57     ` Li,Rongqing
  2025-10-11 23:58       ` Li,Rongqing
  0 siblings, 1 reply; 11+ messages in thread
From: Li,Rongqing @ 2025-10-11 10:57 UTC (permalink / raw)
  To: Kees Cook, Randy Dunlap, corbet@lwn.net,
	akpm@linux-foundation.org, lance.yang@linux.dev,
	mhiramat@kernel.org, paulmck@kernel.org,
	pawan.kumar.gupta@linux.intel.com, mingo@kernel.org,
	dave.hansen@linux.intel.com, rostedt@goodmis.org, arnd@arndb.de,
	feng.tang@linux.alibaba.com, pauld@redhat.com,
	joel.granados@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org


> On October 10, 2025 5:25:05 PM PDT, Randy Dunlap <rdunlap@infradead.org>
> wrote:
> >Hi,
> >
> >On 9/27/25 10:31 PM, lirongqing wrote:
> >> From: Li RongQing <lirongqing@baidu.com>
> >>
> >> Currently, when hung_task_panic is enabled, kernel will panic
> >> immediately upon detecting the first hung task. However, some hung
> >> tasks are transient and the system can recover fully, while others
> >> are unrecoverable and trigger consecutive hung task reports, and a panic is
> expected.
> >>
> >> This commit adds a new sysctl parameter hung_task_count_to_panic to
> >> allows specifying the number of consecutive hung tasks that must be
> >> detected
> 
> Why make a new sysctl? Can't you just use hung_task_panic and raise the max
> to INT_MAX?
> 


However, this will prevent the printing of hung task warnings. Hung task warnings are very useful for identifying which tasks are hanging and where they are stuck.

If there is this function, I hope to shorten sysctl_hung_task_timeout_secs to give more information.

And rcu has the similar function as dfe564045c653d "(rcu: Panic after fixed number of stalls)"

-Li



^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [????] Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks
  2025-09-29  0:47 ` Masami Hiramatsu
@ 2025-10-11 12:03   ` Li,Rongqing
  2025-10-11 14:53     ` Li,Rongqing
  0 siblings, 1 reply; 11+ messages in thread
From: Li,Rongqing @ 2025-10-11 12:03 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: corbet@lwn.net, akpm@linux-foundation.org, lance.yang@linux.dev,
	paulmck@kernel.org, pawan.kumar.gupta@linux.intel.com,
	mingo@kernel.org, dave.hansen@linux.intel.com,
	rostedt@goodmis.org, kees@kernel.org, arnd@arndb.de,
	feng.tang@linux.alibaba.com, pauld@redhat.com,
	joel.granados@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Masami Hiramatsu <mhiramat@kernel.org>
> Sent: 2025年9月29日 8:48
> To: Li,Rongqing <lirongqing@baidu.com>
> Cc: corbet@lwn.net; akpm@linux-foundation.org; lance.yang@linux.dev;
> paulmck@kernel.org; pawan.kumar.gupta@linux.intel.com; mingo@kernel.org;
> dave.hansen@linux.intel.com; rostedt@goodmis.org; kees@kernel.org;
> arnd@arndb.de; feng.tang@linux.alibaba.com; pauld@redhat.com;
> joel.granados@kernel.org; linux-doc@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: [????] Re: [PATCH][v2] hung_task: Panic after fixed number of hung
> tasks
> 
> On Sun, 28 Sep 2025 13:31:37 +0800
> lirongqing <lirongqing@baidu.com> wrote:
> 
> > From: Li RongQing <lirongqing@baidu.com>
> >
> > Currently, when hung_task_panic is enabled, kernel will panic
> > immediately upon detecting the first hung task. However, some hung
> > tasks are transient and the system can recover fully, while others are
> > unrecoverable and trigger consecutive hung task reports, and a panic is
> expected.
> >
> > This commit adds a new sysctl parameter hung_task_count_to_panic to
> > allows specifying the number of consecutive hung tasks that must be
> > detected before triggering a kernel panic. This provides finer control
> > for environments where transient hangs maybe happen but persistent
> > hangs should still be fatal.
> 
> IIUC, perhaps there are multiple groups that require different timeouts for
> hang checks, and you want to set the hung task timeout to match the shorter
> one, but ignore the longer ones at that point.
> 
> If so, this is essentially a problem with a long process that is performed under
> TASK_UNINTERRUPTIBLE. Ideally, the progress of such process should be
> checked periodically and the hang check should be reset unless it is real
> blocked.
> But this is not currently implemented. (For example, depending on the media,
> it may not be possible to check whether long IO is being
> performed.)
> 
> The hung_tasks will even simulate these types of hangs as task hang-ups. But if
> you set a long detection time accordingly, you will also have to wait until that
> detection time for hangs that occur in a short period of time.
> 
> The hung tasks on one major lock can spread in a domino effect.
> So setting a reasonably short detection time, but not panicking until there are
> enough of them, seems like a reasonable strategy.
> But in this case, I think we also need a "hard timeout limit"
> of hung tasks, which will detect longer ones. And also you should use peak
> value not accumulation value.
> 
> If it is really transient (thus, it is not hung), accumulation of such normal but
> just slow operation will still kick hung_tasks.
> 


Is it reasonable to detect the existence of a hung task continuously for a certain number of times to trigger panic?

Like

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index d17cd3f..045bef5 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -304,6 +304,8 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
        int max_count = sysctl_hung_task_check_count;
        unsigned long last_break = jiffies;
        struct task_struct *g, *t;
+       unsigned long pre_detect_count = sysctl_hung_task_detect_count;
+       static unsigned long contiguous_detect_count;

        /*
         * If the system crashed already then all bets are off,
@@ -326,6 +328,15 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)

                check_hung_task(t, timeout);
        }
+
+       if (sysctl_hung_task_detect_count != pre_detect_count) {
+               contiguous_detect_count++;
+               if (sysctl_max_hung_task_to_panic &&
+                               contiguous_detect_count > sysctl_max_hung_task_to_panic)
+                       hung_task_call_panic = 1;
+       }
+       else
+               contiguous_detect_count = 0;
  unlock:
        rcu_read_unlock();
        if (hung_task_show_lock)



-Li

> Thank you,
> 
> >
> > Acked-by: Lance Yang <lance.yang@linux.dev>
> > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > ---
> > Diff with v1: change documentation as Lance suggested
> >
> >  Documentation/admin-guide/sysctl/kernel.rst |  8 ++++++++
> >  kernel/hung_task.c                          | 14 +++++++++++++-
> >  2 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst
> > b/Documentation/admin-guide/sysctl/kernel.rst
> > index 8b49eab..98b47a7 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -405,6 +405,14 @@ This file shows up if
> ``CONFIG_DETECT_HUNG_TASK`` is enabled.
> >  1 Panic immediately.
> >  = =================================================
> >
> > +hung_task_count_to_panic
> > +=====================
> > +
> > +When set to a non-zero value, a kernel panic will be triggered if the
> > +number of detected hung tasks reaches this value.
> > +
> > +Note that setting hung_task_panic=1 will still cause an immediate
> > +panic on the first hung task.
> 
> What happen if it is 0?
> 
> >
> >  hung_task_check_count
> >  =====================
> > diff --git a/kernel/hung_task.c b/kernel/hung_task.c index
> > 8708a12..87a6421 100644
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -83,6 +83,8 @@ static unsigned int __read_mostly
> > sysctl_hung_task_all_cpu_backtrace;
> >  static unsigned int __read_mostly sysctl_hung_task_panic =
> >  	IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC);
> >
> > +static unsigned int __read_mostly sysctl_hung_task_count_to_panic;
> > +
> >  static int
> >  hung_task_panic(struct notifier_block *this, unsigned long event,
> > void *ptr)  { @@ -219,7 +221,9 @@ static void check_hung_task(struct
> > task_struct *t, unsigned long timeout)
> >
> >  	trace_sched_process_hang(t);
> >
> > -	if (sysctl_hung_task_panic) {
> > +	if (sysctl_hung_task_panic ||
> > +	    (sysctl_hung_task_count_to_panic &&
> > +	     (sysctl_hung_task_detect_count >=
> > +sysctl_hung_task_count_to_panic))) {
> >  		console_verbose();
> >  		hung_task_show_lock = true;
> >  		hung_task_call_panic = true;
> > @@ -388,6 +392,14 @@ static const struct ctl_table hung_task_sysctls[] = {
> >  		.extra2		= SYSCTL_ONE,
> >  	},
> >  	{
> > +		.procname	= "hung_task_count_to_panic",
> > +		.data		= &sysctl_hung_task_count_to_panic,
> > +		.maxlen		= sizeof(int),
> > +		.mode		= 0644,
> > +		.proc_handler	= proc_dointvec_minmax,
> > +		.extra1		= SYSCTL_ZERO,
> > +	},
> > +	{
> >  		.procname	= "hung_task_check_count",
> >  		.data		= &sysctl_hung_task_check_count,
> >  		.maxlen		= sizeof(int),
> > --
> > 2.9.4
> >
> 
> 
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [????] Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks
  2025-10-11 12:03   ` [????] " Li,Rongqing
@ 2025-10-11 14:53     ` Li,Rongqing
  0 siblings, 0 replies; 11+ messages in thread
From: Li,Rongqing @ 2025-10-11 14:53 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: corbet@lwn.net, akpm@linux-foundation.org, lance.yang@linux.dev,
	paulmck@kernel.org, pawan.kumar.gupta@linux.intel.com,
	mingo@kernel.org, dave.hansen@linux.intel.com,
	rostedt@goodmis.org, kees@kernel.org, arnd@arndb.de,
	feng.tang@linux.alibaba.com, pauld@redhat.com,
	joel.granados@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Li,Rongqing
> Sent: 2025年10月11日 20:03
> To: 'Masami Hiramatsu' <mhiramat@kernel.org>
> Cc: corbet@lwn.net; akpm@linux-foundation.org; lance.yang@linux.dev;
> paulmck@kernel.org; pawan.kumar.gupta@linux.intel.com; mingo@kernel.org;
> dave.hansen@linux.intel.com; rostedt@goodmis.org; kees@kernel.org;
> arnd@arndb.de; feng.tang@linux.alibaba.com; pauld@redhat.com;
> joel.granados@kernel.org; linux-doc@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: RE: [????] Re: [PATCH][v2] hung_task: Panic after fixed number of
> hung tasks
> 
> 
> 
> > -----Original Message-----
> > From: Masami Hiramatsu <mhiramat@kernel.org>
> > Sent: 2025年9月29日 8:48
> > To: Li,Rongqing <lirongqing@baidu.com>
> > Cc: corbet@lwn.net; akpm@linux-foundation.org; lance.yang@linux.dev;
> > paulmck@kernel.org; pawan.kumar.gupta@linux.intel.com;
> > mingo@kernel.org; dave.hansen@linux.intel.com; rostedt@goodmis.org;
> > kees@kernel.org; arnd@arndb.de; feng.tang@linux.alibaba.com;
> > pauld@redhat.com; joel.granados@kernel.org; linux-doc@vger.kernel.org;
> > linux-kernel@vger.kernel.org
> > Subject: [????] Re: [PATCH][v2] hung_task: Panic after fixed number of
> > hung tasks
> >
> > On Sun, 28 Sep 2025 13:31:37 +0800
> > lirongqing <lirongqing@baidu.com> wrote:
> >
> > > From: Li RongQing <lirongqing@baidu.com>
> > >
> > > Currently, when hung_task_panic is enabled, kernel will panic
> > > immediately upon detecting the first hung task. However, some hung
> > > tasks are transient and the system can recover fully, while others
> > > are unrecoverable and trigger consecutive hung task reports, and a
> > > panic is
> > expected.
> > >
> > > This commit adds a new sysctl parameter hung_task_count_to_panic to
> > > allows specifying the number of consecutive hung tasks that must be
> > > detected before triggering a kernel panic. This provides finer
> > > control for environments where transient hangs maybe happen but
> > > persistent hangs should still be fatal.
> >
> > IIUC, perhaps there are multiple groups that require different
> > timeouts for hang checks, and you want to set the hung task timeout to
> > match the shorter one, but ignore the longer ones at that point.
> >
> > If so, this is essentially a problem with a long process that is
> > performed under TASK_UNINTERRUPTIBLE. Ideally, the progress of such
> > process should be checked periodically and the hang check should be
> > reset unless it is real blocked.
> > But this is not currently implemented. (For example, depending on the
> > media, it may not be possible to check whether long IO is being
> > performed.)
> >
> > The hung_tasks will even simulate these types of hangs as task
> > hang-ups. But if you set a long detection time accordingly, you will
> > also have to wait until that detection time for hangs that occur in a short
> period of time.
> >
> > The hung tasks on one major lock can spread in a domino effect.
> > So setting a reasonably short detection time, but not panicking until
> > there are enough of them, seems like a reasonable strategy.
> > But in this case, I think we also need a "hard timeout limit"
> > of hung tasks, which will detect longer ones. And also you should use
> > peak value not accumulation value.
> >
> > If it is really transient (thus, it is not hung), accumulation of such
> > normal but just slow operation will still kick hung_tasks.
> >
> 
> 
> Is it reasonable to detect the existence of a hung task continuously for a
> certain number of times to trigger panic?
> 
> Like
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c index d17cd3f..045bef5
> 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -304,6 +304,8 @@ static void
> check_hung_uninterruptible_tasks(unsigned long timeout)
>         int max_count = sysctl_hung_task_check_count;
>         unsigned long last_break = jiffies;
>         struct task_struct *g, *t;
> +       unsigned long pre_detect_count = sysctl_hung_task_detect_count;
> +       static unsigned long contiguous_detect_count;
> 
>         /*
>          * If the system crashed already then all bets are off, @@ -326,6
> +328,15 @@ static void check_hung_uninterruptible_tasks(unsigned long
> timeout)
> 
>                 check_hung_task(t, timeout);
>         }
> +
> +       if (sysctl_hung_task_detect_count != pre_detect_count) {
> +               contiguous_detect_count++;
> +               if (sysctl_max_hung_task_to_panic &&
> +                               contiguous_detect_count >
> sysctl_max_hung_task_to_panic)
> +                       hung_task_call_panic = 1;
> +       }
> +       else
> +               contiguous_detect_count = 0;
>   unlock:
>         rcu_read_unlock();
>         if (hung_task_show_lock)
> 
> 

A single task hanging for an extended period may not be a critical issue, as users might still log into the system to investigate. However, if multiple tasks hang simultaneously―such as in cases of I/O hangs caused by disk failures―it could prevent users from logging in and become a serious problem, and a panic is expected. Therefore, the solution should be designed as follows:

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index d17cd3f..52ebf18 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -304,6 +304,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
        int max_count = sysctl_hung_task_check_count;
        unsigned long last_break = jiffies;
        struct task_struct *g, *t;
+       unsigned long pre_detect_count = sysctl_hung_task_detect_count;

        /*
         * If the system crashed already then all bets are off,
@@ -326,6 +327,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)

                check_hung_task(t, timeout);
        }
+
+       if (sysctl_hung_task_detect_count - pre_detect_count > sysctl_max_hung_task_to_panic) {
+               hung_task_call_panic = 1;
+       }
  unlock:
        rcu_read_unlock();
        if (hung_task_show_lock)


-Li

> > -Li
> 
> > Thank you,
> >
> > >
> > > Acked-by: Lance Yang <lance.yang@linux.dev>
> > > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > > ---
> > > Diff with v1: change documentation as Lance suggested
> > >
> > >  Documentation/admin-guide/sysctl/kernel.rst |  8 ++++++++
> > >  kernel/hung_task.c                          | 14 +++++++++++++-
> > >  2 files changed, 21 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/Documentation/admin-guide/sysctl/kernel.rst
> > > b/Documentation/admin-guide/sysctl/kernel.rst
> > > index 8b49eab..98b47a7 100644
> > > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > > @@ -405,6 +405,14 @@ This file shows up if
> > ``CONFIG_DETECT_HUNG_TASK`` is enabled.
> > >  1 Panic immediately.
> > >  = =================================================
> > >
> > > +hung_task_count_to_panic
> > > +=====================
> > > +
> > > +When set to a non-zero value, a kernel panic will be triggered if
> > > +the number of detected hung tasks reaches this value.
> > > +
> > > +Note that setting hung_task_panic=1 will still cause an immediate
> > > +panic on the first hung task.
> >
> > What happen if it is 0?
> >
> > >
> > >  hung_task_check_count
> > >  =====================
> > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c index
> > > 8708a12..87a6421 100644
> > > --- a/kernel/hung_task.c
> > > +++ b/kernel/hung_task.c
> > > @@ -83,6 +83,8 @@ static unsigned int __read_mostly
> > > sysctl_hung_task_all_cpu_backtrace;
> > >  static unsigned int __read_mostly sysctl_hung_task_panic =
> > >  	IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC);
> > >
> > > +static unsigned int __read_mostly sysctl_hung_task_count_to_panic;
> > > +
> > >  static int
> > >  hung_task_panic(struct notifier_block *this, unsigned long event,
> > > void *ptr)  { @@ -219,7 +221,9 @@ static void check_hung_task(struct
> > > task_struct *t, unsigned long timeout)
> > >
> > >  	trace_sched_process_hang(t);
> > >
> > > -	if (sysctl_hung_task_panic) {
> > > +	if (sysctl_hung_task_panic ||
> > > +	    (sysctl_hung_task_count_to_panic &&
> > > +	     (sysctl_hung_task_detect_count >=
> > > +sysctl_hung_task_count_to_panic))) {
> > >  		console_verbose();
> > >  		hung_task_show_lock = true;
> > >  		hung_task_call_panic = true;
> > > @@ -388,6 +392,14 @@ static const struct ctl_table hung_task_sysctls[] =
> {
> > >  		.extra2		= SYSCTL_ONE,
> > >  	},
> > >  	{
> > > +		.procname	= "hung_task_count_to_panic",
> > > +		.data		= &sysctl_hung_task_count_to_panic,
> > > +		.maxlen		= sizeof(int),
> > > +		.mode		= 0644,
> > > +		.proc_handler	= proc_dointvec_minmax,
> > > +		.extra1		= SYSCTL_ZERO,
> > > +	},
> > > +	{
> > >  		.procname	= "hung_task_check_count",
> > >  		.data		= &sysctl_hung_task_check_count,
> > >  		.maxlen		= sizeof(int),
> > > --
> > > 2.9.4
> > >
> >
> >
> > --
> > Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [外部邮件] Re: [PATCH][v2] hung_task: Panic after fixed number of hung tasks
  2025-10-11 10:57     ` [外部邮件] " Li,Rongqing
@ 2025-10-11 23:58       ` Li,Rongqing
  0 siblings, 0 replies; 11+ messages in thread
From: Li,Rongqing @ 2025-10-11 23:58 UTC (permalink / raw)
  To: Kees Cook, Randy Dunlap, corbet@lwn.net,
	akpm@linux-foundation.org, lance.yang@linux.dev,
	mhiramat@kernel.org, paulmck@kernel.org,
	pawan.kumar.gupta@linux.intel.com, mingo@kernel.org,
	dave.hansen@linux.intel.com, rostedt@goodmis.org, arnd@arndb.de,
	feng.tang@linux.alibaba.com, pauld@redhat.com,
	joel.granados@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org



> 
> 
> > On October 10, 2025 5:25:05 PM PDT, Randy Dunlap
> > <rdunlap@infradead.org>
> > wrote:
> > >Hi,
> > >
> > >On 9/27/25 10:31 PM, lirongqing wrote:
> > >> From: Li RongQing <lirongqing@baidu.com>
> > >>
> > >> Currently, when hung_task_panic is enabled, kernel will panic
> > >> immediately upon detecting the first hung task. However, some hung
> > >> tasks are transient and the system can recover fully, while others
> > >> are unrecoverable and trigger consecutive hung task reports, and a
> > >> panic is
> > expected.
> > >>
> > >> This commit adds a new sysctl parameter hung_task_count_to_panic to
> > >> allows specifying the number of consecutive hung tasks that must be
> > >> detected
> >
> > Why make a new sysctl? Can't you just use hung_task_panic and raise
> > the max to INT_MAX?
> >
> 

Sorry, I misunderstand at first.

I'm not sure if this sysctl hung_task_panic can be modified, and if changed, BOOTPARAM_HUNG_TASK_PANIC should be changed, and whether both changing will cause issues for users ?

If no one objects, I will reuse hung_task_panic , and not adding a new sysctl

Thanks

-Li

> 
> However, this will prevent the printing of hung task warnings. Hung task
> warnings are very useful for identifying which tasks are hanging and where
> they are stuck.
> 
> If there is this function, I hope to shorten sysctl_hung_task_timeout_secs to
> give more information.
> 
> And rcu has the similar function as dfe564045c653d "(rcu: Panic after fixed
> number of stalls)"
> 
> -Li
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-10-12  0:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-28  5:31 [PATCH][v2] hung_task: Panic after fixed number of hung tasks lirongqing
2025-09-28  6:55 ` Lance Yang
2025-09-28  7:03   ` [外部邮件] " Li,Rongqing
2025-09-28  7:12     ` Lance Yang
2025-09-29  0:47 ` Masami Hiramatsu
2025-10-11 12:03   ` [????] " Li,Rongqing
2025-10-11 14:53     ` Li,Rongqing
2025-10-11  0:25 ` Randy Dunlap
2025-10-11  5:47   ` Kees Cook
2025-10-11 10:57     ` [外部邮件] " Li,Rongqing
2025-10-11 23:58       ` Li,Rongqing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).