[PATCH] hung_task: Skip scan on idle systems

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] hung_task: Skip scan on idle systems
@ 2026-01-26  3:45 Aaron Tomlin
  2026-01-26  5:23 ` Lance Yang
  0 siblings, 1 reply; 5+ messages in thread
From: Aaron Tomlin @ 2026-01-26  3:45 UTC (permalink / raw)
  To: akpm, lance.yang, mhiramat, gregkh, pmladek, joel.granados
  Cc: neelx, sean, mproche, chjohnst, nick.lange, linux-kernel

At present, the hung task detector behaves in an unoptimised manner: it
wakes up periodically (every check_interval_secs, defaulting to 120
seconds) and performs an O(N) scan of the entire process list,
regardless of the system's actual state. On idle embedded devices,
virtual machines, or large servers with no activity, this behaviour
unnecessarily consumes CPU cycles and memory bandwidth, hindering
power-saving states.

To rectify this, this patch introduces an adaptive "green" polling
mechanism. The detector will now verify whether the system is
effectively idle before committing to a full process scan.

To implement this, we utilise the standard get_avenrun() API to verify
the global system load. Tasks in the TASK_UNINTERRUPTIBLE (D) state
explicitly contribute to the system load average; consequently, if the
1-minute load average is zero, we can confidently infer that no tasks
are currently hung, allowing us to bypass the expensive process scan.

Crucially, we invoke get_avenrun(load, 0, 0) with both the offset and
shift parameters set to zero. This configuration is deliberate and
necessary for safety:

        1. Zero Offset: Prevents the application of any artificial
           rounding bias usually intended for human-readable display.

        2. Zero Shift: Retrieves the raw fixed-point value (where 1.0
           load = 2048) rather than shifting it down to an integer.

This ensures maximum sensitivity: even a microscopic fractional load
(e.g., a single task entering D state momentarily) will register as a
non-zero raw value. This guarantees that we never encounter a false
negative where a valid hung task is ignored due to integer truncation or
rounding errors.

This heuristic significantly minimises the detector's footprint on
healthy systems whilst maintaining robust reliability for genuine hangs.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
 kernel/hung_task.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index d2254c91450b..7b9f5c1bd35e 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -17,6 +17,7 @@
 #include <linux/export.h>
 #include <linux/panic_notifier.h>
 #include <linux/sysctl.h>
+#include <linux/sched/loadavg.h>
 #include <linux/suspend.h>
 #include <linux/utsname.h>
 #include <linux/sched/signal.h>
@@ -503,6 +504,7 @@ static int watchdog(void *dummy)
 	for ( ; ; ) {
 		unsigned long timeout = sysctl_hung_task_timeout_secs;
 		unsigned long interval = sysctl_hung_task_check_interval_secs;
+		unsigned long load[3];
 		long t;

 		if (interval == 0)
@@ -511,8 +513,12 @@ static int watchdog(void *dummy)
 		t = hung_timeout_jiffies(hung_last_checked, interval);
 		if (t <= 0) {
 			if (!atomic_xchg(&reset_hung_task, 0) &&
-			    !hung_detector_suspended)
-				check_hung_uninterruptible_tasks(timeout);
+			    !hung_detector_suspended) {
+				/* Check 1-min load to detect idle system */
+				get_avenrun(load, 0, 0);
+				if (load[0] > 0)
+					check_hung_uninterruptible_tasks(timeout);
+			}
 			hung_last_checked = jiffies;
 			continue;
 		}
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] hung_task: Skip scan on idle systems
  2026-01-26  3:45 [PATCH] hung_task: Skip scan on idle systems Aaron Tomlin
@ 2026-01-26  5:23 ` Lance Yang
  2026-01-26 20:14   ` Aaron Tomlin
  0 siblings, 1 reply; 5+ messages in thread
From: Lance Yang @ 2026-01-26  5:23 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: neelx, sean, pmladek, mhiramat, akpm, joel.granados, mproche,
	chjohnst, nick.lange, gregkh, linux-kernel

Hi Aaron,

Keep one patch or series under review at a time, especially in the
same subsystem ...

Maintainers/Reviewers have limited bandwidth and can focus better
on one thing at a time.

Please, be patient! Just wait for it to be merged or rejected before
sending the next.

On 2026/1/26 11:45, Aaron Tomlin wrote:
> At present, the hung task detector behaves in an unoptimised manner: it
> wakes up periodically (every check_interval_secs, defaulting to 120
> seconds) and performs an O(N) scan of the entire process list,
> regardless of the system's actual state. On idle embedded devices,
> virtual machines, or large servers with no activity, this behaviour
> unnecessarily consumes CPU cycles and memory bandwidth, hindering
> power-saving states.
> 
> To rectify this, this patch introduces an adaptive "green" polling
> mechanism. The detector will now verify whether the system is
> effectively idle before committing to a full process scan.
> 
> To implement this, we utilise the standard get_avenrun() API to verify
> the global system load. Tasks in the TASK_UNINTERRUPTIBLE (D) state
> explicitly contribute to the system load average; consequently, if the
> 1-minute load average is zero, we can confidently infer that no tasks
> are currently hung, allowing us to bypass the expensive process scan.
> 
> Crucially, we invoke get_avenrun(load, 0, 0) with both the offset and
> shift parameters set to zero. This configuration is deliberate and
> necessary for safety:
> 
>          1. Zero Offset: Prevents the application of any artificial
>             rounding bias usually intended for human-readable display.
> 
>          2. Zero Shift: Retrieves the raw fixed-point value (where 1.0
>             load = 2048) rather than shifting it down to an integer.
> 
> This ensures maximum sensitivity: even a microscopic fractional load
> (e.g., a single task entering D state momentarily) will register as a
> non-zero raw value. This guarantees that we never encounter a false
> negative where a valid hung task is ignored due to integer truncation or
> rounding errors.
> 
> This heuristic significantly minimises the detector's footprint on
> healthy systems whilst maintaining robust reliability for genuine hangs.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
>   kernel/hung_task.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index d2254c91450b..7b9f5c1bd35e 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -17,6 +17,7 @@
>   #include <linux/export.h>
>   #include <linux/panic_notifier.h>
>   #include <linux/sysctl.h>
> +#include <linux/sched/loadavg.h>
>   #include <linux/suspend.h>
>   #include <linux/utsname.h>
>   #include <linux/sched/signal.h>
> @@ -503,6 +504,7 @@ static int watchdog(void *dummy)
>   	for ( ; ; ) {
>   		unsigned long timeout = sysctl_hung_task_timeout_secs;
>   		unsigned long interval = sysctl_hung_task_check_interval_secs;
> +		unsigned long load[3];
>   		long t;
>   
>   		if (interval == 0)
> @@ -511,8 +513,12 @@ static int watchdog(void *dummy)
>   		t = hung_timeout_jiffies(hung_last_checked, interval);
>   		if (t <= 0) {
>   			if (!atomic_xchg(&reset_hung_task, 0) &&
> -			    !hung_detector_suspended)
> -				check_hung_uninterruptible_tasks(timeout);
> +			    !hung_detector_suspended) {
> +				/* Check 1-min load to detect idle system */
> +				get_avenrun(load, 0, 0);
> +				if (load[0] > 0)
> +					check_hung_uninterruptible_tasks(timeout);

The optimization is not worth the trouble.

I don't think the assumption that "load[0] == 0 means no hung tasks" is
100% correct.

So that would miss actual hung tasks - a false negative, which is worse
than the "wasted scan" you're trying to avoid.

Also, I don't *really* care about optimizing something that runs once
every 120 seconds :)

Nacked-by: Lance Yang <lance.yang@linux.dev>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] hung_task: Skip scan on idle systems
  2026-01-26  5:23 ` Lance Yang
@ 2026-01-26 20:14   ` Aaron Tomlin
  2026-02-02 13:55     ` Petr Mladek
  0 siblings, 1 reply; 5+ messages in thread
From: Aaron Tomlin @ 2026-01-26 20:14 UTC (permalink / raw)
  To: Lance Yang
  Cc: neelx, sean, pmladek, mhiramat, akpm, joel.granados, mproche,
	chjohnst, nick.lange, gregkh, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1928 bytes --]

On Mon, Jan 26, 2026 at 01:23:01PM +0800, Lance Yang wrote:
> Hi Aaron,

Hi Lance,

> Keep one patch or series under review at a time, especially in the
> same subsystem ...

Understood. That's fair.

> > @@ -503,6 +504,7 @@ static int watchdog(void *dummy)
> >   	for ( ; ; ) {
> >   		unsigned long timeout = sysctl_hung_task_timeout_secs;
> >   		unsigned long interval = sysctl_hung_task_check_interval_secs;
> > +		unsigned long load[3];
> >   		long t;
> >   		if (interval == 0)
> > @@ -511,8 +513,12 @@ static int watchdog(void *dummy)
> >   		t = hung_timeout_jiffies(hung_last_checked, interval);
> >   		if (t <= 0) {
> >   			if (!atomic_xchg(&reset_hung_task, 0) &&
> > -			    !hung_detector_suspended)
> > -				check_hung_uninterruptible_tasks(timeout);
> > +			    !hung_detector_suspended) {
> > +				/* Check 1-min load to detect idle system */
> > +				get_avenrun(load, 0, 0);
> > +				if (load[0] > 0)
> > +					check_hung_uninterruptible_tasks(timeout);
> 
> The optimization is not worth the trouble.
> 
> I don't think the assumption that "load[0] == 0 means no hung tasks" is
> 100% correct.
> 
> So that would miss actual hung tasks - a false negative, which is worse
> than the "wasted scan" you're trying to avoid.
> 
> Also, I don't *really* care about optimizing something that runs once
> every 120 seconds :)
> 
> Nacked-by: Lance Yang <lance.yang@linux.dev>

Yes, please ignore. This is indeed wrong.

Regarding the value of the optimisation, while a 120-second interval
implies a low frequency, the cost of the scan is O(N). On large servers
with high thread counts (even if idle), iterating the entire task list
dirties cache lines and consumes memory bandwidth unnecessarily.

Nevertheless, we currently do not have a way to economically compute the
total number of tasks in TASK_UNINTERRUPTIBLE state.


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] hung_task: Skip scan on idle systems
  2026-01-26 20:14   ` Aaron Tomlin
@ 2026-02-02 13:55     ` Petr Mladek
  2026-02-07 21:01       ` Aaron Tomlin
  0 siblings, 1 reply; 5+ messages in thread
From: Petr Mladek @ 2026-02-02 13:55 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: Lance Yang, neelx, sean, mhiramat, akpm, joel.granados, mproche,
	chjohnst, nick.lange, gregkh, linux-kernel

On Mon 2026-01-26 15:14:27, Aaron Tomlin wrote:
> On Mon, Jan 26, 2026 at 01:23:01PM +0800, Lance Yang wrote:
> > Hi Aaron,
> 
> Hi Lance,
> 
> > Keep one patch or series under review at a time, especially in the
> > same subsystem ...

+1 :-)

> Understood. That's fair.
> 
> > > @@ -503,6 +504,7 @@ static int watchdog(void *dummy)
> > >   	for ( ; ; ) {
> > >   		unsigned long timeout = sysctl_hung_task_timeout_secs;
> > >   		unsigned long interval = sysctl_hung_task_check_interval_secs;
> > > +		unsigned long load[3];
> > >   		long t;
> > >   		if (interval == 0)
> > > @@ -511,8 +513,12 @@ static int watchdog(void *dummy)
> > >   		t = hung_timeout_jiffies(hung_last_checked, interval);
> > >   		if (t <= 0) {
> > >   			if (!atomic_xchg(&reset_hung_task, 0) &&
> > > -			    !hung_detector_suspended)
> > > -				check_hung_uninterruptible_tasks(timeout);
> > > +			    !hung_detector_suspended) {
> > > +				/* Check 1-min load to detect idle system */
> > > +				get_avenrun(load, 0, 0);
> > > +				if (load[0] > 0)
> > > +					check_hung_uninterruptible_tasks(timeout);
> > 
> > The optimization is not worth the trouble.
> > 
> > I don't think the assumption that "load[0] == 0 means no hung tasks" is
> > 100% correct.
> > 
> > So that would miss actual hung tasks - a false negative, which is worse
> > than the "wasted scan" you're trying to avoid.
> > 
> > Also, I don't *really* care about optimizing something that runs once
> > every 120 seconds :)
> > 
> > Nacked-by: Lance Yang <lance.yang@linux.dev>
> 
> Yes, please ignore. This is indeed wrong.
> 
> Regarding the value of the optimisation, while a 120-second interval
> implies a low frequency, the cost of the scan is O(N). On large servers
> with high thread counts (even if idle), iterating the entire task list
> dirties cache lines and consumes memory bandwidth unnecessarily.
> 
> Nevertheless, we currently do not have a way to economically compute the
> total number of tasks in TASK_UNINTERRUPTIBLE state.

It makes some sense. And the check of the average load is trivial
so it might be acceptable.

But I somehow doubt that it works. Have you ever seen a system with
(avenrun[0] == 0)? IMHO, it might be pretty hard to achieve it.
Or maybe I am too pessimistic. Or are there embedded systems which can
only be waken by some interrupt from a sensor? Do embedded systems
run hung task detector?

By other words. Is this patch solving a theoretical scenario?
Did you test it in practice, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] hung_task: Skip scan on idle systems
  2026-02-02 13:55     ` Petr Mladek
@ 2026-02-07 21:01       ` Aaron Tomlin
  0 siblings, 0 replies; 5+ messages in thread
From: Aaron Tomlin @ 2026-02-07 21:01 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Lance Yang, neelx, sean, mhiramat, akpm, joel.granados, mproche,
	chjohnst, nick.lange, gregkh, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 993 bytes --]

On Mon, Feb 02, 2026 at 02:55:41PM +0100, Petr Mladek wrote:
> It makes some sense. And the check of the average load is trivial
> so it might be acceptable.
> 
> But I somehow doubt that it works. Have you ever seen a system with
> (avenrun[0] == 0)? IMHO, it might be pretty hard to achieve it.
> Or maybe I am too pessimistic. Or are there embedded systems which can
> only be waken by some interrupt from a sensor? Do embedded systems
> run hung task detector?
> 
> By other words. Is this patch solving a theoretical scenario?
> Did you test it in practice, please?
> 
> Best Regards,

Hi Petr,

You are entirely correct; this was a purely theoretical proposition.

I have not validated this against a production workload to quantify any
potential savings. Achieving a load average of exactly zero is elusive in
practice on modern systems, rendering the optimisation likely ineffective.

Please consider this patch withdrawn.


Best regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-07 21:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-26  3:45 [PATCH] hung_task: Skip scan on idle systems Aaron Tomlin
2026-01-26  5:23 ` Lance Yang
2026-01-26 20:14   ` Aaron Tomlin
2026-02-02 13:55     ` Petr Mladek
2026-02-07 21:01       ` Aaron Tomlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox