public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector
@ 2026-01-25 13:58 Aaron Tomlin
  2026-01-25 13:58 ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Aaron Tomlin @ 2026-01-25 13:58 UTC (permalink / raw)
  To: akpm, lance.yang, mhiramat, gregkh, pmladek, joel.granados
  Cc: neelx, sean, mproche, chjohnst, nick.lange, linux-kernel

Hi Lance, Greg, Petr, Joel, Andrew,

This series introduces the ability to reset
/proc/sys/kernel/hung_task_detect_count.

Writing a "0" value to this file atomically resets the counter of detected
hung tasks. This functionality provides system administrators with the
means to clear the cumulative diagnostic history following incident
resolution, thereby simplifying subsequent monitoring without necessitating
a system restart.

The updated logic ensures that the long-running scan (which is inherently
preemptible and subject to rcu_lock_break()) does not become desynchronised
from the global state. By treating the initial read as a "version snapshot"
the kernel can guarantee that the cumulative count only updates if the
underlying state remained stable throughout the duration of the
scan.

Please let me know your thoughts.


Changes since v6 [1]:
 - Decoupled the detection logic from the reporting logic.
   check_hung_task() was renamed to hung_task_info() and the task_is_hung()
   check was hoisted into the primary check_hung_uninterruptible_tasks()
   loop (Petr Mladek)

 - Changed the global statistic update mechanism from incremental atomic
   increments (per task) to a single batched atomic_long_cmpxchg_relaxed()
   at the end of the scan (Petr Mladek)

 - Strengthened the memory ordering using atomic_long_read_acquire() and
   atomic_long_cmpxchg_release(). This ensures strict synchronisation
   between the scan and concurrent userspace resets (Petr Mladek)

 - Updated the inline reporting comment to remove the hardcoded "2 minutes"
   reference

Changes since v5 [2]:
 - Introduced a preparatory patch (Joel Granados)
 - Extended custom proc_handler to handle SYSCTL_USER_TO_KERN writes,
   strictly validating that only a value of "0" is permitted for resets
   (Joel Granados)
 - Transitioned from atomic_long_inc_return_relaxed() to a more robust
   read_acquire/cmpxchg_release pattern to ensure "All-or-Nothing" scan
   updates (Petr Mladek)
 - Re-introduce hung_task_diagnostics(). For better readability and
   consistent metadata publication

Changes since v4 [3]:
 - Added missing underflow check (Lance Yang) 

Changes since v3 [4]:
 - Use atomic operations to ensure cross-CPU visibility and prevent an integer underflow
 - Use acquire/release semantics for memory ordering (Petr Mladek)
 - Move quoted string to a single line (Petr Mladek)
 - Remove variables coredump_msg and disable_msg to simplify code (Petr Mladek)
 - Add trailing "\n" to all strings to ensure immediate console flushing (Petr Mladek)
 - Improve the hung task counter documentation (Joel Granados)
 - Reject non-zero writes with -EINVAL (Joel Granados)
 - Translate to the new sysctl API (Petr Mladek)

Changes since v2 [5]:
 - Avoided a needless double update to hung_task_detect_count (Lance Yang)
 - Restored previous use of pr_err() for each message (Greg KH)
 - Provided a complete descriptive comment for the helper

Changes since v1 [6]:
 - Removed write-only sysfs attribute (Lance Yang)
 - Modified procfs hung_task_detect_count instead (Lance Yang)
 - Introduced a custom proc_handler
 - Updated documentation (Lance Yang)
 - Added 'static inline' as a hint to eliminate any function call overhead
 - Removed clutter through encapsulation

[1]: https://lore.kernel.org/lkml/20260115023229.3028462-1-atomlin@atomlin.com/
[2]: https://lore.kernel.org/lkml/20251231004125.2380105-1-atomlin@atomlin.com/
[3]: https://lore.kernel.org/lkml/20251222014210.2032214-1-atomlin@atomlin.com/
[4]: https://lore.kernel.org/all/20251216030036.1822217-1-atomlin@atomlin.com/
[5]: https://lore.kernel.org/lkml/20251211033004.1628875-1-atomlin@atomlin.com/
[6]: https://lore.kernel.org/lkml/20251209041218.1583600-1-atomlin@atomlin.com/

Aaron Tomlin (2):
  hung_task: Refactor detection logic and atomicise detection count
  hung_task: Enable runtime reset of hung_task_detect_count

 Documentation/admin-guide/sysctl/kernel.rst |   3 +-
 kernel/hung_task.c                          | 108 +++++++++++++++-----
 2 files changed, 82 insertions(+), 29 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count
  2026-01-25 13:58 [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
@ 2026-01-25 13:58 ` Aaron Tomlin
  2026-02-02  6:10   ` Masami Hiramatsu
                     ` (2 more replies)
  2026-01-25 13:58 ` [v7 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
  2026-02-01 19:48 ` [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
  2 siblings, 3 replies; 19+ messages in thread
From: Aaron Tomlin @ 2026-01-25 13:58 UTC (permalink / raw)
  To: akpm, lance.yang, mhiramat, gregkh, pmladek, joel.granados
  Cc: neelx, sean, mproche, chjohnst, nick.lange, linux-kernel

The check_hung_task() function currently conflates two distinct
responsibilities: validating whether a task is hung and handling the
subsequent reporting (printing warnings, triggering panics, or
tracepoints).

This patch refactors the logic by introducing hung_task_info(), a
function dedicated solely to reporting. The actual detection check,
task_is_hung(), is hoisted into the primary loop within
check_hung_uninterruptible_tasks(). This separation clearly decouples
the mechanism of detection from the policy of reporting.

Furthermore, to facilitate future support for concurrent hung task
detection, the global sysctl_hung_task_detect_count variable is
converted from unsigned long to atomic_long_t. Consequently, the
counting logic is updated to accumulate the number of hung tasks locally
(this_round_count) during the iteration. The global counter is then
updated atomically via atomic_long_cmpxchg_relaxed() once the loop
concludes, rather than incrementally during the scan.

These changes are strictly preparatory and introduce no functional
change to the system's runtime behaviour.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
 kernel/hung_task.c | 58 ++++++++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index d2254c91450b..df10830ed9ef 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -36,7 +36,7 @@ static int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
 /*
  * Total number of tasks detected as hung since boot:
  */
-static unsigned long __read_mostly sysctl_hung_task_detect_count;
+static atomic_long_t sysctl_hung_task_detect_count = ATOMIC_LONG_INIT(0);
 
 /*
  * Limit number of tasks checked in a batch.
@@ -223,31 +223,29 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
 }
 #endif
 
-static void check_hung_task(struct task_struct *t, unsigned long timeout,
-		unsigned long prev_detect_count)
+/**
+ * hung_task_info - Print diagnostic details for a hung task
+ * @t: Pointer to the detected hung task.
+ * @timeout: Timeout threshold for detecting hung tasks
+ * @this_round_count: Count of hung tasks detected in the current iteration
+ *
+ * Print structured information about the specified hung task, if warnings
+ * are enabled or if the panic batch threshold is exceeded.
+ */
+static void hung_task_info(struct task_struct *t, unsigned long timeout,
+			   unsigned long this_round_count)
 {
-	unsigned long total_hung_task;
-
-	if (!task_is_hung(t, timeout))
-		return;
-
-	/*
-	 * This counter tracks the total number of tasks detected as hung
-	 * since boot.
-	 */
-	sysctl_hung_task_detect_count++;
-
-	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
 	trace_sched_process_hang(t);
 
-	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
+	if (sysctl_hung_task_panic && this_round_count >= sysctl_hung_task_panic) {
 		console_verbose();
 		hung_task_call_panic = true;
 	}
 
 	/*
-	 * Ok, the task did not get scheduled for more than 2 minutes,
-	 * complain:
+	 * The given task did not get scheduled for more than
+	 * CONFIG_DEFAULT_HUNG_TASK_TIMEOUT. Therefore, complain
+	 * accordingly
 	 */
 	if (sysctl_hung_task_warnings || hung_task_call_panic) {
 		if (sysctl_hung_task_warnings > 0)
@@ -297,18 +295,18 @@ static bool rcu_lock_break(struct task_struct *g, struct task_struct *t)
 
 /*
  * Check whether a TASK_UNINTERRUPTIBLE does not get woken up for
- * a really long time (120 seconds). If that happens, print out
- * a warning.
+ * a really long time. If that happens, print out a warning.
  */
 static void check_hung_uninterruptible_tasks(unsigned long timeout)
 {
 	int max_count = sysctl_hung_task_check_count;
 	unsigned long last_break = jiffies;
 	struct task_struct *g, *t;
-	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
+	unsigned long total_count, this_round_count;
 	int need_warning = sysctl_hung_task_warnings;
 	unsigned long si_mask = hung_task_si_mask;
 
+	total_count = atomic_long_read(&sysctl_hung_task_detect_count);
 	/*
 	 * If the system crashed already then all bets are off,
 	 * do not report extra hung tasks:
@@ -316,10 +314,9 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	if (test_taint(TAINT_DIE) || did_panic)
 		return;
 
-
+	this_round_count = 0;
 	rcu_read_lock();
 	for_each_process_thread(g, t) {
-
 		if (!max_count--)
 			goto unlock;
 		if (time_after(jiffies, last_break + HUNG_TASK_LOCK_BREAK)) {
@@ -328,14 +325,25 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 			last_break = jiffies;
 		}
 
-		check_hung_task(t, timeout, prev_detect_count);
+		if (task_is_hung(t, timeout)) {
+			this_round_count++;
+			hung_task_info(t, timeout, this_round_count);
+		}
 	}
  unlock:
 	rcu_read_unlock();
 
-	if (!(sysctl_hung_task_detect_count - prev_detect_count))
+	if (!this_round_count)
 		return;
 
+	/*
+	 * This counter tracks the total number of tasks detected as hung
+	 * since boot.
+	 */
+	atomic_long_cmpxchg_relaxed(&sysctl_hung_task_detect_count,
+				    total_count, total_count +
+				    this_round_count);
+
 	if (need_warning || hung_task_call_panic) {
 		si_mask |= SYS_INFO_LOCKS;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [v7 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-25 13:58 [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
  2026-01-25 13:58 ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
@ 2026-01-25 13:58 ` Aaron Tomlin
  2026-02-02  6:09   ` Masami Hiramatsu
  2026-02-02 13:26   ` Petr Mladek
  2026-02-01 19:48 ` [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
  2 siblings, 2 replies; 19+ messages in thread
From: Aaron Tomlin @ 2026-01-25 13:58 UTC (permalink / raw)
  To: akpm, lance.yang, mhiramat, gregkh, pmladek, joel.granados
  Cc: neelx, sean, mproche, chjohnst, nick.lange, linux-kernel

Currently, the hung_task_detect_count sysctl provides a cumulative count
of hung tasks since boot. In long-running, high-availability
environments, this counter may lose its utility if it cannot be reset
once an incident has been resolved. Furthermore, the previous
implementation relied upon implicit ordering, which could not strictly
guarantee that diagnostic metadata published by one CPU was visible to
the panic logic on another.

This patch introduces the capability to reset the detection count by
writing "0" to the hung_task_detect_count sysctl. The proc_handler logic
has been updated to validate this input and atomically reset the
counter.

The synchronisation of sysctl_hung_task_detect_count relies upon a
transactional model to ensure the integrity of the detection counter
against concurrent resets from userspace. The application of
atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
and provides the following guarantees:

    1. Prevention of Load-Store Reordering via Acquire Semantics By
       utilising atomic_long_read_acquire() to snapshot the counter
       before initiating the task traversal, we establish a strict
       memory barrier. This prevents the compiler or hardware from
       reordering the initial load to a point later in the scan. Without
       this "acquire" barrier, a delayed load could potentially read a
       "0" value resulting from a userspace reset that occurred
       mid-scan. This would lead to the subsequent cmpxchg succeeding
       erroneously, thereby overwriting the user's reset with stale
       increment data.

    2. Atomicity of the "Commit" Phase via Release Semantics The
       atomic_long_cmpxchg_release() serves as the transaction's commit
       point. The "release" barrier ensures that all diagnostic
       recordings and task-state observations made during the scan are
       globally visible before the counter is incremented.

    3. Race Condition Resolution This pairing effectively detects any
       "out-of-band" reset of the counter. If
       sysctl_hung_task_detect_count is modified via the procfs
       interface during the scan, the final cmpxchg will detect the
       discrepancy between the current value and the "acquire" snapshot.
       Consequently, the update will fail, ensuring that a reset command
       from the administrator is prioritised over a scan that may have
       been invalidated by that very reset.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
 Documentation/admin-guide/sysctl/kernel.rst |  3 +-
 kernel/hung_task.c                          | 58 ++++++++++++++++++---
 2 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 239da22c4e28..68da4235225a 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -418,7 +418,8 @@ hung_task_detect_count
 ======================
 
 Indicates the total number of tasks that have been detected as hung since
-the system boot.
+the system boot or since the counter was reset. The counter is zeroed when
+a value of 0 is written.
 
 This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
 
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index df10830ed9ef..350093de0535 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -306,7 +306,11 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	int need_warning = sysctl_hung_task_warnings;
 	unsigned long si_mask = hung_task_si_mask;
 
-	total_count = atomic_long_read(&sysctl_hung_task_detect_count);
+	/*
+	 * The counter might get reset. Remember the initial value.
+	 * Acquire prevents reordering task checks before this point.
+	 */
+	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
 	/*
 	 * If the system crashed already then all bets are off,
 	 * do not report extra hung tasks:
@@ -337,10 +341,11 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 		return;
 
 	/*
-	 * This counter tracks the total number of tasks detected as hung
-	 * since boot.
+	 * Do not count this round when the global counter has been reset
+	 * during this check. Release ensures we see all hang details
+	 * recorded during the scan.
 	 */
-	atomic_long_cmpxchg_relaxed(&sysctl_hung_task_detect_count,
+	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count,
 				    total_count, total_count +
 				    this_round_count);
 
@@ -366,6 +371,46 @@ static long hung_timeout_jiffies(unsigned long last_checked,
 }
 
 #ifdef CONFIG_SYSCTL
+
+/**
+ * proc_dohung_task_detect_count - proc handler for hung_task_detect_count
+ * @table: Pointer to the struct ctl_table definition for this proc entry
+ * @dir: Flag indicating the operation
+ * @buffer: User space buffer for data transfer
+ * @lenp: Pointer to the length of the data being transferred
+ * @ppos: Pointer to the current file offset
+ *
+ * This handler is used for reading the current hung task detection count
+ * and for resetting it to zero when a write operation is performed using a
+ * zero value only.
+ * Return: 0 on success, or a negative error code on failure.
+ */
+static int proc_dohung_task_detect_count(const struct ctl_table *table, int dir,
+					 void *buffer, size_t *lenp, loff_t *ppos)
+{
+	unsigned long detect_count;
+	struct ctl_table proxy_table;
+	int err;
+
+	proxy_table = *table;
+	proxy_table.data = &detect_count;
+
+	if (SYSCTL_KERN_TO_USER(dir))
+		detect_count = atomic_long_read(&sysctl_hung_task_detect_count);
+
+	err = proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
+	if (err < 0)
+		return err;
+
+	if (SYSCTL_USER_TO_KERN(dir)) {
+		if (detect_count)
+			return -EINVAL;
+		atomic_long_set(&sysctl_hung_task_detect_count, 0);
+	}
+
+	return 0;
+}
+
 /*
  * Process updating of timeout sysctl
  */
@@ -446,10 +491,9 @@ static const struct ctl_table hung_task_sysctls[] = {
 	},
 	{
 		.procname	= "hung_task_detect_count",
-		.data		= &sysctl_hung_task_detect_count,
 		.maxlen		= sizeof(unsigned long),
-		.mode		= 0444,
-		.proc_handler	= proc_doulongvec_minmax,
+		.mode		= 0644,
+		.proc_handler	= proc_dohung_task_detect_count,
 	},
 	{
 		.procname	= "hung_task_sys_info",
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector
  2026-01-25 13:58 [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
  2026-01-25 13:58 ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
  2026-01-25 13:58 ` [v7 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
@ 2026-02-01 19:48 ` Aaron Tomlin
  2 siblings, 0 replies; 19+ messages in thread
From: Aaron Tomlin @ 2026-02-01 19:48 UTC (permalink / raw)
  To: akpm, lance.yang, mhiramat, gregkh, pmladek, joel.granados
  Cc: neelx, sean, mproche, chjohnst, nick.lange, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

On Sun, Jan 25, 2026 at 08:58:46AM -0500, Aaron Tomlin wrote:
> This series introduces the ability to reset
> /proc/sys/kernel/hung_task_detect_count.

Hi Andrew, Lance, Greg, Petr, Joel and Masami,

I am keen to ascertain if this series now aligns with your expectations. I
would appreciate your confirmation that this implementation is suitable for
acceptance before I commit to any further patch work or minor refinements.


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v7 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-25 13:58 ` [v7 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
@ 2026-02-02  6:09   ` Masami Hiramatsu
  2026-02-02 13:26   ` Petr Mladek
  1 sibling, 0 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2026-02-02  6:09 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: akpm, lance.yang, gregkh, pmladek, joel.granados, neelx, sean,
	mproche, chjohnst, nick.lange, linux-kernel

On Sun, 25 Jan 2026 08:58:48 -0500
Aaron Tomlin <atomlin@atomlin.com> wrote:

> Currently, the hung_task_detect_count sysctl provides a cumulative count
> of hung tasks since boot. In long-running, high-availability
> environments, this counter may lose its utility if it cannot be reset
> once an incident has been resolved. Furthermore, the previous
> implementation relied upon implicit ordering, which could not strictly
> guarantee that diagnostic metadata published by one CPU was visible to
> the panic logic on another.
> 
> This patch introduces the capability to reset the detection count by
> writing "0" to the hung_task_detect_count sysctl. The proc_handler logic
> has been updated to validate this input and atomically reset the
> counter.
> 
> The synchronisation of sysctl_hung_task_detect_count relies upon a
> transactional model to ensure the integrity of the detection counter
> against concurrent resets from userspace. The application of
> atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
> and provides the following guarantees:
> 
>     1. Prevention of Load-Store Reordering via Acquire Semantics By
>        utilising atomic_long_read_acquire() to snapshot the counter
>        before initiating the task traversal, we establish a strict
>        memory barrier. This prevents the compiler or hardware from
>        reordering the initial load to a point later in the scan. Without
>        this "acquire" barrier, a delayed load could potentially read a
>        "0" value resulting from a userspace reset that occurred
>        mid-scan. This would lead to the subsequent cmpxchg succeeding
>        erroneously, thereby overwriting the user's reset with stale
>        increment data.
> 
>     2. Atomicity of the "Commit" Phase via Release Semantics The
>        atomic_long_cmpxchg_release() serves as the transaction's commit
>        point. The "release" barrier ensures that all diagnostic
>        recordings and task-state observations made during the scan are
>        globally visible before the counter is incremented.
> 
>     3. Race Condition Resolution This pairing effectively detects any
>        "out-of-band" reset of the counter. If
>        sysctl_hung_task_detect_count is modified via the procfs
>        interface during the scan, the final cmpxchg will detect the
>        discrepancy between the current value and the "acquire" snapshot.
>        Consequently, the update will fail, ensuring that a reset command
>        from the administrator is prioritised over a scan that may have
>        been invalidated by that very reset.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thanks,

> ---
>  Documentation/admin-guide/sysctl/kernel.rst |  3 +-
>  kernel/hung_task.c                          | 58 ++++++++++++++++++---
>  2 files changed, 53 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 239da22c4e28..68da4235225a 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -418,7 +418,8 @@ hung_task_detect_count
>  ======================
>  
>  Indicates the total number of tasks that have been detected as hung since
> -the system boot.
> +the system boot or since the counter was reset. The counter is zeroed when
> +a value of 0 is written.
>  
>  This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>  
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index df10830ed9ef..350093de0535 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -306,7 +306,11 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  	int need_warning = sysctl_hung_task_warnings;
>  	unsigned long si_mask = hung_task_si_mask;
>  
> -	total_count = atomic_long_read(&sysctl_hung_task_detect_count);
> +	/*
> +	 * The counter might get reset. Remember the initial value.
> +	 * Acquire prevents reordering task checks before this point.
> +	 */
> +	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
>  	/*
>  	 * If the system crashed already then all bets are off,
>  	 * do not report extra hung tasks:
> @@ -337,10 +341,11 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  		return;
>  
>  	/*
> -	 * This counter tracks the total number of tasks detected as hung
> -	 * since boot.
> +	 * Do not count this round when the global counter has been reset
> +	 * during this check. Release ensures we see all hang details
> +	 * recorded during the scan.
>  	 */
> -	atomic_long_cmpxchg_relaxed(&sysctl_hung_task_detect_count,
> +	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count,
>  				    total_count, total_count +
>  				    this_round_count);
>  
> @@ -366,6 +371,46 @@ static long hung_timeout_jiffies(unsigned long last_checked,
>  }
>  
>  #ifdef CONFIG_SYSCTL
> +
> +/**
> + * proc_dohung_task_detect_count - proc handler for hung_task_detect_count
> + * @table: Pointer to the struct ctl_table definition for this proc entry
> + * @dir: Flag indicating the operation
> + * @buffer: User space buffer for data transfer
> + * @lenp: Pointer to the length of the data being transferred
> + * @ppos: Pointer to the current file offset
> + *
> + * This handler is used for reading the current hung task detection count
> + * and for resetting it to zero when a write operation is performed using a
> + * zero value only.
> + * Return: 0 on success, or a negative error code on failure.
> + */
> +static int proc_dohung_task_detect_count(const struct ctl_table *table, int dir,
> +					 void *buffer, size_t *lenp, loff_t *ppos)
> +{
> +	unsigned long detect_count;
> +	struct ctl_table proxy_table;
> +	int err;
> +
> +	proxy_table = *table;
> +	proxy_table.data = &detect_count;
> +
> +	if (SYSCTL_KERN_TO_USER(dir))
> +		detect_count = atomic_long_read(&sysctl_hung_task_detect_count);
> +
> +	err = proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
> +	if (err < 0)
> +		return err;
> +
> +	if (SYSCTL_USER_TO_KERN(dir)) {
> +		if (detect_count)
> +			return -EINVAL;
> +		atomic_long_set(&sysctl_hung_task_detect_count, 0);
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * Process updating of timeout sysctl
>   */
> @@ -446,10 +491,9 @@ static const struct ctl_table hung_task_sysctls[] = {
>  	},
>  	{
>  		.procname	= "hung_task_detect_count",
> -		.data		= &sysctl_hung_task_detect_count,
>  		.maxlen		= sizeof(unsigned long),
> -		.mode		= 0444,
> -		.proc_handler	= proc_doulongvec_minmax,
> +		.mode		= 0644,
> +		.proc_handler	= proc_dohung_task_detect_count,
>  	},
>  	{
>  		.procname	= "hung_task_sys_info",
> -- 
> 2.51.0
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count
  2026-01-25 13:58 ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
@ 2026-02-02  6:10   ` Masami Hiramatsu
  2026-02-02 12:59   ` Petr Mladek
  2026-02-03  3:05   ` Lance Yang
  2 siblings, 0 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2026-02-02  6:10 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: akpm, lance.yang, gregkh, pmladek, joel.granados, neelx, sean,
	mproche, chjohnst, nick.lange, linux-kernel

On Sun, 25 Jan 2026 08:58:47 -0500
Aaron Tomlin <atomlin@atomlin.com> wrote:

> The check_hung_task() function currently conflates two distinct
> responsibilities: validating whether a task is hung and handling the
> subsequent reporting (printing warnings, triggering panics, or
> tracepoints).
> 
> This patch refactors the logic by introducing hung_task_info(), a
> function dedicated solely to reporting. The actual detection check,
> task_is_hung(), is hoisted into the primary loop within
> check_hung_uninterruptible_tasks(). This separation clearly decouples
> the mechanism of detection from the policy of reporting.
> 
> Furthermore, to facilitate future support for concurrent hung task
> detection, the global sysctl_hung_task_detect_count variable is
> converted from unsigned long to atomic_long_t. Consequently, the
> counting logic is updated to accumulate the number of hung tasks locally
> (this_round_count) during the iteration. The global counter is then
> updated atomically via atomic_long_cmpxchg_relaxed() once the loop
> concludes, rather than incrementally during the scan.
> 
> These changes are strictly preparatory and introduce no functional
> change to the system's runtime behaviour.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thanks,

> ---
>  kernel/hung_task.c | 58 ++++++++++++++++++++++++++--------------------
>  1 file changed, 33 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index d2254c91450b..df10830ed9ef 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -36,7 +36,7 @@ static int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
>  /*
>   * Total number of tasks detected as hung since boot:
>   */
> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
> +static atomic_long_t sysctl_hung_task_detect_count = ATOMIC_LONG_INIT(0);
>  
>  /*
>   * Limit number of tasks checked in a batch.
> @@ -223,31 +223,29 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
>  }
>  #endif
>  
> -static void check_hung_task(struct task_struct *t, unsigned long timeout,
> -		unsigned long prev_detect_count)
> +/**
> + * hung_task_info - Print diagnostic details for a hung task
> + * @t: Pointer to the detected hung task.
> + * @timeout: Timeout threshold for detecting hung tasks
> + * @this_round_count: Count of hung tasks detected in the current iteration
> + *
> + * Print structured information about the specified hung task, if warnings
> + * are enabled or if the panic batch threshold is exceeded.
> + */
> +static void hung_task_info(struct task_struct *t, unsigned long timeout,
> +			   unsigned long this_round_count)
>  {
> -	unsigned long total_hung_task;
> -
> -	if (!task_is_hung(t, timeout))
> -		return;
> -
> -	/*
> -	 * This counter tracks the total number of tasks detected as hung
> -	 * since boot.
> -	 */
> -	sysctl_hung_task_detect_count++;
> -
> -	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
>  	trace_sched_process_hang(t);
>  
> -	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> +	if (sysctl_hung_task_panic && this_round_count >= sysctl_hung_task_panic) {
>  		console_verbose();
>  		hung_task_call_panic = true;
>  	}
>  
>  	/*
> -	 * Ok, the task did not get scheduled for more than 2 minutes,
> -	 * complain:
> +	 * The given task did not get scheduled for more than
> +	 * CONFIG_DEFAULT_HUNG_TASK_TIMEOUT. Therefore, complain
> +	 * accordingly
>  	 */
>  	if (sysctl_hung_task_warnings || hung_task_call_panic) {
>  		if (sysctl_hung_task_warnings > 0)
> @@ -297,18 +295,18 @@ static bool rcu_lock_break(struct task_struct *g, struct task_struct *t)
>  
>  /*
>   * Check whether a TASK_UNINTERRUPTIBLE does not get woken up for
> - * a really long time (120 seconds). If that happens, print out
> - * a warning.
> + * a really long time. If that happens, print out a warning.
>   */
>  static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  {
>  	int max_count = sysctl_hung_task_check_count;
>  	unsigned long last_break = jiffies;
>  	struct task_struct *g, *t;
> -	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> +	unsigned long total_count, this_round_count;
>  	int need_warning = sysctl_hung_task_warnings;
>  	unsigned long si_mask = hung_task_si_mask;
>  
> +	total_count = atomic_long_read(&sysctl_hung_task_detect_count);
>  	/*
>  	 * If the system crashed already then all bets are off,
>  	 * do not report extra hung tasks:
> @@ -316,10 +314,9 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  	if (test_taint(TAINT_DIE) || did_panic)
>  		return;
>  
> -
> +	this_round_count = 0;
>  	rcu_read_lock();
>  	for_each_process_thread(g, t) {
> -
>  		if (!max_count--)
>  			goto unlock;
>  		if (time_after(jiffies, last_break + HUNG_TASK_LOCK_BREAK)) {
> @@ -328,14 +325,25 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  			last_break = jiffies;
>  		}
>  
> -		check_hung_task(t, timeout, prev_detect_count);
> +		if (task_is_hung(t, timeout)) {
> +			this_round_count++;
> +			hung_task_info(t, timeout, this_round_count);
> +		}
>  	}
>   unlock:
>  	rcu_read_unlock();
>  
> -	if (!(sysctl_hung_task_detect_count - prev_detect_count))
> +	if (!this_round_count)
>  		return;
>  
> +	/*
> +	 * This counter tracks the total number of tasks detected as hung
> +	 * since boot.
> +	 */
> +	atomic_long_cmpxchg_relaxed(&sysctl_hung_task_detect_count,
> +				    total_count, total_count +
> +				    this_round_count);
> +
>  	if (need_warning || hung_task_call_panic) {
>  		si_mask |= SYS_INFO_LOCKS;
>  
> -- 
> 2.51.0
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count
  2026-01-25 13:58 ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
  2026-02-02  6:10   ` Masami Hiramatsu
@ 2026-02-02 12:59   ` Petr Mladek
  2026-02-03  3:05   ` Lance Yang
  2 siblings, 0 replies; 19+ messages in thread
From: Petr Mladek @ 2026-02-02 12:59 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: akpm, lance.yang, mhiramat, gregkh, joel.granados, neelx, sean,
	mproche, chjohnst, nick.lange, linux-kernel

On Sun 2026-01-25 08:58:47, Aaron Tomlin wrote:
> The check_hung_task() function currently conflates two distinct
> responsibilities: validating whether a task is hung and handling the
> subsequent reporting (printing warnings, triggering panics, or
> tracepoints).
> 
> This patch refactors the logic by introducing hung_task_info(), a
> function dedicated solely to reporting. The actual detection check,
> task_is_hung(), is hoisted into the primary loop within
> check_hung_uninterruptible_tasks(). This separation clearly decouples
> the mechanism of detection from the policy of reporting.
> 
> Furthermore, to facilitate future support for concurrent hung task
> detection, the global sysctl_hung_task_detect_count variable is
> converted from unsigned long to atomic_long_t. Consequently, the
> counting logic is updated to accumulate the number of hung tasks locally
> (this_round_count) during the iteration. The global counter is then
> updated atomically via atomic_long_cmpxchg_relaxed() once the loop
> concludes, rather than incrementally during the scan.
> 
> These changes are strictly preparatory and introduce no functional
> change to the system's runtime behaviour.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>

LGTM. Feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v7 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-25 13:58 ` [v7 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
  2026-02-02  6:09   ` Masami Hiramatsu
@ 2026-02-02 13:26   ` Petr Mladek
  1 sibling, 0 replies; 19+ messages in thread
From: Petr Mladek @ 2026-02-02 13:26 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: akpm, lance.yang, mhiramat, gregkh, joel.granados, neelx, sean,
	mproche, chjohnst, nick.lange, linux-kernel

On Sun 2026-01-25 08:58:48, Aaron Tomlin wrote:
> Currently, the hung_task_detect_count sysctl provides a cumulative count
> of hung tasks since boot. In long-running, high-availability
> environments, this counter may lose its utility if it cannot be reset
> once an incident has been resolved. Furthermore, the previous
> implementation relied upon implicit ordering, which could not strictly
> guarantee that diagnostic metadata published by one CPU was visible to
> the panic logic on another.
> 
> This patch introduces the capability to reset the detection count by
> writing "0" to the hung_task_detect_count sysctl. The proc_handler logic
> has been updated to validate this input and atomically reset the
> counter.
> 
> The synchronisation of sysctl_hung_task_detect_count relies upon a
> transactional model to ensure the integrity of the detection counter
> against concurrent resets from userspace. The application of
> atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
> and provides the following guarantees:
> 
>     1. Prevention of Load-Store Reordering via Acquire Semantics By
>        utilising atomic_long_read_acquire() to snapshot the counter
>        before initiating the task traversal, we establish a strict
>        memory barrier. This prevents the compiler or hardware from
>        reordering the initial load to a point later in the scan. Without
>        this "acquire" barrier, a delayed load could potentially read a
>        "0" value resulting from a userspace reset that occurred
>        mid-scan. This would lead to the subsequent cmpxchg succeeding
>        erroneously, thereby overwriting the user's reset with stale
>        increment data.
> 
>     2. Atomicity of the "Commit" Phase via Release Semantics The
>        atomic_long_cmpxchg_release() serves as the transaction's commit
>        point. The "release" barrier ensures that all diagnostic
>        recordings and task-state observations made during the scan are
>        globally visible before the counter is incremented.
> 
>     3. Race Condition Resolution This pairing effectively detects any
>        "out-of-band" reset of the counter. If
>        sysctl_hung_task_detect_count is modified via the procfs
>        interface during the scan, the final cmpxchg will detect the
>        discrepancy between the current value and the "acquire" snapshot.
>        Consequently, the update will fail, ensuring that a reset command
>        from the administrator is prioritised over a scan that may have
>        been invalidated by that very reset.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>

LGTM, feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count
  2026-01-25 13:58 ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
  2026-02-02  6:10   ` Masami Hiramatsu
  2026-02-02 12:59   ` Petr Mladek
@ 2026-02-03  3:05   ` Lance Yang
  2026-02-03  3:08     ` Lance Yang
  2 siblings, 1 reply; 19+ messages in thread
From: Lance Yang @ 2026-02-03  3:05 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: neelx, sean, akpm, mproche, chjohnst, nick.lange, linux-kernel,
	mhiramat, joel.granados, pmladek, gregkh



On 2026/1/25 21:58, Aaron Tomlin wrote:
> The check_hung_task() function currently conflates two distinct
> responsibilities: validating whether a task is hung and handling the
> subsequent reporting (printing warnings, triggering panics, or
> tracepoints).
> 
> This patch refactors the logic by introducing hung_task_info(), a
> function dedicated solely to reporting. The actual detection check,
> task_is_hung(), is hoisted into the primary loop within
> check_hung_uninterruptible_tasks(). This separation clearly decouples
> the mechanism of detection from the policy of reporting.
> 
> Furthermore, to facilitate future support for concurrent hung task
> detection, the global sysctl_hung_task_detect_count variable is
> converted from unsigned long to atomic_long_t. Consequently, the
> counting logic is updated to accumulate the number of hung tasks locally
> (this_round_count) during the iteration. The global counter is then
> updated atomically via atomic_long_cmpxchg_relaxed() once the loop
> concludes, rather than incrementally during the scan.
> 
> These changes are strictly preparatory and introduce no functional
> change to the system's runtime behaviour.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
>   kernel/hung_task.c | 58 ++++++++++++++++++++++++++--------------------
>   1 file changed, 33 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index d2254c91450b..df10830ed9ef 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -36,7 +36,7 @@ static int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
>   /*
>    * Total number of tasks detected as hung since boot:
>    */
> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
> +static atomic_long_t sysctl_hung_task_detect_count = ATOMIC_LONG_INIT(0);
>   
>   /*
>    * Limit number of tasks checked in a batch.
> @@ -223,31 +223,29 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
>   }
>   #endif
>   
> -static void check_hung_task(struct task_struct *t, unsigned long timeout,
> -		unsigned long prev_detect_count)
> +/**
> + * hung_task_info - Print diagnostic details for a hung task
> + * @t: Pointer to the detected hung task.
> + * @timeout: Timeout threshold for detecting hung tasks
> + * @this_round_count: Count of hung tasks detected in the current iteration
> + *
> + * Print structured information about the specified hung task, if warnings
> + * are enabled or if the panic batch threshold is exceeded.
> + */
> +static void hung_task_info(struct task_struct *t, unsigned long timeout,
> +			   unsigned long this_round_count)
>   {
> -	unsigned long total_hung_task;
> -
> -	if (!task_is_hung(t, timeout))
> -		return;
> -
> -	/*
> -	 * This counter tracks the total number of tasks detected as hung
> -	 * since boot.
> -	 */
> -	sysctl_hung_task_detect_count++;

Previously, the global detect count updated immediately when a hung task
was found. BUT now, it only updates after the full scan finishes ...

Ideally, the count should update as soon as possible, so that userspace
can react in time :)

For example, by migrating critical containers away from the node before
the situation gets worse - something we already do.

Cheers,
Lance

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count
  2026-02-03  3:05   ` Lance Yang
@ 2026-02-03  3:08     ` Lance Yang
  2026-02-03  9:03       ` Petr Mladek
  0 siblings, 1 reply; 19+ messages in thread
From: Lance Yang @ 2026-02-03  3:08 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: neelx, sean, akpm, mproche, chjohnst, nick.lange, linux-kernel,
	mhiramat, joel.granados, pmladek, gregkh



On 2026/2/3 11:05, Lance Yang wrote:
> 
> 
> On 2026/1/25 21:58, Aaron Tomlin wrote:
>> The check_hung_task() function currently conflates two distinct
>> responsibilities: validating whether a task is hung and handling the
>> subsequent reporting (printing warnings, triggering panics, or
>> tracepoints).
>>
>> This patch refactors the logic by introducing hung_task_info(), a
>> function dedicated solely to reporting. The actual detection check,
>> task_is_hung(), is hoisted into the primary loop within
>> check_hung_uninterruptible_tasks(). This separation clearly decouples
>> the mechanism of detection from the policy of reporting.
>>
>> Furthermore, to facilitate future support for concurrent hung task
>> detection, the global sysctl_hung_task_detect_count variable is
>> converted from unsigned long to atomic_long_t. Consequently, the
>> counting logic is updated to accumulate the number of hung tasks locally
>> (this_round_count) during the iteration. The global counter is then
>> updated atomically via atomic_long_cmpxchg_relaxed() once the loop
>> concludes, rather than incrementally during the scan.
>>
>> These changes are strictly preparatory and introduce no functional
>> change to the system's runtime behaviour.
>>
>> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
>> ---
>>   kernel/hung_task.c | 58 ++++++++++++++++++++++++++--------------------
>>   1 file changed, 33 insertions(+), 25 deletions(-)
>>
>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>> index d2254c91450b..df10830ed9ef 100644
>> --- a/kernel/hung_task.c
>> +++ b/kernel/hung_task.c
>> @@ -36,7 +36,7 @@ static int __read_mostly 
>> sysctl_hung_task_check_count = PID_MAX_LIMIT;
>>   /*
>>    * Total number of tasks detected as hung since boot:
>>    */
>> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
>> +static atomic_long_t sysctl_hung_task_detect_count = 
>> ATOMIC_LONG_INIT(0);
>>   /*
>>    * Limit number of tasks checked in a batch.
>> @@ -223,31 +223,29 @@ static inline void debug_show_blocker(struct 
>> task_struct *task, unsigned long ti
>>   }
>>   #endif
>> -static void check_hung_task(struct task_struct *t, unsigned long 
>> timeout,
>> -        unsigned long prev_detect_count)
>> +/**
>> + * hung_task_info - Print diagnostic details for a hung task
>> + * @t: Pointer to the detected hung task.
>> + * @timeout: Timeout threshold for detecting hung tasks
>> + * @this_round_count: Count of hung tasks detected in the current 
>> iteration
>> + *
>> + * Print structured information about the specified hung task, if 
>> warnings
>> + * are enabled or if the panic batch threshold is exceeded.
>> + */
>> +static void hung_task_info(struct task_struct *t, unsigned long timeout,
>> +               unsigned long this_round_count)
>>   {
>> -    unsigned long total_hung_task;
>> -
>> -    if (!task_is_hung(t, timeout))
>> -        return;
>> -
>> -    /*
>> -     * This counter tracks the total number of tasks detected as hung
>> -     * since boot.
>> -     */
>> -    sysctl_hung_task_detect_count++;
> 
> Previously, the global detect count updated immediately when a hung task
> was found. BUT now, it only updates after the full scan finishes ...
> 
> Ideally, the count should update as soon as possible, so that userspace
> can react in time :)
> 
> For example, by migrating critical containers away from the node before
> the situation gets worse - something we already do.

Sorry, I should have said that earlier - just realized it ...


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count
  2026-02-03  3:08     ` Lance Yang
@ 2026-02-03  9:03       ` Petr Mladek
  2026-02-03 11:01         ` Lance Yang
  2026-02-04 14:07         ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
  0 siblings, 2 replies; 19+ messages in thread
From: Petr Mladek @ 2026-02-03  9:03 UTC (permalink / raw)
  To: Lance Yang
  Cc: Aaron Tomlin, neelx, sean, akpm, mproche, chjohnst, nick.lange,
	linux-kernel, mhiramat, joel.granados, gregkh

On Tue 2026-02-03 11:08:33, Lance Yang wrote:
> On 2026/2/3 11:05, Lance Yang wrote:
> > On 2026/1/25 21:58, Aaron Tomlin wrote:
> > > The check_hung_task() function currently conflates two distinct
> > > responsibilities: validating whether a task is hung and handling the
> > > subsequent reporting (printing warnings, triggering panics, or
> > > tracepoints).
> > > 
> > > This patch refactors the logic by introducing hung_task_info(), a
> > > function dedicated solely to reporting. The actual detection check,
> > > task_is_hung(), is hoisted into the primary loop within
> > > check_hung_uninterruptible_tasks(). This separation clearly decouples
> > > the mechanism of detection from the policy of reporting.
> > > 
> > > Furthermore, to facilitate future support for concurrent hung task
> > > detection, the global sysctl_hung_task_detect_count variable is
> > > converted from unsigned long to atomic_long_t. Consequently, the
> > > counting logic is updated to accumulate the number of hung tasks locally
> > > (this_round_count) during the iteration. The global counter is then
> > > updated atomically via atomic_long_cmpxchg_relaxed() once the loop
> > > concludes, rather than incrementally during the scan.
> > > 
> > > These changes are strictly preparatory and introduce no functional
> > > change to the system's runtime behaviour.
> > > 
> > > Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> > > ---
> > >   kernel/hung_task.c | 58 ++++++++++++++++++++++++++--------------------
> > >   1 file changed, 33 insertions(+), 25 deletions(-)
> > > 
> > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > > index d2254c91450b..df10830ed9ef 100644
> > > --- a/kernel/hung_task.c
> > > +++ b/kernel/hung_task.c
> > > @@ -36,7 +36,7 @@ static int __read_mostly
> > > sysctl_hung_task_check_count = PID_MAX_LIMIT;
> > >   /*
> > >    * Total number of tasks detected as hung since boot:
> > >    */
> > > -static unsigned long __read_mostly sysctl_hung_task_detect_count;
> > > +static atomic_long_t sysctl_hung_task_detect_count =
> > > ATOMIC_LONG_INIT(0);
> > >   /*
> > >    * Limit number of tasks checked in a batch.
> > > @@ -223,31 +223,29 @@ static inline void debug_show_blocker(struct
> > > task_struct *task, unsigned long ti
> > >   }
> > >   #endif
> > > -static void check_hung_task(struct task_struct *t, unsigned long
> > > timeout,
> > > -        unsigned long prev_detect_count)
> > > +/**
> > > + * hung_task_info - Print diagnostic details for a hung task
> > > + * @t: Pointer to the detected hung task.
> > > + * @timeout: Timeout threshold for detecting hung tasks
> > > + * @this_round_count: Count of hung tasks detected in the current
> > > iteration
> > > + *
> > > + * Print structured information about the specified hung task, if
> > > warnings
> > > + * are enabled or if the panic batch threshold is exceeded.
> > > + */
> > > +static void hung_task_info(struct task_struct *t, unsigned long timeout,
> > > +               unsigned long this_round_count)
> > >   {
> > > -    unsigned long total_hung_task;
> > > -
> > > -    if (!task_is_hung(t, timeout))
> > > -        return;
> > > -
> > > -    /*
> > > -     * This counter tracks the total number of tasks detected as hung
> > > -     * since boot.
> > > -     */
> > > -    sysctl_hung_task_detect_count++;
> > 
> > Previously, the global detect count updated immediately when a hung task
> > was found. BUT now, it only updates after the full scan finishes ...
> > 
> > Ideally, the count should update as soon as possible, so that userspace
> > can react in time :)
> > 
> > For example, by migrating critical containers away from the node before
> > the situation gets worse - something we already do.
> 
> Sorry, I should have said that earlier - just realized it ...

Better late then sorry ;-)

That said, is the delay really critical? I guess that the userspace
checks the counter in regular intervals (seconds or tens of seconds).
Or is there any way to get a notification immediately?

Anyway, I thought how the counting and barriers might work when
we update the global counter immediately. And I came up with
the following:

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 350093de0535..8bc043fbe89c 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -302,15 +302,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	int max_count = sysctl_hung_task_check_count;
 	unsigned long last_break = jiffies;
 	struct task_struct *g, *t;
-	unsigned long total_count, this_round_count;
+	unsigned long this_round_count;
 	int need_warning = sysctl_hung_task_warnings;
 	unsigned long si_mask = hung_task_si_mask;
 
-	/*
-	 * The counter might get reset. Remember the initial value.
-	 * Acquire prevents reordering task checks before this point.
-	 */
-	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
 	/*
 	 * If the system crashed already then all bets are off,
 	 * do not report extra hung tasks:
@@ -330,6 +325,13 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 		}
 
 		if (task_is_hung(t, timeout)) {
+			/*
+			 * Increment the global counter so that userspace could
+			 * start migrating tasks ASAP. But count the current
+			 * round separately because userspace could reset
+			 * the global counter at any time.
+			 */
+			atomic_long_inc(&sysctl_hung_task_detect_count);
 			this_round_count++;
 			hung_task_info(t, timeout, this_round_count);
 		}
@@ -340,15 +342,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	if (!this_round_count)
 		return;
 
-	/*
-	 * Do not count this round when the global counter has been reset
-	 * during this check. Release ensures we see all hang details
-	 * recorded during the scan.
-	 */
-	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count,
-				    total_count, total_count +
-				    this_round_count);
-
 	if (need_warning || hung_task_call_panic) {
 		si_mask |= SYS_INFO_LOCKS;
 

I am not sure of the comment above the increment is needed.
Well, it might help anyone to understand the motivation without
digging in the git log history.

Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count
  2026-02-03  9:03       ` Petr Mladek
@ 2026-02-03 11:01         ` Lance Yang
  2026-02-04 11:04           ` [PATCH] hung_task: Increment the global counter immediately Petr Mladek
  2026-02-04 14:07         ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
  1 sibling, 1 reply; 19+ messages in thread
From: Lance Yang @ 2026-02-03 11:01 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Aaron Tomlin, neelx, sean, akpm, mproche, chjohnst, nick.lange,
	linux-kernel, mhiramat, joel.granados, gregkh



On 2026/2/3 17:03, Petr Mladek wrote:
> On Tue 2026-02-03 11:08:33, Lance Yang wrote:
>> On 2026/2/3 11:05, Lance Yang wrote:
>>> On 2026/1/25 21:58, Aaron Tomlin wrote:
>>>> The check_hung_task() function currently conflates two distinct
>>>> responsibilities: validating whether a task is hung and handling the
>>>> subsequent reporting (printing warnings, triggering panics, or
>>>> tracepoints).
>>>>
>>>> This patch refactors the logic by introducing hung_task_info(), a
>>>> function dedicated solely to reporting. The actual detection check,
>>>> task_is_hung(), is hoisted into the primary loop within
>>>> check_hung_uninterruptible_tasks(). This separation clearly decouples
>>>> the mechanism of detection from the policy of reporting.
>>>>
>>>> Furthermore, to facilitate future support for concurrent hung task
>>>> detection, the global sysctl_hung_task_detect_count variable is
>>>> converted from unsigned long to atomic_long_t. Consequently, the
>>>> counting logic is updated to accumulate the number of hung tasks locally
>>>> (this_round_count) during the iteration. The global counter is then
>>>> updated atomically via atomic_long_cmpxchg_relaxed() once the loop
>>>> concludes, rather than incrementally during the scan.
>>>>
>>>> These changes are strictly preparatory and introduce no functional
>>>> change to the system's runtime behaviour.
>>>>
>>>> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
>>>> ---
>>>>    kernel/hung_task.c | 58 ++++++++++++++++++++++++++--------------------
>>>>    1 file changed, 33 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>>>> index d2254c91450b..df10830ed9ef 100644
>>>> --- a/kernel/hung_task.c
>>>> +++ b/kernel/hung_task.c
>>>> @@ -36,7 +36,7 @@ static int __read_mostly
>>>> sysctl_hung_task_check_count = PID_MAX_LIMIT;
>>>>    /*
>>>>     * Total number of tasks detected as hung since boot:
>>>>     */
>>>> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
>>>> +static atomic_long_t sysctl_hung_task_detect_count =
>>>> ATOMIC_LONG_INIT(0);
>>>>    /*
>>>>     * Limit number of tasks checked in a batch.
>>>> @@ -223,31 +223,29 @@ static inline void debug_show_blocker(struct
>>>> task_struct *task, unsigned long ti
>>>>    }
>>>>    #endif
>>>> -static void check_hung_task(struct task_struct *t, unsigned long
>>>> timeout,
>>>> -        unsigned long prev_detect_count)
>>>> +/**
>>>> + * hung_task_info - Print diagnostic details for a hung task
>>>> + * @t: Pointer to the detected hung task.
>>>> + * @timeout: Timeout threshold for detecting hung tasks
>>>> + * @this_round_count: Count of hung tasks detected in the current
>>>> iteration
>>>> + *
>>>> + * Print structured information about the specified hung task, if
>>>> warnings
>>>> + * are enabled or if the panic batch threshold is exceeded.
>>>> + */
>>>> +static void hung_task_info(struct task_struct *t, unsigned long timeout,
>>>> +               unsigned long this_round_count)
>>>>    {
>>>> -    unsigned long total_hung_task;
>>>> -
>>>> -    if (!task_is_hung(t, timeout))
>>>> -        return;
>>>> -
>>>> -    /*
>>>> -     * This counter tracks the total number of tasks detected as hung
>>>> -     * since boot.
>>>> -     */
>>>> -    sysctl_hung_task_detect_count++;
>>>
>>> Previously, the global detect count updated immediately when a hung task
>>> was found. BUT now, it only updates after the full scan finishes ...
>>>
>>> Ideally, the count should update as soon as possible, so that userspace
>>> can react in time :)
>>>
>>> For example, by migrating critical containers away from the node before
>>> the situation gets worse - something we already do.
>>
>> Sorry, I should have said that earlier - just realized it ...
> 
> Better late then sorry ;-)

;P

> 
> That said, is the delay really critical? I guess that the userspace
> checks the counter in regular intervals (seconds or tens of seconds).
> Or is there any way to get a notification immediately?

Just rely on polling the counter every 0.x seconds.

I don't think that the full scan would take many seconds, but reporting
(e.g. pr_err) could be slow somehow ...

> 
> Anyway, I thought how the counting and barriers might work when
> we update the global counter immediately. And I came up with
> the following:

Nice! That should be doing the right thing.

> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 350093de0535..8bc043fbe89c 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -302,15 +302,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   	int max_count = sysctl_hung_task_check_count;
>   	unsigned long last_break = jiffies;
>   	struct task_struct *g, *t;
> -	unsigned long total_count, this_round_count;
> +	unsigned long this_round_count;
>   	int need_warning = sysctl_hung_task_warnings;
>   	unsigned long si_mask = hung_task_si_mask;
>   
> -	/*
> -	 * The counter might get reset. Remember the initial value.
> -	 * Acquire prevents reordering task checks before this point.
> -	 */
> -	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
>   	/*
>   	 * If the system crashed already then all bets are off,
>   	 * do not report extra hung tasks:
> @@ -330,6 +325,13 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   		}
>   
>   		if (task_is_hung(t, timeout)) {
> +			/*
> +			 * Increment the global counter so that userspace could
> +			 * start migrating tasks ASAP. But count the current
> +			 * round separately because userspace could reset
> +			 * the global counter at any time.
> +			 */
> +			atomic_long_inc(&sysctl_hung_task_detect_count);


Atomic increment with relaxed ordering, which is good enough and works 
well, IIUC.


>   			this_round_count++;
>   			hung_task_info(t, timeout, this_round_count);
>   		}
> @@ -340,15 +342,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   	if (!this_round_count)
>   		return;
>   
> -	/*
> -	 * Do not count this round when the global counter has been reset
> -	 * during this check. Release ensures we see all hang details
> -	 * recorded during the scan.
> -	 */
> -	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count,
> -				    total_count, total_count +
> -				    this_round_count);
> -
>   	if (need_warning || hung_task_call_panic) {
>   		si_mask |= SYS_INFO_LOCKS;
>   
> 
> I am not sure of the comment above the increment is needed.
> Well, it might help anyone to understand the motivation without
> digging in the git log history.

Looks good to me. Could you send it as a follow-up patch?

Cheers,
Lance

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] hung_task: Increment the global counter immediately
  2026-02-03 11:01         ` Lance Yang
@ 2026-02-04 11:04           ` Petr Mladek
  2026-02-04 11:21             ` Lance Yang
                               ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Petr Mladek @ 2026-02-04 11:04 UTC (permalink / raw)
  To: Lance Yang
  Cc: Aaron Tomlin, neelx, sean, akpm, mproche, chjohnst, nick.lange,
	linux-kernel, mhiramat, joel.granados, gregkh

A recent change allowed to reset the global counter of hung tasks using
the sysctl interface. A potential race with the regular check has been
solved by updating the global counter only once at the end of the check.

However, the hung task check can take a significant amount of time,
particularly when task information is being dumped to slow serial
consoles. Some users monitor this global counter to trigger immediate
migration of critical containers. Delaying the increment until the
full check completes postpones these high-priority rescue operations.

Update the global counter as soon as a hung task is detected. Since
the value is read asynchronously, a relaxed atomic operation is
sufficient.

Reported-by: Lance Yang <lance.yang@linux.dev>
Closes: https://lore.kernel.org/r/f239e00f-4282-408d-b172-0f9885f4b01b@linux.dev
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
This is a followup patch for
https://lore.kernel.org/r/20260125135848.3356585-1-atomlin@atomlin.com

Note that I could not use commit IDs because the original
patchset is not in a stable tree yet. In fact, it seems
that it is not even in linux-next at the moment.

Best Regards,
Petr

 kernel/hung_task.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 350093de0535..8bc043fbe89c 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -302,15 +302,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	int max_count = sysctl_hung_task_check_count;
 	unsigned long last_break = jiffies;
 	struct task_struct *g, *t;
-	unsigned long total_count, this_round_count;
+	unsigned long this_round_count;
 	int need_warning = sysctl_hung_task_warnings;
 	unsigned long si_mask = hung_task_si_mask;
 
-	/*
-	 * The counter might get reset. Remember the initial value.
-	 * Acquire prevents reordering task checks before this point.
-	 */
-	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
 	/*
 	 * If the system crashed already then all bets are off,
 	 * do not report extra hung tasks:
@@ -330,6 +325,13 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 		}
 
 		if (task_is_hung(t, timeout)) {
+			/*
+			 * Increment the global counter so that userspace could
+			 * start migrating tasks ASAP. But count the current
+			 * round separately because userspace could reset
+			 * the global counter at any time.
+			 */
+			atomic_long_inc(&sysctl_hung_task_detect_count);
 			this_round_count++;
 			hung_task_info(t, timeout, this_round_count);
 		}
@@ -340,15 +342,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	if (!this_round_count)
 		return;
 
-	/*
-	 * Do not count this round when the global counter has been reset
-	 * during this check. Release ensures we see all hang details
-	 * recorded during the scan.
-	 */
-	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count,
-				    total_count, total_count +
-				    this_round_count);
-
 	if (need_warning || hung_task_call_panic) {
 		si_mask |= SYS_INFO_LOCKS;
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] hung_task: Increment the global counter immediately
  2026-02-04 11:04           ` [PATCH] hung_task: Increment the global counter immediately Petr Mladek
@ 2026-02-04 11:21             ` Lance Yang
  2026-02-04 14:00             ` Aaron Tomlin
  2026-02-04 18:05             ` Andrew Morton
  2 siblings, 0 replies; 19+ messages in thread
From: Lance Yang @ 2026-02-04 11:21 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Aaron Tomlin, neelx, sean, akpm, mproche, chjohnst, nick.lange,
	linux-kernel, mhiramat, joel.granados, gregkh



On 2026/2/4 19:04, Petr Mladek wrote:
> A recent change allowed to reset the global counter of hung tasks using
> the sysctl interface. A potential race with the regular check has been
> solved by updating the global counter only once at the end of the check.
> 
> However, the hung task check can take a significant amount of time,
> particularly when task information is being dumped to slow serial
> consoles. Some users monitor this global counter to trigger immediate
> migration of critical containers. Delaying the increment until the
> full check completes postpones these high-priority rescue operations.
> 
> Update the global counter as soon as a hung task is detected. Since
> the value is read asynchronously, a relaxed atomic operation is
> sufficient.
> 
> Reported-by: Lance Yang <lance.yang@linux.dev>
> Closes: https://lore.kernel.org/r/f239e00f-4282-408d-b172-0f9885f4b01b@linux.dev
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---

Cool! Looks good to me:

Reviewed-by: Lance Yang <lance.yang@linux.dev>

> This is a followup patch for
> https://lore.kernel.org/r/20260125135848.3356585-1-atomlin@atomlin.com
> 
> Note that I could not use commit IDs because the original
> patchset is not in a stable tree yet. In fact, it seems
> that it is not even in linux-next at the moment.
> 
> Best Regards,
> Petr
> 
>   kernel/hung_task.c | 23 ++++++++---------------
>   1 file changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 350093de0535..8bc043fbe89c 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -302,15 +302,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   	int max_count = sysctl_hung_task_check_count;
>   	unsigned long last_break = jiffies;
>   	struct task_struct *g, *t;
> -	unsigned long total_count, this_round_count;
> +	unsigned long this_round_count;
>   	int need_warning = sysctl_hung_task_warnings;
>   	unsigned long si_mask = hung_task_si_mask;
>   
> -	/*
> -	 * The counter might get reset. Remember the initial value.
> -	 * Acquire prevents reordering task checks before this point.
> -	 */
> -	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
>   	/*
>   	 * If the system crashed already then all bets are off,
>   	 * do not report extra hung tasks:
> @@ -330,6 +325,13 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   		}
>   
>   		if (task_is_hung(t, timeout)) {
> +			/*
> +			 * Increment the global counter so that userspace could
> +			 * start migrating tasks ASAP. But count the current
> +			 * round separately because userspace could reset
> +			 * the global counter at any time.
> +			 */
> +			atomic_long_inc(&sysctl_hung_task_detect_count);
>   			this_round_count++;
>   			hung_task_info(t, timeout, this_round_count);
>   		}
> @@ -340,15 +342,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   	if (!this_round_count)
>   		return;
>   
> -	/*
> -	 * Do not count this round when the global counter has been reset
> -	 * during this check. Release ensures we see all hang details
> -	 * recorded during the scan.
> -	 */
> -	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count,
> -				    total_count, total_count +
> -				    this_round_count);
> -
>   	if (need_warning || hung_task_call_panic) {
>   		si_mask |= SYS_INFO_LOCKS;
>   


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] hung_task: Increment the global counter immediately
  2026-02-04 11:04           ` [PATCH] hung_task: Increment the global counter immediately Petr Mladek
  2026-02-04 11:21             ` Lance Yang
@ 2026-02-04 14:00             ` Aaron Tomlin
  2026-02-04 18:05             ` Andrew Morton
  2 siblings, 0 replies; 19+ messages in thread
From: Aaron Tomlin @ 2026-02-04 14:00 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Lance Yang, neelx, sean, akpm, mproche, chjohnst, nick.lange,
	linux-kernel, mhiramat, joel.granados, gregkh

On Wed, Feb 04, 2026 at 12:04:54PM +0100, Petr Mladek wrote:
> A recent change allowed to reset the global counter of hung tasks using
> the sysctl interface. A potential race with the regular check has been
> solved by updating the global counter only once at the end of the check.
> 
> However, the hung task check can take a significant amount of time,
> particularly when task information is being dumped to slow serial
> consoles. Some users monitor this global counter to trigger immediate
> migration of critical containers. Delaying the increment until the
> full check completes postpones these high-priority rescue operations.
> 
> Update the global counter as soon as a hung task is detected. Since
> the value is read asynchronously, a relaxed atomic operation is
> sufficient.
> 
> Reported-by: Lance Yang <lance.yang@linux.dev>
> Closes: https://lore.kernel.org/r/f239e00f-4282-408d-b172-0f9885f4b01b@linux.dev
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
> This is a followup patch for
> https://lore.kernel.org/r/20260125135848.3356585-1-atomlin@atomlin.com
> 
> Note that I could not use commit IDs because the original
> patchset is not in a stable tree yet. In fact, it seems
> that it is not even in linux-next at the moment.
> 
> Best Regards,
> Petr
> 
>  kernel/hung_task.c | 23 ++++++++---------------
>  1 file changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 350093de0535..8bc043fbe89c 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -302,15 +302,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  	int max_count = sysctl_hung_task_check_count;
>  	unsigned long last_break = jiffies;
>  	struct task_struct *g, *t;
> -	unsigned long total_count, this_round_count;
> +	unsigned long this_round_count;
>  	int need_warning = sysctl_hung_task_warnings;
>  	unsigned long si_mask = hung_task_si_mask;
>  
> -	/*
> -	 * The counter might get reset. Remember the initial value.
> -	 * Acquire prevents reordering task checks before this point.
> -	 */
> -	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
>  	/*
>  	 * If the system crashed already then all bets are off,
>  	 * do not report extra hung tasks:
> @@ -330,6 +325,13 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  		}
>  
>  		if (task_is_hung(t, timeout)) {
> +			/*
> +			 * Increment the global counter so that userspace could
> +			 * start migrating tasks ASAP. But count the current
> +			 * round separately because userspace could reset
> +			 * the global counter at any time.
> +			 */
> +			atomic_long_inc(&sysctl_hung_task_detect_count);
>  			this_round_count++;
>  			hung_task_info(t, timeout, this_round_count);
>  		}
> @@ -340,15 +342,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  	if (!this_round_count)
>  		return;
>  
> -	/*
> -	 * Do not count this round when the global counter has been reset
> -	 * during this check. Release ensures we see all hang details
> -	 * recorded during the scan.
> -	 */
> -	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count,
> -				    total_count, total_count +
> -				    this_round_count);
> -
>  	if (need_warning || hung_task_call_panic) {
>  		si_mask |= SYS_INFO_LOCKS;
>  
> -- 
> 2.52.0
> 

Agreed.
This is correct given the architectural shift from "Batched" to "Immediate"
updates.

Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>

-- 
Aaron Tomlin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count
  2026-02-03  9:03       ` Petr Mladek
  2026-02-03 11:01         ` Lance Yang
@ 2026-02-04 14:07         ` Aaron Tomlin
  1 sibling, 0 replies; 19+ messages in thread
From: Aaron Tomlin @ 2026-02-04 14:07 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Lance Yang, neelx, sean, akpm, mproche, chjohnst, nick.lange,
	linux-kernel, mhiramat, joel.granados, gregkh

[-- Attachment #1: Type: text/plain, Size: 7351 bytes --]

On Tue, Feb 03, 2026 at 10:03:52AM +0100, Petr Mladek wrote:
> On Tue 2026-02-03 11:08:33, Lance Yang wrote:
> > On 2026/2/3 11:05, Lance Yang wrote:
> > > On 2026/1/25 21:58, Aaron Tomlin wrote:
> > > > The check_hung_task() function currently conflates two distinct
> > > > responsibilities: validating whether a task is hung and handling the
> > > > subsequent reporting (printing warnings, triggering panics, or
> > > > tracepoints).
> > > > 
> > > > This patch refactors the logic by introducing hung_task_info(), a
> > > > function dedicated solely to reporting. The actual detection check,
> > > > task_is_hung(), is hoisted into the primary loop within
> > > > check_hung_uninterruptible_tasks(). This separation clearly decouples
> > > > the mechanism of detection from the policy of reporting.
> > > > 
> > > > Furthermore, to facilitate future support for concurrent hung task
> > > > detection, the global sysctl_hung_task_detect_count variable is
> > > > converted from unsigned long to atomic_long_t. Consequently, the
> > > > counting logic is updated to accumulate the number of hung tasks locally
> > > > (this_round_count) during the iteration. The global counter is then
> > > > updated atomically via atomic_long_cmpxchg_relaxed() once the loop
> > > > concludes, rather than incrementally during the scan.
> > > > 
> > > > These changes are strictly preparatory and introduce no functional
> > > > change to the system's runtime behaviour.
> > > > 
> > > > Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> > > > ---
> > > >   kernel/hung_task.c | 58 ++++++++++++++++++++++++++--------------------
> > > >   1 file changed, 33 insertions(+), 25 deletions(-)
> > > > 
> > > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > > > index d2254c91450b..df10830ed9ef 100644
> > > > --- a/kernel/hung_task.c
> > > > +++ b/kernel/hung_task.c
> > > > @@ -36,7 +36,7 @@ static int __read_mostly
> > > > sysctl_hung_task_check_count = PID_MAX_LIMIT;
> > > >   /*
> > > >    * Total number of tasks detected as hung since boot:
> > > >    */
> > > > -static unsigned long __read_mostly sysctl_hung_task_detect_count;
> > > > +static atomic_long_t sysctl_hung_task_detect_count =
> > > > ATOMIC_LONG_INIT(0);
> > > >   /*
> > > >    * Limit number of tasks checked in a batch.
> > > > @@ -223,31 +223,29 @@ static inline void debug_show_blocker(struct
> > > > task_struct *task, unsigned long ti
> > > >   }
> > > >   #endif
> > > > -static void check_hung_task(struct task_struct *t, unsigned long
> > > > timeout,
> > > > -        unsigned long prev_detect_count)
> > > > +/**
> > > > + * hung_task_info - Print diagnostic details for a hung task
> > > > + * @t: Pointer to the detected hung task.
> > > > + * @timeout: Timeout threshold for detecting hung tasks
> > > > + * @this_round_count: Count of hung tasks detected in the current
> > > > iteration
> > > > + *
> > > > + * Print structured information about the specified hung task, if
> > > > warnings
> > > > + * are enabled or if the panic batch threshold is exceeded.
> > > > + */
> > > > +static void hung_task_info(struct task_struct *t, unsigned long timeout,
> > > > +               unsigned long this_round_count)
> > > >   {
> > > > -    unsigned long total_hung_task;
> > > > -
> > > > -    if (!task_is_hung(t, timeout))
> > > > -        return;
> > > > -
> > > > -    /*
> > > > -     * This counter tracks the total number of tasks detected as hung
> > > > -     * since boot.
> > > > -     */
> > > > -    sysctl_hung_task_detect_count++;
> > > 
> > > Previously, the global detect count updated immediately when a hung task
> > > was found. BUT now, it only updates after the full scan finishes ...
> > > 
> > > Ideally, the count should update as soon as possible, so that userspace
> > > can react in time :)
> > > 
> > > For example, by migrating critical containers away from the node before
> > > the situation gets worse - something we already do.
> > 
> > Sorry, I should have said that earlier - just realized it ...
> 
> Better late then sorry ;-)
> 
> That said, is the delay really critical? I guess that the userspace
> checks the counter in regular intervals (seconds or tens of seconds).
> Or is there any way to get a notification immediately?
> 
> Anyway, I thought how the counting and barriers might work when
> we update the global counter immediately. And I came up with
> the following:
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 350093de0535..8bc043fbe89c 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -302,15 +302,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  	int max_count = sysctl_hung_task_check_count;
>  	unsigned long last_break = jiffies;
>  	struct task_struct *g, *t;
> -	unsigned long total_count, this_round_count;
> +	unsigned long this_round_count;
>  	int need_warning = sysctl_hung_task_warnings;
>  	unsigned long si_mask = hung_task_si_mask;
>  
> -	/*
> -	 * The counter might get reset. Remember the initial value.
> -	 * Acquire prevents reordering task checks before this point.
> -	 */
> -	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
>  	/*
>  	 * If the system crashed already then all bets are off,
>  	 * do not report extra hung tasks:
> @@ -330,6 +325,13 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  		}
>  
>  		if (task_is_hung(t, timeout)) {
> +			/*
> +			 * Increment the global counter so that userspace could
> +			 * start migrating tasks ASAP. But count the current
> +			 * round separately because userspace could reset
> +			 * the global counter at any time.
> +			 */
> +			atomic_long_inc(&sysctl_hung_task_detect_count);
>  			this_round_count++;
>  			hung_task_info(t, timeout, this_round_count);
>  		}
> @@ -340,15 +342,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  	if (!this_round_count)
>  		return;
>  
> -	/*
> -	 * Do not count this round when the global counter has been reset
> -	 * during this check. Release ensures we see all hang details
> -	 * recorded during the scan.
> -	 */
> -	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count,
> -				    total_count, total_count +
> -				    this_round_count);
> -
>  	if (need_warning || hung_task_call_panic) {
>  		si_mask |= SYS_INFO_LOCKS;
>  
> 
> I am not sure of the comment above the increment is needed.
> Well, it might help anyone to understand the motivation without
> digging in the git log history.

Hi Petr,

Agreed.

By moving to a "relaxed" atomic_long_inc(), one now will rely on the
atomicity of the individual operation. If a user resets the counter
(writes 0) concurrently:

        CPU 0                               CPU 1
khungtaskd increments (1)
                                        User resets (0)
khungtaskd increments (1)


In the above, the counter reflects 1. This is acceptable behavior for a
"live" counter. The strict protection against "lost updates" required by
the batch calculation (i.e., old + new) is not required for a simple atomic
increment.


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] hung_task: Increment the global counter immediately
  2026-02-04 11:04           ` [PATCH] hung_task: Increment the global counter immediately Petr Mladek
  2026-02-04 11:21             ` Lance Yang
  2026-02-04 14:00             ` Aaron Tomlin
@ 2026-02-04 18:05             ` Andrew Morton
  2026-02-06 20:54               ` Aaron Tomlin
  2 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2026-02-04 18:05 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Lance Yang, Aaron Tomlin, neelx, sean, mproche, chjohnst,
	nick.lange, linux-kernel, mhiramat, joel.granados, gregkh

On Wed, 4 Feb 2026 12:04:54 +0100 Petr Mladek <pmladek@suse.com> wrote:

> A recent change allowed to reset the global counter of hung tasks using
> the sysctl interface. A potential race with the regular check has been
> solved by updating the global counter only once at the end of the check.
> 
> However, the hung task check can take a significant amount of time,
> particularly when task information is being dumped to slow serial
> consoles. Some users monitor this global counter to trigger immediate
> migration of critical containers. Delaying the increment until the
> full check completes postpones these high-priority rescue operations.
> 
> Update the global counter as soon as a hung task is detected. Since
> the value is read asynchronously, a relaxed atomic operation is
> sufficient.
> 
> Reported-by: Lance Yang <lance.yang@linux.dev>
> Closes: https://lore.kernel.org/r/f239e00f-4282-408d-b172-0f9885f4b01b@linux.dev
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
> This is a followup patch for
> https://lore.kernel.org/r/20260125135848.3356585-1-atomlin@atomlin.com
> 
> Note that I could not use commit IDs because the original
> patchset is not in a stable tree yet. In fact, it seems
> that it is not even in linux-next at the moment.

Yes, I've gone into "fixes and trivial stuff only" mode, as we're at -rc8.

Aaron, please incorporate Petr's fix into v8 and resend towards the end
of the merge window?

Thanks.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] hung_task: Increment the global counter immediately
  2026-02-04 18:05             ` Andrew Morton
@ 2026-02-06 20:54               ` Aaron Tomlin
  2026-02-07  6:10                 ` Lance Yang
  0 siblings, 1 reply; 19+ messages in thread
From: Aaron Tomlin @ 2026-02-06 20:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Lance Yang, neelx, sean, mproche, chjohnst,
	nick.lange, linux-kernel, mhiramat, joel.granados, gregkh

[-- Attachment #1: Type: text/plain, Size: 1674 bytes --]

On Wed, Feb 04, 2026 at 10:05:49AM -0800, Andrew Morton wrote:
> On Wed, 4 Feb 2026 12:04:54 +0100 Petr Mladek <pmladek@suse.com> wrote:
> 
> > A recent change allowed to reset the global counter of hung tasks using
> > the sysctl interface. A potential race with the regular check has been
> > solved by updating the global counter only once at the end of the check.
> > 
> > However, the hung task check can take a significant amount of time,
> > particularly when task information is being dumped to slow serial
> > consoles. Some users monitor this global counter to trigger immediate
> > migration of critical containers. Delaying the increment until the
> > full check completes postpones these high-priority rescue operations.
> > 
> > Update the global counter as soon as a hung task is detected. Since
> > the value is read asynchronously, a relaxed atomic operation is
> > sufficient.
> > 
> > Reported-by: Lance Yang <lance.yang@linux.dev>
> > Closes: https://lore.kernel.org/r/f239e00f-4282-408d-b172-0f9885f4b01b@linux.dev
> > Signed-off-by: Petr Mladek <pmladek@suse.com>
> > ---
> > This is a followup patch for
> > https://lore.kernel.org/r/20260125135848.3356585-1-atomlin@atomlin.com
> > 
> > Note that I could not use commit IDs because the original
> > patchset is not in a stable tree yet. In fact, it seems
> > that it is not even in linux-next at the moment.
> 
> Yes, I've gone into "fixes and trivial stuff only" mode, as we're at -rc8.
> 
> Aaron, please incorporate Petr's fix into v8 and resend towards the end
> of the merge window?
> 
> Thanks.

Hi Andrew,

Absolutely.

Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] hung_task: Increment the global counter immediately
  2026-02-06 20:54               ` Aaron Tomlin
@ 2026-02-07  6:10                 ` Lance Yang
  0 siblings, 0 replies; 19+ messages in thread
From: Lance Yang @ 2026-02-07  6:10 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: Petr Mladek, neelx, sean, mproche, chjohnst, nick.lange,
	linux-kernel, mhiramat, joel.granados, gregkh, Andrew Morton



On 2026/2/7 04:54, Aaron Tomlin wrote:
> On Wed, Feb 04, 2026 at 10:05:49AM -0800, Andrew Morton wrote:
>> On Wed, 4 Feb 2026 12:04:54 +0100 Petr Mladek <pmladek@suse.com> wrote:
>>
>>> A recent change allowed to reset the global counter of hung tasks using
>>> the sysctl interface. A potential race with the regular check has been
>>> solved by updating the global counter only once at the end of the check.
>>>
>>> However, the hung task check can take a significant amount of time,
>>> particularly when task information is being dumped to slow serial
>>> consoles. Some users monitor this global counter to trigger immediate
>>> migration of critical containers. Delaying the increment until the
>>> full check completes postpones these high-priority rescue operations.
>>>
>>> Update the global counter as soon as a hung task is detected. Since
>>> the value is read asynchronously, a relaxed atomic operation is
>>> sufficient.
>>>
>>> Reported-by: Lance Yang <lance.yang@linux.dev>
>>> Closes: https://lore.kernel.org/r/f239e00f-4282-408d-b172-0f9885f4b01b@linux.dev
>>> Signed-off-by: Petr Mladek <pmladek@suse.com>
>>> ---
>>> This is a followup patch for
>>> https://lore.kernel.org/r/20260125135848.3356585-1-atomlin@atomlin.com
>>>
>>> Note that I could not use commit IDs because the original
>>> patchset is not in a stable tree yet. In fact, it seems
>>> that it is not even in linux-next at the moment.
>>
>> Yes, I've gone into "fixes and trivial stuff only" mode, as we're at -rc8.
>>
>> Aaron, please incorporate Petr's fix into v8 and resend towards the end
>> of the merge window?
>>
>> Thanks.
> 
> Hi Andrew,
> 
> Absolutely.

Don't forget to credit Petr - just saying :)


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-02-07  6:10 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-25 13:58 [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
2026-01-25 13:58 ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
2026-02-02  6:10   ` Masami Hiramatsu
2026-02-02 12:59   ` Petr Mladek
2026-02-03  3:05   ` Lance Yang
2026-02-03  3:08     ` Lance Yang
2026-02-03  9:03       ` Petr Mladek
2026-02-03 11:01         ` Lance Yang
2026-02-04 11:04           ` [PATCH] hung_task: Increment the global counter immediately Petr Mladek
2026-02-04 11:21             ` Lance Yang
2026-02-04 14:00             ` Aaron Tomlin
2026-02-04 18:05             ` Andrew Morton
2026-02-06 20:54               ` Aaron Tomlin
2026-02-07  6:10                 ` Lance Yang
2026-02-04 14:07         ` [v7 PATCH 1/2] hung_task: Refactor detection logic and atomicise detection count Aaron Tomlin
2026-01-25 13:58 ` [v7 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
2026-02-02  6:09   ` Masami Hiramatsu
2026-02-02 13:26   ` Petr Mladek
2026-02-01 19:48 ` [v7 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox