[v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector
@ 2026-01-15  2:32 Aaron Tomlin
  2026-01-15  2:32 ` [v6 PATCH 1/2] hung_task: Convert detection count to atomic_long_t Aaron Tomlin
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Aaron Tomlin @ 2026-01-15  2:32 UTC (permalink / raw)
  To: akpm, lance.yang, mhiramat, gregkh, pmladek, joel.granados
  Cc: sean, linux-kernel

Hi Lance, Greg, Petr, Joel, Andrew,

This series introduces the ability to reset
/proc/sys/kernel/hung_task_detect_count.

Writing a "0" value to this file atomically resets the counter of detected
hung tasks. This functionality provides system administrators with the
means to clear the cumulative diagnostic history following incident
resolution, thereby simplifying subsequent monitoring without necessitating
a system restart.

The updated logic ensures that the long-running scan (which is inherently
preemptible and subject to rcu_lock_break()) does not become desynchronised
from the global state. By treating the initial read as a "version snapshot"
the kernel can guarantee that the cumulative count only updates if the
underlying state remained stable throughout the duration of the
scan.

Please let me know your thoughts.


Changes since v5 [1]:
 - Introduced a preparatory patch (Joel Granados)
 - Extended custom proc_handler to handle SYSCTL_USER_TO_KERN writes,
   strictly validating that only a value of "0" is permitted for resets
   (Joel Granados)
 - Transitioned from atomic_long_inc_return_relaxed() to a more robust
   read_acquire/cmpxchg_release pattern to ensure "All-or-Nothing" scan
   updates (Petr Mladek)
 - Re-introduce hung_task_diagnostics(). For better readability and
   consistent metadata publication

Changes since v4 [2]:
 - Added missing underflow check (Lance Yang) 

Changes since v3 [3]:
 - Use atomic operations to ensure cross-CPU visibility and prevent an integer underflow
 - Use acquire/release semantics for memory ordering (Petr Mladek)
 - Move quoted string to a single line (Petr Mladek)
 - Remove variables coredump_msg and disable_msg to simplify code (Petr Mladek)
 - Add trailing "\n" to all strings to ensure immediate console flushing (Petr Mladek)
 - Improve the hung task counter documentation (Joel Granados)
 - Reject non-zero writes with -EINVAL (Joel Granados)
 - Translate to the new sysctl API (Petr Mladek)

Changes since v2 [4]:
 - Avoided a needless double update to hung_task_detect_count (Lance Yang)
 - Restored previous use of pr_err() for each message (Greg KH)
 - Provided a complete descriptive comment for the helper

Changes since v1 [5]:
 - Removed write-only sysfs attribute (Lance Yang)
 - Modified procfs hung_task_detect_count instead (Lance Yang)
 - Introduced a custom proc_handler
 - Updated documentation (Lance Yang)
 - Added 'static inline' as a hint to eliminate any function call overhead
 - Removed clutter through encapsulation

[1]: https://lore.kernel.org/lkml/20251231004125.2380105-1-atomlin@atomlin.com/
[2]: https://lore.kernel.org/lkml/20251222014210.2032214-1-atomlin@atomlin.com/
[3]: https://lore.kernel.org/all/20251216030036.1822217-1-atomlin@atomlin.com/
[4]: https://lore.kernel.org/lkml/20251211033004.1628875-1-atomlin@atomlin.com/
[5]: https://lore.kernel.org/lkml/20251209041218.1583600-1-atomlin@atomlin.com/

Aaron Tomlin (2):
  hung_task: Convert detection count to atomic_long_t
  hung_task: Enable runtime reset of hung_task_detect_count

 Documentation/admin-guide/sysctl/kernel.rst |   3 +-
 kernel/hung_task.c                          | 130 +++++++++++++++-----
 2 files changed, 99 insertions(+), 34 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [v6 PATCH 1/2] hung_task: Convert detection count to atomic_long_t
  2026-01-15  2:32 [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
@ 2026-01-15  2:32 ` Aaron Tomlin
  2026-01-15  2:32 ` [v6 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
  2026-01-15  3:24 ` [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Lance Yang
  2 siblings, 0 replies; 12+ messages in thread
From: Aaron Tomlin @ 2026-01-15  2:32 UTC (permalink / raw)
  To: akpm, lance.yang, mhiramat, gregkh, pmladek, joel.granados
  Cc: sean, linux-kernel

To facilitate the future introduction of a runtime reset mechanism for
the hung task detector, it is necessary to manage
sysctl_hung_task_detect_count via atomic operations. This ensures that
concurrent modifications - specifically between the khungtaskd kernel
thread and potential future user-space writers - are handled safely
without the requirement for heavyweight locking.

Consequently, this patch converts the variable from unsigned long to
atomic_long_t. Accordingly, the increment logic within check_hung_task()
is updated to utilise atomic_long_inc_return_relaxed().

Furthermore, a custom proc_handler, proc_dohung_task_detect_count(), is
introduced to bridge the interface between the atomic variable and the
standard sysctl infrastructure. Note that as the sysctl entry retains
its read-only permission (0444) within the scope of this commit, the
handler implementation is currently restricted to read operations via a
proxy variable. The logic requisite for handling user-space writes is
reserved for a subsequent patch which will formally enable the reset
capability.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
 kernel/hung_task.c | 45 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index d2254c91450b..b5ad7a755eb5 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -17,6 +17,7 @@
 #include <linux/export.h>
 #include <linux/panic_notifier.h>
 #include <linux/sysctl.h>
+#include <linux/atomic.h>
 #include <linux/suspend.h>
 #include <linux/utsname.h>
 #include <linux/sched/signal.h>
@@ -36,7 +37,7 @@ static int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
 /*
  * Total number of tasks detected as hung since boot:
  */
-static unsigned long __read_mostly sysctl_hung_task_detect_count;
+static atomic_long_t sysctl_hung_task_detect_count = ATOMIC_LONG_INIT(0);
 
 /*
  * Limit number of tasks checked in a batch.
@@ -224,9 +225,9 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
 #endif
 
 static void check_hung_task(struct task_struct *t, unsigned long timeout,
-		unsigned long prev_detect_count)
+			    unsigned long prev_detect_count)
 {
-	unsigned long total_hung_task;
+	unsigned long total_hung_task, cur_detect_count;
 
 	if (!task_is_hung(t, timeout))
 		return;
@@ -235,9 +236,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
 	 * This counter tracks the total number of tasks detected as hung
 	 * since boot.
 	 */
-	sysctl_hung_task_detect_count++;
+	cur_detect_count = atomic_long_inc_return_relaxed(&sysctl_hung_task_detect_count);
+	total_hung_task = cur_detect_count - prev_detect_count;
 
-	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
 	trace_sched_process_hang(t);
 
 	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
@@ -305,10 +306,11 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	int max_count = sysctl_hung_task_check_count;
 	unsigned long last_break = jiffies;
 	struct task_struct *g, *t;
-	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
+	unsigned long prev_detect_count;
 	int need_warning = sysctl_hung_task_warnings;
 	unsigned long si_mask = hung_task_si_mask;
 
+	prev_detect_count = atomic_long_read(&sysctl_hung_task_detect_count);
 	/*
 	 * If the system crashed already then all bets are off,
 	 * do not report extra hung tasks:
@@ -333,7 +335,8 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
  unlock:
 	rcu_read_unlock();
 
-	if (!(sysctl_hung_task_detect_count - prev_detect_count))
+	if (!(atomic_long_read(&sysctl_hung_task_detect_count) -
+				prev_detect_count))
 		return;
 
 	if (need_warning || hung_task_call_panic) {
@@ -358,6 +361,31 @@ static long hung_timeout_jiffies(unsigned long last_checked,
 }
 
 #ifdef CONFIG_SYSCTL
+
+/**
+ * proc_dohung_task_detect_count - proc handler for hung_task_detect_count
+ * @table: Pointer to the struct ctl_table definition for this proc entry
+ * @dir: Flag indicating the operation
+ * @buffer: User space buffer for data transfer
+ * @lenp: Pointer to the length of the data being transferred
+ * @ppos: Pointer to the current file offset
+ *
+ * This handler is used for reading the current hung task detection count.
+ * Returns 0 on success or a negative error code on failure.
+ */
+static int proc_dohung_task_detect_count(const struct ctl_table *table, int dir,
+					 void *buffer, size_t *lenp, loff_t *ppos)
+{
+	unsigned long detect_count;
+	struct ctl_table proxy_table;
+
+	detect_count = atomic_long_read(&sysctl_hung_task_detect_count);
+	proxy_table = *table;
+	proxy_table.data = &detect_count;
+
+	return proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
+}
+
 /*
  * Process updating of timeout sysctl
  */
@@ -438,10 +466,9 @@ static const struct ctl_table hung_task_sysctls[] = {
 	},
 	{
 		.procname	= "hung_task_detect_count",
-		.data		= &sysctl_hung_task_detect_count,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0444,
-		.proc_handler	= proc_doulongvec_minmax,
+		.proc_handler	= proc_dohung_task_detect_count,
 	},
 	{
 		.procname	= "hung_task_sys_info",
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [v6 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-15  2:32 [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
  2026-01-15  2:32 ` [v6 PATCH 1/2] hung_task: Convert detection count to atomic_long_t Aaron Tomlin
@ 2026-01-15  2:32 ` Aaron Tomlin
  2026-01-15  3:06   ` Lance Yang
  2026-01-15  3:24 ` [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Lance Yang
  2 siblings, 1 reply; 12+ messages in thread
From: Aaron Tomlin @ 2026-01-15  2:32 UTC (permalink / raw)
  To: akpm, lance.yang, mhiramat, gregkh, pmladek, joel.granados
  Cc: sean, linux-kernel

Currently, the hung_task_detect_count sysctl provides a cumulative count
of hung tasks since boot. In long-running, high-availability
environments, this counter may lose its utility if it cannot be reset
once an incident has been resolved. Furthermore, the previous
implementation relied upon implicit ordering, which could not strictly
guarantee that diagnostic metadata published by one CPU was visible to
the panic logic on another.

This patch introduces the capability to reset the detection count by
writing "0" to the hung_task_detect_count sysctl. The proc_handler logic
has been updated to validate this input and atomically reset the
counter.

The synchronisation of sysctl_hung_task_detect_count relies upon a
transactional model to ensure the integrity of the detection counter
against concurrent resets from userspace. The application of
atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
and provides the following guarantees:

    1. Prevention of Load-Store Reordering via Acquire Semantics By
       utilising atomic_long_read_acquire() to snapshot the counter
       before initiating the task traversal, we establish a strict
       memory barrier. This prevents the compiler or hardware from
       reordering the initial load to a point later in the scan. Without
       this "acquire" barrier, a delayed load could potentially read a
       "0" value resulting from a userspace reset that occurred
       mid-scan. This would lead to the subsequent cmpxchg succeeding
       erroneously, thereby overwriting the user's reset with stale
       increment data.

    2. Atomicity of the "Commit" Phase via Release Semantics The
       atomic_long_cmpxchg_release() serves as the transaction's commit
       point. The "release" barrier ensures that all diagnostic
       recordings and task-state observations made during the scan are
       globally visible before the counter is incremented.

    3. Race Condition Resolution This pairing effectively detects any
       "out-of-band" reset of the counter. If
       sysctl_hung_task_detect_count is modified via the procfs
       interface during the scan, the final cmpxchg will detect the
       discrepancy between the current value and the "acquire" snapshot.
       Consequently, the update will fail, ensuring that a reset command
       from the administrator is prioritised over a scan that may have
       been invalidated by that very reset.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
 Documentation/admin-guide/sysctl/kernel.rst |   3 +-
 kernel/hung_task.c                          | 109 +++++++++++++-------
 2 files changed, 75 insertions(+), 37 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 239da22c4e28..68da4235225a 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -418,7 +418,8 @@ hung_task_detect_count
 ======================
 
 Indicates the total number of tasks that have been detected as hung since
-the system boot.
+the system boot or since the counter was reset. The counter is zeroed when
+a value of 0 is written.
 
 This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
 
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index b5ad7a755eb5..2eb9c861bdcc 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -224,24 +224,43 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
 }
 #endif
 
-static void check_hung_task(struct task_struct *t, unsigned long timeout,
-			    unsigned long prev_detect_count)
+/**
+ * hung_task_diagnostics - Print structured diagnostic info for a hung task.
+ * @t: Pointer to the detected hung task.
+ *
+ * This function consolidates the printing of core diagnostic information
+ * for a task found to be blocked.
+ */
+static inline void hung_task_diagnostics(struct task_struct *t)
 {
-	unsigned long total_hung_task, cur_detect_count;
-
-	if (!task_is_hung(t, timeout))
-		return;
-
-	/*
-	 * This counter tracks the total number of tasks detected as hung
-	 * since boot.
-	 */
-	cur_detect_count = atomic_long_inc_return_relaxed(&sysctl_hung_task_detect_count);
-	total_hung_task = cur_detect_count - prev_detect_count;
+	unsigned long blocked_secs = (jiffies - t->last_switch_time) / HZ;
+
+	pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
+		t->comm, t->pid, blocked_secs);
+	pr_err("      %s %s %.*s\n",
+		print_tainted(), init_utsname()->release,
+		(int)strcspn(init_utsname()->version, " "),
+		init_utsname()->version);
+	if (t->flags & PF_POSTCOREDUMP)
+		pr_err("      Blocked by coredump.\n");
+	pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\" disables this message.\n");
+}
 
+/**
+ * hung_task_info - Print diagnostic details for a hung task
+ * @t: Pointer to the detected hung task.
+ * @timeout: Timeout threshold for detecting hung tasks
+ * @this_round_count: Count of hung tasks detected in the current iteration
+ *
+ * Print structured information about the specified hung task, if warnings
+ * are enabled or if the panic batch threshold is exceeded.
+ */
+static void hung_task_info(struct task_struct *t, unsigned long timeout,
+			   unsigned long this_round_count)
+{
 	trace_sched_process_hang(t);
 
-	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
+	if (sysctl_hung_task_panic && this_round_count >= sysctl_hung_task_panic) {
 		console_verbose();
 		hung_task_call_panic = true;
 	}
@@ -251,18 +270,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
 	 * complain:
 	 */
 	if (sysctl_hung_task_warnings || hung_task_call_panic) {
-		if (sysctl_hung_task_warnings > 0)
-			sysctl_hung_task_warnings--;
-		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
-		       t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
-		pr_err("      %s %s %.*s\n",
-			print_tainted(), init_utsname()->release,
-			(int)strcspn(init_utsname()->version, " "),
-			init_utsname()->version);
-		if (t->flags & PF_POSTCOREDUMP)
-			pr_err("      Blocked by coredump.\n");
-		pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
-			" disables this message.\n");
+		hung_task_diagnostics(t);
 		sched_show_task(t);
 		debug_show_blocker(t, timeout);
 
@@ -306,11 +314,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	int max_count = sysctl_hung_task_check_count;
 	unsigned long last_break = jiffies;
 	struct task_struct *g, *t;
-	unsigned long prev_detect_count;
+	unsigned long total_count, this_round_count;
 	int need_warning = sysctl_hung_task_warnings;
 	unsigned long si_mask = hung_task_si_mask;
 
-	prev_detect_count = atomic_long_read(&sysctl_hung_task_detect_count);
+	/* The counter might get reset. Remember the initial value.
+	 * Acquire prevents reordering task checks before this point.
+	 */
+	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
 	/*
 	 * If the system crashed already then all bets are off,
 	 * do not report extra hung tasks:
@@ -318,7 +329,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	if (test_taint(TAINT_DIE) || did_panic)
 		return;
 
-
+	this_round_count = 0;
 	rcu_read_lock();
 	for_each_process_thread(g, t) {
 
@@ -330,15 +341,26 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 			last_break = jiffies;
 		}
 
-		check_hung_task(t, timeout, prev_detect_count);
+		if (task_is_hung(t, timeout)) {
+			this_round_count++;
+			hung_task_info(t, timeout, this_round_count);
+		}
 	}
  unlock:
 	rcu_read_unlock();
 
-	if (!(atomic_long_read(&sysctl_hung_task_detect_count) -
-				prev_detect_count))
+	if (!this_round_count)
 		return;
 
+	/*
+	 * Do not count this round when the global counter has been reset
+	 * during this check. Release ensures we see all hang details
+	 * recorded during the scan.
+	 */
+	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count,
+				    total_count, total_count +
+				    this_round_count);
+
 	if (need_warning || hung_task_call_panic) {
 		si_mask |= SYS_INFO_LOCKS;
 
@@ -370,20 +392,35 @@ static long hung_timeout_jiffies(unsigned long last_checked,
  * @lenp: Pointer to the length of the data being transferred
  * @ppos: Pointer to the current file offset
  *
- * This handler is used for reading the current hung task detection count.
- * Returns 0 on success or a negative error code on failure.
+ * This handler is used for reading the current hung task detection count
+ * and for resetting it to zero when a write operation is performed using a
+ * zero value only. Returns 0 on success or a negative error code on
+ * failure.
  */
 static int proc_dohung_task_detect_count(const struct ctl_table *table, int dir,
 					 void *buffer, size_t *lenp, loff_t *ppos)
 {
 	unsigned long detect_count;
 	struct ctl_table proxy_table;
+	int err;
 
-	detect_count = atomic_long_read(&sysctl_hung_task_detect_count);
 	proxy_table = *table;
 	proxy_table.data = &detect_count;
 
-	return proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
+	if (SYSCTL_KERN_TO_USER(dir))
+		detect_count = atomic_long_read(&sysctl_hung_task_detect_count);
+
+	err = proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
+	if (err < 0)
+		return err;
+
+	if (SYSCTL_USER_TO_KERN(dir)) {
+		if (detect_count)
+			return -EINVAL;
+		atomic_long_set(&sysctl_hung_task_detect_count, 0);
+	}
+
+	return 0;
 }
 
 /*
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [v6 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-15  2:32 ` [v6 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
@ 2026-01-15  3:06   ` Lance Yang
  2026-01-15 18:24     ` Aaron Tomlin
  0 siblings, 1 reply; 12+ messages in thread
From: Lance Yang @ 2026-01-15  3:06 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: sean, linux-kernel, pmladek, gregkh, mhiramat, akpm,
	joel.granados



On 2026/1/15 10:32, Aaron Tomlin wrote:
> Currently, the hung_task_detect_count sysctl provides a cumulative count
> of hung tasks since boot. In long-running, high-availability
> environments, this counter may lose its utility if it cannot be reset
> once an incident has been resolved. Furthermore, the previous
> implementation relied upon implicit ordering, which could not strictly
> guarantee that diagnostic metadata published by one CPU was visible to
> the panic logic on another.
> 
> This patch introduces the capability to reset the detection count by
> writing "0" to the hung_task_detect_count sysctl. The proc_handler logic
> has been updated to validate this input and atomically reset the
> counter.
> 
> The synchronisation of sysctl_hung_task_detect_count relies upon a
> transactional model to ensure the integrity of the detection counter
> against concurrent resets from userspace. The application of
> atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
> and provides the following guarantees:
> 
>      1. Prevention of Load-Store Reordering via Acquire Semantics By
>         utilising atomic_long_read_acquire() to snapshot the counter
>         before initiating the task traversal, we establish a strict
>         memory barrier. This prevents the compiler or hardware from
>         reordering the initial load to a point later in the scan. Without
>         this "acquire" barrier, a delayed load could potentially read a
>         "0" value resulting from a userspace reset that occurred
>         mid-scan. This would lead to the subsequent cmpxchg succeeding
>         erroneously, thereby overwriting the user's reset with stale
>         increment data.
> 
>      2. Atomicity of the "Commit" Phase via Release Semantics The
>         atomic_long_cmpxchg_release() serves as the transaction's commit
>         point. The "release" barrier ensures that all diagnostic
>         recordings and task-state observations made during the scan are
>         globally visible before the counter is incremented.
> 
>      3. Race Condition Resolution This pairing effectively detects any
>         "out-of-band" reset of the counter. If
>         sysctl_hung_task_detect_count is modified via the procfs
>         interface during the scan, the final cmpxchg will detect the
>         discrepancy between the current value and the "acquire" snapshot.
>         Consequently, the update will fail, ensuring that a reset command
>         from the administrator is prioritised over a scan that may have
>         been invalidated by that very reset.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
>   Documentation/admin-guide/sysctl/kernel.rst |   3 +-
>   kernel/hung_task.c                          | 109 +++++++++++++-------
>   2 files changed, 75 insertions(+), 37 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 239da22c4e28..68da4235225a 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -418,7 +418,8 @@ hung_task_detect_count
>   ======================
>   
>   Indicates the total number of tasks that have been detected as hung since
> -the system boot.
> +the system boot or since the counter was reset. The counter is zeroed when
> +a value of 0 is written.
>   
>   This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>   
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index b5ad7a755eb5..2eb9c861bdcc 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -224,24 +224,43 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
>   }
>   #endif
>   
> -static void check_hung_task(struct task_struct *t, unsigned long timeout,
> -			    unsigned long prev_detect_count)
> +/**
> + * hung_task_diagnostics - Print structured diagnostic info for a hung task.
> + * @t: Pointer to the detected hung task.
> + *
> + * This function consolidates the printing of core diagnostic information
> + * for a task found to be blocked.
> + */
> +static inline void hung_task_diagnostics(struct task_struct *t)
>   {
> -	unsigned long total_hung_task, cur_detect_count;
> -
> -	if (!task_is_hung(t, timeout))
> -		return;
> -
> -	/*
> -	 * This counter tracks the total number of tasks detected as hung
> -	 * since boot.
> -	 */
> -	cur_detect_count = atomic_long_inc_return_relaxed(&sysctl_hung_task_detect_count);
> -	total_hung_task = cur_detect_count - prev_detect_count;
> +	unsigned long blocked_secs = (jiffies - t->last_switch_time) / HZ;
> +
> +	pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> +		t->comm, t->pid, blocked_secs);
> +	pr_err("      %s %s %.*s\n",
> +		print_tainted(), init_utsname()->release,
> +		(int)strcspn(init_utsname()->version, " "),
> +		init_utsname()->version);
> +	if (t->flags & PF_POSTCOREDUMP)
> +		pr_err("      Blocked by coredump.\n");
> +	pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\" disables this message.\n");
> +}

I see hung_task_diagnostics() is still in this patch. I thought
we'd concluded that[1] the refactoring wasn't really necessary for a
single-use block?

[1] 
https://lore.kernel.org/all/noze3vhqjbsuulvvoaw4h5yeinggpwfslrit5vsd2dllfo4ath@qgmp22hoibgn/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector
  2026-01-15  2:32 [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
  2026-01-15  2:32 ` [v6 PATCH 1/2] hung_task: Convert detection count to atomic_long_t Aaron Tomlin
  2026-01-15  2:32 ` [v6 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
@ 2026-01-15  3:24 ` Lance Yang
  2026-01-15 18:18   ` Aaron Tomlin
  2 siblings, 1 reply; 12+ messages in thread
From: Lance Yang @ 2026-01-15  3:24 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: sean, linux-kernel, pmladek, gregkh, akpm, joel.granados,
	mhiramat



On 2026/1/15 10:32, Aaron Tomlin wrote:
> Hi Lance, Greg, Petr, Joel, Andrew,
> 
> This series introduces the ability to reset
> /proc/sys/kernel/hung_task_detect_count.
> 
> Writing a "0" value to this file atomically resets the counter of detected
> hung tasks. This functionality provides system administrators with the
> means to clear the cumulative diagnostic history following incident
> resolution, thereby simplifying subsequent monitoring without necessitating
> a system restart.
> 
> The updated logic ensures that the long-running scan (which is inherently
> preemptible and subject to rcu_lock_break()) does not become desynchronised
> from the global state. By treating the initial read as a "version snapshot"
> the kernel can guarantee that the cumulative count only updates if the
> underlying state remained stable throughout the duration of the
> scan.
> 
> Please let me know your thoughts.

There is a mismatch here with what Joel and Petr suggested ...

IIUC, we should just do:
- Patch 1: Full cmpxchg-based counting (Petr's POC), sysctl read-only
- Patch 2: Add write handler for userspace reset

That way Patch 1 is the real logic change, and Patch 2 is just adding
the userspace interface.


Thanks,
Lance

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector
  2026-01-15  3:24 ` [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Lance Yang
@ 2026-01-15 18:18   ` Aaron Tomlin
  2026-01-16  2:22     ` Lance Yang
  0 siblings, 1 reply; 12+ messages in thread
From: Aaron Tomlin @ 2026-01-15 18:18 UTC (permalink / raw)
  To: Lance Yang
  Cc: sean, linux-kernel, pmladek, gregkh, akpm, joel.granados,
	mhiramat

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]

On Thu, Jan 15, 2026 at 11:24:13AM +0800, Lance Yang wrote:
> IIUC, we should just do:
> - Patch 1: Full cmpxchg-based counting (Petr's POC), sysctl read-only
> - Patch 2: Add write handler for userspace reset
> 
> That way Patch 1 is the real logic change, and Patch 2 is just adding
> the userspace interface.

Hi Lance,

Thank you for your feedback.
If I am not mistaken, Joel suggested the following structure [1]:

    1. Create a preparatory patch to change the data type to atomic_long_t
    2. Introduce the required functionality to support a reset to "0"

[1]: https://lore.kernel.org/lkml/d4vx6k7d4tlagesq55yrbma26i2nyt4tpijkme6ckioeyfqfec@txrs27niaj2m/

Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [v6 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-15  3:06   ` Lance Yang
@ 2026-01-15 18:24     ` Aaron Tomlin
  2026-01-16  2:10       ` Lance Yang
  0 siblings, 1 reply; 12+ messages in thread
From: Aaron Tomlin @ 2026-01-15 18:24 UTC (permalink / raw)
  To: Lance Yang
  Cc: sean, linux-kernel, pmladek, gregkh, mhiramat, akpm,
	joel.granados

[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]

On Thu, Jan 15, 2026 at 11:06:16AM +0800, Lance Yang wrote:
> I see hung_task_diagnostics() is still in this patch. I thought
> we'd concluded that[1] the refactoring wasn't really necessary for a
> single-use block?
> 
> [1] https://lore.kernel.org/all/noze3vhqjbsuulvvoaw4h5yeinggpwfslrit5vsd2dllfo4ath@qgmp22hoibgn/

Hi Lance,

Please accept my apologies for the oversight; I certainly did not intend to
disregard our previous conclusion regarding the refactoring.

However, in light of the additional modifications suggested by Petr, it
appeared to me that re-introducing hung_task_diagnostics() resulted in a
significantly cleaner implementation. By encapsulating the diagnostic
output logic, we separate the formatting concerns from the control flow,
which seemed to improve the overall readability of the function.

That being said, if you still consider this abstraction to be redundant, I
am entirely amenable to dropping it and reverting to the inline approach.

Please let me know your preference.

Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [v6 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-15 18:24     ` Aaron Tomlin
@ 2026-01-16  2:10       ` Lance Yang
  0 siblings, 0 replies; 12+ messages in thread
From: Lance Yang @ 2026-01-16  2:10 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: sean, linux-kernel, pmladek, gregkh, mhiramat, akpm,
	joel.granados



On 2026/1/16 02:24, Aaron Tomlin wrote:
> On Thu, Jan 15, 2026 at 11:06:16AM +0800, Lance Yang wrote:
>> I see hung_task_diagnostics() is still in this patch. I thought
>> we'd concluded that[1] the refactoring wasn't really necessary for a
>> single-use block?
>>
>> [1] https://lore.kernel.org/all/noze3vhqjbsuulvvoaw4h5yeinggpwfslrit5vsd2dllfo4ath@qgmp22hoibgn/
> 
> Hi Lance,
> 
> Please accept my apologies for the oversight; I certainly did not intend to
> disregard our previous conclusion regarding the refactoring.
> 
> However, in light of the additional modifications suggested by Petr, it
> appeared to me that re-introducing hung_task_diagnostics() resulted in a
> significantly cleaner implementation. By encapsulating the diagnostic
> output logic, we separate the formatting concerns from the control flow,
> which seemed to improve the overall readability of the function.
> 
> That being said, if you still consider this abstraction to be redundant, I
> am entirely amenable to dropping it and reverting to the inline approach.
> 
> Please let me know your preference.

Thanks for explaining!

Personally, I still do not think the helper is necessary here.

Especially for a single-use block, the abstraction doesn't add much
value.

The inline version is actually more straightforward - you can see
the diagnostic output right where it's used without jumping to another
function.

Please drop the helper and keep that code inline :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector
  2026-01-15 18:18   ` Aaron Tomlin
@ 2026-01-16  2:22     ` Lance Yang
  2026-01-20  9:46       ` Petr Mladek
  0 siblings, 1 reply; 12+ messages in thread
From: Lance Yang @ 2026-01-16  2:22 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: sean, linux-kernel, pmladek, gregkh, akpm, joel.granados,
	mhiramat



On 2026/1/16 02:18, Aaron Tomlin wrote:
> On Thu, Jan 15, 2026 at 11:24:13AM +0800, Lance Yang wrote:
>> IIUC, we should just do:
>> - Patch 1: Full cmpxchg-based counting (Petr's POC), sysctl read-only
>> - Patch 2: Add write handler for userspace reset
>>
>> That way Patch 1 is the real logic change, and Patch 2 is just adding
>> the userspace interface.
> 
> Hi Lance,
> 
> Thank you for your feedback.
> If I am not mistaken, Joel suggested the following structure [1]:
> 
>      1. Create a preparatory patch to change the data type to atomic_long_t
>      2. Introduce the required functionality to support a reset to "0"
> 
> [1]: https://lore.kernel.org/lkml/d4vx6k7d4tlagesq55yrbma26i2nyt4tpijkme6ckioeyfqfec@txrs27niaj2m/
> 

Yeah, either way works :)

But that way (changing to atomic with the old logic first, then
rewriting to the new logic) seems like it creates more churn
and makes review harder.

Or just all in one?

I'd hope Petr and Joel can comment.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector
  2026-01-16  2:22     ` Lance Yang
@ 2026-01-20  9:46       ` Petr Mladek
  2026-01-20 11:48         ` Lance Yang
  2026-01-23  0:59         ` Aaron Tomlin
  0 siblings, 2 replies; 12+ messages in thread
From: Petr Mladek @ 2026-01-20  9:46 UTC (permalink / raw)
  To: Lance Yang
  Cc: Aaron Tomlin, sean, linux-kernel, gregkh, akpm, joel.granados,
	mhiramat

On Fri 2026-01-16 10:22:34, Lance Yang wrote:
> 
> 
> On 2026/1/16 02:18, Aaron Tomlin wrote:
> > On Thu, Jan 15, 2026 at 11:24:13AM +0800, Lance Yang wrote:
> > > IIUC, we should just do:
> > > - Patch 1: Full cmpxchg-based counting (Petr's POC), sysctl read-only
> > > - Patch 2: Add write handler for userspace reset
> > > 
> > > That way Patch 1 is the real logic change, and Patch 2 is just adding
> > > the userspace interface.
> > 
> > Hi Lance,
> > 
> > Thank you for your feedback.
> > If I am not mistaken, Joel suggested the following structure [1]:
> > 
> >      1. Create a preparatory patch to change the data type to atomic_long_t
> >      2. Introduce the required functionality to support a reset to "0"
> > 
> > [1]: https://lore.kernel.org/lkml/d4vx6k7d4tlagesq55yrbma26i2nyt4tpijkme6ckioeyfqfec@txrs27niaj2m/
> > 
> 
> Yeah, either way works :)
> 
> But that way (changing to atomic with the old logic first, then
> rewriting to the new logic) seems like it creates more churn
> and makes review harder.

I agree that adding the atomic and keeping the old logic is not
good. I would prefer to split it into two patches the following
way:

  1. Reshufle the code so that "sysctl_hung_task_detect_count"
     gets incremented in check_hung_uninterruptible_tasks()
     and hung_task_info() will just get "this_round_count".

     Plus convert "sysctl_hung_task_detect_count" to atomic.

     It is the change that I suggested at
     https://lore.kernel.org/lkml/aWTzhLSWQRIGt8Xu@pathway.suse.cz/

     This way, it would be clear why the reshufling was done.
     And the atomic operations will get the right acquire/release
     semantic right away.


   2. Add support to reset the couter to "0".

      It should be a quite simple patch easy to review.


I think that this is how Joel meant it. We could even have 3 patches:

   1. Move "sysctl_hung_task_detect_count" increment to
      check_hung_uninterruptible_tasks().

   2. Convert the counter to atomic operations.

   3. Add reset to "0" support.

But I think that two patches might be good enough.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector
  2026-01-20  9:46       ` Petr Mladek
@ 2026-01-20 11:48         ` Lance Yang
  2026-01-23  0:59         ` Aaron Tomlin
  1 sibling, 0 replies; 12+ messages in thread
From: Lance Yang @ 2026-01-20 11:48 UTC (permalink / raw)
  To: Petr Mladek, Aaron Tomlin
  Cc: sean, linux-kernel, gregkh, akpm, joel.granados, mhiramat



On 2026/1/20 17:46, Petr Mladek wrote:
> On Fri 2026-01-16 10:22:34, Lance Yang wrote:
>>
>>
>> On 2026/1/16 02:18, Aaron Tomlin wrote:
>>> On Thu, Jan 15, 2026 at 11:24:13AM +0800, Lance Yang wrote:
>>>> IIUC, we should just do:
>>>> - Patch 1: Full cmpxchg-based counting (Petr's POC), sysctl read-only
>>>> - Patch 2: Add write handler for userspace reset
>>>>
>>>> That way Patch 1 is the real logic change, and Patch 2 is just adding
>>>> the userspace interface.
>>>
>>> Hi Lance,
>>>
>>> Thank you for your feedback.
>>> If I am not mistaken, Joel suggested the following structure [1]:
>>>
>>>       1. Create a preparatory patch to change the data type to atomic_long_t
>>>       2. Introduce the required functionality to support a reset to "0"
>>>
>>> [1]: https://lore.kernel.org/lkml/d4vx6k7d4tlagesq55yrbma26i2nyt4tpijkme6ckioeyfqfec@txrs27niaj2m/
>>>
>>
>> Yeah, either way works :)
>>
>> But that way (changing to atomic with the old logic first, then
>> rewriting to the new logic) seems like it creates more churn
>> and makes review harder.
> 
> I agree that adding the atomic and keeping the old logic is not
> good. I would prefer to split it into two patches the following
> way:
> 
>    1. Reshufle the code so that "sysctl_hung_task_detect_count"
>       gets incremented in check_hung_uninterruptible_tasks()
>       and hung_task_info() will just get "this_round_count".
> 
>       Plus convert "sysctl_hung_task_detect_count" to atomic.
> 
>       It is the change that I suggested at
>       https://lore.kernel.org/lkml/aWTzhLSWQRIGt8Xu@pathway.suse.cz/
> 
>       This way, it would be clear why the reshufling was done.
>       And the atomic operations will get the right acquire/release
>       semantic right away.
> 
> 
>     2. Add support to reset the couter to "0".
> 
>        It should be a quite simple patch easy to review.

+1

Thanks,
Lance

> 
> 
> I think that this is how Joel meant it. We could even have 3 patches:
> 
>     1. Move "sysctl_hung_task_detect_count" increment to
>        check_hung_uninterruptible_tasks().
> 
>     2. Convert the counter to atomic operations.
> 
>     3. Add reset to "0" support.
> 
> But I think that two patches might be good enough.
> 
> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector
  2026-01-20  9:46       ` Petr Mladek
  2026-01-20 11:48         ` Lance Yang
@ 2026-01-23  0:59         ` Aaron Tomlin
  1 sibling, 0 replies; 12+ messages in thread
From: Aaron Tomlin @ 2026-01-23  0:59 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Lance Yang, sean, linux-kernel, gregkh, akpm, joel.granados,
	mhiramat

[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]

On Tue, Jan 20, 2026 at 10:46:14AM +0100, Petr Mladek wrote:
> I agree that adding the atomic and keeping the old logic is not
> good. I would prefer to split it into two patches the following
> way:
> 
>   1. Reshufle the code so that "sysctl_hung_task_detect_count"
>      gets incremented in check_hung_uninterruptible_tasks()
>      and hung_task_info() will just get "this_round_count".
> 
>      Plus convert "sysctl_hung_task_detect_count" to atomic.
> 
>      It is the change that I suggested at
>      https://lore.kernel.org/lkml/aWTzhLSWQRIGt8Xu@pathway.suse.cz/
> 
>      This way, it would be clear why the reshufling was done.
>      And the atomic operations will get the right acquire/release
>      semantic right away.
> 
> 
>    2. Add support to reset the couter to "0".
> 
>       It should be a quite simple patch easy to review.

Acknowledged.

> I think that this is how Joel meant it. We could even have 3 patches:
> 
>    1. Move "sysctl_hung_task_detect_count" increment to
>       check_hung_uninterruptible_tasks().
> 
>    2. Convert the counter to atomic operations.
> 
>    3. Add reset to "0" support.
> 
> But I think that two patches might be good enough.

Understood. I'll sort it out.


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-01-23  1:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-15  2:32 [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
2026-01-15  2:32 ` [v6 PATCH 1/2] hung_task: Convert detection count to atomic_long_t Aaron Tomlin
2026-01-15  2:32 ` [v6 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
2026-01-15  3:06   ` Lance Yang
2026-01-15 18:24     ` Aaron Tomlin
2026-01-16  2:10       ` Lance Yang
2026-01-15  3:24 ` [v6 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Lance Yang
2026-01-15 18:18   ` Aaron Tomlin
2026-01-16  2:22     ` Lance Yang
2026-01-20  9:46       ` Petr Mladek
2026-01-20 11:48         ` Lance Yang
2026-01-23  0:59         ` Aaron Tomlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox