[v5 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [v5 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector
@ 2025-12-31  0:41 Aaron Tomlin
  2025-12-31  0:41 ` [v5 PATCH 1/2] hung_task: Introduce helper for hung task warning Aaron Tomlin
  2025-12-31  0:41 ` [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
  0 siblings, 2 replies; 22+ messages in thread
From: Aaron Tomlin @ 2025-12-31  0:41 UTC (permalink / raw)
  To: akpm, lance.yang, mhiramat, gregkh, pmladek, joel.granados
  Cc: sean, linux-kernel

Hi Lance, Greg, Petr, Joel, Andrew,

This series introduces the ability to reset
/proc/sys/kernel/hung_task_detect_count.

Writing a zero value to this file atomically resets the counter of detected
hung tasks. This functionality provides system administrators with the
means to clear the cumulative diagnostic history following incident
resolution, thereby simplifying subsequent monitoring without necessitating
a system restart.

The implementation uses atomic acquire/release semantics to ensure that
diagnostic metadata published by one CPU is correctly observed by the
monitoring thread on another CPU.

Please let me know your thoughts.


Changes since v4 [1]:
 - Added missing release semantic for memory ordering (Lance Yang) 

Changes since v3 [2]:
 - Use atomic operations to ensure cross-CPU visibility and prevent an integer underflow
 - Use acquire/release semantics for memory ordering (Petr Mladek)
 - Move quoted string to a single line (Petr Mladek)
 - Remove variables coredump_msg and disable_msg to simplify code (Petr Mladek)
 - Add trailing "\n" to all strings to ensure immediate console flushing (Petr Mladek)
 - Improve the hung task counter documentation (Joel Granados)
 - Reject non-zero writes with -EINVAL (Joel Granados)
 - Translate to the new sysctl API (Petr Mladek)

Changes since v2 [3]:
 - Avoided a needless double update to hung_task_detect_count (Lance Yang)
 - Restored previous use of pr_err() for each message (Greg KH)
 - Provided a complete descriptive comment for the helper

Changes since v1 [4]:
 - Removed write-only sysfs attribute (Lance Yang)
 - Modified procfs hung_task_detect_count instead (Lance Yang)
 - Introduced a custom proc_handler
 - Updated documentation (Lance Yang)
 - Added 'static inline' as a hint to eliminate any function call overhead
 - Removed clutter through encapsulation

[1]: https://lore.kernel.org/lkml/20251222014210.2032214-1-atomlin@atomlin.com/
[2]: https://lore.kernel.org/all/20251216030036.1822217-1-atomlin@atomlin.com/
[3]: https://lore.kernel.org/lkml/20251211033004.1628875-1-atomlin@atomlin.com/
[4]: https://lore.kernel.org/lkml/20251209041218.1583600-1-atomlin@atomlin.com/

Aaron Tomlin (2):
  hung_task: Introduce helper for hung task warning
  hung_task: Enable runtime reset of hung_task_detect_count

 Documentation/admin-guide/sysctl/kernel.rst |   3 +-
 kernel/hung_task.c                          | 115 ++++++++++++++++----
 2 files changed, 96 insertions(+), 22 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [v5 PATCH 1/2] hung_task: Introduce helper for hung task warning
  2025-12-31  0:41 [v5 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
@ 2025-12-31  0:41 ` Aaron Tomlin
  2026-01-01  9:49   ` Lance Yang
  2025-12-31  0:41 ` [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
  1 sibling, 1 reply; 22+ messages in thread
From: Aaron Tomlin @ 2025-12-31  0:41 UTC (permalink / raw)
  To: akpm, lance.yang, mhiramat, gregkh, pmladek, joel.granados
  Cc: sean, linux-kernel

Consolidate the multi-line console output block for reporting a hung
task into a new helper function, hung_task_diagnostics(). This improves
readability in the main check_hung_task() loop and makes the diagnostic
output structure easier to maintain and update in the future.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
 kernel/hung_task.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index d2254c91450b..00c3296fd692 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -223,6 +223,28 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
 }
 #endif
 
+/**
+ * hung_task_diagnostics - Print structured diagnostic info for a hung task.
+ * @t: Pointer to the detected hung task.
+ *
+ * This function consolidates the printing of core diagnostic information
+ * for a task found to be blocked.
+ */
+static inline void hung_task_diagnostics(struct task_struct *t)
+{
+	unsigned long blocked_secs = (jiffies - t->last_switch_time) / HZ;
+
+	pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
+	       t->comm, t->pid, blocked_secs);
+	pr_err("      %s %s %.*s\n",
+	       print_tainted(), init_utsname()->release,
+	       (int)strcspn(init_utsname()->version, " "),
+	       init_utsname()->version);
+	if (t->flags & PF_POSTCOREDUMP)
+		pr_err("      Blocked by coredump.\n");
+	pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\" disables this message.\n");
+}
+
 static void check_hung_task(struct task_struct *t, unsigned long timeout,
 		unsigned long prev_detect_count)
 {
@@ -252,16 +274,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
 	if (sysctl_hung_task_warnings || hung_task_call_panic) {
 		if (sysctl_hung_task_warnings > 0)
 			sysctl_hung_task_warnings--;
-		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
-		       t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
-		pr_err("      %s %s %.*s\n",
-			print_tainted(), init_utsname()->release,
-			(int)strcspn(init_utsname()->version, " "),
-			init_utsname()->version);
-		if (t->flags & PF_POSTCOREDUMP)
-			pr_err("      Blocked by coredump.\n");
-		pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
-			" disables this message.\n");
+		hung_task_diagnostics(t);
 		sched_show_task(t);
 		debug_show_blocker(t, timeout);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2025-12-31  0:41 [v5 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
  2025-12-31  0:41 ` [v5 PATCH 1/2] hung_task: Introduce helper for hung task warning Aaron Tomlin
@ 2025-12-31  0:41 ` Aaron Tomlin
  2026-01-01  9:46   ` Lance Yang
                     ` (3 more replies)
  1 sibling, 4 replies; 22+ messages in thread
From: Aaron Tomlin @ 2025-12-31  0:41 UTC (permalink / raw)
  To: akpm, lance.yang, mhiramat, gregkh, pmladek, joel.granados
  Cc: sean, linux-kernel

Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.

Writing a value of zero to this file atomically resets the counter of
detected hung tasks. This grants system administrators the ability to
clear the cumulative diagnostic history after resolving an incident,
simplifying monitoring without requiring a system restart.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
 Documentation/admin-guide/sysctl/kernel.rst |  3 +-
 kernel/hung_task.c                          | 82 ++++++++++++++++++---
 2 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 239da22c4e28..68da4235225a 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -418,7 +418,8 @@ hung_task_detect_count
 ======================
 
 Indicates the total number of tasks that have been detected as hung since
-the system boot.
+the system boot or since the counter was reset. The counter is zeroed when
+a value of 0 is written.
 
 This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
 
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 00c3296fd692..3bc72a4e4032 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -17,6 +17,7 @@
 #include <linux/export.h>
 #include <linux/panic_notifier.h>
 #include <linux/sysctl.h>
+#include <linux/atomic.h>
 #include <linux/suspend.h>
 #include <linux/utsname.h>
 #include <linux/sched/signal.h>
@@ -36,7 +37,7 @@ static int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
 /*
  * Total number of tasks detected as hung since boot:
  */
-static unsigned long __read_mostly sysctl_hung_task_detect_count;
+static atomic_long_t sysctl_hung_task_detect_count = ATOMIC_LONG_INIT(0);
 
 /*
  * Limit number of tasks checked in a batch.
@@ -246,20 +247,26 @@ static inline void hung_task_diagnostics(struct task_struct *t)
 }
 
 static void check_hung_task(struct task_struct *t, unsigned long timeout,
-		unsigned long prev_detect_count)
+			    unsigned long prev_detect_count)
 {
-	unsigned long total_hung_task;
+	unsigned long total_hung_task, cur_detect_count;
 
 	if (!task_is_hung(t, timeout))
 		return;
 
 	/*
 	 * This counter tracks the total number of tasks detected as hung
-	 * since boot.
+	 * since boot. If a reset occurred during the scan, we treat the
+	 * current count as the new delta to avoid an underflow error.
+	 * Ensure hang details are globally visible before the counter
+	 * update.
 	 */
-	sysctl_hung_task_detect_count++;
+	cur_detect_count = atomic_long_inc_return_release(&sysctl_hung_task_detect_count);
+	if (cur_detect_count >= prev_detect_count)
+		total_hung_task = cur_detect_count - prev_detect_count;
+	else
+		total_hung_task = cur_detect_count;
 
-	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
 	trace_sched_process_hang(t);
 
 	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
@@ -318,10 +325,12 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	int max_count = sysctl_hung_task_check_count;
 	unsigned long last_break = jiffies;
 	struct task_struct *g, *t;
-	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
+	unsigned long cur_detect_count, prev_detect_count, delta;
 	int need_warning = sysctl_hung_task_warnings;
 	unsigned long si_mask = hung_task_si_mask;
 
+	/* Acquire prevents reordering task checks before this point. */
+	prev_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
 	/*
 	 * If the system crashed already then all bets are off,
 	 * do not report extra hung tasks:
@@ -346,7 +355,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
  unlock:
 	rcu_read_unlock();
 
-	if (!(sysctl_hung_task_detect_count - prev_detect_count))
+	/* Ensures we see all hang details recorded during the scan. */
+	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
+	if (cur_detect_count < prev_detect_count)
+		delta = cur_detect_count;
+	else
+		delta = cur_detect_count - prev_detect_count;
+
+	if (!delta)
 		return;
 
 	if (need_warning || hung_task_call_panic) {
@@ -371,6 +387,51 @@ static long hung_timeout_jiffies(unsigned long last_checked,
 }
 
 #ifdef CONFIG_SYSCTL
+
+/**
+ * proc_dohung_task_detect_count - proc handler for hung_task_detect_count
+ * @table: Pointer to the struct ctl_table definition for this proc entry
+ * @dir: Flag indicating the operation
+ * @buffer: User space buffer for data transfer
+ * @lenp: Pointer to the length of the data being transferred
+ * @ppos: Pointer to the current file offset
+ *
+ * This handler is used for reading the current hung task detection count
+ * and for resetting it to zero when a write operation is performed using a
+ * zero value only. Returns 0 on success or a negative error code on
+ * failure.
+ */
+static int proc_dohung_task_detect_count(const struct ctl_table *table, int dir,
+					 void *buffer, size_t *lenp, loff_t *ppos)
+{
+	unsigned long detect_count;
+	struct ctl_table proxy_table;
+	int err;
+
+	proxy_table = *table;
+	proxy_table.data = &detect_count;
+
+	if (SYSCTL_KERN_TO_USER(dir)) {
+		detect_count = atomic_long_read(&sysctl_hung_task_detect_count);
+
+		return proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
+	}
+
+	err = proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
+	if (err < 0)
+		return err;
+
+	if (SYSCTL_USER_TO_KERN(dir)) {
+		/* The only valid value for clearing is zero. */
+		if (detect_count)
+			return -EINVAL;
+		atomic_long_set(&sysctl_hung_task_detect_count, 0);
+	}
+
+	*ppos += *lenp;
+	return err;
+}
+
 /*
  * Process updating of timeout sysctl
  */
@@ -451,10 +512,9 @@ static const struct ctl_table hung_task_sysctls[] = {
 	},
 	{
 		.procname	= "hung_task_detect_count",
-		.data		= &sysctl_hung_task_detect_count,
 		.maxlen		= sizeof(unsigned long),
-		.mode		= 0444,
-		.proc_handler	= proc_doulongvec_minmax,
+		.mode		= 0644,
+		.proc_handler	= proc_dohung_task_detect_count,
 	},
 	{
 		.procname	= "hung_task_sys_info",
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2025-12-31  0:41 ` [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
@ 2026-01-01  9:46   ` Lance Yang
  2026-01-01 23:14   ` Joel Granados
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Lance Yang @ 2026-01-01  9:46 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: sean, linux-kernel, joel.granados, pmladek, akpm, mhiramat,
	gregkh



On 2025/12/31 08:41, Aaron Tomlin wrote:
> Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.
> 
> Writing a value of zero to this file atomically resets the counter of
> detected hung tasks. This grants system administrators the ability to
> clear the cumulative diagnostic history after resolving an incident,
> simplifying monitoring without requiring a system restart.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---

Overall, looks good to me :)

>   Documentation/admin-guide/sysctl/kernel.rst |  3 +-
>   kernel/hung_task.c                          | 82 ++++++++++++++++++---
>   2 files changed, 73 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 239da22c4e28..68da4235225a 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -418,7 +418,8 @@ hung_task_detect_count
>   ======================
>   
>   Indicates the total number of tasks that have been detected as hung since
> -the system boot.
> +the system boot or since the counter was reset. The counter is zeroed when
> +a value of 0 is written.
>   
>   This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>   
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 00c3296fd692..3bc72a4e4032 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -17,6 +17,7 @@
>   #include <linux/export.h>
>   #include <linux/panic_notifier.h>
>   #include <linux/sysctl.h>
> +#include <linux/atomic.h>
>   #include <linux/suspend.h>
>   #include <linux/utsname.h>
>   #include <linux/sched/signal.h>
> @@ -36,7 +37,7 @@ static int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
>   /*
>    * Total number of tasks detected as hung since boot:
>    */
> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
> +static atomic_long_t sysctl_hung_task_detect_count = ATOMIC_LONG_INIT(0);
>   
>   /*
>    * Limit number of tasks checked in a batch.
> @@ -246,20 +247,26 @@ static inline void hung_task_diagnostics(struct task_struct *t)
>   }
>   
>   static void check_hung_task(struct task_struct *t, unsigned long timeout,
> -		unsigned long prev_detect_count)
> +			    unsigned long prev_detect_count)
>   {
> -	unsigned long total_hung_task;
> +	unsigned long total_hung_task, cur_detect_count;
>   
>   	if (!task_is_hung(t, timeout))
>   		return;
>   
>   	/*
>   	 * This counter tracks the total number of tasks detected as hung
> -	 * since boot.
> +	 * since boot. If a reset occurred during the scan, we treat the
> +	 * current count as the new delta to avoid an underflow error.
> +	 * Ensure hang details are globally visible before the counter
> +	 * update.
>   	 */
> -	sysctl_hung_task_detect_count++;
> +	cur_detect_count = atomic_long_inc_return_release(&sysctl_hung_task_detect_count);
> +	if (cur_detect_count >= prev_detect_count)
> +		total_hung_task = cur_detect_count - prev_detect_count;
> +	else
> +		total_hung_task = cur_detect_count;
>   
> -	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
>   	trace_sched_process_hang(t);
>   
>   	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> @@ -318,10 +325,12 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   	int max_count = sysctl_hung_task_check_count;
>   	unsigned long last_break = jiffies;
>   	struct task_struct *g, *t;
> -	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> +	unsigned long cur_detect_count, prev_detect_count, delta;
>   	int need_warning = sysctl_hung_task_warnings;
>   	unsigned long si_mask = hung_task_si_mask;
>   
> +	/* Acquire prevents reordering task checks before this point. */
> +	prev_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
>   	/*
>   	 * If the system crashed already then all bets are off,
>   	 * do not report extra hung tasks:
> @@ -346,7 +355,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>    unlock:
>   	rcu_read_unlock();
>   
> -	if (!(sysctl_hung_task_detect_count - prev_detect_count))
> +	/* Ensures we see all hang details recorded during the scan. */
> +	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> +	if (cur_detect_count < prev_detect_count)
> +		delta = cur_detect_count;
> +	else
> +		delta = cur_detect_count - prev_detect_count;
> +
> +	if (!delta)
>   		return;

Right. The underflow check applied in both check_hung_task() and
check_hung_uninterruptible_tasks() to handle reset during scan
looks properly addressed.

>   
>   	if (need_warning || hung_task_call_panic) {
> @@ -371,6 +387,51 @@ static long hung_timeout_jiffies(unsigned long last_checked,
>   }
>   
>   #ifdef CONFIG_SYSCTL
> +
> +/**
> + * proc_dohung_task_detect_count - proc handler for hung_task_detect_count
> + * @table: Pointer to the struct ctl_table definition for this proc entry
> + * @dir: Flag indicating the operation
> + * @buffer: User space buffer for data transfer
> + * @lenp: Pointer to the length of the data being transferred
> + * @ppos: Pointer to the current file offset
> + *
> + * This handler is used for reading the current hung task detection count
> + * and for resetting it to zero when a write operation is performed using a
> + * zero value only. Returns 0 on success or a negative error code on
> + * failure.
> + */
> +static int proc_dohung_task_detect_count(const struct ctl_table *table, int dir,
> +					 void *buffer, size_t *lenp, loff_t *ppos)
> +{

This proc_handler is probably better left for Petr and Joel to review ;)

Nothing else jumped out at me, so:

Acked-by: Lance Yang <lance.yang@linux.dev>

> +	unsigned long detect_count;
> +	struct ctl_table proxy_table;
> +	int err;
> +
> +	proxy_table = *table;
> +	proxy_table.data = &detect_count;
> +
> +	if (SYSCTL_KERN_TO_USER(dir)) {
> +		detect_count = atomic_long_read(&sysctl_hung_task_detect_count);
> +
> +		return proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
> +	}
> +
> +	err = proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
> +	if (err < 0)
> +		return err;
> +
> +	if (SYSCTL_USER_TO_KERN(dir)) {
> +		/* The only valid value for clearing is zero. */
> +		if (detect_count)
> +			return -EINVAL;
> +		atomic_long_set(&sysctl_hung_task_detect_count, 0);
> +	}
> +
> +	*ppos += *lenp;
> +	return err;
> +}
> +
>   /*
>    * Process updating of timeout sysctl
>    */
> @@ -451,10 +512,9 @@ static const struct ctl_table hung_task_sysctls[] = {
>   	},
>   	{
>   		.procname	= "hung_task_detect_count",
> -		.data		= &sysctl_hung_task_detect_count,
>   		.maxlen		= sizeof(unsigned long),
> -		.mode		= 0444,
> -		.proc_handler	= proc_doulongvec_minmax,
> +		.mode		= 0644,
> +		.proc_handler	= proc_dohung_task_detect_count,
>   	},
>   	{
>   		.procname	= "hung_task_sys_info",


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 1/2] hung_task: Introduce helper for hung task warning
  2025-12-31  0:41 ` [v5 PATCH 1/2] hung_task: Introduce helper for hung task warning Aaron Tomlin
@ 2026-01-01  9:49   ` Lance Yang
  2026-01-01 19:28     ` Aaron Tomlin
  0 siblings, 1 reply; 22+ messages in thread
From: Lance Yang @ 2026-01-01  9:49 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: sean, akpm, gregkh, linux-kernel, joel.granados, pmladek,
	mhiramat



On 2025/12/31 08:41, Aaron Tomlin wrote:
> Consolidate the multi-line console output block for reporting a hung
> task into a new helper function, hung_task_diagnostics(). This improves
> readability in the main check_hung_task() loop and makes the diagnostic
> output structure easier to maintain and update in the future.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
>   kernel/hung_task.c | 33 +++++++++++++++++++++++----------
>   1 file changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index d2254c91450b..00c3296fd692 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -223,6 +223,28 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
>   }
>   #endif
>   
> +/**
> + * hung_task_diagnostics - Print structured diagnostic info for a hung task.
> + * @t: Pointer to the detected hung task.
> + *
> + * This function consolidates the printing of core diagnostic information
> + * for a task found to be blocked.
> + */
> +static inline void hung_task_diagnostics(struct task_struct *t)
> +{
> +	unsigned long blocked_secs = (jiffies - t->last_switch_time) / HZ;
> +
> +	pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> +	       t->comm, t->pid, blocked_secs);
> +	pr_err("      %s %s %.*s\n",
> +	       print_tainted(), init_utsname()->release,
> +	       (int)strcspn(init_utsname()->version, " "),
> +	       init_utsname()->version);
> +	if (t->flags & PF_POSTCOREDUMP)
> +		pr_err("      Blocked by coredump.\n");
> +	pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\" disables this message.\n");
> +}
> +
>   static void check_hung_task(struct task_struct *t, unsigned long timeout,
>   		unsigned long prev_detect_count)
>   {
> @@ -252,16 +274,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
>   	if (sysctl_hung_task_warnings || hung_task_call_panic) {
>   		if (sysctl_hung_task_warnings > 0)
>   			sysctl_hung_task_warnings--;
> -		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> -		       t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
> -		pr_err("      %s %s %.*s\n",
> -			print_tainted(), init_utsname()->release,
> -			(int)strcspn(init_utsname()->version, " "),
> -			init_utsname()->version);
> -		if (t->flags & PF_POSTCOREDUMP)
> -			pr_err("      Blocked by coredump.\n");
> -		pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
> -			" disables this message.\n");
> +		hung_task_diagnostics(t);

I am wondering whether we should leave that code as-is to
avoid unnecessary churn ...

That code was not particularly complex or duplicated :)

>   		sched_show_task(t);
>   		debug_show_blocker(t, timeout);
>   


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 1/2] hung_task: Introduce helper for hung task warning
  2026-01-01  9:49   ` Lance Yang
@ 2026-01-01 19:28     ` Aaron Tomlin
  2026-01-02  3:40       ` Lance Yang
  0 siblings, 1 reply; 22+ messages in thread
From: Aaron Tomlin @ 2026-01-01 19:28 UTC (permalink / raw)
  To: Lance Yang
  Cc: sean, akpm, gregkh, linux-kernel, joel.granados, pmladek,
	mhiramat

On Thu, Jan 01, 2026 at 05:49:59PM +0800, Lance Yang wrote:
> I am wondering whether we should leave that code as-is to
> avoid unnecessary churn ...
> 
> That code was not particularly complex or duplicated :)

Hi Lance,

While I agree the current logic is simple, separating the verbose reporting
from the detection loop significantly improves the readability of
check_hung_task(). This refactoring introduces no runtime overhead
(static inline) while providing a cleaner, encapsulated structure for any
future diagnostic enhancements.

Kind regards,
-- 
Aaron Tomlin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2025-12-31  0:41 ` [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
  2026-01-01  9:46   ` Lance Yang
@ 2026-01-01 23:14   ` Joel Granados
  2026-01-02  1:24     ` Aaron Tomlin
  2026-01-06 11:51   ` Joel Granados
  2026-01-08 14:41   ` Petr Mladek
  3 siblings, 1 reply; 22+ messages in thread
From: Joel Granados @ 2026-01-01 23:14 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: akpm, lance.yang, mhiramat, gregkh, pmladek, sean, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 7146 bytes --]

On Tue, Dec 30, 2025 at 07:41:25PM -0500, Aaron Tomlin wrote:
> Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.
> 
> Writing a value of zero to this file atomically resets the counter of
> detected hung tasks. This grants system administrators the ability to
> clear the cumulative diagnostic history after resolving an incident,
> simplifying monitoring without requiring a system restart.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
>  Documentation/admin-guide/sysctl/kernel.rst |  3 +-
>  kernel/hung_task.c                          | 82 ++++++++++++++++++---
>  2 files changed, 73 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 239da22c4e28..68da4235225a 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -418,7 +418,8 @@ hung_task_detect_count
>  ======================
>  
>  Indicates the total number of tasks that have been detected as hung since
> -the system boot.
> +the system boot or since the counter was reset. The counter is zeroed when
> +a value of 0 is written.
>  
>  This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>  
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 00c3296fd692..3bc72a4e4032 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -17,6 +17,7 @@
>  #include <linux/export.h>
>  #include <linux/panic_notifier.h>
>  #include <linux/sysctl.h>
> +#include <linux/atomic.h>
>  #include <linux/suspend.h>
>  #include <linux/utsname.h>
>  #include <linux/sched/signal.h>
> @@ -36,7 +37,7 @@ static int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
>  /*
>   * Total number of tasks detected as hung since boot:
>   */
> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
> +static atomic_long_t sysctl_hung_task_detect_count = ATOMIC_LONG_INIT(0);
>  
>  /*
>   * Limit number of tasks checked in a batch.
> @@ -246,20 +247,26 @@ static inline void hung_task_diagnostics(struct task_struct *t)
>  }
>  
>  static void check_hung_task(struct task_struct *t, unsigned long timeout,
> -		unsigned long prev_detect_count)
> +			    unsigned long prev_detect_count)
>  {
> -	unsigned long total_hung_task;
> +	unsigned long total_hung_task, cur_detect_count;
>  
>  	if (!task_is_hung(t, timeout))
>  		return;
>  
>  	/*
>  	 * This counter tracks the total number of tasks detected as hung
> -	 * since boot.
> +	 * since boot. If a reset occurred during the scan, we treat the
> +	 * current count as the new delta to avoid an underflow error.
> +	 * Ensure hang details are globally visible before the counter
> +	 * update.
>  	 */
> -	sysctl_hung_task_detect_count++;
> +	cur_detect_count = atomic_long_inc_return_release(&sysctl_hung_task_detect_count);
> +	if (cur_detect_count >= prev_detect_count)
> +		total_hung_task = cur_detect_count - prev_detect_count;
> +	else
> +		total_hung_task = cur_detect_count;
>  
> -	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
>  	trace_sched_process_hang(t);
>  
>  	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> @@ -318,10 +325,12 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  	int max_count = sysctl_hung_task_check_count;
>  	unsigned long last_break = jiffies;
>  	struct task_struct *g, *t;
> -	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> +	unsigned long cur_detect_count, prev_detect_count, delta;
>  	int need_warning = sysctl_hung_task_warnings;
>  	unsigned long si_mask = hung_task_si_mask;
>  
> +	/* Acquire prevents reordering task checks before this point. */
> +	prev_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
>  	/*
>  	 * If the system crashed already then all bets are off,
>  	 * do not report extra hung tasks:
> @@ -346,7 +355,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   unlock:
>  	rcu_read_unlock();
>  
> -	if (!(sysctl_hung_task_detect_count - prev_detect_count))
> +	/* Ensures we see all hang details recorded during the scan. */
> +	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> +	if (cur_detect_count < prev_detect_count)
> +		delta = cur_detect_count;
> +	else
> +		delta = cur_detect_count - prev_detect_count;
> +
> +	if (!delta)
>  		return;
>  
>  	if (need_warning || hung_task_call_panic) {
> @@ -371,6 +387,51 @@ static long hung_timeout_jiffies(unsigned long last_checked,
>  }
>  
>  #ifdef CONFIG_SYSCTL
> +
> +/**
> + * proc_dohung_task_detect_count - proc handler for hung_task_detect_count
> + * @table: Pointer to the struct ctl_table definition for this proc entry
> + * @dir: Flag indicating the operation
> + * @buffer: User space buffer for data transfer
> + * @lenp: Pointer to the length of the data being transferred
> + * @ppos: Pointer to the current file offset
> + *
> + * This handler is used for reading the current hung task detection count
> + * and for resetting it to zero when a write operation is performed using a
> + * zero value only. Returns 0 on success or a negative error code on
> + * failure.
> + */
> +static int proc_dohung_task_detect_count(const struct ctl_table *table, int dir,
> +					 void *buffer, size_t *lenp, loff_t *ppos)
> +{
> +	unsigned long detect_count;
> +	struct ctl_table proxy_table;
> +	int err;
> +
> +	proxy_table = *table;
> +	proxy_table.data = &detect_count;
> +
> +	if (SYSCTL_KERN_TO_USER(dir)) {
> +		detect_count = atomic_long_read(&sysctl_hung_task_detect_count);
> +
> +		return proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
> +	}
> +
> +	err = proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
> +	if (err < 0)
> +		return err;
> +
> +	if (SYSCTL_USER_TO_KERN(dir)) {
> +		/* The only valid value for clearing is zero. */
> +		if (detect_count)
> +			return -EINVAL;
> +		atomic_long_set(&sysctl_hung_task_detect_count, 0);
> +	}
> +
> +	*ppos += *lenp;
> +	return err;
> +}
> +
>  /*
>   * Process updating of timeout sysctl
>   */
> @@ -451,10 +512,9 @@ static const struct ctl_table hung_task_sysctls[] = {
>  	},
>  	{
>  		.procname	= "hung_task_detect_count",
> -		.data		= &sysctl_hung_task_detect_count,
>  		.maxlen		= sizeof(unsigned long),
> -		.mode		= 0444,
> -		.proc_handler	= proc_doulongvec_minmax,
> +		.mode		= 0644,
> +		.proc_handler	= proc_dohung_task_detect_count,
>  	},
>  	{
>  		.procname	= "hung_task_sys_info",

I don't understand why do you need a custom proc_handler here.
Couldn't you just do this:
  	{
  		.procname	    = "hung_task_detect_count",
  		.data		      = &sysctl_hung_task_detect_count,
  		.maxlen		    = sizeof(unsigned long),
  		.mode		      = 0644,
  		.proc_handler	= proc_doulongvec_minmax,
      .extra1       = 0,
      .extra2       = 0,
    },

???

What a I missing?

Best

-- 

Joel Granados

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-01 23:14   ` Joel Granados
@ 2026-01-02  1:24     ` Aaron Tomlin
  2026-01-05 10:53       ` Joel Granados
  0 siblings, 1 reply; 22+ messages in thread
From: Aaron Tomlin @ 2026-01-02  1:24 UTC (permalink / raw)
  To: Joel Granados
  Cc: akpm, lance.yang, mhiramat, gregkh, pmladek, sean, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1119 bytes --]

On Fri, Jan 02, 2026 at 12:14:39AM +0100, Joel Granados wrote:
> I don't understand why do you need a custom proc_handler here.
> Couldn't you just do this:
>   	{
>   		.procname	    = "hung_task_detect_count",
>   		.data		      = &sysctl_hung_task_detect_count,
>   		.maxlen		    = sizeof(unsigned long),
>   		.mode		      = 0644,
>   		.proc_handler	= proc_doulongvec_minmax,
>       .extra1       = 0,
>       .extra2       = 0,
>     },
> 
> ???
> 
> What a I missing?

Hi Joel,

The reason for the custom handler is that sysctl_hung_task_detect_count is
now an atomic_long_t.

The generic proc_doulongvec_minmax() handler expects a pointer to a raw
unsigned long and performs standard assignments. If we pointed it directly
at an atomic type, we would lose the atomicity and memory barriers required
for the reset logic to work correctly alongside check_hung_task().

The custom handler acts as a wrapper to ensure we use atomic_long_read()
and atomic_long_set(), while also enforcing the policy that only a value of
'0' is valid for writing


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 1/2] hung_task: Introduce helper for hung task warning
  2026-01-01 19:28     ` Aaron Tomlin
@ 2026-01-02  3:40       ` Lance Yang
  2026-01-02 19:02         ` Aaron Tomlin
  0 siblings, 1 reply; 22+ messages in thread
From: Lance Yang @ 2026-01-02  3:40 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: sean, akpm, gregkh, linux-kernel, joel.granados, pmladek,
	mhiramat



On 2026/1/2 03:28, Aaron Tomlin wrote:
> On Thu, Jan 01, 2026 at 05:49:59PM +0800, Lance Yang wrote:
>> I am wondering whether we should leave that code as-is to
>> avoid unnecessary churn ...
>>
>> That code was not particularly complex or duplicated :)
> 
> Hi Lance,
> 
> While I agree the current logic is simple, separating the verbose reporting
> from the detection loop significantly improves the readability of
> check_hung_task(). This refactoring introduces no runtime overhead
> (static inline) while providing a cleaner, encapsulated structure for any
> future diagnostic enhancements.

For a single-use diagnostic block, I still think this refactoring
does not add much practical value ...

Let's leave the code alone - it's probably not worth the churn :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 1/2] hung_task: Introduce helper for hung task warning
  2026-01-02  3:40       ` Lance Yang
@ 2026-01-02 19:02         ` Aaron Tomlin
  0 siblings, 0 replies; 22+ messages in thread
From: Aaron Tomlin @ 2026-01-02 19:02 UTC (permalink / raw)
  To: Lance Yang
  Cc: sean, akpm, gregkh, linux-kernel, joel.granados, pmladek,
	mhiramat

[-- Attachment #1: Type: text/plain, Size: 565 bytes --]

On Fri, Jan 02, 2026 at 11:40:24AM +0800, Lance Yang wrote:
> For a single-use diagnostic block, I still think this refactoring
> does not add much practical value ...
> 
> Let's leave the code alone - it's probably not worth the churn :)

Hi Lance,

Fair point. While I remain of the view that the encapsulation offers a bit
more clarity, I certainly appreciate your point regarding unnecessary churn
for a single-use block.

I shall drop this patch from the series in the next iteration to keep
things concise.


Best regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-02  1:24     ` Aaron Tomlin
@ 2026-01-05 10:53       ` Joel Granados
  2026-01-05 14:42         ` Aaron Tomlin
  0 siblings, 1 reply; 22+ messages in thread
From: Joel Granados @ 2026-01-05 10:53 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: akpm, lance.yang, mhiramat, gregkh, pmladek, sean, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1520 bytes --]

On Thu, Jan 01, 2026 at 08:24:42PM -0500, Aaron Tomlin wrote:
> On Fri, Jan 02, 2026 at 12:14:39AM +0100, Joel Granados wrote:
> > I don't understand why do you need a custom proc_handler here.
> > Couldn't you just do this:
> >   	{
> >   		.procname	    = "hung_task_detect_count",
> >   		.data		      = &sysctl_hung_task_detect_count,
> >   		.maxlen		    = sizeof(unsigned long),
> >   		.mode		      = 0644,
> >   		.proc_handler	= proc_doulongvec_minmax,
> >       .extra1       = 0,
> >       .extra2       = 0,
> >     },
> > 
> > ???
> > 
> > What a I missing?
> 
> Hi Joel,
> 
> The reason for the custom handler is that sysctl_hung_task_detect_count is
> now an atomic_long_t.
> 
> The generic proc_doulongvec_minmax() handler expects a pointer to a raw
> unsigned long and performs standard assignments. If we pointed it directly
> at an atomic type, we would lose the atomicity and memory barriers required
> for the reset logic to work correctly alongside check_hung_task().
> 
> The custom handler acts as a wrapper to ensure we use atomic_long_read()
> and atomic_long_set(), while also enforcing the policy that only a value of
> '0' is valid for writing
It is clear to me that you need a custom handler after changing the type
to an atomic_long_t.

It is not clear to my why you changed it to atomic_long_t. It was
already being modified,read,written when it was a raw unsigned long.
What has changed that requires atomic_long_t?

Best


-- 

Joel Granados

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-05 10:53       ` Joel Granados
@ 2026-01-05 14:42         ` Aaron Tomlin
  2026-01-06 11:36           ` Joel Granados
  0 siblings, 1 reply; 22+ messages in thread
From: Aaron Tomlin @ 2026-01-05 14:42 UTC (permalink / raw)
  To: Joel Granados
  Cc: akpm, lance.yang, mhiramat, gregkh, pmladek, neelx, sean,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2934 bytes --]

On Mon, Jan 05, 2026 at 11:53:08AM +0100, Joel Granados wrote:
> It is clear to me that you need a custom handler after changing the type
> to an atomic_long_t.
> 
> It is not clear to my why you changed it to atomic_long_t. It was
> already being modified,read,written when it was a raw unsigned long.
> What has changed that requires atomic_long_t?

Hi Joel,

Thank you very much for your review and for raising this important
question. I appreciate the opportunity to clarify the reasoning behind the
type change.

The rationale for switching to atomic_long_t is primarily due to the shift
from a Single-Writer model to a Multi-Writer model, alongside the need for
strict memory ordering which standard unsigned long operations cannot
guarantee.

1. Atomicity (precluding the "Lost Update" scenario) In the existing
   implementation, sysctl_hung_task_detect_count is effectively read-only
   for userspace. There is only one writer: the watchdog thread. By
   introducing the ability for an administrator to write '0' (reset), we
   create a race condition between:

    - The watchdog thread attempting to increment the counter
      (detect_count++)

    - The administrator attempting to reset the counter (detect_count = 0)

On many architectures, a standard increment is a non-atomic
Read-Modify-Write sequence. Without atomic_long_inc(), we would risk a
"Lost Update" scenario where a reset occurs precisely between the
watchdog's load and store instructions, causing the counter to revert to a
high value immediately after the reset.

2. Memory Ordering (Acquire/Release Semantics) Beyond basic atomicity, the
   use of atomic_long_inc_return_release() and atomic_long_read_acquire()
   is necessary to enforce correct synchronisation without the overhead of
   full memory barriers

    - Release semantics (in check_hung_task()): We utilise Release semantics
      when incrementing the counter. This guarantees that all memory
      operations associated with detecting the hang (the reads inside
      task_is_hung()) are globally visible before the counter is updated.
      This effectively "publishes" the hang event only after the detection
      logic is fully complete

    - Acquire semantics (in check_hung_uninterruptible_tasks()): We utilise
      Acquire semantics when reading the counter in the summary loop. This
      ensures that we do not reorder the loop's subsequent checks to occur
      before we have observed the current counter state

If we were to rely on _relaxed atomics or a simple unsigned long, the CPU
could theoretically reorder these operations, potentially leading to
inconsistencies where the summary loop observes a counter increment but
sees stale task state data.

I hope this explanation clarifies the necessity of these changes. Please
let me know if you have any further queries.

Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-05 14:42         ` Aaron Tomlin
@ 2026-01-06 11:36           ` Joel Granados
  2026-01-07  1:49             ` Aaron Tomlin
  0 siblings, 1 reply; 22+ messages in thread
From: Joel Granados @ 2026-01-06 11:36 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: akpm, lance.yang, mhiramat, gregkh, pmladek, neelx, sean,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3670 bytes --]

On Mon, Jan 05, 2026 at 09:42:08AM -0500, Aaron Tomlin wrote:
> On Mon, Jan 05, 2026 at 11:53:08AM +0100, Joel Granados wrote:
> > It is clear to me that you need a custom handler after changing the type
> > to an atomic_long_t.
> > 
> > It is not clear to my why you changed it to atomic_long_t. It was
> > already being modified,read,written when it was a raw unsigned long.
> > What has changed that requires atomic_long_t?
> 
> Hi Joel,
> 
> Thank you very much for your review and for raising this important
> question. I appreciate the opportunity to clarify the reasoning behind the
> type change.
> 
> The rationale for switching to atomic_long_t is primarily due to the shift
> from a Single-Writer model to a Multi-Writer model, alongside the need for
> strict memory ordering which standard unsigned long operations cannot
> guarantee.
> 
> 1. Atomicity (precluding the "Lost Update" scenario) In the existing
>    implementation, sysctl_hung_task_detect_count is effectively read-only
>    for userspace. There is only one writer: the watchdog thread. By
>    introducing the ability for an administrator to write '0' (reset), we
>    create a race condition between:
> 
>     - The watchdog thread attempting to increment the counter
>       (detect_count++)
> 
>     - The administrator attempting to reset the counter (detect_count = 0)
> 
> On many architectures, a standard increment is a non-atomic
> Read-Modify-Write sequence. Without atomic_long_inc(), we would risk a
> "Lost Update" scenario where a reset occurs precisely between the
> watchdog's load and store instructions, causing the counter to revert to a
> high value immediately after the reset.
> 
> 2. Memory Ordering (Acquire/Release Semantics) Beyond basic atomicity, the
>    use of atomic_long_inc_return_release() and atomic_long_read_acquire()
>    is necessary to enforce correct synchronisation without the overhead of
>    full memory barriers
> 
>     - Release semantics (in check_hung_task()): We utilise Release semantics
>       when incrementing the counter. This guarantees that all memory
>       operations associated with detecting the hang (the reads inside
>       task_is_hung()) are globally visible before the counter is updated.
>       This effectively "publishes" the hang event only after the detection
>       logic is fully complete
> 
>     - Acquire semantics (in check_hung_uninterruptible_tasks()): We utilise
>       Acquire semantics when reading the counter in the summary loop. This
>       ensures that we do not reorder the loop's subsequent checks to occur
>       before we have observed the current counter state
> 
> If we were to rely on _relaxed atomics or a simple unsigned long, the CPU
> could theoretically reorder these operations, potentially leading to
> inconsistencies where the summary loop observes a counter increment but
> sees stale task state data.
> 
> I hope this explanation clarifies the necessity of these changes. Please
> let me know if you have any further queries.
> 
> 
> Kind regards,
> -- 
> Aaron Tomlin
Hey Aaron

Thx for the very clear explanation. Makes sense.

Can you please add this as part of the commit message.

Actually, if you are up for it, it would be nice to split this into two
commits (this will give you an opportunity to add all this nice
explanation from this mail). One that changes the type and is a
preparation commit. And two, the changes to make the variable writable
from user space.

I have more comments on the diff, but I'll "answer" those to the patch
mail.

Thx again.

Best

-- 

Joel Granados

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2025-12-31  0:41 ` [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
  2026-01-01  9:46   ` Lance Yang
  2026-01-01 23:14   ` Joel Granados
@ 2026-01-06 11:51   ` Joel Granados
  2026-01-07  3:37     ` Aaron Tomlin
  2026-01-08 14:41   ` Petr Mladek
  3 siblings, 1 reply; 22+ messages in thread
From: Joel Granados @ 2026-01-06 11:51 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: akpm, lance.yang, mhiramat, gregkh, pmladek, sean, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 7275 bytes --]

On Tue, Dec 30, 2025 at 07:41:25PM -0500, Aaron Tomlin wrote:
> Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.
> 
> Writing a value of zero to this file atomically resets the counter of
> detected hung tasks. This grants system administrators the ability to
> clear the cumulative diagnostic history after resolving an incident,
> simplifying monitoring without requiring a system restart.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
>  Documentation/admin-guide/sysctl/kernel.rst |  3 +-
>  kernel/hung_task.c                          | 82 ++++++++++++++++++---
>  2 files changed, 73 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 239da22c4e28..68da4235225a 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -418,7 +418,8 @@ hung_task_detect_count
>  ======================
>  
>  Indicates the total number of tasks that have been detected as hung since
> -the system boot.
> +the system boot or since the counter was reset. The counter is zeroed when
> +a value of 0 is written.
>  
>  This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>  
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 00c3296fd692..3bc72a4e4032 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -17,6 +17,7 @@
>  #include <linux/export.h>
>  #include <linux/panic_notifier.h>
>  #include <linux/sysctl.h>
> +#include <linux/atomic.h>
>  #include <linux/suspend.h>
>  #include <linux/utsname.h>
>  #include <linux/sched/signal.h>
> @@ -36,7 +37,7 @@ static int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
>  /*
>   * Total number of tasks detected as hung since boot:
>   */
> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
> +static atomic_long_t sysctl_hung_task_detect_count = ATOMIC_LONG_INIT(0);
>  
>  /*
>   * Limit number of tasks checked in a batch.
> @@ -246,20 +247,26 @@ static inline void hung_task_diagnostics(struct task_struct *t)
>  }
>  
>  static void check_hung_task(struct task_struct *t, unsigned long timeout,
> -		unsigned long prev_detect_count)
> +			    unsigned long prev_detect_count)
>  {
> -	unsigned long total_hung_task;
> +	unsigned long total_hung_task, cur_detect_count;
>  
>  	if (!task_is_hung(t, timeout))
>  		return;
>  
>  	/*
>  	 * This counter tracks the total number of tasks detected as hung
> -	 * since boot.
> +	 * since boot. If a reset occurred during the scan, we treat the
> +	 * current count as the new delta to avoid an underflow error.
> +	 * Ensure hang details are globally visible before the counter
> +	 * update.
>  	 */
> -	sysctl_hung_task_detect_count++;
> +	cur_detect_count = atomic_long_inc_return_release(&sysctl_hung_task_detect_count);
> +	if (cur_detect_count >= prev_detect_count)
> +		total_hung_task = cur_detect_count - prev_detect_count;
> +	else
> +		total_hung_task = cur_detect_count;
>  
> -	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
>  	trace_sched_process_hang(t);
>  
>  	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> @@ -318,10 +325,12 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  	int max_count = sysctl_hung_task_check_count;
>  	unsigned long last_break = jiffies;
>  	struct task_struct *g, *t;
> -	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> +	unsigned long cur_detect_count, prev_detect_count, delta;
>  	int need_warning = sysctl_hung_task_warnings;
>  	unsigned long si_mask = hung_task_si_mask;
>  
> +	/* Acquire prevents reordering task checks before this point. */
> +	prev_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
>  	/*
>  	 * If the system crashed already then all bets are off,
>  	 * do not report extra hung tasks:
> @@ -346,7 +355,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   unlock:
>  	rcu_read_unlock();
>  
> -	if (!(sysctl_hung_task_detect_count - prev_detect_count))
> +	/* Ensures we see all hang details recorded during the scan. */
> +	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> +	if (cur_detect_count < prev_detect_count)
> +		delta = cur_detect_count;
> +	else
> +		delta = cur_detect_count - prev_detect_count;
> +
> +	if (!delta)
>  		return;
>  
>  	if (need_warning || hung_task_call_panic) {
> @@ -371,6 +387,51 @@ static long hung_timeout_jiffies(unsigned long last_checked,
>  }
>  
>  #ifdef CONFIG_SYSCTL
> +
> +/**
> + * proc_dohung_task_detect_count - proc handler for hung_task_detect_count
> + * @table: Pointer to the struct ctl_table definition for this proc entry
> + * @dir: Flag indicating the operation
> + * @buffer: User space buffer for data transfer
> + * @lenp: Pointer to the length of the data being transferred
> + * @ppos: Pointer to the current file offset
> + *
> + * This handler is used for reading the current hung task detection count
> + * and for resetting it to zero when a write operation is performed using a
> + * zero value only. Returns 0 on success or a negative error code on
> + * failure.
> + */
> +static int proc_dohung_task_detect_count(const struct ctl_table *table, int dir,
> +					 void *buffer, size_t *lenp, loff_t *ppos)
> +{
> +	unsigned long detect_count;
> +	struct ctl_table proxy_table;
> +	int err;
> +
> +	proxy_table = *table;
> +	proxy_table.data = &detect_count;
> +
> +	if (SYSCTL_KERN_TO_USER(dir)) {
> +		detect_count = atomic_long_read(&sysctl_hung_task_detect_count);
> +
> +		return proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
> +	}
Could you do something like this (untested):

  if (SYSCTL_KERN_TO_USER(dir))
    detect_count = atomic_long_read(&sysctl_hung_task_detect_count);

  err = proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
  if (err < 0)
    return err;

  if (SYSCTL_USER_TO_KERN(dir)) {
    if (detect_count)
      return -EINVAL;

    atomic_long_set(&sysctl_hung_task_detect_count, 0);
  }

  return 0;

> +
> +	err = proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
> +	if (err < 0)
> +		return err;
> +
> +	if (SYSCTL_USER_TO_KERN(dir)) {
> +		/* The only valid value for clearing is zero. */
> +		if (detect_count)
> +			return -EINVAL;
> +		atomic_long_set(&sysctl_hung_task_detect_count, 0);
> +	}
> +
> +	*ppos += *lenp;
why do you advance ppos here? It is already advanced when you call
proc_doulongvec_minmax.

> +	return err;
> +}
> +
>  /*
>   * Process updating of timeout sysctl
>   */
> @@ -451,10 +512,9 @@ static const struct ctl_table hung_task_sysctls[] = {
>  	},
>  	{
>  		.procname	= "hung_task_detect_count",
> -		.data		= &sysctl_hung_task_detect_count,
>  		.maxlen		= sizeof(unsigned long),
> -		.mode		= 0444,
> -		.proc_handler	= proc_doulongvec_minmax,
> +		.mode		= 0644,
> +		.proc_handler	= proc_dohung_task_detect_count,
>  	},
>  	{
>  		.procname	= "hung_task_sys_info",
> -- 
> 2.51.0
> 

-- 

Joel Granados

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-06 11:36           ` Joel Granados
@ 2026-01-07  1:49             ` Aaron Tomlin
  0 siblings, 0 replies; 22+ messages in thread
From: Aaron Tomlin @ 2026-01-07  1:49 UTC (permalink / raw)
  To: Joel Granados
  Cc: akpm, lance.yang, mhiramat, gregkh, pmladek, neelx, sean,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 889 bytes --]

On Tue, Jan 06, 2026 at 12:36:10PM +0100, Joel Granados wrote:
> Hey Aaron

Hi Joel,

> Thx for the very clear explanation. Makes sense.

My pleasure.

> Can you please add this as part of the commit message.
> 
> Actually, if you are up for it, it would be nice to split this into two
> commits (this will give you an opportunity to add all this nice
> explanation from this mail). One that changes the type and is a
> preparation commit. And two, the changes to make the variable writable
> from user space.

Certainly, I will structure the first as a preparatory commit to amend the
type - utilising that opportunity to include the detailed technical
explanation - and the second to implement the user-space write
capabilities.

> I have more comments on the diff, but I'll "answer" those to the patch
> mail.

Thank you.


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-06 11:51   ` Joel Granados
@ 2026-01-07  3:37     ` Aaron Tomlin
  0 siblings, 0 replies; 22+ messages in thread
From: Aaron Tomlin @ 2026-01-07  3:37 UTC (permalink / raw)
  To: Joel Granados
  Cc: akpm, lance.yang, mhiramat, gregkh, pmladek, sean, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 763 bytes --]

On Tue, Jan 06, 2026 at 12:51:14PM +0100, Joel Granados wrote:
> Could you do something like this (untested):
> 
>   if (SYSCTL_KERN_TO_USER(dir))
>     detect_count = atomic_long_read(&sysctl_hung_task_detect_count);
> 
>   err = proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos);
>   if (err < 0)
>     return err;
> 
>   if (SYSCTL_USER_TO_KERN(dir)) {
>     if (detect_count)
>       return -EINVAL;
> 
>     atomic_long_set(&sysctl_hung_task_detect_count, 0);
>   }
> 
>   return 0;

Fair enough. If the input is malformed or out of bounds, it exits early
with an error.

> why do you advance ppos here? It is already advanced when you call
> proc_doulongvec_minmax.

Acknowledged.


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2025-12-31  0:41 ` [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
                     ` (2 preceding siblings ...)
  2026-01-06 11:51   ` Joel Granados
@ 2026-01-08 14:41   ` Petr Mladek
  2026-01-09 13:50     ` Lance Yang
  2026-01-10 15:55     ` Aaron Tomlin
  3 siblings, 2 replies; 22+ messages in thread
From: Petr Mladek @ 2026-01-08 14:41 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: akpm, lance.yang, mhiramat, gregkh, joel.granados, sean,
	linux-kernel

On Tue 2025-12-30 19:41:25, Aaron Tomlin wrote:
> Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.
> 
> Writing a value of zero to this file atomically resets the counter of
> detected hung tasks. This grants system administrators the ability to
> clear the cumulative diagnostic history after resolving an incident,
> simplifying monitoring without requiring a system restart.

> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -36,7 +37,7 @@ static int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
>  /*
>   * Total number of tasks detected as hung since boot:
>   */
> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
> +static atomic_long_t sysctl_hung_task_detect_count = ATOMIC_LONG_INIT(0);
>  
>  /*
>   * Limit number of tasks checked in a batch.
> @@ -246,20 +247,26 @@ static inline void hung_task_diagnostics(struct task_struct *t)
>  }
>  
>  static void check_hung_task(struct task_struct *t, unsigned long timeout,
> -		unsigned long prev_detect_count)
> +			    unsigned long prev_detect_count)
>  {
> -	unsigned long total_hung_task;
> +	unsigned long total_hung_task, cur_detect_count;
>  
>  	if (!task_is_hung(t, timeout))
>  		return;
>  
>  	/*
>  	 * This counter tracks the total number of tasks detected as hung
> -	 * since boot.
> +	 * since boot. If a reset occurred during the scan, we treat the
> +	 * current count as the new delta to avoid an underflow error.
> +	 * Ensure hang details are globally visible before the counter
> +	 * update.
>  	 */
> -	sysctl_hung_task_detect_count++;
> +	cur_detect_count = atomic_long_inc_return_release(&sysctl_hung_task_detect_count);

The _release() feels a bit weird here because the counter might
get incremented more times during one scan.

IMHO, it should be perfectly fine to use the _relaxed version here
because it is in the middle of the acquire/release, see below.
The important thing here is that the load/modify/store operation
is done atomically.

> +	if (cur_detect_count >= prev_detect_count)
> +		total_hung_task = cur_detect_count - prev_detect_count;
> +	else
> +		total_hung_task = cur_detect_count;
>  
> -	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
>  	trace_sched_process_hang(t);
>  
>  	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> @@ -318,10 +325,12 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  	int max_count = sysctl_hung_task_check_count;
>  	unsigned long last_break = jiffies;
>  	struct task_struct *g, *t;
> -	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> +	unsigned long cur_detect_count, prev_detect_count, delta;
>  	int need_warning = sysctl_hung_task_warnings;
>  	unsigned long si_mask = hung_task_si_mask;
>  
> +	/* Acquire prevents reordering task checks before this point. */
> +	prev_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);

This value is read before the scan started => _acquire
semantic/barrier fits here.

>  	/*
>  	 * If the system crashed already then all bets are off,
>  	 * do not report extra hung tasks:
> @@ -346,7 +355,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   unlock:
>  	rcu_read_unlock();
>  
> -	if (!(sysctl_hung_task_detect_count - prev_detect_count))
> +	/* Ensures we see all hang details recorded during the scan. */
> +	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);

This value is read at the end of the scan => _release
semantic/barrier should be here.

> +	if (cur_detect_count < prev_detect_count)
> +		delta = cur_detect_count;
> +	else
> +		delta = cur_detect_count - prev_detect_count;
> +
> +	if (!delta)
>  		return;
>  
>  	if (need_warning || hung_task_call_panic) {

Otherwise, I do not have anything more to add. I agree with the other
proposals, for example:

   + remove 1st patch
   + split 2nd patch into two
   + changes in the sysctl code proposed by Joel

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-08 14:41   ` Petr Mladek
@ 2026-01-09 13:50     ` Lance Yang
  2026-01-12 13:13       ` Petr Mladek
  2026-01-10 15:55     ` Aaron Tomlin
  1 sibling, 1 reply; 22+ messages in thread
From: Lance Yang @ 2026-01-09 13:50 UTC (permalink / raw)
  To: Petr Mladek, Aaron Tomlin
  Cc: akpm, mhiramat, gregkh, joel.granados, sean, linux-kernel



On 2026/1/8 22:41, Petr Mladek wrote:
> On Tue 2025-12-30 19:41:25, Aaron Tomlin wrote:
>> Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.
>>
>> Writing a value of zero to this file atomically resets the counter of
>> detected hung tasks. This grants system administrators the ability to
>> clear the cumulative diagnostic history after resolving an incident,
>> simplifying monitoring without requiring a system restart.
> 
>> --- a/kernel/hung_task.c
>> +++ b/kernel/hung_task.c
>> @@ -36,7 +37,7 @@ static int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
>>   /*
>>    * Total number of tasks detected as hung since boot:
>>    */
>> -static unsigned long __read_mostly sysctl_hung_task_detect_count;
>> +static atomic_long_t sysctl_hung_task_detect_count = ATOMIC_LONG_INIT(0);
>>   
>>   /*
>>    * Limit number of tasks checked in a batch.
>> @@ -246,20 +247,26 @@ static inline void hung_task_diagnostics(struct task_struct *t)
>>   }
>>   
>>   static void check_hung_task(struct task_struct *t, unsigned long timeout,
>> -		unsigned long prev_detect_count)
>> +			    unsigned long prev_detect_count)
>>   {
>> -	unsigned long total_hung_task;
>> +	unsigned long total_hung_task, cur_detect_count;
>>   
>>   	if (!task_is_hung(t, timeout))
>>   		return;
>>   
>>   	/*
>>   	 * This counter tracks the total number of tasks detected as hung
>> -	 * since boot.
>> +	 * since boot. If a reset occurred during the scan, we treat the
>> +	 * current count as the new delta to avoid an underflow error.
>> +	 * Ensure hang details are globally visible before the counter
>> +	 * update.
>>   	 */
>> -	sysctl_hung_task_detect_count++;
>> +	cur_detect_count = atomic_long_inc_return_release(&sysctl_hung_task_detect_count);
> 
> The _release() feels a bit weird here because the counter might
> get incremented more times during one scan.
> 
> IMHO, it should be perfectly fine to use the _relaxed version here
> because it is in the middle of the acquire/release, see below.
> The important thing here is that the load/modify/store operation
> is done atomically.

Right, we only need atomicity here, not the ordering guarantee :)

> 
>> +	if (cur_detect_count >= prev_detect_count)
>> +		total_hung_task = cur_detect_count - prev_detect_count;
>> +	else
>> +		total_hung_task = cur_detect_count;
>>   
>> -	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
>>   	trace_sched_process_hang(t);
>>   
>>   	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
>> @@ -318,10 +325,12 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>>   	int max_count = sysctl_hung_task_check_count;
>>   	unsigned long last_break = jiffies;
>>   	struct task_struct *g, *t;
>> -	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
>> +	unsigned long cur_detect_count, prev_detect_count, delta;
>>   	int need_warning = sysctl_hung_task_warnings;
>>   	unsigned long si_mask = hung_task_si_mask;
>>   
>> +	/* Acquire prevents reordering task checks before this point. */
>> +	prev_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> 
> This value is read before the scan started => _acquire
> semantic/barrier fits here.
> 
>>   	/*
>>   	 * If the system crashed already then all bets are off,
>>   	 * do not report extra hung tasks:
>> @@ -346,7 +355,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>>    unlock:
>>   	rcu_read_unlock();
>>   
>> -	if (!(sysctl_hung_task_detect_count - prev_detect_count))
>> +	/* Ensures we see all hang details recorded during the scan. */
>> +	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> 
> This value is read at the end of the scan => _release
> semantic/barrier should be here.

Seems like _acquire is still correct here, because it is a load.

_release semantics apply to stores, while _acquire on a load
ensures subsequent memory accesses are not reordered before it.

Or smp_mb()?

In the same thread, atomic operations on the same variable are not
reordered with respect to each other, even the _relaxed variant
preserves program order for that variable, IIRC.

So the increment will always complete before the final read in
program order, and the read will see the updated value (unless
another CPU resets it concurrently, which is a logical race, not
a reordering issue).

So, it would be:

   prev = atomic_long_read_acquire(&counter);     // scan start
   ...
   cur = atomic_long_inc_return_relaxed(&counter); // during scan
   ...
   cur = atomic_long_read_acquire(&counter);      // scan end

The first _acquire ensures no task-checking code is reordered
before the start read, the middle increment is just atomic
without extra barriers, and the final _acquire makes sure we
observe all hang details before computing the delta.

That said, I also see the value in using _release or smp_mb()
here to pair with the _acquire at the start. Making the
ordering semantics clearer to readers.

Cheers,
Lance

> 
>> +	if (cur_detect_count < prev_detect_count)
>> +		delta = cur_detect_count;
>> +	else
>> +		delta = cur_detect_count - prev_detect_count;
>> +
>> +	if (!delta)
>>   		return;
>>   
>>   	if (need_warning || hung_task_call_panic) {
> 
> Otherwise, I do not have anything more to add. I agree with the other
> proposals, for example:
> 
>     + remove 1st patch
>     + split 2nd patch into two
>     + changes in the sysctl code proposed by Joel
> 
> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-08 14:41   ` Petr Mladek
  2026-01-09 13:50     ` Lance Yang
@ 2026-01-10 15:55     ` Aaron Tomlin
  1 sibling, 0 replies; 22+ messages in thread
From: Aaron Tomlin @ 2026-01-10 15:55 UTC (permalink / raw)
  To: Petr Mladek
  Cc: akpm, lance.yang, mhiramat, gregkh, joel.granados, sean,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2085 bytes --]

On Thu, Jan 08, 2026 at 03:41:28PM +0100, Petr Mladek wrote:
> The _release() feels a bit weird here because the counter might
> get incremented more times during one scan.
> 
> IMHO, it should be perfectly fine to use the _relaxed version here
> because it is in the middle of the acquire/release, see below.
> The important thing here is that the load/modify/store operation
> is done atomically.

Hi Petr,

Thank you for your review and the detailed feedback.

I agree with your assessment regarding the counter update within
check_hung_task(). The _release semantics are indeed unnecessary in that
specific context; I shall switch to atomic_long_inc_return_relaxed() as the
atomicity of the operation suffices there.

> > -	if (!(sysctl_hung_task_detect_count - prev_detect_count))
> > +	/* Ensures we see all hang details recorded during the scan. */
> > +	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> 
> This value is read at the end of the scan => _release
> semantic/barrier should be here.

Since we are performing a read/load operation here to capture the current
state, atomic_long_read_acquire() is the correct primitive. It ensures that
if we observe the updated counter, we are also guaranteed to observe any
associated data (such as the hang details) that were "published" by the
writer before the counter was updated. This approach is preferable to
inserting a full memory barrier (smp_mb()), as it provides the necessary
ordering guarantees with less overhead.

> Otherwise, I do not have anything more to add. I agree with the other
> proposals, for example:
> 
>    + remove 1st patch
>    + split 2nd patch into two
>    + changes in the sysctl code proposed by Joel

I entirely agree with your other points regarding the patch structure. I
shall discard the first patch, split the second into distinct logical
changes, and incorporate Joel’s suggestions for the sysctl code.

I will prepare a follow-up patch series incorporating these changes.

Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-09 13:50     ` Lance Yang
@ 2026-01-12 13:13       ` Petr Mladek
  2026-01-12 14:43         ` Lance Yang
  0 siblings, 1 reply; 22+ messages in thread
From: Petr Mladek @ 2026-01-12 13:13 UTC (permalink / raw)
  To: Lance Yang
  Cc: Aaron Tomlin, akpm, mhiramat, gregkh, joel.granados, sean,
	linux-kernel

On Fri 2026-01-09 21:50:20, Lance Yang wrote:
> 
> 
> On 2026/1/8 22:41, Petr Mladek wrote:
> > On Tue 2025-12-30 19:41:25, Aaron Tomlin wrote:
> > > Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.
> > > 
> > > Writing a value of zero to this file atomically resets the counter of
> > > detected hung tasks. This grants system administrators the ability to
> > > clear the cumulative diagnostic history after resolving an incident,
> > > simplifying monitoring without requiring a system restart.
> > 
> > > --- a/kernel/hung_task.c
> > > +++ b/kernel/hung_task.c
> > > @@ -346,7 +355,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> > >    unlock:
> > >   	rcu_read_unlock();
> > > -	if (!(sysctl_hung_task_detect_count - prev_detect_count))
> > > +	/* Ensures we see all hang details recorded during the scan. */
> > > +	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> > 
> > This value is read at the end of the scan => _release
> > semantic/barrier should be here.
> 
> Seems like _acquire is still correct here, because it is a load.
> 
> _release semantics apply to stores, while _acquire on a load
> ensures subsequent memory accesses are not reordered before it.

Right!

> Or smp_mb()?
> 
> In the same thread, atomic operations on the same variable are not
> reordered with respect to each other, even the _relaxed variant
> preserves program order for that variable, IIRC.
> 
> So the increment will always complete before the final read in
> program order, and the read will see the updated value (unless
> another CPU resets it concurrently, which is a logical race, not
> a reordering issue).
> 
> So, it would be:
> 
>   prev = atomic_long_read_acquire(&counter);     // scan start
>   ...
>   cur = atomic_long_inc_return_relaxed(&counter); // during scan
>   ...
>   cur = atomic_long_read_acquire(&counter);      // scan end
> 
> The first _acquire ensures no task-checking code is reordered
> before the start read, the middle increment is just atomic
> without extra barriers, and the final _acquire makes sure we
> observe all hang details before computing the delta.

The acquire/relaxed/acquire semantic looks weird. The problem is
that we do not use the counter as a lock.

I thought about a sane approach and the following came to my
mind:

From c28b74c35d653f527aa9017c32630ad08180fb4e Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Mon, 12 Jan 2026 14:00:52 +0100
Subject: [POC] hung_task: Update the global counter using a proper
 acquire/release semantic

The global counter of hung tasks might get reset when the check
is in progress. Also the number of hung tasks detected in the current
round is important to decide whether panic() is needed or not.

Handle races by:

1. Remember the total counter at the beginnning of the check.
2. Count the current round in a local variable.
3. Udpate the total counter only when the value has not been modified
   during the check.

Note that this is only compile tested.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/hung_task.c | 53 +++++++++++++++++-----------------------------
 1 file changed, 20 insertions(+), 33 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 3bc72a4e4032..c939cd3d8a2c 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -246,30 +246,12 @@ static inline void hung_task_diagnostics(struct task_struct *t)
 	pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\" disables this message.\n");
 }
 
-static void check_hung_task(struct task_struct *t, unsigned long timeout,
-			    unsigned long prev_detect_count)
+static void hung_task_info(struct task_struct *t, unsigned long timeout,
+			   unsigned long this_round_count)
 {
-	unsigned long total_hung_task, cur_detect_count;
-
-	if (!task_is_hung(t, timeout))
-		return;
-
-	/*
-	 * This counter tracks the total number of tasks detected as hung
-	 * since boot. If a reset occurred during the scan, we treat the
-	 * current count as the new delta to avoid an underflow error.
-	 * Ensure hang details are globally visible before the counter
-	 * update.
-	 */
-	cur_detect_count = atomic_long_inc_return_release(&sysctl_hung_task_detect_count);
-	if (cur_detect_count >= prev_detect_count)
-		total_hung_task = cur_detect_count - prev_detect_count;
-	else
-		total_hung_task = cur_detect_count;
-
 	trace_sched_process_hang(t);
 
-	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
+	if (sysctl_hung_task_panic && this_round_count >= sysctl_hung_task_panic) {
 		console_verbose();
 		hung_task_call_panic = true;
 	}
@@ -325,12 +307,13 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	int max_count = sysctl_hung_task_check_count;
 	unsigned long last_break = jiffies;
 	struct task_struct *g, *t;
-	unsigned long cur_detect_count, prev_detect_count, delta;
+	unsigned long total_count, this_round_count;
 	int need_warning = sysctl_hung_task_warnings;
 	unsigned long si_mask = hung_task_si_mask;
 
-	/* Acquire prevents reordering task checks before this point. */
-	prev_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
+	/* The counter might get reset. Remember the initial value. */
+	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
+
 	/*
 	 * If the system crashed already then all bets are off,
 	 * do not report extra hung tasks:
@@ -339,6 +322,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 		return;
 
 
+	this_round_count = 0UL;
 	rcu_read_lock();
 	for_each_process_thread(g, t) {
 
@@ -350,21 +334,24 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 			last_break = jiffies;
 		}
 
-		check_hung_task(t, timeout, prev_detect_count);
+		if (task_is_hung(t, timeout)) {
+			this_round_count++;
+			hung_task_info(t, timeout, this_round_count);
+		}
 	}
  unlock:
 	rcu_read_unlock();
 
-	/* Ensures we see all hang details recorded during the scan. */
-	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
-	if (cur_detect_count < prev_detect_count)
-		delta = cur_detect_count;
-	else
-		delta = cur_detect_count - prev_detect_count;
-
-	if (!delta)
+	if (!this_round_count)
 		return;
 
+	/*
+	 * Do not count this round when the global counter has been reset
+	 * during this check.
+	 */
+	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count, total_count,
+				    total_count + this_round_count);
+
 	if (need_warning || hung_task_call_panic) {
 		si_mask |= SYS_INFO_LOCKS;
 
-- 
2.52.0

It is just a POC. Feel free to do the refactoring another way.

Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-12 13:13       ` Petr Mladek
@ 2026-01-12 14:43         ` Lance Yang
  2026-01-15  2:20           ` Aaron Tomlin
  0 siblings, 1 reply; 22+ messages in thread
From: Lance Yang @ 2026-01-12 14:43 UTC (permalink / raw)
  To: Petr Mladek, Aaron Tomlin
  Cc: akpm, mhiramat, gregkh, joel.granados, sean, linux-kernel



On 2026/1/12 21:13, Petr Mladek wrote:
> On Fri 2026-01-09 21:50:20, Lance Yang wrote:
>>
>>
>> On 2026/1/8 22:41, Petr Mladek wrote:
>>> On Tue 2025-12-30 19:41:25, Aaron Tomlin wrote:
>>>> Introduce support for writing to /proc/sys/kernel/hung_task_detect_count.
>>>>
>>>> Writing a value of zero to this file atomically resets the counter of
>>>> detected hung tasks. This grants system administrators the ability to
>>>> clear the cumulative diagnostic history after resolving an incident,
>>>> simplifying monitoring without requiring a system restart.
>>>
>>>> --- a/kernel/hung_task.c
>>>> +++ b/kernel/hung_task.c
>>>> @@ -346,7 +355,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>>>>     unlock:
>>>>    	rcu_read_unlock();
>>>> -	if (!(sysctl_hung_task_detect_count - prev_detect_count))
>>>> +	/* Ensures we see all hang details recorded during the scan. */
>>>> +	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
>>>
>>> This value is read at the end of the scan => _release
>>> semantic/barrier should be here.
>>
>> Seems like _acquire is still correct here, because it is a load.
>>
>> _release semantics apply to stores, while _acquire on a load
>> ensures subsequent memory accesses are not reordered before it.
> 
> Right!
> 
>> Or smp_mb()?
>>
>> In the same thread, atomic operations on the same variable are not
>> reordered with respect to each other, even the _relaxed variant
>> preserves program order for that variable, IIRC.
>>
>> So the increment will always complete before the final read in
>> program order, and the read will see the updated value (unless
>> another CPU resets it concurrently, which is a logical race, not
>> a reordering issue).
>>
>> So, it would be:
>>
>>    prev = atomic_long_read_acquire(&counter);     // scan start
>>    ...
>>    cur = atomic_long_inc_return_relaxed(&counter); // during scan
>>    ...
>>    cur = atomic_long_read_acquire(&counter);      // scan end
>>
>> The first _acquire ensures no task-checking code is reordered
>> before the start read, the middle increment is just atomic
>> without extra barriers, and the final _acquire makes sure we
>> observe all hang details before computing the delta.
> 
> The acquire/relaxed/acquire semantic looks weird. The problem is
> that we do not use the counter as a lock.
> 
> I thought about a sane approach and the following came to my
> mind:
> 
>  From c28b74c35d653f527aa9017c32630ad08180fb4e Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@suse.com>
> Date: Mon, 12 Jan 2026 14:00:52 +0100
> Subject: [POC] hung_task: Update the global counter using a proper
>   acquire/release semantic
> 
> The global counter of hung tasks might get reset when the check
> is in progress. Also the number of hung tasks detected in the current
> round is important to decide whether panic() is needed or not.
> 
> Handle races by:
> 
> 1. Remember the total counter at the beginnning of the check.
> 2. Count the current round in a local variable.
> 3. Udpate the total counter only when the value has not been modified
>     during the check.

Cool!

> 
> Note that this is only compile tested.
> 
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>   kernel/hung_task.c | 53 +++++++++++++++++-----------------------------
>   1 file changed, 20 insertions(+), 33 deletions(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 3bc72a4e4032..c939cd3d8a2c 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -246,30 +246,12 @@ static inline void hung_task_diagnostics(struct task_struct *t)
>   	pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\" disables this message.\n");
>   }
>   
> -static void check_hung_task(struct task_struct *t, unsigned long timeout,
> -			    unsigned long prev_detect_count)
> +static void hung_task_info(struct task_struct *t, unsigned long timeout,
> +			   unsigned long this_round_count)
>   {
> -	unsigned long total_hung_task, cur_detect_count;
> -
> -	if (!task_is_hung(t, timeout))
> -		return;
> -
> -	/*
> -	 * This counter tracks the total number of tasks detected as hung
> -	 * since boot. If a reset occurred during the scan, we treat the
> -	 * current count as the new delta to avoid an underflow error.
> -	 * Ensure hang details are globally visible before the counter
> -	 * update.
> -	 */
> -	cur_detect_count = atomic_long_inc_return_release(&sysctl_hung_task_detect_count);
> -	if (cur_detect_count >= prev_detect_count)
> -		total_hung_task = cur_detect_count - prev_detect_count;
> -	else
> -		total_hung_task = cur_detect_count;
> -
>   	trace_sched_process_hang(t);
>   
> -	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> +	if (sysctl_hung_task_panic && this_round_count >= sysctl_hung_task_panic) {
>   		console_verbose();
>   		hung_task_call_panic = true;
>   	}
> @@ -325,12 +307,13 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   	int max_count = sysctl_hung_task_check_count;
>   	unsigned long last_break = jiffies;
>   	struct task_struct *g, *t;
> -	unsigned long cur_detect_count, prev_detect_count, delta;
> +	unsigned long total_count, this_round_count;
>   	int need_warning = sysctl_hung_task_warnings;
>   	unsigned long si_mask = hung_task_si_mask;
>   
> -	/* Acquire prevents reordering task checks before this point. */
> -	prev_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> +	/* The counter might get reset. Remember the initial value. */
> +	total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> +
>   	/*
>   	 * If the system crashed already then all bets are off,
>   	 * do not report extra hung tasks:
> @@ -339,6 +322,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   		return;
>   
>   
> +	this_round_count = 0UL;
>   	rcu_read_lock();
>   	for_each_process_thread(g, t) {
>   
> @@ -350,21 +334,24 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>   			last_break = jiffies;
>   		}
>   
> -		check_hung_task(t, timeout, prev_detect_count);
> +		if (task_is_hung(t, timeout)) {
> +			this_round_count++;
> +			hung_task_info(t, timeout, this_round_count);
> +		}
>   	}
>    unlock:
>   	rcu_read_unlock();
>   
> -	/* Ensures we see all hang details recorded during the scan. */
> -	cur_detect_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count);
> -	if (cur_detect_count < prev_detect_count)
> -		delta = cur_detect_count;
> -	else
> -		delta = cur_detect_count - prev_detect_count;
> -
> -	if (!delta)
> +	if (!this_round_count)
>   		return;
>   
> +	/*
> +	 * Do not count this round when the global counter has been reset
> +	 * during this check.
> +	 */
> +	atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count, total_count,
> +				    total_count + this_round_count);
> +
>   	if (need_warning || hung_task_call_panic) {
>   		si_mask |= SYS_INFO_LOCKS;
>   

In general, the POC makes a lot of sense ;)

The acquire/release pairing is now straightforward:
- _read_acquire at start
- local counting (no atomics)
- _cmpxchg_release at end

And yeah, if cmpxchg fails, we simply do not count this round,
which is reasonable behavior.

So, Aaron, can you take care of refactoring/testing based on
Petr's approach?

Cheers,
Lance

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count
  2026-01-12 14:43         ` Lance Yang
@ 2026-01-15  2:20           ` Aaron Tomlin
  0 siblings, 0 replies; 22+ messages in thread
From: Aaron Tomlin @ 2026-01-15  2:20 UTC (permalink / raw)
  To: Lance Yang
  Cc: Petr Mladek, akpm, mhiramat, gregkh, joel.granados, sean,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 491 bytes --]

On Mon, Jan 12, 2026 at 10:43:35PM +0800, Lance Yang wrote:
> In general, the POC makes a lot of sense ;)
> 
> The acquire/release pairing is now straightforward:
> - _read_acquire at start
> - local counting (no atomics)
> - _cmpxchg_release at end
> 
> And yeah, if cmpxchg fails, we simply do not count this round,
> which is reasonable behavior.
> 
> So, Aaron, can you take care of refactoring/testing based on
> Petr's approach?

Hi Lance,

Sure!

-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-01-15  2:20 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-31  0:41 [v5 PATCH 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin
2025-12-31  0:41 ` [v5 PATCH 1/2] hung_task: Introduce helper for hung task warning Aaron Tomlin
2026-01-01  9:49   ` Lance Yang
2026-01-01 19:28     ` Aaron Tomlin
2026-01-02  3:40       ` Lance Yang
2026-01-02 19:02         ` Aaron Tomlin
2025-12-31  0:41 ` [v5 PATCH 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin
2026-01-01  9:46   ` Lance Yang
2026-01-01 23:14   ` Joel Granados
2026-01-02  1:24     ` Aaron Tomlin
2026-01-05 10:53       ` Joel Granados
2026-01-05 14:42         ` Aaron Tomlin
2026-01-06 11:36           ` Joel Granados
2026-01-07  1:49             ` Aaron Tomlin
2026-01-06 11:51   ` Joel Granados
2026-01-07  3:37     ` Aaron Tomlin
2026-01-08 14:41   ` Petr Mladek
2026-01-09 13:50     ` Lance Yang
2026-01-12 13:13       ` Petr Mladek
2026-01-12 14:43         ` Lance Yang
2026-01-15  2:20           ` Aaron Tomlin
2026-01-10 15:55     ` Aaron Tomlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox