* [PATCH v2 0/2] hung_task: Provide runtime reset interface for hung task detector @ 2025-12-11 3:30 Aaron Tomlin 2025-12-11 3:30 ` [PATCH v2 1/2] hung_task: Consolidate hung task warning into an atomic log block Aaron Tomlin 2025-12-11 3:30 ` [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin 0 siblings, 2 replies; 9+ messages in thread From: Aaron Tomlin @ 2025-12-11 3:30 UTC (permalink / raw) To: akpm, lance.yang, mhiramat, gregkh, pmladek; +Cc: sean, linux-kernel Hi Lance, Greg, Petr, This series introduces the ability to reset /proc/sys/kernel/hung_task_detect_count and consolidates hung task warning into a single, atomic log block. Writing any value to this file atomically resets the counter of detected hung tasks to zero. This functionality provides system administrators with the means to clear the cumulative diagnostic history following incident resolution, thereby simplifying subsequent monitoring without necessitating a system restart. I intend to retain the use of the hung_task_diagnostics() helper function to consolidate the multi-line logging logic for a detected hung task. The primary goal is to improve code quality, readability, and ensure diagnostic output integrity. Please let me know your thoughts. Changes since v1 [1]: - Removed write-only sysfs attribute (Lance Yang) - Modified procfs hung_task_detect_count instead (Lance Yang) - Introduced a custom proc_handler - Updated documentation (Lance Yang) - Added 'static inline' as a hint to eliminate any function call overhead - Removed clutter through encapsulation [1]: https://lore.kernel.org/lkml/20251209041218.1583600-1-atomlin@atomlin.com/ Aaron Tomlin (2): hung_task: Consolidate hung task warning into an atomic log block hung_task: Enable runtime reset of hung_task_detect_count Documentation/admin-guide/sysctl/kernel.rst | 2 +- kernel/hung_task.c | 69 +++++++++++++++++---- 2 files changed, 58 insertions(+), 13 deletions(-) -- 2.51.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 1/2] hung_task: Consolidate hung task warning into an atomic log block 2025-12-11 3:30 [PATCH v2 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin @ 2025-12-11 3:30 ` Aaron Tomlin 2025-12-11 8:02 ` Greg KH 2025-12-11 3:30 ` [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin 1 sibling, 1 reply; 9+ messages in thread From: Aaron Tomlin @ 2025-12-11 3:30 UTC (permalink / raw) To: akpm, lance.yang, mhiramat, gregkh, pmladek; +Cc: sean, linux-kernel Consolidate the multi-line console output in check_hung_task() into a new helper function, hung_task_diagnostics(). This patch ensures the entire diagnostic block (task info, kernel version, and sysctl advice) is logged to the ring buffer via a single pr_err() call. This is critical in a concurrent environment to prevent message lines from interleaving with other CPU activity, thus maintaining contextual integrity of the warning message. Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> --- kernel/hung_task.c | 39 +++++++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/kernel/hung_task.c b/kernel/hung_task.c index d2254c91450b..6f3fb26378b5 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -223,6 +223,34 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti } #endif +/** + * hung_task_diagnostics - Print structured diagnostic info for a hung task. + * @t: The struct task_struct of the detected hung task. + * + * This function consolidates the printing of core diagnostic information + * for a task found to be blocked. This approach ensures atomic logging + * of the multi-line message block, preventing interleaving by other + * console activity, thus maintaining contextual clarity. + */ +static inline void hung_task_diagnostics(struct task_struct *t) +{ + unsigned long blocked_secs = (jiffies - t->last_switch_time) / HZ; + const char *coredump_msg = ""; + const char *disable_msg = + "\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"" + " disables this message.\n"; + + if (t->flags & PF_POSTCOREDUMP) + coredump_msg = " Blocked by coredump.\n"; + + pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n" + " %s %s %.*s\n%s%s", + t->comm, t->pid, blocked_secs, + print_tainted(), init_utsname()->release, + (int)strcspn(init_utsname()->version, " "), + init_utsname()->version, coredump_msg, disable_msg); +} + static void check_hung_task(struct task_struct *t, unsigned long timeout, unsigned long prev_detect_count) { @@ -252,16 +280,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout, if (sysctl_hung_task_warnings || hung_task_call_panic) { if (sysctl_hung_task_warnings > 0) sysctl_hung_task_warnings--; - pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n", - t->comm, t->pid, (jiffies - t->last_switch_time) / HZ); - pr_err(" %s %s %.*s\n", - print_tainted(), init_utsname()->release, - (int)strcspn(init_utsname()->version, " "), - init_utsname()->version); - if (t->flags & PF_POSTCOREDUMP) - pr_err(" Blocked by coredump.\n"); - pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"" - " disables this message.\n"); + hung_task_diagnostics(t); sched_show_task(t); debug_show_blocker(t, timeout); -- 2.51.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] hung_task: Consolidate hung task warning into an atomic log block 2025-12-11 3:30 ` [PATCH v2 1/2] hung_task: Consolidate hung task warning into an atomic log block Aaron Tomlin @ 2025-12-11 8:02 ` Greg KH 2025-12-15 23:44 ` Aaron Tomlin 0 siblings, 1 reply; 9+ messages in thread From: Greg KH @ 2025-12-11 8:02 UTC (permalink / raw) To: Aaron Tomlin; +Cc: akpm, lance.yang, mhiramat, pmladek, sean, linux-kernel On Wed, Dec 10, 2025 at 10:30:03PM -0500, Aaron Tomlin wrote: > Consolidate the multi-line console output in check_hung_task() into a new > helper function, hung_task_diagnostics(). > > This patch ensures the entire diagnostic block (task info, kernel > version, and sysctl advice) is logged to the ring buffer via a single > pr_err() call. This is critical in a concurrent environment to prevent > message lines from interleaving with other CPU activity, thus > maintaining contextual integrity of the warning message. If this message is "critical", then it should not be going through the syslog as that is NOT a "critical" way to communicate things to userspace. What is currently breaking today with the multi-line message that you have? Why is this so much more special than the normal oops / warning / oom and other type messages that are multi-lines today? I'm all for moving this to a single function, but I'm not ok with multi-line messages in one pr_err() call like this, sorry. Especially one that contains a "here is how to disable this" message like this one does, that surely is NOT a "critical" thing. thanks, greg k-h ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] hung_task: Consolidate hung task warning into an atomic log block 2025-12-11 8:02 ` Greg KH @ 2025-12-15 23:44 ` Aaron Tomlin 0 siblings, 0 replies; 9+ messages in thread From: Aaron Tomlin @ 2025-12-15 23:44 UTC (permalink / raw) To: Greg KH; +Cc: akpm, lance.yang, mhiramat, pmladek, sean, linux-kernel On Thu, Dec 11, 2025 at 05:02:08PM +0900, Greg KH wrote: > I'm all for moving this to a single function, but I'm not ok with > multi-line messages in one pr_err() call like this, sorry. Hi Greg, Thank you for your feedback on the patch. I agree with your assessment regarding the severity and the use of a single pr_err() call for multi-line output. My previous description of this message being "critical" was certainly an overstatement; this warning is not more critical than an Oops or an OOM report, which also manage multi-line output adequately. I will revert to the previous multi-line implementation but will still consolidate the logic into the new helper function and make some additional improvements to hopefully improve the code structure and readability somewhat. Kind regards, -- Aaron Tomlin ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count 2025-12-11 3:30 [PATCH v2 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin 2025-12-11 3:30 ` [PATCH v2 1/2] hung_task: Consolidate hung task warning into an atomic log block Aaron Tomlin @ 2025-12-11 3:30 ` Aaron Tomlin 2025-12-11 5:14 ` Lance Yang ` (2 more replies) 1 sibling, 3 replies; 9+ messages in thread From: Aaron Tomlin @ 2025-12-11 3:30 UTC (permalink / raw) To: akpm, lance.yang, mhiramat, gregkh, pmladek; +Cc: sean, linux-kernel Introduce support for writing to /proc/sys/kernel/hung_task_detect_count. Writing any value to this file atomically resets the counter of detected hung tasks to zero. This grants system administrators the ability to clear the cumulative diagnostic history after resolving an incident, simplifying monitoring without requiring a system restart. Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> --- Documentation/admin-guide/sysctl/kernel.rst | 2 +- kernel/hung_task.c | 30 +++++++++++++++++++-- 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 239da22c4e28..43c17b919969 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -418,7 +418,7 @@ hung_task_detect_count ====================== Indicates the total number of tasks that have been detected as hung since -the system boot. +the system boot. The counter can be reset to zero when written to. This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. diff --git a/kernel/hung_task.c b/kernel/hung_task.c index 6f3fb26378b5..979b7e2fcc19 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -377,6 +377,32 @@ static long hung_timeout_jiffies(unsigned long last_checked, } #ifdef CONFIG_SYSCTL + +/** + * proc_dohung_task_detect_count - proc handler for hung_task_detect_count + * + * Handles read access for the hung task counter. For write access, it + * accepts any successfully parsed value and atomically resets the counter + * to zero. Returns the byte count written on success or a negative error + * code on failure. + */ +static int proc_dohung_task_detect_count(const struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + int ret; + + if (!write) + return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); + + ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos); + if (ret) + return ret; + + WRITE_ONCE(sysctl_hung_task_detect_count, 0); + + return ret; +} + /* * Process updating of timeout sysctl */ @@ -459,8 +485,8 @@ static const struct ctl_table hung_task_sysctls[] = { .procname = "hung_task_detect_count", .data = &sysctl_hung_task_detect_count, .maxlen = sizeof(unsigned long), - .mode = 0444, - .proc_handler = proc_doulongvec_minmax, + .mode = 0644, + .proc_handler = proc_dohung_task_detect_count, }, { .procname = "hung_task_sys_info", -- 2.51.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count 2025-12-11 3:30 ` [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin @ 2025-12-11 5:14 ` Lance Yang 2025-12-15 23:38 ` Aaron Tomlin 2025-12-11 15:49 ` kernel test robot 2025-12-15 5:00 ` kernel test robot 2 siblings, 1 reply; 9+ messages in thread From: Lance Yang @ 2025-12-11 5:14 UTC (permalink / raw) To: Aaron Tomlin; +Cc: sean, linux-kernel, pmladek, gregkh, mhiramat, akpm On 2025/12/11 11:30, Aaron Tomlin wrote: > Introduce support for writing to /proc/sys/kernel/hung_task_detect_count. > > Writing any value to this file atomically resets the counter of detected > hung tasks to zero. This grants system administrators the ability to clear > the cumulative diagnostic history after resolving an incident, simplifying > monitoring without requiring a system restart. > > Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> > --- > Documentation/admin-guide/sysctl/kernel.rst | 2 +- > kernel/hung_task.c | 30 +++++++++++++++++++-- > 2 files changed, 29 insertions(+), 3 deletions(-) > > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst > index 239da22c4e28..43c17b919969 100644 > --- a/Documentation/admin-guide/sysctl/kernel.rst > +++ b/Documentation/admin-guide/sysctl/kernel.rst > @@ -418,7 +418,7 @@ hung_task_detect_count > ====================== > > Indicates the total number of tasks that have been detected as hung since > -the system boot. > +the system boot. The counter can be reset to zero when written to. > > This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c > index 6f3fb26378b5..979b7e2fcc19 100644 > --- a/kernel/hung_task.c > +++ b/kernel/hung_task.c > @@ -377,6 +377,32 @@ static long hung_timeout_jiffies(unsigned long last_checked, > } > > #ifdef CONFIG_SYSCTL > + > +/** > + * proc_dohung_task_detect_count - proc handler for hung_task_detect_count > + * > + * Handles read access for the hung task counter. For write access, it > + * accepts any successfully parsed value and atomically resets the counter > + * to zero. Returns the byte count written on success or a negative error > + * code on failure. > + */ > +static int proc_dohung_task_detect_count(const struct ctl_table *table, int write, > + void *buffer, size_t *lenp, loff_t *ppos) > +{ > + int ret; > + > + if (!write) > + return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); > + > + ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos); Since the intent is "any write resets to zero", we could skip parsing the input entirely (untested): WRITE_ONCE(sysctl_hung_task_detect_count, 0); *ppos += *lenp; return 0; See vmstat_refresh() for a similar pattern :) > + if (ret) > + return ret; > + > + WRITE_ONCE(sysctl_hung_task_detect_count, 0); > + > + return ret; > +} Cheers, Lance > + > /* > * Process updating of timeout sysctl > */ > @@ -459,8 +485,8 @@ static const struct ctl_table hung_task_sysctls[] = { > .procname = "hung_task_detect_count", > .data = &sysctl_hung_task_detect_count, > .maxlen = sizeof(unsigned long), > - .mode = 0444, > - .proc_handler = proc_doulongvec_minmax, > + .mode = 0644, > + .proc_handler = proc_dohung_task_detect_count, > }, > { > .procname = "hung_task_sys_info", ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count 2025-12-11 5:14 ` Lance Yang @ 2025-12-15 23:38 ` Aaron Tomlin 0 siblings, 0 replies; 9+ messages in thread From: Aaron Tomlin @ 2025-12-15 23:38 UTC (permalink / raw) To: Lance Yang; +Cc: sean, linux-kernel, pmladek, gregkh, mhiramat, akpm On Thu, Dec 11, 2025 at 01:14:38PM +0800, Lance Yang wrote: > Since the intent is "any write resets to zero", we could skip parsing > the input entirely (untested): > > WRITE_ONCE(sysctl_hung_task_detect_count, 0); > *ppos += *lenp; > return 0; > Hi Lance, Acknowledged. I will simply test for a 'write' operation. Kind regards, -- Aaron Tomlin ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count 2025-12-11 3:30 ` [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin 2025-12-11 5:14 ` Lance Yang @ 2025-12-11 15:49 ` kernel test robot 2025-12-15 5:00 ` kernel test robot 2 siblings, 0 replies; 9+ messages in thread From: kernel test robot @ 2025-12-11 15:49 UTC (permalink / raw) To: Aaron Tomlin, akpm, lance.yang, mhiramat, gregkh, pmladek Cc: oe-kbuild-all, sean, linux-kernel Hi Aaron, kernel test robot noticed the following build warnings: [auto build test WARNING on akpm-mm/mm-everything] [also build test WARNING on linus/master next-20251211] [cannot apply to v6.18] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Aaron-Tomlin/hung_task-Consolidate-hung-task-warning-into-an-atomic-log-block/20251211-113605 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20251211033004.1628875-3-atomlin%40atomlin.com patch subject: [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count config: s390-randconfig-001-20251211 (https://download.01.org/0day-ci/archive/20251211/202512112355.FQD4j4e8-lkp@intel.com/config) compiler: s390-linux-gcc (GCC) 11.5.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251211/202512112355.FQD4j4e8-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202512112355.FQD4j4e8-lkp@intel.com/ All warnings (new ones prefixed by >>): >> Warning: kernel/hung_task.c:390 function parameter 'table' not described in 'proc_dohung_task_detect_count' >> Warning: kernel/hung_task.c:390 function parameter 'write' not described in 'proc_dohung_task_detect_count' >> Warning: kernel/hung_task.c:390 function parameter 'buffer' not described in 'proc_dohung_task_detect_count' >> Warning: kernel/hung_task.c:390 function parameter 'lenp' not described in 'proc_dohung_task_detect_count' >> Warning: kernel/hung_task.c:390 function parameter 'ppos' not described in 'proc_dohung_task_detect_count' -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count 2025-12-11 3:30 ` [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin 2025-12-11 5:14 ` Lance Yang 2025-12-11 15:49 ` kernel test robot @ 2025-12-15 5:00 ` kernel test robot 2 siblings, 0 replies; 9+ messages in thread From: kernel test robot @ 2025-12-15 5:00 UTC (permalink / raw) To: Aaron Tomlin, akpm, lance.yang, mhiramat, gregkh, pmladek Cc: oe-kbuild-all, sean, linux-kernel Hi Aaron, kernel test robot noticed the following build warnings: [auto build test WARNING on akpm-mm/mm-everything] [also build test WARNING on linus/master v6.19-rc1 next-20251215] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Aaron-Tomlin/hung_task-Consolidate-hung-task-warning-into-an-atomic-log-block/20251211-113605 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20251211033004.1628875-3-atomlin%40atomlin.com patch subject: [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count config: x86_64-rhel-9.4-ltp (https://download.01.org/0day-ci/archive/20251215/202512150521.53lGwfu6-lkp@intel.com/config) compiler: gcc-14 (Debian 14.2.0-19) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251215/202512150521.53lGwfu6-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202512150521.53lGwfu6-lkp@intel.com/ All warnings (new ones prefixed by >>): Warning: kernel/hung_task.c:390 function parameter 'table' not described in 'proc_dohung_task_detect_count' >> Warning: kernel/hung_task.c:390 function parameter 'write' not described in 'proc_dohung_task_detect_count' >> Warning: kernel/hung_task.c:390 function parameter 'buffer' not described in 'proc_dohung_task_detect_count' >> Warning: kernel/hung_task.c:390 function parameter 'lenp' not described in 'proc_dohung_task_detect_count' >> Warning: kernel/hung_task.c:390 function parameter 'ppos' not described in 'proc_dohung_task_detect_count' -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-12-15 23:45 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-11 3:30 [PATCH v2 0/2] hung_task: Provide runtime reset interface for hung task detector Aaron Tomlin 2025-12-11 3:30 ` [PATCH v2 1/2] hung_task: Consolidate hung task warning into an atomic log block Aaron Tomlin 2025-12-11 8:02 ` Greg KH 2025-12-15 23:44 ` Aaron Tomlin 2025-12-11 3:30 ` [PATCH v2 2/2] hung_task: Enable runtime reset of hung_task_detect_count Aaron Tomlin 2025-12-11 5:14 ` Lance Yang 2025-12-15 23:38 ` Aaron Tomlin 2025-12-11 15:49 ` kernel test robot 2025-12-15 5:00 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox