* [PATCH v2 0/4] Enable hung_task and lockup cases to dump system info on demand
@ 2025-11-13 11:10 Feng Tang
2025-11-13 11:10 ` [PATCH v2 1/4] docs: panic: correct some sys_ifo names in sysctl doc Feng Tang
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Feng Tang @ 2025-11-13 11:10 UTC (permalink / raw)
To: Andrew Morton, Petr Mladek, Lance Yang, Jonathan Corbet, paulmck,
Steven Rostedt, linux-kernel
Cc: Feng Tang
When working on kernel stability issues: panic, task-hung and soft/hard
lockup are frequently met. And to debug them, user may need lots of
system information at that time, like task call stacks, lock info,
memory info, ftrace dump, etc.
panic case already uses sys_info() for this purpose, and has a
'panic_sys_info' sysctl(also support cmdline setup) interface to take
human readable string like "tasks,mem,timers,locks,ftrace,..." to
control what kinds of information is needed. Which is also helpful
to debug task-hung and lockup cases.
So this patchset introduces the similar sys_info sysctl interface for
task-hung and lockup cases.
Please be noted, this is mainly for debugging and the info dumping
could be intrusive, like dumping call stack for all tasks when system
has huge number of tasks, similarly for ftrace dump (we may add
tracing_stop() and tracing_start() around it)
Locally these have been used in our bug chasing for stability issues
and was helpful.
As Andrew suggested, add a configurable global 'kernel_sys_info' knob.
When error scenarios like panic/hung-task/lockup etc doesn't setup
their own sys_info knob and calls sys_info() with parameter "0", this
global knob will take effect. It could be used for other kernel cases
like OOM, which may not need one dedicated sys_info knob.
Codewise, these 4 patches are independent to each other and could be
applied separately.
Please help to review, thanks!
- Feng
Changelog:
v2:
* Add 0004 patch to add the default kernel sys_info knob (Andrew)
* Simplify the code for hung_sys_info (Petr)
* Use separate sys_info interface for hardlockup and softlockpu (Petr)
* Consider the ALL_CPU_BT handling for hardlockup case (Petr)
* Collect Reviewd-by tags.
* Put soft/hard sys_info knob into correct kernel config domain.
Feng Tang (4):
docs: panic: correct some sys_ifo names in sysctl doc
hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
watchdog: add sys_info sysctls to dump sys info on system lockup
sys_info: add a default kernel sys_info mask
Documentation/admin-guide/sysctl/kernel.rst | 23 +++++++-
kernel/hung_task.c | 62 +++++++++++++--------
kernel/watchdog.c | 44 ++++++++++++++-
lib/sys_info.c | 31 ++++++++++-
4 files changed, 130 insertions(+), 30 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/4] docs: panic: correct some sys_ifo names in sysctl doc
2025-11-13 11:10 [PATCH v2 0/4] Enable hung_task and lockup cases to dump system info on demand Feng Tang
@ 2025-11-13 11:10 ` Feng Tang
2025-11-13 11:10 ` [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung Feng Tang
` (2 subsequent siblings)
3 siblings, 0 replies; 18+ messages in thread
From: Feng Tang @ 2025-11-13 11:10 UTC (permalink / raw)
To: Andrew Morton, Petr Mladek, Lance Yang, Jonathan Corbet, paulmck,
Steven Rostedt, linux-kernel
Cc: Feng Tang
Some sys_info names wered forgotten to change in patch iterations, while
the right names are defined in kernel/sys_info.c.
Fixes: d747755917bf ("panic: add 'panic_sys_info' sysctl to take human readable string parameter")
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
Documentation/admin-guide/sysctl/kernel.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 0065a55bc09e..a397eeccaea7 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -911,8 +911,8 @@ to 'panic_print'. Possible values are:
============= ===================================================
tasks print all tasks info
mem print system memory info
-timer print timers info
-lock print locks info if CONFIG_LOCKDEP is on
+timers print timers info
+locks print locks info if CONFIG_LOCKDEP is on
ftrace print ftrace buffer
all_bt print all CPUs backtrace (if available in the arch)
blocked_tasks print only tasks in uninterruptible (blocked) state
--
2.43.5
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-13 11:10 [PATCH v2 0/4] Enable hung_task and lockup cases to dump system info on demand Feng Tang
2025-11-13 11:10 ` [PATCH v2 1/4] docs: panic: correct some sys_ifo names in sysctl doc Feng Tang
@ 2025-11-13 11:10 ` Feng Tang
2025-11-14 15:36 ` Petr Mladek
2025-11-16 7:58 ` Lance Yang
2025-11-13 11:10 ` [PATCH v2 3/4] watchdog: add sys_info sysctls to dump sys info on system lockup Feng Tang
2025-11-13 11:10 ` [PATCH v2 4/4] sys_info: add a default kernel sys_info mask Feng Tang
3 siblings, 2 replies; 18+ messages in thread
From: Feng Tang @ 2025-11-13 11:10 UTC (permalink / raw)
To: Andrew Morton, Petr Mladek, Lance Yang, Jonathan Corbet, paulmck,
Steven Rostedt, linux-kernel
Cc: Feng Tang
When task-hung happens, developers may need different kinds of system
information (call-stacks, memory info, locks, etc.) to help debugging.
Add 'hung_task_sys_info' sysctl knob to take human readable string like
"tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
requested information will be dumped. (refer kernel/sys_info.c for more
details).
Meanwhile, the newly introduced sys_info() call is used to unify some
existing info-dumping knobs.
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
Documentation/admin-guide/sysctl/kernel.rst | 5 ++
kernel/hung_task.c | 62 +++++++++++++--------
2 files changed, 43 insertions(+), 24 deletions(-)
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index a397eeccaea7..45b4408dad31 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -422,6 +422,11 @@ the system boot.
This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
+hung_task_sys_info
+==================
+A comma separated list of extra system information to be dumped when
+hung task is detected, for example, "tasks,mem,timers,locks,...".
+Refer 'panic_sys_info' section below for more details.
hung_task_timeout_secs
======================
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 5ac0e66a1361..5b3a7785d3a2 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -24,6 +24,7 @@
#include <linux/sched/sysctl.h>
#include <linux/hung_task.h>
#include <linux/rwsem.h>
+#include <linux/sys_info.h>
#include <trace/events/sched.h>
@@ -59,12 +60,17 @@ static unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
static int __read_mostly sysctl_hung_task_warnings = 10;
static int __read_mostly did_panic;
-static bool hung_task_show_lock;
static bool hung_task_call_panic;
-static bool hung_task_show_all_bt;
static struct task_struct *watchdog_task;
+/*
+ * A bitmask to control what kinds of system info to be printed when
+ * a hung task is detected, it could be task, memory, lock etc. Refer
+ * include/linux/sys_info.h for detailed bit definition.
+ */
+static unsigned long hung_task_si_mask;
+
#ifdef CONFIG_SMP
/*
* Should we dump all CPUs backtraces in a hung task event?
@@ -217,11 +223,8 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
}
#endif
-static void check_hung_task(struct task_struct *t, unsigned long timeout,
- unsigned long prev_detect_count)
+static void check_hung_task(struct task_struct *t, unsigned long timeout)
{
- unsigned long total_hung_task;
-
if (!task_is_hung(t, timeout))
return;
@@ -231,20 +234,13 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
*/
sysctl_hung_task_detect_count++;
- total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
trace_sched_process_hang(t);
- if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
- console_verbose();
- hung_task_show_lock = true;
- hung_task_call_panic = true;
- }
-
/*
* Ok, the task did not get scheduled for more than 2 minutes,
* complain:
*/
- if (sysctl_hung_task_warnings || hung_task_call_panic) {
+ if (sysctl_hung_task_warnings) {
if (sysctl_hung_task_warnings > 0)
sysctl_hung_task_warnings--;
pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
@@ -259,10 +255,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
" disables this message.\n");
sched_show_task(t);
debug_show_blocker(t, timeout);
- hung_task_show_lock = true;
- if (sysctl_hung_task_all_cpu_backtrace)
- hung_task_show_all_bt = true;
if (!sysctl_hung_task_warnings)
pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");
}
@@ -302,8 +295,11 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
{
int max_count = sysctl_hung_task_check_count;
unsigned long last_break = jiffies;
+ unsigned long total_hung_task;
struct task_struct *g, *t;
unsigned long prev_detect_count = sysctl_hung_task_detect_count;
+ int need_warning = sysctl_hung_task_warnings;
+ unsigned long si_mask = hung_task_si_mask;
/*
* If the system crashed already then all bets are off,
@@ -312,7 +308,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
if (test_taint(TAINT_DIE) || did_panic)
return;
- hung_task_show_lock = false;
+
rcu_read_lock();
for_each_process_thread(g, t) {
@@ -324,18 +320,29 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
last_break = jiffies;
}
- check_hung_task(t, timeout, prev_detect_count);
+ check_hung_task(t, timeout);
}
unlock:
rcu_read_unlock();
- if (hung_task_show_lock)
- debug_show_all_locks();
- if (hung_task_show_all_bt) {
- hung_task_show_all_bt = false;
- trigger_all_cpu_backtrace();
+ total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
+ if (!total_hung_task)
+ return;
+
+ if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
+ console_verbose();
+ hung_task_call_panic = true;
+ }
+
+ if (need_warning || hung_task_call_panic) {
+ si_mask |= SYS_INFO_LOCKS;
+
+ if (sysctl_hung_task_all_cpu_backtrace)
+ si_mask |= SYS_INFO_ALL_BT;
}
+ sys_info(si_mask);
+
if (hung_task_call_panic)
panic("hung_task: blocked tasks");
}
@@ -434,6 +441,13 @@ static const struct ctl_table hung_task_sysctls[] = {
.mode = 0444,
.proc_handler = proc_doulongvec_minmax,
},
+ {
+ .procname = "hung_task_sys_info",
+ .data = &hung_task_si_mask,
+ .maxlen = sizeof(hung_task_si_mask),
+ .mode = 0644,
+ .proc_handler = sysctl_sys_info_handler,
+ },
};
static void __init hung_task_sysctl_init(void)
--
2.43.5
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 3/4] watchdog: add sys_info sysctls to dump sys info on system lockup
2025-11-13 11:10 [PATCH v2 0/4] Enable hung_task and lockup cases to dump system info on demand Feng Tang
2025-11-13 11:10 ` [PATCH v2 1/4] docs: panic: correct some sys_ifo names in sysctl doc Feng Tang
2025-11-13 11:10 ` [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung Feng Tang
@ 2025-11-13 11:10 ` Feng Tang
2025-11-14 15:44 ` Petr Mladek
2025-11-13 11:10 ` [PATCH v2 4/4] sys_info: add a default kernel sys_info mask Feng Tang
3 siblings, 1 reply; 18+ messages in thread
From: Feng Tang @ 2025-11-13 11:10 UTC (permalink / raw)
To: Andrew Morton, Petr Mladek, Lance Yang, Jonathan Corbet, paulmck,
Steven Rostedt, linux-kernel
Cc: Feng Tang
When soft/hard lockup happens, developers may need different kinds of
system information (call-stacks, memory info, locks, etc.) to help
debugging.
Add 'softlockup_sys_info' and 'hardlockup_sys_info' sysctl knobs to
take human readable string like "tasks,mem,timers,locks,ftrace,...",
and when system lockup happens, all requested information will be
printed out. (refer kernel/sys_info.c for more details).
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
Documentation/admin-guide/sysctl/kernel.rst | 5 +++
kernel/watchdog.c | 44 +++++++++++++++++++--
2 files changed, 46 insertions(+), 3 deletions(-)
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 45b4408dad31..176520283f1a 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -582,6 +582,11 @@ if leaking kernel pointer values to unprivileged users is a concern.
When ``kptr_restrict`` is set to 2, kernel pointers printed using
%pK will be replaced with 0s regardless of privileges.
+softlockup_sys_info & hardlockup_sys_info
+=========================================
+A comma separated list of extra system information to be dumped when
+soft/hard lockup is detected, for example, "tasks,mem,timers,locks,...".
+Refer 'panic_sys_info' section below for more details.
modprobe
========
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 659f5844393c..bbd11562e4c4 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -25,6 +25,7 @@
#include <linux/stop_machine.h>
#include <linux/sysctl.h>
#include <linux/tick.h>
+#include <linux/sys_info.h>
#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
@@ -65,6 +66,13 @@ int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
unsigned int __read_mostly hardlockup_panic =
IS_ENABLED(CONFIG_BOOTPARAM_HARDLOCKUP_PANIC);
+/*
+ * bitmasks to control what kinds of system info to be printed when
+ * hard lockup is detected, it could be task, memory, lock etc.
+ * Refer include/linux/sys_info.h for detailed bit definition.
+ */
+static unsigned long hardlockup_si_mask;
+
#ifdef CONFIG_SYSFS
static unsigned int hardlockup_count;
@@ -178,11 +186,15 @@ static void watchdog_hardlockup_kick(void)
void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
{
+ int hardlockup_all_cpu_backtrace;
+
if (per_cpu(watchdog_hardlockup_touched, cpu)) {
per_cpu(watchdog_hardlockup_touched, cpu) = false;
return;
}
+ hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
+ 1 : sysctl_hardlockup_all_cpu_backtrace;
/*
* Check for a hardlockup by making sure the CPU's timer
* interrupt is incrementing. The timer interrupt should have
@@ -205,7 +217,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
* Prevent multiple hard-lockup reports if one cpu is already
* engaged in dumping all cpu back traces.
*/
- if (sysctl_hardlockup_all_cpu_backtrace) {
+ if (hardlockup_all_cpu_backtrace) {
if (test_and_set_bit_lock(0, &hard_lockup_nmi_warn))
return;
}
@@ -234,12 +246,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
trigger_single_cpu_backtrace(cpu);
}
- if (sysctl_hardlockup_all_cpu_backtrace) {
+ if (hardlockup_all_cpu_backtrace) {
trigger_allbutcpu_cpu_backtrace(cpu);
if (!hardlockup_panic)
clear_bit_unlock(0, &hard_lockup_nmi_warn);
}
+ sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
if (hardlockup_panic)
nmi_panic(regs, "Hard LOCKUP");
@@ -330,6 +343,13 @@ static void lockup_detector_update_enable(void)
int __read_mostly sysctl_softlockup_all_cpu_backtrace;
#endif
+/*
+ * bitmasks to control what kinds of system info to be printed when
+ * soft lockup is detected, it could be task, memory, lock etc.
+ * Refer include/linux/sys_info.h for detailed bit definition.
+ */
+static unsigned long softlockup_si_mask;
+
static struct cpumask watchdog_allowed_mask __read_mostly;
/* Global variables, exported for sysctl */
@@ -746,7 +766,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
unsigned long touch_ts, period_ts, now;
struct pt_regs *regs = get_irq_regs();
int duration;
- int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
+ int softlockup_all_cpu_backtrace;
unsigned long flags;
if (!watchdog_enabled)
@@ -758,6 +778,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
if (panic_in_progress())
return HRTIMER_NORESTART;
+ softlockup_all_cpu_backtrace = (softlockup_si_mask & SYS_INFO_ALL_BT) ?
+ 1 : sysctl_softlockup_all_cpu_backtrace;
+
watchdog_hardlockup_kick();
/* kick the softlockup detector */
@@ -846,6 +869,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
}
add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
+ sys_info(softlockup_si_mask & ~SYS_INFO_ALL_BT);
if (softlockup_panic)
panic("softlockup: hung tasks");
}
@@ -1197,6 +1221,13 @@ static const struct ctl_table watchdog_sysctls[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
+ {
+ .procname = "softlockup_sys_info",
+ .data = &softlockup_si_mask,
+ .maxlen = sizeof(softlockup_si_mask),
+ .mode = 0644,
+ .proc_handler = sysctl_sys_info_handler,
+ },
#ifdef CONFIG_SMP
{
.procname = "softlockup_all_cpu_backtrace",
@@ -1219,6 +1250,13 @@ static const struct ctl_table watchdog_sysctls[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
+ {
+ .procname = "hardlockup_sys_info",
+ .data = &hardlockup_si_mask,
+ .maxlen = sizeof(hardlockup_si_mask),
+ .mode = 0644,
+ .proc_handler = sysctl_sys_info_handler,
+ },
#ifdef CONFIG_SMP
{
.procname = "hardlockup_all_cpu_backtrace",
--
2.43.5
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 4/4] sys_info: add a default kernel sys_info mask
2025-11-13 11:10 [PATCH v2 0/4] Enable hung_task and lockup cases to dump system info on demand Feng Tang
` (2 preceding siblings ...)
2025-11-13 11:10 ` [PATCH v2 3/4] watchdog: add sys_info sysctls to dump sys info on system lockup Feng Tang
@ 2025-11-13 11:10 ` Feng Tang
3 siblings, 0 replies; 18+ messages in thread
From: Feng Tang @ 2025-11-13 11:10 UTC (permalink / raw)
To: Andrew Morton, Petr Mladek, Lance Yang, Jonathan Corbet, paulmck,
Steven Rostedt, linux-kernel
Cc: Feng Tang
Which serves as a global default sys_info mask. When users want the
same system information for many error cases (panic, hung, lockup ...),
they can chose to set this global knob only once, while not setting up
each individual sys_info knobs.
This just adds a 'lazy' option, and doesn't change existing kernel
behavior as the mask is 0 by default.
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
Documentation/admin-guide/sysctl/kernel.rst | 9 ++++++
lib/sys_info.c | 31 ++++++++++++++++++++-
2 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 176520283f1a..239da22c4e28 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -521,6 +521,15 @@ default), only processes with the CAP_SYS_ADMIN capability may create
io_uring instances.
+kernel_sys_info
+===============
+A comma separated list of extra system information to be dumped when
+soft/hard lockup is detected, for example, "tasks,mem,timers,locks,...".
+Refer 'panic_sys_info' section below for more details.
+
+It serves as the default kernel control knob, which will take effect
+when a kernel module calls sys_info() with parameter==0.
+
kexec_load_disabled
===================
diff --git a/lib/sys_info.c b/lib/sys_info.c
index 323624093e54..f32a06ec9ed4 100644
--- a/lib/sys_info.c
+++ b/lib/sys_info.c
@@ -24,6 +24,13 @@ static const char * const si_names[] = {
[ilog2(SYS_INFO_BLOCKED_TASKS)] = "blocked_tasks",
};
+/*
+ * Default kernel sys_info mask.
+ * If a kernel module calls sys_info() with "parameter == 0", then
+ * this mask will be used.
+ */
+static unsigned long kernel_si_mask;
+
/* Expecting string like "xxx_sys_info=tasks,mem,timers,locks,ftrace,..." */
unsigned long sys_info_parse_param(char *str)
{
@@ -110,9 +117,26 @@ int sysctl_sys_info_handler(const struct ctl_table *ro_table, int write,
else
return sys_info_read_handler(&table, buffer, lenp, ppos, ro_table->data);
}
+
+static const struct ctl_table sys_info_sysctls[] = {
+ {
+ .procname = "kernel_sys_info",
+ .data = &kernel_si_mask,
+ .maxlen = sizeof(kernel_si_mask),
+ .mode = 0644,
+ .proc_handler = sysctl_sys_info_handler,
+ },
+};
+
+static int __init sys_info_sysctl_init(void)
+{
+ register_sysctl_init("kernel", sys_info_sysctls);
+ return 0;
+}
+subsys_initcall(sys_info_sysctl_init);
#endif
-void sys_info(unsigned long si_mask)
+static void __sys_info(unsigned long si_mask)
{
if (si_mask & SYS_INFO_TASKS)
show_state();
@@ -135,3 +159,8 @@ void sys_info(unsigned long si_mask)
if (si_mask & SYS_INFO_BLOCKED_TASKS)
show_state_filter(TASK_UNINTERRUPTIBLE);
}
+
+void sys_info(unsigned long si_mask)
+{
+ __sys_info(si_mask ? : kernel_si_mask);
+}
--
2.43.5
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-13 11:10 ` [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung Feng Tang
@ 2025-11-14 15:36 ` Petr Mladek
2025-11-16 7:16 ` Feng Tang
2025-11-16 7:58 ` Lance Yang
1 sibling, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2025-11-14 15:36 UTC (permalink / raw)
To: Feng Tang
Cc: Andrew Morton, Lance Yang, Jonathan Corbet, paulmck,
Steven Rostedt, linux-kernel
On Thu 2025-11-13 19:10:37, Feng Tang wrote:
> When task-hung happens, developers may need different kinds of system
> information (call-stacks, memory info, locks, etc.) to help debugging.
>
> Add 'hung_task_sys_info' sysctl knob to take human readable string like
> "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
> requested information will be dumped. (refer kernel/sys_info.c for more
> details).
>
> Meanwhile, the newly introduced sys_info() call is used to unify some
> existing info-dumping knobs.
>
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
It would have been better to split the refactoring (moving some logic
from check_hung_task()) into a separate patch.
But the result looks good. Feel free to use:
Reviewed-by: Petr Mladek <pmladek@suse.com>
Best Regards,
Petr
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] watchdog: add sys_info sysctls to dump sys info on system lockup
2025-11-13 11:10 ` [PATCH v2 3/4] watchdog: add sys_info sysctls to dump sys info on system lockup Feng Tang
@ 2025-11-14 15:44 ` Petr Mladek
0 siblings, 0 replies; 18+ messages in thread
From: Petr Mladek @ 2025-11-14 15:44 UTC (permalink / raw)
To: Feng Tang
Cc: Andrew Morton, Lance Yang, Jonathan Corbet, paulmck,
Steven Rostedt, linux-kernel
On Thu 2025-11-13 19:10:38, Feng Tang wrote:
> When soft/hard lockup happens, developers may need different kinds of
> system information (call-stacks, memory info, locks, etc.) to help
> debugging.
>
> Add 'softlockup_sys_info' and 'hardlockup_sys_info' sysctl knobs to
> take human readable string like "tasks,mem,timers,locks,ftrace,...",
> and when system lockup happens, all requested information will be
> printed out. (refer kernel/sys_info.c for more details).
>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Looks good to me. Feel free to use:
Reviewed-by: Petr Mladek <pmladek@suse.com>
Best Regards,
Petr
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-14 15:36 ` Petr Mladek
@ 2025-11-16 7:16 ` Feng Tang
0 siblings, 0 replies; 18+ messages in thread
From: Feng Tang @ 2025-11-16 7:16 UTC (permalink / raw)
To: Petr Mladek
Cc: Andrew Morton, Lance Yang, Jonathan Corbet, paulmck,
Steven Rostedt, linux-kernel
On Fri, Nov 14, 2025 at 04:36:39PM +0100, Petr Mladek wrote:
> On Thu 2025-11-13 19:10:37, Feng Tang wrote:
> > When task-hung happens, developers may need different kinds of system
> > information (call-stacks, memory info, locks, etc.) to help debugging.
> >
> > Add 'hung_task_sys_info' sysctl knob to take human readable string like
> > "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
> > requested information will be dumped. (refer kernel/sys_info.c for more
> > details).
> >
> > Meanwhile, the newly introduced sys_info() call is used to unify some
> > existing info-dumping knobs.
> >
> > Suggested-by: Petr Mladek <pmladek@suse.com>
> > Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
>
> It would have been better to split the refactoring (moving some logic
> from check_hung_task()) into a separate patch.
Yes, it would be cleaner to have a functional patch and a clenup one.
Will pay more attention in the future.
> But the result looks good. Feel free to use:
>
> Reviewed-by: Petr Mladek <pmladek@suse.com>
Thank you!
- Feng
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-13 11:10 ` [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung Feng Tang
2025-11-14 15:36 ` Petr Mladek
@ 2025-11-16 7:58 ` Lance Yang
2025-11-16 9:11 ` Feng Tang
1 sibling, 1 reply; 18+ messages in thread
From: Lance Yang @ 2025-11-16 7:58 UTC (permalink / raw)
To: Feng Tang
Cc: Petr Mladek, Andrew Morton, Steven Rostedt, Lance Yang,
linux-kernel, Jonathan Corbet, paulmck, lirongqing, leonylgao
On 2025/11/13 19:10, Feng Tang wrote:
> When task-hung happens, developers may need different kinds of system
> information (call-stacks, memory info, locks, etc.) to help debugging.
>
> Add 'hung_task_sys_info' sysctl knob to take human readable string like
> "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
> requested information will be dumped. (refer kernel/sys_info.c for more
> details).
>
> Meanwhile, the newly introduced sys_info() call is used to unify some
> existing info-dumping knobs.
>
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> ---
> Documentation/admin-guide/sysctl/kernel.rst | 5 ++
> kernel/hung_task.c | 62 +++++++++++++--------
> 2 files changed, 43 insertions(+), 24 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index a397eeccaea7..45b4408dad31 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
[...]
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 5ac0e66a1361..5b3a7785d3a2 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -24,6 +24,7 @@
> #include <linux/sched/sysctl.h>
> #include <linux/hung_task.h>
> #include <linux/rwsem.h>
> +#include <linux/sys_info.h>
>
> #include <trace/events/sched.h>
>
> @@ -59,12 +60,17 @@ static unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
> static int __read_mostly sysctl_hung_task_warnings = 10;
>
> static int __read_mostly did_panic;
> -static bool hung_task_show_lock;
> static bool hung_task_call_panic;
> -static bool hung_task_show_all_bt;
>
> static struct task_struct *watchdog_task;
>
> +/*
> + * A bitmask to control what kinds of system info to be printed when
> + * a hung task is detected, it could be task, memory, lock etc. Refer
> + * include/linux/sys_info.h for detailed bit definition.
> + */
> +static unsigned long hung_task_si_mask;
> +
> #ifdef CONFIG_SMP
> /*
> * Should we dump all CPUs backtraces in a hung task event?
> @@ -217,11 +223,8 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
> }
> #endif
>
> -static void check_hung_task(struct task_struct *t, unsigned long timeout,
> - unsigned long prev_detect_count)
> +static void check_hung_task(struct task_struct *t, unsigned long timeout)
> {
> - unsigned long total_hung_task;
> -
> if (!task_is_hung(t, timeout))
> return;
>
> @@ -231,20 +234,13 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
> */
> sysctl_hung_task_detect_count++;
>
> - total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
> trace_sched_process_hang(t);
>
> - if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> - console_verbose();
> - hung_task_show_lock = true;
> - hung_task_call_panic = true;
> - }
> -
> /*
> * Ok, the task did not get scheduled for more than 2 minutes,
> * complain:
> */
> - if (sysctl_hung_task_warnings || hung_task_call_panic) {
> + if (sysctl_hung_task_warnings) {
It seems like the behavior changes when sysctl_hung_task_warnings is
0 but a panic is about to be triggered ...
Looking at the history:
1) Commit ("hung_task: ignore hung_task_warnings when hung_task_panic
is enabled")[1] ensured that hung task information is always dumped
when a panic is configured, even if the warning counter is exhausted.
2) Later, commit ("hung_task: panic when there are more than N hung
tasks at the same time")[2] refined the logic to trigger a panic based
on the number of hung tasks found in a single scan.
To stay consistent with the established behavior, I think we should
continue to dump the information for hung tasks as long as
sysctl_hung_task_panic is enabled :)
[1] https://lore.kernel.org/all/20240613033159.3446265-1-leonylgao@gmail.com
[2] https://lore.kernel.org/all/20251015063615.2632-1-lirongqing@baidu.com
[...]
Cheers,
Lance
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-16 7:58 ` Lance Yang
@ 2025-11-16 9:11 ` Feng Tang
2025-11-16 13:22 ` Lance Yang
0 siblings, 1 reply; 18+ messages in thread
From: Feng Tang @ 2025-11-16 9:11 UTC (permalink / raw)
To: Lance Yang
Cc: Petr Mladek, Andrew Morton, Steven Rostedt, Lance Yang,
linux-kernel, Jonathan Corbet, paulmck, lirongqing, leonylgao
On Sun, Nov 16, 2025 at 03:58:32PM +0800, Lance Yang wrote:
>
>
> On 2025/11/13 19:10, Feng Tang wrote:
> > When task-hung happens, developers may need different kinds of system
> > information (call-stacks, memory info, locks, etc.) to help debugging.
> >
> > Add 'hung_task_sys_info' sysctl knob to take human readable string like
> > "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
> > requested information will be dumped. (refer kernel/sys_info.c for more
> > details).
> >
> > Meanwhile, the newly introduced sys_info() call is used to unify some
> > existing info-dumping knobs.
> >
> > Suggested-by: Petr Mladek <pmladek@suse.com>
> > Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> > ---
> > Documentation/admin-guide/sysctl/kernel.rst | 5 ++
> > kernel/hung_task.c | 62 +++++++++++++--------
> > 2 files changed, 43 insertions(+), 24 deletions(-)
> > * Ok, the task did not get scheduled for more than 2 minutes,
> > * complain:
> > */
> > - if (sysctl_hung_task_warnings || hung_task_call_panic) {
> > + if (sysctl_hung_task_warnings) {
>
> It seems like the behavior changes when sysctl_hung_task_warnings is
> 0 but a panic is about to be triggered ...
>
> Looking at the history:
>
> 1) Commit ("hung_task: ignore hung_task_warnings when hung_task_panic
> is enabled")[1] ensured that hung task information is always dumped
> when a panic is configured, even if the warning counter is exhausted.
>
> 2) Later, commit ("hung_task: panic when there are more than N hung
> tasks at the same time")[2] refined the logic to trigger a panic based
> on the number of hung tasks found in a single scan.
>
> To stay consistent with the established behavior, I think we should
> continue to dump the information for hung tasks as long as
> sysctl_hung_task_panic is enabled :)
>
> [1] https://lore.kernel.org/all/20240613033159.3446265-1-leonylgao@gmail.com
> [2] https://lore.kernel.org/all/20251015063615.2632-1-lirongqing@baidu.com
> [...]
Aha, Petr asked similar question during his review. Thanks for the catch!
How about following fixup patch to restore that part of logic?
Thanks,
Feng
---
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 5b3a7785d3a2..d2254c91450b 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -223,8 +223,11 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
}
#endif
-static void check_hung_task(struct task_struct *t, unsigned long timeout)
+static void check_hung_task(struct task_struct *t, unsigned long timeout,
+ unsigned long prev_detect_count)
{
+ unsigned long total_hung_task;
+
if (!task_is_hung(t, timeout))
return;
@@ -234,13 +237,19 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
*/
sysctl_hung_task_detect_count++;
+ total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
trace_sched_process_hang(t);
+ if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
+ console_verbose();
+ hung_task_call_panic = true;
+ }
+
/*
* Ok, the task did not get scheduled for more than 2 minutes,
* complain:
*/
- if (sysctl_hung_task_warnings) {
+ if (sysctl_hung_task_warnings || hung_task_call_panic) {
if (sysctl_hung_task_warnings > 0)
sysctl_hung_task_warnings--;
pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
@@ -295,7 +304,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
{
int max_count = sysctl_hung_task_check_count;
unsigned long last_break = jiffies;
- unsigned long total_hung_task;
struct task_struct *g, *t;
unsigned long prev_detect_count = sysctl_hung_task_detect_count;
int need_warning = sysctl_hung_task_warnings;
@@ -320,20 +328,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
last_break = jiffies;
}
- check_hung_task(t, timeout);
+ check_hung_task(t, timeout, prev_detect_count);
}
unlock:
rcu_read_unlock();
- total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
- if (!total_hung_task)
+ if (!(sysctl_hung_task_detect_count - prev_detect_count))
return;
- if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
- console_verbose();
- hung_task_call_panic = true;
- }
-
if (need_warning || hung_task_call_panic) {
si_mask |= SYS_INFO_LOCKS;
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-16 9:11 ` Feng Tang
@ 2025-11-16 13:22 ` Lance Yang
2025-11-16 14:13 ` Feng Tang
0 siblings, 1 reply; 18+ messages in thread
From: Lance Yang @ 2025-11-16 13:22 UTC (permalink / raw)
To: Feng Tang
Cc: Petr Mladek, Andrew Morton, Steven Rostedt, Lance Yang,
linux-kernel, Jonathan Corbet, paulmck, lirongqing, leonylgao
On 2025/11/16 17:11, Feng Tang wrote:
> On Sun, Nov 16, 2025 at 03:58:32PM +0800, Lance Yang wrote:
>>
>>
>> On 2025/11/13 19:10, Feng Tang wrote:
>>> When task-hung happens, developers may need different kinds of system
>>> information (call-stacks, memory info, locks, etc.) to help debugging.
>>>
>>> Add 'hung_task_sys_info' sysctl knob to take human readable string like
>>> "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
>>> requested information will be dumped. (refer kernel/sys_info.c for more
>>> details).
>>>
>>> Meanwhile, the newly introduced sys_info() call is used to unify some
>>> existing info-dumping knobs.
>>>
>>> Suggested-by: Petr Mladek <pmladek@suse.com>
>>> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
>>> ---
>>> Documentation/admin-guide/sysctl/kernel.rst | 5 ++
>>> kernel/hung_task.c | 62 +++++++++++++--------
>>> 2 files changed, 43 insertions(+), 24 deletions(-)
>>> * Ok, the task did not get scheduled for more than 2 minutes,
>>> * complain:
>>> */
>>> - if (sysctl_hung_task_warnings || hung_task_call_panic) {
>>> + if (sysctl_hung_task_warnings) {
>>
>> It seems like the behavior changes when sysctl_hung_task_warnings is
>> 0 but a panic is about to be triggered ...
>>
>> Looking at the history:
>>
>> 1) Commit ("hung_task: ignore hung_task_warnings when hung_task_panic
>> is enabled")[1] ensured that hung task information is always dumped
>> when a panic is configured, even if the warning counter is exhausted.
>>
>> 2) Later, commit ("hung_task: panic when there are more than N hung
>> tasks at the same time")[2] refined the logic to trigger a panic based
>> on the number of hung tasks found in a single scan.
>>
>> To stay consistent with the established behavior, I think we should
>> continue to dump the information for hung tasks as long as
>> sysctl_hung_task_panic is enabled :)
>>
>> [1] https://lore.kernel.org/all/20240613033159.3446265-1-leonylgao@gmail.com
>> [2] https://lore.kernel.org/all/20251015063615.2632-1-lirongqing@baidu.com
>> [...]
>
> Aha, Petr asked similar question during his review. Thanks for the catch!
>
> How about following fixup patch to restore that part of logic?
>
> Thanks,
> Feng
>
> ---
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 5b3a7785d3a2..d2254c91450b 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -223,8 +223,11 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
> }
> #endif
>
> -static void check_hung_task(struct task_struct *t, unsigned long timeout)
> +static void check_hung_task(struct task_struct *t, unsigned long timeout,
> + unsigned long prev_detect_count)
> {
> + unsigned long total_hung_task;
> +
> if (!task_is_hung(t, timeout))
> return;
>
> @@ -234,13 +237,19 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> */
> sysctl_hung_task_detect_count++;
>
> + total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
> trace_sched_process_hang(t);
>
> + if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> + console_verbose();
> + hung_task_call_panic = true;
> + }
> +
> /*
> * Ok, the task did not get scheduled for more than 2 minutes,
> * complain:
> */
> - if (sysctl_hung_task_warnings) {
> + if (sysctl_hung_task_warnings || hung_task_call_panic) {
> if (sysctl_hung_task_warnings > 0)
> sysctl_hung_task_warnings--;
> pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> @@ -295,7 +304,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> {
> int max_count = sysctl_hung_task_check_count;
> unsigned long last_break = jiffies;
> - unsigned long total_hung_task;
> struct task_struct *g, *t;
> unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> int need_warning = sysctl_hung_task_warnings;
> @@ -320,20 +328,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> last_break = jiffies;
> }
>
> - check_hung_task(t, timeout);
> + check_hung_task(t, timeout, prev_detect_count);
> }
> unlock:
> rcu_read_unlock();
>
> - total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
> - if (!total_hung_task)
> + if (!(sysctl_hung_task_detect_count - prev_detect_count))
> return;
>
> - if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> - console_verbose();
> - hung_task_call_panic = true;
> - }
> -
> if (need_warning || hung_task_call_panic) {
> si_mask |= SYS_INFO_LOCKS;
Looks good to me now! I assume v3 would be expected, can you
post a new version?
Cheers,
Lance
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-16 13:22 ` Lance Yang
@ 2025-11-16 14:13 ` Feng Tang
2025-11-17 17:53 ` Andrew Morton
0 siblings, 1 reply; 18+ messages in thread
From: Feng Tang @ 2025-11-16 14:13 UTC (permalink / raw)
To: Lance Yang, Andrew Morton
Cc: Petr Mladek, Steven Rostedt, Lance Yang, linux-kernel,
Jonathan Corbet, paulmck, lirongqing, leonylgao
On Sun, Nov 16, 2025 at 09:22:43PM +0800, Lance Yang wrote:
> > > Looking at the history:
> > >
> > > 1) Commit ("hung_task: ignore hung_task_warnings when hung_task_panic
> > > is enabled")[1] ensured that hung task information is always dumped
> > > when a panic is configured, even if the warning counter is exhausted.
> > >
> > > 2) Later, commit ("hung_task: panic when there are more than N hung
> > > tasks at the same time")[2] refined the logic to trigger a panic based
> > > on the number of hung tasks found in a single scan.
> > >
> > > To stay consistent with the established behavior, I think we should
> > > continue to dump the information for hung tasks as long as
> > > sysctl_hung_task_panic is enabled :)
> > >
> > > [1] https://lore.kernel.org/all/20240613033159.3446265-1-leonylgao@gmail.com
> > > [2] https://lore.kernel.org/all/20251015063615.2632-1-lirongqing@baidu.com
> > > [...]
> >
> > Aha, Petr asked similar question during his review. Thanks for the catch!
> >
> > How about following fixup patch to restore that part of logic?
> >
> > Thanks,
> > Feng
> >
> > ---
> > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > index 5b3a7785d3a2..d2254c91450b 100644
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -223,8 +223,11 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
> > }
> > #endif
> > -static void check_hung_task(struct task_struct *t, unsigned long timeout)
> > +static void check_hung_task(struct task_struct *t, unsigned long timeout,
> > + unsigned long prev_detect_count)
> > {
> > + unsigned long total_hung_task;
> > +
> > if (!task_is_hung(t, timeout))
> > return;
> > @@ -234,13 +237,19 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> > */
> > sysctl_hung_task_detect_count++;
> > + total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
> > trace_sched_process_hang(t);
> > + if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> > + console_verbose();
> > + hung_task_call_panic = true;
> > + }
> > +
> > /*
> > * Ok, the task did not get scheduled for more than 2 minutes,
> > * complain:
> > */
> > - if (sysctl_hung_task_warnings) {
> > + if (sysctl_hung_task_warnings || hung_task_call_panic) {
> > if (sysctl_hung_task_warnings > 0)
> > sysctl_hung_task_warnings--;
> > pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> > @@ -295,7 +304,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> > {
> > int max_count = sysctl_hung_task_check_count;
> > unsigned long last_break = jiffies;
> > - unsigned long total_hung_task;
> > struct task_struct *g, *t;
> > unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> > int need_warning = sysctl_hung_task_warnings;
> > @@ -320,20 +328,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> > last_break = jiffies;
> > }
> > - check_hung_task(t, timeout);
> > + check_hung_task(t, timeout, prev_detect_count);
> > }
> > unlock:
> > rcu_read_unlock();
> > - total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
> > - if (!total_hung_task)
> > + if (!(sysctl_hung_task_detect_count - prev_detect_count))
> > return;
> > - if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> > - console_verbose();
> > - hung_task_call_panic = true;
> > - }
> > -
> > if (need_warning || hung_task_call_panic) {
> > si_mask |= SYS_INFO_LOCKS;
>
> Looks good to me now! I assume v3 would be expected, can you
> post a new version?
Andrew has taken the patchset to -mm tree.
Andrew, which way do you prefer? I send a v3 patch for hung-task or you
pickup the fixup patch and squash it into the orginal 0002 patch?
Anyway, I make a squshed version v3 patch below.
Thanks,
Feng
---
From f90a60dae2440c89da7151fb1ddac022b872fb69 Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang@linux.alibaba.com>
Date: Wed, 5 Nov 2025 19:30:36 +0800
Subject: [PATCH v3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
When task-hung happens, developers may need different kinds of system
information (call-stacks, memory info, locks, etc.) to help debugging.
Add 'hung_task_sys_info' sysctl knob to take human readable string like
"tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
requested information will be dumped. (refer kernel/sys_info.c for more
details).
Meanwhile, the newly introduced sys_info() call is used to unify some
existing info-dumping knobs.
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
Changelog:
v3:
* restore hung_task_call_panic logic (Lance)
v2:
* code cleanup for si_mask setup (Petr)
Documentation/admin-guide/sysctl/kernel.rst | 5 +++
kernel/hung_task.c | 40 ++++++++++++++-------
2 files changed, 33 insertions(+), 12 deletions(-)
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index a397eeccaea7..45b4408dad31 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -422,6 +422,11 @@ the system boot.
This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
+hung_task_sys_info
+==================
+A comma separated list of extra system information to be dumped when
+hung task is detected, for example, "tasks,mem,timers,locks,...".
+Refer 'panic_sys_info' section below for more details.
hung_task_timeout_secs
======================
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 5ac0e66a1361..d2254c91450b 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -24,6 +24,7 @@
#include <linux/sched/sysctl.h>
#include <linux/hung_task.h>
#include <linux/rwsem.h>
+#include <linux/sys_info.h>
#include <trace/events/sched.h>
@@ -59,12 +60,17 @@ static unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
static int __read_mostly sysctl_hung_task_warnings = 10;
static int __read_mostly did_panic;
-static bool hung_task_show_lock;
static bool hung_task_call_panic;
-static bool hung_task_show_all_bt;
static struct task_struct *watchdog_task;
+/*
+ * A bitmask to control what kinds of system info to be printed when
+ * a hung task is detected, it could be task, memory, lock etc. Refer
+ * include/linux/sys_info.h for detailed bit definition.
+ */
+static unsigned long hung_task_si_mask;
+
#ifdef CONFIG_SMP
/*
* Should we dump all CPUs backtraces in a hung task event?
@@ -236,7 +242,6 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
console_verbose();
- hung_task_show_lock = true;
hung_task_call_panic = true;
}
@@ -259,10 +264,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
" disables this message.\n");
sched_show_task(t);
debug_show_blocker(t, timeout);
- hung_task_show_lock = true;
- if (sysctl_hung_task_all_cpu_backtrace)
- hung_task_show_all_bt = true;
if (!sysctl_hung_task_warnings)
pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");
}
@@ -304,6 +306,8 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
unsigned long last_break = jiffies;
struct task_struct *g, *t;
unsigned long prev_detect_count = sysctl_hung_task_detect_count;
+ int need_warning = sysctl_hung_task_warnings;
+ unsigned long si_mask = hung_task_si_mask;
/*
* If the system crashed already then all bets are off,
@@ -312,7 +316,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
if (test_taint(TAINT_DIE) || did_panic)
return;
- hung_task_show_lock = false;
+
rcu_read_lock();
for_each_process_thread(g, t) {
@@ -328,14 +332,19 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
}
unlock:
rcu_read_unlock();
- if (hung_task_show_lock)
- debug_show_all_locks();
- if (hung_task_show_all_bt) {
- hung_task_show_all_bt = false;
- trigger_all_cpu_backtrace();
+ if (!(sysctl_hung_task_detect_count - prev_detect_count))
+ return;
+
+ if (need_warning || hung_task_call_panic) {
+ si_mask |= SYS_INFO_LOCKS;
+
+ if (sysctl_hung_task_all_cpu_backtrace)
+ si_mask |= SYS_INFO_ALL_BT;
}
+ sys_info(si_mask);
+
if (hung_task_call_panic)
panic("hung_task: blocked tasks");
}
@@ -434,6 +443,13 @@ static const struct ctl_table hung_task_sysctls[] = {
.mode = 0444,
.proc_handler = proc_doulongvec_minmax,
},
+ {
+ .procname = "hung_task_sys_info",
+ .data = &hung_task_si_mask,
+ .maxlen = sizeof(hung_task_si_mask),
+ .mode = 0644,
+ .proc_handler = sysctl_sys_info_handler,
+ },
};
static void __init hung_task_sysctl_init(void)
--
2.43.5
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-16 14:13 ` Feng Tang
@ 2025-11-17 17:53 ` Andrew Morton
2025-11-18 2:26 ` Feng Tang
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Andrew Morton @ 2025-11-17 17:53 UTC (permalink / raw)
To: Feng Tang
Cc: Lance Yang, Petr Mladek, Steven Rostedt, Lance Yang, linux-kernel,
Jonathan Corbet, paulmck, lirongqing, leonylgao
On Sun, 16 Nov 2025 22:13:58 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
> > > if (need_warning || hung_task_call_panic) {
> > > si_mask |= SYS_INFO_LOCKS;
> >
> > Looks good to me now! I assume v3 would be expected, can you
> > post a new version?
>
> Andrew has taken the patchset to -mm tree.
>
> Andrew, which way do you prefer? I send a v3 patch for hung-task or you
> pickup the fixup patch and squash it into the orginal 0002 patch?
>
> Anyway, I make a squshed version v3 patch below.
I prefer little fixup patches, generally. So people can see what
changed and don't feel they should re-review everything.
I queued the below, thanks.
From: Feng Tang <feng.tang@linux.alibaba.com>
Subject: hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
Date: Wed, 5 Nov 2025 19:30:36 +0800
maintain consistecy established behavior, per Lance and Petr
Link: https://lkml.kernel.org/r/aRncJo1mA5Zk77Hr@U-2FWC9VHC-2323.local
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
kernel/hung_task.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
--- a/kernel/hung_task.c~hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
+++ a/kernel/hung_task.c
@@ -223,8 +223,11 @@ static inline void debug_show_blocker(st
}
#endif
-static void check_hung_task(struct task_struct *t, unsigned long timeout)
+static void check_hung_task(struct task_struct *t, unsigned long timeout,
+ unsigned long prev_detect_count)
{
+ unsigned long total_hung_task;
+
if (!task_is_hung(t, timeout))
return;
@@ -234,13 +237,19 @@ static void check_hung_task(struct task_
*/
sysctl_hung_task_detect_count++;
+ total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
trace_sched_process_hang(t);
+ if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
+ console_verbose();
+ hung_task_call_panic = true;
+ }
+
/*
* Ok, the task did not get scheduled for more than 2 minutes,
* complain:
*/
- if (sysctl_hung_task_warnings) {
+ if (sysctl_hung_task_warnings || hung_task_call_panic) {
if (sysctl_hung_task_warnings > 0)
sysctl_hung_task_warnings--;
pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
@@ -295,7 +304,6 @@ static void check_hung_uninterruptible_t
{
int max_count = sysctl_hung_task_check_count;
unsigned long last_break = jiffies;
- unsigned long total_hung_task;
struct task_struct *g, *t;
unsigned long prev_detect_count = sysctl_hung_task_detect_count;
int need_warning = sysctl_hung_task_warnings;
@@ -320,20 +328,14 @@ static void check_hung_uninterruptible_t
last_break = jiffies;
}
- check_hung_task(t, timeout);
+ check_hung_task(t, timeout, prev_detect_count);
}
unlock:
rcu_read_unlock();
- total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
- if (!total_hung_task)
+ if (!(sysctl_hung_task_detect_count - prev_detect_count))
return;
- if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
- console_verbose();
- hung_task_call_panic = true;
- }
-
if (need_warning || hung_task_call_panic) {
si_mask |= SYS_INFO_LOCKS;
_
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-17 17:53 ` Andrew Morton
@ 2025-11-18 2:26 ` Feng Tang
2025-11-18 6:06 ` Lance Yang
2025-11-18 15:20 ` Petr Mladek
2 siblings, 0 replies; 18+ messages in thread
From: Feng Tang @ 2025-11-18 2:26 UTC (permalink / raw)
To: Andrew Morton
Cc: Lance Yang, Petr Mladek, Steven Rostedt, Lance Yang, linux-kernel,
Jonathan Corbet, paulmck, lirongqing, leonylgao
On Mon, Nov 17, 2025 at 09:53:52AM -0800, Andrew Morton wrote:
> On Sun, 16 Nov 2025 22:13:58 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
>
> > > > if (need_warning || hung_task_call_panic) {
> > > > si_mask |= SYS_INFO_LOCKS;
> > >
> > > Looks good to me now! I assume v3 would be expected, can you
> > > post a new version?
> >
> > Andrew has taken the patchset to -mm tree.
> >
> > Andrew, which way do you prefer? I send a v3 patch for hung-task or you
> > pickup the fixup patch and squash it into the orginal 0002 patch?
> >
> > Anyway, I make a squshed version v3 patch below.
>
> I prefer little fixup patches, generally. So people can see what
> changed and don't feel they should re-review everything.
I see now.
> I queued the below, thanks.
Thank you! I just run some tests with latest mm tree and they all passed.
- Feng
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-17 17:53 ` Andrew Morton
2025-11-18 2:26 ` Feng Tang
@ 2025-11-18 6:06 ` Lance Yang
2025-11-18 15:20 ` Petr Mladek
2 siblings, 0 replies; 18+ messages in thread
From: Lance Yang @ 2025-11-18 6:06 UTC (permalink / raw)
To: Andrew Morton, Feng Tang
Cc: Petr Mladek, Steven Rostedt, Lance Yang, linux-kernel,
Jonathan Corbet, paulmck, lirongqing, leonylgao
On 2025/11/18 01:53, Andrew Morton wrote:
> On Sun, 16 Nov 2025 22:13:58 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
>
>>>> if (need_warning || hung_task_call_panic) {
>>>> si_mask |= SYS_INFO_LOCKS;
>>>
>>> Looks good to me now! I assume v3 would be expected, can you
>>> post a new version?
>>
>> Andrew has taken the patchset to -mm tree.
>>
>> Andrew, which way do you prefer? I send a v3 patch for hung-task or you
>> pickup the fixup patch and squash it into the orginal 0002 patch?
>>
>> Anyway, I make a squshed version v3 patch below.
>
> I prefer little fixup patches, generally. So people can see what
> changed and don't feel they should re-review everything.
>
> I queued the below, thanks.
Thanks!
>
>
> From: Feng Tang <feng.tang@linux.alibaba.com>
> Subject: hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
> Date: Wed, 5 Nov 2025 19:30:36 +0800
>
> maintain consistecy established behavior, per Lance and Petr
>
> Link: https://lkml.kernel.org/r/aRncJo1mA5Zk77Hr@U-2FWC9VHC-2323.local
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Lance Yang <ioworker0@gmail.com>
> Cc: "Paul E . McKenney" <paulmck@kernel.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
With this fix, #02 patch looks good to me!
Reviewed-by: Lance Yang <lance.yang@linux.dev>
>
> kernel/hung_task.c | 24 +++++++++++++-----------
> 1 file changed, 13 insertions(+), 11 deletions(-)
>
> --- a/kernel/hung_task.c~hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
> +++ a/kernel/hung_task.c
> @@ -223,8 +223,11 @@ static inline void debug_show_blocker(st
> }
> #endif
>
> -static void check_hung_task(struct task_struct *t, unsigned long timeout)
> +static void check_hung_task(struct task_struct *t, unsigned long timeout,
> + unsigned long prev_detect_count)
> {
> + unsigned long total_hung_task;
> +
> if (!task_is_hung(t, timeout))
> return;
>
> @@ -234,13 +237,19 @@ static void check_hung_task(struct task_
> */
> sysctl_hung_task_detect_count++;
>
> + total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
> trace_sched_process_hang(t);
>
> + if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> + console_verbose();
> + hung_task_call_panic = true;
> + }
> +
> /*
> * Ok, the task did not get scheduled for more than 2 minutes,
> * complain:
> */
> - if (sysctl_hung_task_warnings) {
> + if (sysctl_hung_task_warnings || hung_task_call_panic) {
> if (sysctl_hung_task_warnings > 0)
> sysctl_hung_task_warnings--;
> pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> @@ -295,7 +304,6 @@ static void check_hung_uninterruptible_t
> {
> int max_count = sysctl_hung_task_check_count;
> unsigned long last_break = jiffies;
> - unsigned long total_hung_task;
> struct task_struct *g, *t;
> unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> int need_warning = sysctl_hung_task_warnings;
> @@ -320,20 +328,14 @@ static void check_hung_uninterruptible_t
> last_break = jiffies;
> }
>
> - check_hung_task(t, timeout);
> + check_hung_task(t, timeout, prev_detect_count);
> }
> unlock:
> rcu_read_unlock();
>
> - total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
> - if (!total_hung_task)
> + if (!(sysctl_hung_task_detect_count - prev_detect_count))
> return;
>
> - if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> - console_verbose();
> - hung_task_call_panic = true;
> - }
> -
> if (need_warning || hung_task_call_panic) {
> si_mask |= SYS_INFO_LOCKS;
>
> _
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-17 17:53 ` Andrew Morton
2025-11-18 2:26 ` Feng Tang
2025-11-18 6:06 ` Lance Yang
@ 2025-11-18 15:20 ` Petr Mladek
2025-11-18 17:57 ` Lance Yang
2 siblings, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2025-11-18 15:20 UTC (permalink / raw)
To: Andrew Morton
Cc: Feng Tang, Lance Yang, Steven Rostedt, Lance Yang, linux-kernel,
Jonathan Corbet, paulmck, lirongqing, leonylgao
On Mon 2025-11-17 09:53:52, Andrew Morton wrote:
> On Sun, 16 Nov 2025 22:13:58 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
>
> > > > if (need_warning || hung_task_call_panic) {
> > > > si_mask |= SYS_INFO_LOCKS;
> > >
> > > Looks good to me now! I assume v3 would be expected, can you
> > > post a new version?
> >
> > Andrew has taken the patchset to -mm tree.
> >
> > Andrew, which way do you prefer? I send a v3 patch for hung-task or you
> > pickup the fixup patch and squash it into the orginal 0002 patch?
> >
> > Anyway, I make a squshed version v3 patch below.
>
> I prefer little fixup patches, generally. So people can see what
> changed and don't feel they should re-review everything.
>
> I queued the below, thanks.
>
> From: Feng Tang <feng.tang@linux.alibaba.com>
> Subject: hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
> Date: Wed, 5 Nov 2025 19:30:36 +0800
>
> maintain consistecy established behavior, per Lance and Petr
>
> Link: https://lkml.kernel.org/r/aRncJo1mA5Zk77Hr@U-2FWC9VHC-2323.local
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Lance Yang <ioworker0@gmail.com>
> Cc: "Paul E . McKenney" <paulmck@kernel.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Thanks a lot for catching and fixing the regression caused
by this patchset. The patch looks good.
See a comment below.
> --- a/kernel/hung_task.c~hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
> +++ a/kernel/hung_task.c
> @@ -223,8 +223,11 @@ static inline void debug_show_blocker(st
> }
> #endif
>
> -static void check_hung_task(struct task_struct *t, unsigned long timeout)
> +static void check_hung_task(struct task_struct *t, unsigned long timeout,
> + unsigned long prev_detect_count)
> {
> + unsigned long total_hung_task;
> +
> if (!task_is_hung(t, timeout))
> return;
>
> @@ -234,13 +237,19 @@ static void check_hung_task(struct task_
> */
> sysctl_hung_task_detect_count++;
>
> + total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
> trace_sched_process_hang(t);
>
> + if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> + console_verbose();
> + hung_task_call_panic = true;
> + }
> +
> /*
> * Ok, the task did not get scheduled for more than 2 minutes,
> * complain:
> */
> - if (sysctl_hung_task_warnings) {
> + if (sysctl_hung_task_warnings || hung_task_call_panic) {
> if (sysctl_hung_task_warnings > 0)
> sysctl_hung_task_warnings--;
> pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
This restores the behavior after the commit 9544f9e6947f6508
("hung_task: panic when there are more than N hung tasks at
the same time"). It is better than nothing.
Well, the behavior is still not ideal. It would be better when
we printed backtraces from _all_ "hung" tasks before panicking.
But it prints the backtraces only when sysctl_hung_task_panic
limit is reached.
I mean, for example, let's have:
+ sysctl_hung_task_warnings = 2;
+ sysctl_hung_task_panic = 5;
+ and detect 6 hung tasks.
The code will report 1st and 2nd hung tasks. It will skip 3rd and 4th
because sysctl_hung_task_warnings reached 0. It will report 5th and
6th tasks because (total_hung_task >= 5).
It is better than nothing. But it might be confusing.
I am not sure how to fix it. A minimalist solution would be to print
a warning. Something like:
if (sysctl_hung_task_panic > 1 &&
(total_hung_task == sysctl_hung_task_panic) &&
!sysctl_hung_task_warnings) {
pr_err("INFO: %d blocked tasks might have been skipped because reached hung_task_warnings limit\n",
sysctl_hung_task_panic - 1);
Or we could print the "total_hung_task" counter somewhere, for
example,
pr_err("INFO[%lu]: task %s:%d blocked for more than %ld seconds.\n",
total_hung_task, ...
Or we could restart the for_each_process_thread() cycle and make sure
that all hung tasks will get reported.
Or we could ignore it until anyone complains.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-18 15:20 ` Petr Mladek
@ 2025-11-18 17:57 ` Lance Yang
2025-11-19 12:31 ` Petr Mladek
0 siblings, 1 reply; 18+ messages in thread
From: Lance Yang @ 2025-11-18 17:57 UTC (permalink / raw)
To: Petr Mladek, Andrew Morton
Cc: Feng Tang, Steven Rostedt, Lance Yang, linux-kernel,
Jonathan Corbet, paulmck, lirongqing, leonylgao
On 2025/11/18 23:20, Petr Mladek wrote:
> On Mon 2025-11-17 09:53:52, Andrew Morton wrote:
>> On Sun, 16 Nov 2025 22:13:58 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
>>
>>>>> if (need_warning || hung_task_call_panic) {
>>>>> si_mask |= SYS_INFO_LOCKS;
>>>>
>>>> Looks good to me now! I assume v3 would be expected, can you
>>>> post a new version?
>>>
>>> Andrew has taken the patchset to -mm tree.
>>>
>>> Andrew, which way do you prefer? I send a v3 patch for hung-task or you
>>> pickup the fixup patch and squash it into the orginal 0002 patch?
>>>
>>> Anyway, I make a squshed version v3 patch below.
>>
>> I prefer little fixup patches, generally. So people can see what
>> changed and don't feel they should re-review everything.
>>
>> I queued the below, thanks.
>>
>> From: Feng Tang <feng.tang@linux.alibaba.com>
>> Subject: hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
>> Date: Wed, 5 Nov 2025 19:30:36 +0800
>>
>> maintain consistecy established behavior, per Lance and Petr
>>
>> Link: https://lkml.kernel.org/r/aRncJo1mA5Zk77Hr@U-2FWC9VHC-2323.local
>> Suggested-by: Petr Mladek <pmladek@suse.com>
>> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
>> Cc: Jonathan Corbet <corbet@lwn.net>
>> Cc: Lance Yang <ioworker0@gmail.com>
>> Cc: "Paul E . McKenney" <paulmck@kernel.org>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
> Thanks a lot for catching and fixing the regression caused
> by this patchset. The patch looks good.
>
> See a comment below.
>
>> --- a/kernel/hung_task.c~hung_task-add-hung_task_sys_info-sysctl-to-dump-sys-info-on-task-hung-fix
>> +++ a/kernel/hung_task.c
>> @@ -223,8 +223,11 @@ static inline void debug_show_blocker(st
>> }
>> #endif
>>
>> -static void check_hung_task(struct task_struct *t, unsigned long timeout)
>> +static void check_hung_task(struct task_struct *t, unsigned long timeout,
>> + unsigned long prev_detect_count)
>> {
>> + unsigned long total_hung_task;
>> +
>> if (!task_is_hung(t, timeout))
>> return;
>>
>> @@ -234,13 +237,19 @@ static void check_hung_task(struct task_
>> */
>> sysctl_hung_task_detect_count++;
>>
>> + total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
>> trace_sched_process_hang(t);
>>
>> + if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
>> + console_verbose();
>> + hung_task_call_panic = true;
>> + }
>> +
>> /*
>> * Ok, the task did not get scheduled for more than 2 minutes,
>> * complain:
>> */
>> - if (sysctl_hung_task_warnings) {
>> + if (sysctl_hung_task_warnings || hung_task_call_panic) {
>> if (sysctl_hung_task_warnings > 0)
>> sysctl_hung_task_warnings--;
>> pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
>
> This restores the behavior after the commit 9544f9e6947f6508
> ("hung_task: panic when there are more than N hung tasks at
> the same time"). It is better than nothing.
>
> Well, the behavior is still not ideal. It would be better when
> we printed backtraces from _all_ "hung" tasks before panicking.
> But it prints the backtraces only when sysctl_hung_task_panic
> limit is reached.
>
> I mean, for example, let's have:
>
> + sysctl_hung_task_warnings = 2;
> + sysctl_hung_task_panic = 5;
> + and detect 6 hung tasks.
>
> The code will report 1st and 2nd hung tasks. It will skip 3rd and 4th
> because sysctl_hung_task_warnings reached 0. It will report 5th and
> 6th tasks because (total_hung_task >= 5).
>
> It is better than nothing. But it might be confusing.
Right, I can see how it might be confusing.
IMHO, sysctl_hung_task_warnings is a user-configured limit on verbosity.
It makes sense that reports are suppressed after the limit is exhausted,
except when the sysctl_hung_task_panic threshold is reached ;)
>
> I am not sure how to fix it. A minimalist solution would be to print
> a warning. Something like:
>
> if (sysctl_hung_task_panic > 1 &&
> (total_hung_task == sysctl_hung_task_panic) &&
> !sysctl_hung_task_warnings) {
> pr_err("INFO: %d blocked tasks might have been skipped because reached hung_task_warnings limit\n",
> sysctl_hung_task_panic - 1);
>
> Or we could print the "total_hung_task" counter somewhere, for
> example,
>
> pr_err("INFO[%lu]: task %s:%d blocked for more than %ld seconds.\n",
> total_hung_task, ...
>
> Or we could restart the for_each_process_thread() cycle and make sure
> that all hung tasks will get reported.
>
> Or we could ignore it until anyone complains.
It looks like we already inform the user when that happens. When
sysctl_hung_task_warnings is finally decremented to zero, the code prints:
```
if (!sysctl_hung_task_warnings)
pr_info("Future hung task reports are suppressed, see sysctl
kernel.hung_task_warnings\n");
```
Given that this explicit warning is already in place, perhaps the current
behavior is sufficient and clear enough?
Thanks,
Lance
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
2025-11-18 17:57 ` Lance Yang
@ 2025-11-19 12:31 ` Petr Mladek
0 siblings, 0 replies; 18+ messages in thread
From: Petr Mladek @ 2025-11-19 12:31 UTC (permalink / raw)
To: Lance Yang
Cc: Andrew Morton, Feng Tang, Steven Rostedt, Lance Yang,
linux-kernel, Jonathan Corbet, paulmck, lirongqing, leonylgao
On Wed 2025-11-19 01:57:36, Lance Yang wrote:
> On 2025/11/18 23:20, Petr Mladek wrote:
> > Well, the behavior is still not ideal. It would be better when
> > we printed backtraces from _all_ "hung" tasks before panicking.
> > But it prints the backtraces only when sysctl_hung_task_panic
> > limit is reached.
> >
> > I mean, for example, let's have:
> >
> > + sysctl_hung_task_warnings = 2;
> > + sysctl_hung_task_panic = 5;
> > + and detect 6 hung tasks.
> >
> > The code will report 1st and 2nd hung tasks. It will skip 3rd and 4th
> > because sysctl_hung_task_warnings reached 0. It will report 5th and
> > 6th tasks because (total_hung_task >= 5).
> >
> > It is better than nothing. But it might be confusing.
>
> Right, I can see how it might be confusing.
>
> IMHO, sysctl_hung_task_warnings is a user-configured limit on verbosity.
> It makes sense that reports are suppressed after the limit is exhausted,
> except when the sysctl_hung_task_panic threshold is reached ;)
>
> > I am not sure how to fix it. A minimalist solution would be to print
> > a warning. Something like:
> >
> > if (sysctl_hung_task_panic > 1 &&
> > (total_hung_task == sysctl_hung_task_panic) &&
> > !sysctl_hung_task_warnings) {
> > pr_err("INFO: %d blocked tasks might have been skipped because reached hung_task_warnings limit\n",
> > sysctl_hung_task_panic - 1);
> >
> > Or we could print the "total_hung_task" counter somewhere, for
> > example,
> >
> > pr_err("INFO[%lu]: task %s:%d blocked for more than %ld seconds.\n",
> > total_hung_task, ...
> >
> > Or we could restart the for_each_process_thread() cycle and make sure
> > that all hung tasks will get reported.
> >
> > Or we could ignore it until anyone complains.
>
> It looks like we already inform the user when that happens. When
> sysctl_hung_task_warnings is finally decremented to zero, the code prints:
>
> ```
> if (!sysctl_hung_task_warnings)
> pr_info("Future hung task reports are suppressed, see sysctl
> kernel.hung_task_warnings\n");
> ```
>
> Given that this explicit warning is already in place, perhaps the current
> behavior is sufficient and clear enough?
The warning might get lost or it might happen long time before
critical stall so people might miss it.
But you are right. There is a warning. And my worries are rather
theoretical. Let's keep the code simple until anyone complains.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-11-19 12:31 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-13 11:10 [PATCH v2 0/4] Enable hung_task and lockup cases to dump system info on demand Feng Tang
2025-11-13 11:10 ` [PATCH v2 1/4] docs: panic: correct some sys_ifo names in sysctl doc Feng Tang
2025-11-13 11:10 ` [PATCH v2 2/4] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung Feng Tang
2025-11-14 15:36 ` Petr Mladek
2025-11-16 7:16 ` Feng Tang
2025-11-16 7:58 ` Lance Yang
2025-11-16 9:11 ` Feng Tang
2025-11-16 13:22 ` Lance Yang
2025-11-16 14:13 ` Feng Tang
2025-11-17 17:53 ` Andrew Morton
2025-11-18 2:26 ` Feng Tang
2025-11-18 6:06 ` Lance Yang
2025-11-18 15:20 ` Petr Mladek
2025-11-18 17:57 ` Lance Yang
2025-11-19 12:31 ` Petr Mladek
2025-11-13 11:10 ` [PATCH v2 3/4] watchdog: add sys_info sysctls to dump sys info on system lockup Feng Tang
2025-11-14 15:44 ` Petr Mladek
2025-11-13 11:10 ` [PATCH v2 4/4] sys_info: add a default kernel sys_info mask Feng Tang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.