public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Enable hung_task and lockup cases to dump system info on demand
@ 2025-11-06  2:30 Feng Tang
  2025-11-06  2:30 ` [PATCH 1/3] docs: panic: correct some sys_ifo names in sysctl doc Feng Tang
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Feng Tang @ 2025-11-06  2:30 UTC (permalink / raw)
  To: Andrew Morton, Petr Mladek, Lance Yang, paulmck, Steven Rostedt,
	linux-kernel
  Cc: Feng Tang

When working on kernel stability issues: panic, task-hung and soft/hard
lockup are frequently met. And to debug them, user may need lots of
system information at that time, like task call stacks, lock info,
memory info, ftrace dump, etc. 

panic case already uses sys_info()  for this purpose, and has a
'panic_sys_info' sysctl(also support cmdline setup) interface to take
human readable string like "tasks,mem,timers,locks,ftrace,..."  to
control what kinds of information is needed. Which is also helpful
to debug task-hung and lockup cases.

So this patchset introduce the similar sys_info sysctl interface for
task-hung and lockup cases.

Please be noted, this is mainly for debugging and the info dumping
could be intrusive, like dumping call stack for all tasks when system
has huge number of tasks, similarly for ftrace dump (we may add
tracing_stop() and tracing_start() around it)

Locally these have been used in our bug chasing for stablility issues
and was helpful.

Andrew suggested a global sys_info knob, and one thought for this is 
to have something in sys_info.c:

	unsigned long gloabl_si_mask;

	void sys_info(unsigned long si_mask)
	{
		if (!si_mask)
			__sys_info(gloabl_si_mask);
		else
			__sys_info(si_mask);
	}

to let caller decide whether to use its own option or the gloabl one.

Please help to review, thanks!

Feng Tang (3):
  docs: panic: correct some sys_ifo names in sysctl doc
  hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
  watchdog: add lockup_sys_info sysctl to dump sys info on system lockup

 Documentation/admin-guide/sysctl/kernel.rst | 14 ++++++--
 kernel/hung_task.c                          | 39 +++++++++++++++------
 kernel/watchdog.c                           | 21 ++++++++++-
 3 files changed, 60 insertions(+), 14 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] docs: panic: correct some sys_ifo names in sysctl doc
  2025-11-06  2:30 [PATCH 0/3] Enable hung_task and lockup cases to dump system info on demand Feng Tang
@ 2025-11-06  2:30 ` Feng Tang
  2025-11-10 16:52   ` Petr Mladek
  2025-11-06  2:30 ` [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung Feng Tang
  2025-11-06  2:30 ` [PATCH 3/3] watchdog: add lockup_sys_info sysctl to dump sys info on system lockup Feng Tang
  2 siblings, 1 reply; 15+ messages in thread
From: Feng Tang @ 2025-11-06  2:30 UTC (permalink / raw)
  To: Andrew Morton, Petr Mladek, Lance Yang, paulmck, Steven Rostedt,
	linux-kernel
  Cc: Feng Tang

Some sys_info names wered forgotten to change in patch iterations, while
the right names are defined in kernel/sys_info.c.

Fixes: d747755917bf ("panic: add 'panic_sys_info' sysctl to take human readable string parameter")
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
 Documentation/admin-guide/sysctl/kernel.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 0065a55bc09e..a397eeccaea7 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -911,8 +911,8 @@ to 'panic_print'. Possible values are:
 =============   ===================================================
 tasks           print all tasks info
 mem             print system memory info
-timer           print timers info
-lock            print locks info if CONFIG_LOCKDEP is on
+timers          print timers info
+locks           print locks info if CONFIG_LOCKDEP is on
 ftrace          print ftrace buffer
 all_bt          print all CPUs backtrace (if available in the arch)
 blocked_tasks   print only tasks in uninterruptible (blocked) state
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
  2025-11-06  2:30 [PATCH 0/3] Enable hung_task and lockup cases to dump system info on demand Feng Tang
  2025-11-06  2:30 ` [PATCH 1/3] docs: panic: correct some sys_ifo names in sysctl doc Feng Tang
@ 2025-11-06  2:30 ` Feng Tang
  2025-11-06  3:28   ` Lance Yang
  2025-11-10 17:55   ` Petr Mladek
  2025-11-06  2:30 ` [PATCH 3/3] watchdog: add lockup_sys_info sysctl to dump sys info on system lockup Feng Tang
  2 siblings, 2 replies; 15+ messages in thread
From: Feng Tang @ 2025-11-06  2:30 UTC (permalink / raw)
  To: Andrew Morton, Petr Mladek, Lance Yang, paulmck, Steven Rostedt,
	linux-kernel
  Cc: Feng Tang

When task-hung happens, developers may need different kinds of system
information (call-stacks, memory info, locks, etc.) to help debugging.

Add 'hung_task_sys_info' sysctl knob to take human readable string like
"tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
requested information will be dumped. (refer kernel/sys_info.c for more
details).

Meanwhile, the newly introduced sys_info() call is used to unify some
existing info-dumping knobs.

Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
 Documentation/admin-guide/sysctl/kernel.rst |  5 +++
 kernel/hung_task.c                          | 39 +++++++++++++++------
 2 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index a397eeccaea7..45b4408dad31 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -422,6 +422,11 @@ the system boot.
 
 This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
 
+hung_task_sys_info
+==================
+A comma separated list of extra system information to be dumped when
+hung task is detected, for example, "tasks,mem,timers,locks,...".
+Refer 'panic_sys_info' section below for more details.
 
 hung_task_timeout_secs
 ======================
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 84b4b049faa5..102be5a8e75a 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -24,6 +24,7 @@
 #include <linux/sched/sysctl.h>
 #include <linux/hung_task.h>
 #include <linux/rwsem.h>
+#include <linux/sys_info.h>
 
 #include <trace/events/sched.h>
 
@@ -60,12 +61,23 @@ static unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
 static int __read_mostly sysctl_hung_task_warnings = 10;
 
 static int __read_mostly did_panic;
-static bool hung_task_show_lock;
 static bool hung_task_call_panic;
-static bool hung_task_show_all_bt;
 
 static struct task_struct *watchdog_task;
 
+/*
+ * A bitmask to control what kinds of system info to be printed when
+ * a hung task is detected, it could be task, memory, lock etc. Refer
+ * include/linux/sys_info.h for detailed bit definition.
+ */
+static unsigned long hung_task_si_mask;
+
+/*
+ * There are several sysctl knobs, and this serves as the runtime
+ * effective sys_info knob
+ */
+static unsigned long cur_si_mask;
+
 #ifdef CONFIG_SMP
 /*
  * Should we dump all CPUs backtraces in a hung task event?
@@ -235,9 +247,10 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
 	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
 	trace_sched_process_hang(t);
 
+	cur_si_mask = hung_task_si_mask;
 	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
 		console_verbose();
-		hung_task_show_lock = true;
+		cur_si_mask |= SYS_INFO_LOCKS;
 		hung_task_call_panic = true;
 	}
 
@@ -260,10 +273,10 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
 			" disables this message.\n");
 		sched_show_task(t);
 		debug_show_blocker(t, timeout);
-		hung_task_show_lock = true;
+		cur_si_mask |= SYS_INFO_LOCKS;
 
 		if (sysctl_hung_task_all_cpu_backtrace)
-			hung_task_show_all_bt = true;
+			cur_si_mask |= SYS_INFO_ALL_BT;
 		if (!sysctl_hung_task_warnings)
 			pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");
 	}
@@ -313,7 +326,6 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	if (test_taint(TAINT_DIE) || did_panic)
 		return;
 
-	hung_task_show_lock = false;
 	rcu_read_lock();
 	for_each_process_thread(g, t) {
 
@@ -329,12 +341,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	}
  unlock:
 	rcu_read_unlock();
-	if (hung_task_show_lock)
-		debug_show_all_locks();
 
-	if (hung_task_show_all_bt) {
-		hung_task_show_all_bt = false;
-		trigger_all_cpu_backtrace();
+	if (unlikely(cur_si_mask)) {
+		sys_info(cur_si_mask);
+		cur_si_mask = 0;
 	}
 
 	if (hung_task_call_panic)
@@ -435,6 +445,13 @@ static const struct ctl_table hung_task_sysctls[] = {
 		.mode		= 0444,
 		.proc_handler	= proc_doulongvec_minmax,
 	},
+	{
+		.procname	= "hung_task_sys_info",
+		.data		= &hung_task_si_mask,
+		.maxlen         = sizeof(hung_task_si_mask),
+		.mode		= 0644,
+		.proc_handler	= sysctl_sys_info_handler,
+	},
 };
 
 static void __init hung_task_sysctl_init(void)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] watchdog: add lockup_sys_info sysctl to dump sys info on system lockup
  2025-11-06  2:30 [PATCH 0/3] Enable hung_task and lockup cases to dump system info on demand Feng Tang
  2025-11-06  2:30 ` [PATCH 1/3] docs: panic: correct some sys_ifo names in sysctl doc Feng Tang
  2025-11-06  2:30 ` [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung Feng Tang
@ 2025-11-06  2:30 ` Feng Tang
  2025-11-11 13:26   ` Petr Mladek
  2 siblings, 1 reply; 15+ messages in thread
From: Feng Tang @ 2025-11-06  2:30 UTC (permalink / raw)
  To: Andrew Morton, Petr Mladek, Lance Yang, paulmck, Steven Rostedt,
	linux-kernel
  Cc: Feng Tang

When soft/hard lockup happens, developers may need different kinds of
system information (call-stacks, memory info, locks, etc.) to help debugging.

Add 'lockup_sys_info' sysctl knob to take human readable string like
"tasks,mem,timers,locks,ftrace,...", and when system lockup happens, all
requested information will be dumped. (refer kernel/sys_info.c for more
details).

Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
 Documentation/admin-guide/sysctl/kernel.rst |  5 +++++
 kernel/watchdog.c                           | 21 ++++++++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 45b4408dad31..4e39e661d5ab 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -582,6 +582,11 @@ if leaking kernel pointer values to unprivileged users is a concern.
 When ``kptr_restrict`` is set to 2, kernel pointers printed using
 %pK will be replaced with 0s regardless of privileges.
 
+lockup_sys_info
+==================
+A comma separated list of extra system information to be dumped when
+soft/hard lockup is detected, for example, "tasks,mem,timers,locks,...".
+Refer 'panic_sys_info' section below for more details.
 
 modprobe
 ========
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 659f5844393c..18d8f2a32318 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -25,6 +25,7 @@
 #include <linux/stop_machine.h>
 #include <linux/sysctl.h>
 #include <linux/tick.h>
+#include <linux/sys_info.h>
 
 #include <linux/sched/clock.h>
 #include <linux/sched/debug.h>
@@ -53,6 +54,13 @@ static int __read_mostly watchdog_hardlockup_available;
 struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
 
+/*
+ * A bitmask to control what kinds of system info to be printed when
+ * system lockup is detected, it could be task, memory, lock etc. Refer
+ * include/linux/sys_info.h for detailed bit definition.
+ */
+static unsigned long lockup_si_mask;
+
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 
 # ifdef CONFIG_SMP
@@ -240,6 +248,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 				clear_bit_unlock(0, &hard_lockup_nmi_warn);
 		}
 
+		sys_info(lockup_si_mask);
 		if (hardlockup_panic)
 			nmi_panic(regs, "Hard LOCKUP");
 
@@ -746,9 +755,11 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	unsigned long touch_ts, period_ts, now;
 	struct pt_regs *regs = get_irq_regs();
 	int duration;
-	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
+	int softlockup_all_cpu_backtrace;
 	unsigned long flags;
 
+	softlockup_all_cpu_backtrace = (lockup_si_mask & SYS_INFO_ALL_BT) ?
+					1 : sysctl_softlockup_all_cpu_backtrace;
 	if (!watchdog_enabled)
 		return HRTIMER_NORESTART;
 
@@ -846,6 +857,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		}
 
 		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
+		sys_info(lockup_si_mask & ~SYS_INFO_ALL_BT);
 		if (softlockup_panic)
 			panic("softlockup: hung tasks");
 	}
@@ -1178,6 +1190,13 @@ static const struct ctl_table watchdog_sysctls[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_watchdog_cpumask,
 	},
+	{
+		.procname	= "lockup_sys_info",
+		.data		= &lockup_si_mask,
+		.maxlen         = sizeof(lockup_si_mask),
+		.mode		= 0644,
+		.proc_handler	= sysctl_sys_info_handler,
+	},
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
 	{
 		.procname       = "soft_watchdog",
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
  2025-11-06  2:30 ` [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung Feng Tang
@ 2025-11-06  3:28   ` Lance Yang
  2025-11-06  4:48     ` Feng Tang
  2025-11-10 17:55   ` Petr Mladek
  1 sibling, 1 reply; 15+ messages in thread
From: Lance Yang @ 2025-11-06  3:28 UTC (permalink / raw)
  To: Feng Tang
  Cc: paulmck, linux-kernel, Andrew Morton, Steven Rostedt, Lance Yang,
	Petr Mladek



On 2025/11/6 10:30, Feng Tang wrote:
> When task-hung happens, developers may need different kinds of system
> information (call-stacks, memory info, locks, etc.) to help debugging.
> 
> Add 'hung_task_sys_info' sysctl knob to take human readable string like
> "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
> requested information will be dumped. (refer kernel/sys_info.c for more
> details).
> 
> Meanwhile, the newly introduced sys_info() call is used to unify some
> existing info-dumping knobs.

Thanks! Just one nit below.

> 
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> ---
>   Documentation/admin-guide/sysctl/kernel.rst |  5 +++
>   kernel/hung_task.c                          | 39 +++++++++++++++------
>   2 files changed, 33 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index a397eeccaea7..45b4408dad31 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -422,6 +422,11 @@ the system boot.
>   
>   This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>   
> +hung_task_sys_info
> +==================
> +A comma separated list of extra system information to be dumped when
> +hung task is detected, for example, "tasks,mem,timers,locks,...".
> +Refer 'panic_sys_info' section below for more details.
>   
>   hung_task_timeout_secs
>   ======================
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 84b4b049faa5..102be5a8e75a 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -24,6 +24,7 @@
>   #include <linux/sched/sysctl.h>
>   #include <linux/hung_task.h>
>   #include <linux/rwsem.h>
> +#include <linux/sys_info.h>
>   
>   #include <trace/events/sched.h>
>   
> @@ -60,12 +61,23 @@ static unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
>   static int __read_mostly sysctl_hung_task_warnings = 10;
>   
>   static int __read_mostly did_panic;
> -static bool hung_task_show_lock;
>   static bool hung_task_call_panic;
> -static bool hung_task_show_all_bt;
>   
>   static struct task_struct *watchdog_task;
>   
> +/*
> + * A bitmask to control what kinds of system info to be printed when
> + * a hung task is detected, it could be task, memory, lock etc. Refer
> + * include/linux/sys_info.h for detailed bit definition.
> + */
> +static unsigned long hung_task_si_mask;
> +
> +/*
> + * There are several sysctl knobs, and this serves as the runtime
> + * effective sys_info knob
> + */

Nit: let's make the comment for cur_si_mask even more explicit.
+/*
+ * The effective sys_info mask for the current detection cycle. It
+ * aggregates the base hung_task_si_mask and any flags triggered
+ * by other conditions within this cycle. It is cleared after use.
+ */
> +static unsigned long cur_si_mask;

That makes its lifecycle (aggregate, use, and clear) super obvious ;)

With that, LGTM!

Reviewed-by: Lance Yang <lance.yang@linux.dev>

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
  2025-11-06  3:28   ` Lance Yang
@ 2025-11-06  4:48     ` Feng Tang
  0 siblings, 0 replies; 15+ messages in thread
From: Feng Tang @ 2025-11-06  4:48 UTC (permalink / raw)
  To: Lance Yang
  Cc: paulmck, linux-kernel, Andrew Morton, Steven Rostedt, Lance Yang,
	Petr Mladek

On Thu, Nov 06, 2025 at 11:28:12AM +0800, Lance Yang wrote:
> 
> 
> On 2025/11/6 10:30, Feng Tang wrote:
> > When task-hung happens, developers may need different kinds of system
> > information (call-stacks, memory info, locks, etc.) to help debugging.
> > 
> > Add 'hung_task_sys_info' sysctl knob to take human readable string like
> > "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
> > requested information will be dumped. (refer kernel/sys_info.c for more
> > details).
> > 
> > Meanwhile, the newly introduced sys_info() call is used to unify some
> > existing info-dumping knobs.
> 
> Thanks! Just one nit below.
> 
> > 
> > Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> > ---
> >   Documentation/admin-guide/sysctl/kernel.rst |  5 +++
> >   kernel/hung_task.c                          | 39 +++++++++++++++------
> >   2 files changed, 33 insertions(+), 11 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> > index a397eeccaea7..45b4408dad31 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -422,6 +422,11 @@ the system boot.
> >   This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
> > +hung_task_sys_info
> > +==================
> > +A comma separated list of extra system information to be dumped when
> > +hung task is detected, for example, "tasks,mem,timers,locks,...".
> > +Refer 'panic_sys_info' section below for more details.
> >   hung_task_timeout_secs
> >   ======================
> > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > index 84b4b049faa5..102be5a8e75a 100644
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -24,6 +24,7 @@
> >   #include <linux/sched/sysctl.h>
> >   #include <linux/hung_task.h>
> >   #include <linux/rwsem.h>
> > +#include <linux/sys_info.h>
> >   #include <trace/events/sched.h>
> > @@ -60,12 +61,23 @@ static unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
> >   static int __read_mostly sysctl_hung_task_warnings = 10;
> >   static int __read_mostly did_panic;
> > -static bool hung_task_show_lock;
> >   static bool hung_task_call_panic;
> > -static bool hung_task_show_all_bt;
> >   static struct task_struct *watchdog_task;
> > +/*
> > + * A bitmask to control what kinds of system info to be printed when
> > + * a hung task is detected, it could be task, memory, lock etc. Refer
> > + * include/linux/sys_info.h for detailed bit definition.
> > + */
> > +static unsigned long hung_task_si_mask;
> > +
> > +/*
> > + * There are several sysctl knobs, and this serves as the runtime
> > + * effective sys_info knob
> > + */
> 
> Nit: let's make the comment for cur_si_mask even more explicit.
> +/*
> + * The effective sys_info mask for the current detection cycle. It
> + * aggregates the base hung_task_si_mask and any flags triggered
> + * by other conditions within this cycle. It is cleared after use.
> + */
> > +static unsigned long cur_si_mask;
> 
> That makes its lifecycle (aggregate, use, and clear) super obvious ;)

Yep. Thanks for the imporovement! Will take.

> With that, LGTM!
> 
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
Thanks!

- Feng

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] docs: panic: correct some sys_ifo names in sysctl doc
  2025-11-06  2:30 ` [PATCH 1/3] docs: panic: correct some sys_ifo names in sysctl doc Feng Tang
@ 2025-11-10 16:52   ` Petr Mladek
  2025-11-11 14:09     ` Feng Tang
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2025-11-10 16:52 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Lance Yang, paulmck, Steven Rostedt, linux-kernel

On Thu 2025-11-06 10:30:30, Feng Tang wrote:
> Some sys_info names wered forgotten to change in patch iterations, while
> the right names are defined in kernel/sys_info.c.
> 
> Fixes: d747755917bf ("panic: add 'panic_sys_info' sysctl to take human readable string parameter")
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>

LGTM, feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
  2025-11-06  2:30 ` [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung Feng Tang
  2025-11-06  3:28   ` Lance Yang
@ 2025-11-10 17:55   ` Petr Mladek
  2025-11-11 13:37     ` Feng Tang
  2025-11-12 11:25     ` Feng Tang
  1 sibling, 2 replies; 15+ messages in thread
From: Petr Mladek @ 2025-11-10 17:55 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Lance Yang, paulmck, Steven Rostedt, linux-kernel

On Thu 2025-11-06 10:30:31, Feng Tang wrote:
> When task-hung happens, developers may need different kinds of system
> information (call-stacks, memory info, locks, etc.) to help debugging.
> 
> Add 'hung_task_sys_info' sysctl knob to take human readable string like
> "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
> requested information will be dumped. (refer kernel/sys_info.c for more
> details).
> 
> Meanwhile, the newly introduced sys_info() call is used to unify some
> existing info-dumping knobs.
> 
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -60,12 +61,23 @@ static unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
>  static int __read_mostly sysctl_hung_task_warnings = 10;
>  
>  static int __read_mostly did_panic;
> -static bool hung_task_show_lock;
>  static bool hung_task_call_panic;
> -static bool hung_task_show_all_bt;
>  
>  static struct task_struct *watchdog_task;
>  
> +/*
> + * A bitmask to control what kinds of system info to be printed when
> + * a hung task is detected, it could be task, memory, lock etc. Refer
> + * include/linux/sys_info.h for detailed bit definition.
> + */
> +static unsigned long hung_task_si_mask;
> +
> +/*
> + * There are several sysctl knobs, and this serves as the runtime
> + * effective sys_info knob
> + */
> +static unsigned long cur_si_mask;

It seems that this variable is used to pass information between
check_hung_task() and check_hung_uninterruptible_tasks().

And "hung_task_show_lock" and "hung_task_show_all_bt" had the same
purpose.

If I get it correctly, we could move these decisions to
check_hung_uninterruptible_tasks() and avoid the global
variable.

I think that it even makes the code a bit cleaner.

Something like this on top of this patch:

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 5f0275b2c742..c2a0dfce1e56 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -71,12 +71,6 @@ static struct task_struct *watchdog_task;
  */
 static unsigned long hung_task_si_mask;
 
-/*
- * There are several sysctl knobs, and this serves as the runtime
- * effective sys_info knob
- */
-static unsigned long cur_si_mask;
-
 #ifdef CONFIG_SMP
 /*
  * Should we dump all CPUs backtraces in a hung task event?
@@ -229,11 +223,8 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
 }
 #endif
 
-static void check_hung_task(struct task_struct *t, unsigned long timeout,
-		unsigned long prev_detect_count)
+static void check_hung_task(struct task_struct *t, unsigned long timeout)
 {
-	unsigned long total_hung_task;
-
 	if (!task_is_hung(t, timeout))
 		return;
 
@@ -243,16 +234,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
 	 */
 	sysctl_hung_task_detect_count++;
 
-	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
 	trace_sched_process_hang(t);
 
-	cur_si_mask = hung_task_si_mask;
-	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
-		console_verbose();
-		cur_si_mask |= SYS_INFO_LOCKS;
-		hung_task_call_panic = true;
-	}
-
 	/*
 	 * Ok, the task did not get scheduled for more than 2 minutes,
 	 * complain:
@@ -272,10 +255,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
 			" disables this message.\n");
 		sched_show_task(t);
 		debug_show_blocker(t, timeout);
-		cur_si_mask |= SYS_INFO_LOCKS;
 
-		if (sysctl_hung_task_all_cpu_backtrace)
-			cur_si_mask |= SYS_INFO_ALL_BT;
 		if (!sysctl_hung_task_warnings)
 			pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");
 	}
@@ -315,8 +295,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 {
 	int max_count = sysctl_hung_task_check_count;
 	unsigned long last_break = jiffies;
+	unsigned long total_hung_task;
 	struct task_struct *g, *t;
 	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
+	unsigned long si_mask;
 
 	/*
 	 * If the system crashed already then all bets are off,
@@ -325,6 +307,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	if (test_taint(TAINT_DIE) || did_panic)
 		return;
 
+	si_mask = hung_task_si_mask;
+	if (sysctl_hung_task_warnings || hung_task_call_panic) {
+		si_mask |= SYS_INFO_LOCKS;
+
+		if (sysctl_hung_task_all_cpu_backtrace)
+			si_mask |= SYS_INFO_ALL_BT;
+	}
+
 	rcu_read_lock();
 	for_each_process_thread(g, t) {
 
@@ -336,16 +326,20 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 			last_break = jiffies;
 		}
 
-		check_hung_task(t, timeout, prev_detect_count);
+		check_hung_task(t, timeout);
 	}
  unlock:
 	rcu_read_unlock();
 
-	if (unlikely(cur_si_mask)) {
-		sys_info(cur_si_mask);
-		cur_si_mask = 0;
+	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
+	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
+		console_verbose();
+		hung_task_call_panic = true;
 	}
 
+	if (unlikely(si_mask))
+		sys_info(si_mask);
+
 	if (hung_task_call_panic)
 		panic("hung_task: blocked tasks");
 }

What do you think?

Hmm, maybe, we might still need to pass "prev_detect_count" and
keep "console_verbose()" in check_hung_task().

Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] watchdog: add lockup_sys_info sysctl to dump sys info on system lockup
  2025-11-06  2:30 ` [PATCH 3/3] watchdog: add lockup_sys_info sysctl to dump sys info on system lockup Feng Tang
@ 2025-11-11 13:26   ` Petr Mladek
  2025-11-11 14:09     ` Feng Tang
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2025-11-11 13:26 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Lance Yang, paulmck, Steven Rostedt, linux-kernel

On Thu 2025-11-06 10:30:32, Feng Tang wrote:
> When soft/hard lockup happens, developers may need different kinds of
> system information (call-stacks, memory info, locks, etc.) to help debugging.
> 
> Add 'lockup_sys_info' sysctl knob to take human readable string like
> "tasks,mem,timers,locks,ftrace,...", and when system lockup happens, all
> requested information will be dumped. (refer kernel/sys_info.c for more
> details).
> 
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -53,6 +54,13 @@ static int __read_mostly watchdog_hardlockup_available;
>  struct cpumask watchdog_cpumask __read_mostly;
>  unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>  
> +/*
> + * A bitmask to control what kinds of system info to be printed when
> + * system lockup is detected, it could be task, memory, lock etc. Refer
> + * include/linux/sys_info.h for detailed bit definition.
> + */
> +static unsigned long lockup_si_mask;
> +
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  
>  # ifdef CONFIG_SMP
> @@ -240,6 +248,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  				clear_bit_unlock(0, &hard_lockup_nmi_warn);
>  		}

The code right above printed backtaces from all CPUs when
sysctl_hardlockup_all_cpu_backtrace.

> +		sys_info(lockup_si_mask);

And sys_info() could print it again when SYS_INFO_ALL_BT
bit is set. The hard lockup detector should use the same
trick as the softlockup detector in watchdog_timer_fn().

>  		if (hardlockup_panic)
>  			nmi_panic(regs, "Hard LOCKUP");
>  
> @@ -746,9 +755,11 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  	unsigned long touch_ts, period_ts, now;
>  	struct pt_regs *regs = get_irq_regs();
>  	int duration;
> -	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
> +	int softlockup_all_cpu_backtrace;
>  	unsigned long flags;
>  
> +	softlockup_all_cpu_backtrace = (lockup_si_mask & SYS_INFO_ALL_BT) ?
> +					1 : sysctl_softlockup_all_cpu_backtrace;
>  	if (!watchdog_enabled)
>  		return HRTIMER_NORESTART;
>  
> @@ -846,6 +857,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  		}
>  
>  		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> +		sys_info(lockup_si_mask & ~SYS_INFO_ALL_BT);
>  		if (softlockup_panic)
>  			panic("softlockup: hung tasks");
>  	}
> @@ -1178,6 +1190,13 @@ static const struct ctl_table watchdog_sysctls[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_watchdog_cpumask,
>  	},
> +	{
> +		.procname	= "lockup_sys_info",
> +		.data		= &lockup_si_mask,
> +		.maxlen         = sizeof(lockup_si_mask),
> +		.mode		= 0644,
> +		.proc_handler	= sysctl_sys_info_handler,
> +	},

There already exists:

	+ hardlockup_all_cpu_backtrace
	+ hardlockup_panic
	+ softlockup_all_cpu_backtrace
	+ softlockup_panic

IMHO, it would make sense to introduce separate:

	+ hardlockup_sys_info
	+ softlockup_sys_info


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
  2025-11-10 17:55   ` Petr Mladek
@ 2025-11-11 13:37     ` Feng Tang
  2025-11-12 11:25     ` Feng Tang
  1 sibling, 0 replies; 15+ messages in thread
From: Feng Tang @ 2025-11-11 13:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Lance Yang, paulmck, Steven Rostedt, linux-kernel

Hi Petr,

On Mon, Nov 10, 2025 at 06:55:57PM +0100, Petr Mladek wrote:
> On Thu 2025-11-06 10:30:31, Feng Tang wrote:
> > When task-hung happens, developers may need different kinds of system
> > information (call-stacks, memory info, locks, etc.) to help debugging.
> > 
> > Add 'hung_task_sys_info' sysctl knob to take human readable string like
> > "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
> > requested information will be dumped. (refer kernel/sys_info.c for more
> > details).
> > 
> > Meanwhile, the newly introduced sys_info() call is used to unify some
> > existing info-dumping knobs.
> > 
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -60,12 +61,23 @@ static unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
> >  static int __read_mostly sysctl_hung_task_warnings = 10;
> >  
> >  static int __read_mostly did_panic;
> > -static bool hung_task_show_lock;
> >  static bool hung_task_call_panic;
> > -static bool hung_task_show_all_bt;
> >  
> >  static struct task_struct *watchdog_task;
> >  
> > +/*
> > + * A bitmask to control what kinds of system info to be printed when
> > + * a hung task is detected, it could be task, memory, lock etc. Refer
> > + * include/linux/sys_info.h for detailed bit definition.
> > + */
> > +static unsigned long hung_task_si_mask;
> > +
> > +/*
> > + * There are several sysctl knobs, and this serves as the runtime
> > + * effective sys_info knob
> > + */
> > +static unsigned long cur_si_mask;
> 
> It seems that this variable is used to pass information between
> check_hung_task() and check_hung_uninterruptible_tasks().
> 
> And "hung_task_show_lock" and "hung_task_show_all_bt" had the same
> purpose.
> 
> If I get it correctly, we could move these decisions to
> check_hung_uninterruptible_tasks() and avoid the global
> variable.
> 
> I think that it even makes the code a bit cleaner.
> 
> Something like this on top of this patch:
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 5f0275b2c742..c2a0dfce1e56 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -71,12 +71,6 @@ static struct task_struct *watchdog_task;
>   */
>  static unsigned long hung_task_si_mask;
>  
> -/*
> - * There are several sysctl knobs, and this serves as the runtime
> - * effective sys_info knob
> - */
> -static unsigned long cur_si_mask;
> -
>  #ifdef CONFIG_SMP
>  /*
>   * Should we dump all CPUs backtraces in a hung task event?
> @@ -229,11 +223,8 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
>  }
>  #endif
>  
> -static void check_hung_task(struct task_struct *t, unsigned long timeout,
> -		unsigned long prev_detect_count)
> +static void check_hung_task(struct task_struct *t, unsigned long timeout)
>  {
> -	unsigned long total_hung_task;
> -
>  	if (!task_is_hung(t, timeout))
>  		return;
>  
> @@ -243,16 +234,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
>  	 */
>  	sysctl_hung_task_detect_count++;
>  
> -	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
>  	trace_sched_process_hang(t);
>  
> -	cur_si_mask = hung_task_si_mask;
> -	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> -		console_verbose();
> -		cur_si_mask |= SYS_INFO_LOCKS;
> -		hung_task_call_panic = true;
> -	}
> -
>  	/*
>  	 * Ok, the task did not get scheduled for more than 2 minutes,
>  	 * complain:
> @@ -272,10 +255,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
>  			" disables this message.\n");
>  		sched_show_task(t);
>  		debug_show_blocker(t, timeout);
> -		cur_si_mask |= SYS_INFO_LOCKS;
>  
> -		if (sysctl_hung_task_all_cpu_backtrace)
> -			cur_si_mask |= SYS_INFO_ALL_BT;
>  		if (!sysctl_hung_task_warnings)
>  			pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");
>  	}
> @@ -315,8 +295,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  {
>  	int max_count = sysctl_hung_task_check_count;
>  	unsigned long last_break = jiffies;
> +	unsigned long total_hung_task;
>  	struct task_struct *g, *t;
>  	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> +	unsigned long si_mask;
>  
>  	/*
>  	 * If the system crashed already then all bets are off,
> @@ -325,6 +307,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  	if (test_taint(TAINT_DIE) || did_panic)
>  		return;
>  
> +	si_mask = hung_task_si_mask;
> +	if (sysctl_hung_task_warnings || hung_task_call_panic) {
> +		si_mask |= SYS_INFO_LOCKS;
> +
> +		if (sysctl_hung_task_all_cpu_backtrace)
> +			si_mask |= SYS_INFO_ALL_BT;
> +	}
> +
>  	rcu_read_lock();
>  	for_each_process_thread(g, t) {
>  
> @@ -336,16 +326,20 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  			last_break = jiffies;
>  		}
>  
> -		check_hung_task(t, timeout, prev_detect_count);
> +		check_hung_task(t, timeout);
>  	}
>   unlock:
>  	rcu_read_unlock();
>  
> -	if (unlikely(cur_si_mask)) {
> -		sys_info(cur_si_mask);
> -		cur_si_mask = 0;
> +	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
> +	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> +		console_verbose();
> +		hung_task_call_panic = true;
>  	}
>  
> +	if (unlikely(si_mask))
> +		sys_info(si_mask);
> +
>  	if (hung_task_call_panic)
>  		panic("hung_task: blocked tasks");
>  }
> 
> What do you think?

The cleanup looks great to me.

> Hmm, maybe, we might still need to pass "prev_detect_count" and
> keep "console_verbose()" in check_hung_task().

I think moving the console_verbose() here is fine, as the msg printing
in check_hung_task() is mostly pr_err() and pr_info() already.

Thanks,
Feng

> 
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] watchdog: add lockup_sys_info sysctl to dump sys info on system lockup
  2025-11-11 13:26   ` Petr Mladek
@ 2025-11-11 14:09     ` Feng Tang
  0 siblings, 0 replies; 15+ messages in thread
From: Feng Tang @ 2025-11-11 14:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Lance Yang, paulmck, Steven Rostedt, linux-kernel

On Tue, Nov 11, 2025 at 02:26:05PM +0100, Petr Mladek wrote:
> On Thu 2025-11-06 10:30:32, Feng Tang wrote:
> > When soft/hard lockup happens, developers may need different kinds of
> > system information (call-stacks, memory info, locks, etc.) to help debugging.
> > 
> > Add 'lockup_sys_info' sysctl knob to take human readable string like
> > "tasks,mem,timers,locks,ftrace,...", and when system lockup happens, all
> > requested information will be dumped. (refer kernel/sys_info.c for more
> > details).
> > 
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -53,6 +54,13 @@ static int __read_mostly watchdog_hardlockup_available;
> >  struct cpumask watchdog_cpumask __read_mostly;
> >  unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
> >  
> > +/*
> > + * A bitmask to control what kinds of system info to be printed when
> > + * system lockup is detected, it could be task, memory, lock etc. Refer
> > + * include/linux/sys_info.h for detailed bit definition.
> > + */
> > +static unsigned long lockup_si_mask;
> > +
> >  #ifdef CONFIG_HARDLOCKUP_DETECTOR
> >  
> >  # ifdef CONFIG_SMP
> > @@ -240,6 +248,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >  				clear_bit_unlock(0, &hard_lockup_nmi_warn);
> >  		}
> 
> The code right above printed backtaces from all CPUs when
> sysctl_hardlockup_all_cpu_backtrace.
> 
> > +		sys_info(lockup_si_mask);
> 
> And sys_info() could print it again when SYS_INFO_ALL_BT
> bit is set. The hard lockup detector should use the same
> trick as the softlockup detector in watchdog_timer_fn().

Yes, I missed that. Thanks for the catching!

> >  		if (hardlockup_panic)
> >  			nmi_panic(regs, "Hard LOCKUP");
> >  
> > @@ -746,9 +755,11 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> >  	unsigned long touch_ts, period_ts, now;
> >  	struct pt_regs *regs = get_irq_regs();
> >  	int duration;
> > -	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
> > +	int softlockup_all_cpu_backtrace;
> >  	unsigned long flags;
> >  
> > +	softlockup_all_cpu_backtrace = (lockup_si_mask & SYS_INFO_ALL_BT) ?
> > +					1 : sysctl_softlockup_all_cpu_backtrace;
> >  	if (!watchdog_enabled)
> >  		return HRTIMER_NORESTART;
> >  
> > @@ -846,6 +857,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> >  		}
> >  
> >  		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> > +		sys_info(lockup_si_mask & ~SYS_INFO_ALL_BT);
> >  		if (softlockup_panic)
> >  			panic("softlockup: hung tasks");
> >  	}
> > @@ -1178,6 +1190,13 @@ static const struct ctl_table watchdog_sysctls[] = {
> >  		.mode		= 0644,
> >  		.proc_handler	= proc_watchdog_cpumask,
> >  	},
> > +	{
> > +		.procname	= "lockup_sys_info",
> > +		.data		= &lockup_si_mask,
> > +		.maxlen         = sizeof(lockup_si_mask),
> > +		.mode		= 0644,
> > +		.proc_handler	= sysctl_sys_info_handler,
> > +	},
> 
> There already exists:
> 
> 	+ hardlockup_all_cpu_backtrace
> 	+ hardlockup_panic
> 	+ softlockup_all_cpu_backtrace
> 	+ softlockup_panic
> 
> IMHO, it would make sense to introduce separate:
> 
> 	+ hardlockup_sys_info
> 	+ softlockup_sys_info

Make sense to me, will do.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] docs: panic: correct some sys_ifo names in sysctl doc
  2025-11-10 16:52   ` Petr Mladek
@ 2025-11-11 14:09     ` Feng Tang
  0 siblings, 0 replies; 15+ messages in thread
From: Feng Tang @ 2025-11-11 14:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Lance Yang, paulmck, Steven Rostedt, linux-kernel

On Mon, Nov 10, 2025 at 05:52:35PM +0100, Petr Mladek wrote:
> On Thu 2025-11-06 10:30:30, Feng Tang wrote:
> > Some sys_info names wered forgotten to change in patch iterations, while
> > the right names are defined in kernel/sys_info.c.
> > 
> > Fixes: d747755917bf ("panic: add 'panic_sys_info' sysctl to take human readable string parameter")
> > Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> 
> LGTM, feel free to use:
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>

Thanks you!

- Feng

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
  2025-11-10 17:55   ` Petr Mladek
  2025-11-11 13:37     ` Feng Tang
@ 2025-11-12 11:25     ` Feng Tang
  2025-11-12 14:44       ` Petr Mladek
  1 sibling, 1 reply; 15+ messages in thread
From: Feng Tang @ 2025-11-12 11:25 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Lance Yang, paulmck, Steven Rostedt, linux-kernel

On Mon, Nov 10, 2025 at 06:55:57PM +0100, Petr Mladek wrote:
> On Thu 2025-11-06 10:30:31, Feng Tang wrote:
[...]
> @@ -315,8 +295,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  {
>  	int max_count = sysctl_hung_task_check_count;
>  	unsigned long last_break = jiffies;
> +	unsigned long total_hung_task;
>  	struct task_struct *g, *t;
>  	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> +	unsigned long si_mask;
>  
>  	/*
>  	 * If the system crashed already then all bets are off,
> @@ -325,6 +307,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  	if (test_taint(TAINT_DIE) || did_panic)
>  		return;
>  
> +	si_mask = hung_task_si_mask;
> +	if (sysctl_hung_task_warnings || hung_task_call_panic) {
> +		si_mask |= SYS_INFO_LOCKS;
> +
> +		if (sysctl_hung_task_all_cpu_backtrace)
> +			si_mask |= SYS_INFO_ALL_BT;
> +	}

This probably needs to be moved to after the loop check of
check_hung_task(). 

> +
>  	rcu_read_lock();
>  	for_each_process_thread(g, t) {
>  
> @@ -336,16 +326,20 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  			last_break = jiffies;
>  		}
>  
> -		check_hung_task(t, timeout, prev_detect_count);
> +		check_hung_task(t, timeout);
>  	}
>   unlock:
>  	rcu_read_unlock();
>  
> -	if (unlikely(cur_si_mask)) {
> -		sys_info(cur_si_mask);
> -		cur_si_mask = 0;
> +	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
> +	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
> +		console_verbose();
> +		hung_task_call_panic = true;
>  	}
>  
> +	if (unlikely(si_mask))
> +		sys_info(si_mask);
> +
>  	if (hung_task_call_panic)
>  		panic("hung_task: blocked tasks");
>  }
[...]

Thanks,
Feng

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
  2025-11-12 11:25     ` Feng Tang
@ 2025-11-12 14:44       ` Petr Mladek
  2025-11-13  2:56         ` Feng Tang
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2025-11-12 14:44 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Lance Yang, paulmck, Steven Rostedt, linux-kernel

On Wed 2025-11-12 19:25:27, Feng Tang wrote:
> On Mon, Nov 10, 2025 at 06:55:57PM +0100, Petr Mladek wrote:
> > On Thu 2025-11-06 10:30:31, Feng Tang wrote:
> [...]
> > @@ -315,8 +295,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> >  {
> >  	int max_count = sysctl_hung_task_check_count;
> >  	unsigned long last_break = jiffies;
> > +	unsigned long total_hung_task;
> >  	struct task_struct *g, *t;
> >  	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> > +	unsigned long si_mask;
> >  
> >  	/*
> >  	 * If the system crashed already then all bets are off,
> > @@ -325,6 +307,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> >  	if (test_taint(TAINT_DIE) || did_panic)
> >  		return;
> >  
> > +	si_mask = hung_task_si_mask;
> > +	if (sysctl_hung_task_warnings || hung_task_call_panic) {
> > +		si_mask |= SYS_INFO_LOCKS;
> > +
> > +		if (sysctl_hung_task_all_cpu_backtrace)
> > +			si_mask |= SYS_INFO_ALL_BT;
> > +	}
> 
> This probably needs to be moved to after the loop check of
> check_hung_task(). 

I did it on purpose because "sysctl_hung_task_warnings" might get
decremented down to "0" in the loop below. But IMHO, we need to print
the information if it was non-zero at the beginning.

It might be worth to add a comment why it has to be done
before the cycle.

> > +
> >  	rcu_read_lock();
> >  	for_each_process_thread(g, t) {
> >  

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung
  2025-11-12 14:44       ` Petr Mladek
@ 2025-11-13  2:56         ` Feng Tang
  0 siblings, 0 replies; 15+ messages in thread
From: Feng Tang @ 2025-11-13  2:56 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Lance Yang, paulmck, Steven Rostedt, linux-kernel

On Wed, Nov 12, 2025 at 03:44:09PM +0100, Petr Mladek wrote:
> On Wed 2025-11-12 19:25:27, Feng Tang wrote:
> > On Mon, Nov 10, 2025 at 06:55:57PM +0100, Petr Mladek wrote:
> > > On Thu 2025-11-06 10:30:31, Feng Tang wrote:
> > [...]
> > > @@ -315,8 +295,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> > >  {
> > >  	int max_count = sysctl_hung_task_check_count;
> > >  	unsigned long last_break = jiffies;
> > > +	unsigned long total_hung_task;
> > >  	struct task_struct *g, *t;
> > >  	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
> > > +	unsigned long si_mask;
> > >  
> > >  	/*
> > >  	 * If the system crashed already then all bets are off,
> > > @@ -325,6 +307,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> > >  	if (test_taint(TAINT_DIE) || did_panic)
> > >  		return;
> > >  
> > > +	si_mask = hung_task_si_mask;
> > > +	if (sysctl_hung_task_warnings || hung_task_call_panic) {
> > > +		si_mask |= SYS_INFO_LOCKS;
> > > +
> > > +		if (sysctl_hung_task_all_cpu_backtrace)
> > > +			si_mask |= SYS_INFO_ALL_BT;
> > > +	}
> > 
> > This probably needs to be moved to after the loop check of
> > check_hung_task(). 
> 
> I did it on purpose because "sysctl_hung_task_warnings" might get
> decremented down to "0" in the loop below. But IMHO, we need to print
> the information if it was non-zero at the beginning.
> 
> It might be worth to add a comment why it has to be done
> before the cycle.
 
I see your point. Yes, that could happen and should be handled.

My concern was:
1. 'hung_task_call_panic' is actually set in the following loop of 
   checking, and should be checked after the loop
2. when 'sysctl_hung_task_warnings' is not 0 (likely), the
	si_mask |= SYS_INFO_LOCKS
  will make it always call sys_info() will non-zero value, while the
  'hung_task_si_mask' could be pre-set. I just run a simple hung task
  test and can confirm this.

So I made some small changes based on your patches, please help to
review.

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 5f0275b2c742..5b3a7785d3a2 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -71,12 +71,6 @@ static struct task_struct *watchdog_task;
  */
 static unsigned long hung_task_si_mask;
 
-/*
- * There are several sysctl knobs, and this serves as the runtime
- * effective sys_info knob
- */
-static unsigned long cur_si_mask;
-
 #ifdef CONFIG_SMP
 /*
  * Should we dump all CPUs backtraces in a hung task event?
@@ -229,11 +223,8 @@ static inline void debug_show_blocker(struct task_struct *task, unsigned long ti
 }
 #endif
 
-static void check_hung_task(struct task_struct *t, unsigned long timeout,
-		unsigned long prev_detect_count)
+static void check_hung_task(struct task_struct *t, unsigned long timeout)
 {
-	unsigned long total_hung_task;
-
 	if (!task_is_hung(t, timeout))
 		return;
 
@@ -243,21 +234,13 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
 	 */
 	sysctl_hung_task_detect_count++;
 
-	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
 	trace_sched_process_hang(t);
 
-	cur_si_mask = hung_task_si_mask;
-	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
-		console_verbose();
-		cur_si_mask |= SYS_INFO_LOCKS;
-		hung_task_call_panic = true;
-	}
-
 	/*
 	 * Ok, the task did not get scheduled for more than 2 minutes,
 	 * complain:
 	 */
-	if (sysctl_hung_task_warnings || hung_task_call_panic) {
+	if (sysctl_hung_task_warnings) {
 		if (sysctl_hung_task_warnings > 0)
 			sysctl_hung_task_warnings--;
 		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
@@ -272,10 +255,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout,
 			" disables this message.\n");
 		sched_show_task(t);
 		debug_show_blocker(t, timeout);
-		cur_si_mask |= SYS_INFO_LOCKS;
 
-		if (sysctl_hung_task_all_cpu_backtrace)
-			cur_si_mask |= SYS_INFO_ALL_BT;
 		if (!sysctl_hung_task_warnings)
 			pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");
 	}
@@ -315,8 +295,11 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 {
 	int max_count = sysctl_hung_task_check_count;
 	unsigned long last_break = jiffies;
+	unsigned long total_hung_task;
 	struct task_struct *g, *t;
 	unsigned long prev_detect_count = sysctl_hung_task_detect_count;
+	int need_warning = sysctl_hung_task_warnings;
+	unsigned long si_mask = hung_task_si_mask;
 
 	/*
 	 * If the system crashed already then all bets are off,
@@ -325,6 +308,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	if (test_taint(TAINT_DIE) || did_panic)
 		return;
 
+
 	rcu_read_lock();
 	for_each_process_thread(g, t) {
 
@@ -336,16 +320,29 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 			last_break = jiffies;
 		}
 
-		check_hung_task(t, timeout, prev_detect_count);
+		check_hung_task(t, timeout);
 	}
  unlock:
 	rcu_read_unlock();
 
-	if (unlikely(cur_si_mask)) {
-		sys_info(cur_si_mask);
-		cur_si_mask = 0;
+	total_hung_task = sysctl_hung_task_detect_count - prev_detect_count;
+	if (!total_hung_task)
+		return;
+
+	if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) {
+		console_verbose();
+		hung_task_call_panic = true;
 	}
 
+	if (need_warning || hung_task_call_panic) {
+		si_mask |= SYS_INFO_LOCKS;
+
+		if (sysctl_hung_task_all_cpu_backtrace)
+			si_mask |= SYS_INFO_ALL_BT;
+	}
+
+	sys_info(si_mask);
+
 	if (hung_task_call_panic)
 		panic("hung_task: blocked tasks");
 }

Thanks,
Feng

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-11-13  2:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-06  2:30 [PATCH 0/3] Enable hung_task and lockup cases to dump system info on demand Feng Tang
2025-11-06  2:30 ` [PATCH 1/3] docs: panic: correct some sys_ifo names in sysctl doc Feng Tang
2025-11-10 16:52   ` Petr Mladek
2025-11-11 14:09     ` Feng Tang
2025-11-06  2:30 ` [PATCH 2/3] hung_task: Add hung_task_sys_info sysctl to dump sys info on task-hung Feng Tang
2025-11-06  3:28   ` Lance Yang
2025-11-06  4:48     ` Feng Tang
2025-11-10 17:55   ` Petr Mladek
2025-11-11 13:37     ` Feng Tang
2025-11-12 11:25     ` Feng Tang
2025-11-12 14:44       ` Petr Mladek
2025-11-13  2:56         ` Feng Tang
2025-11-06  2:30 ` [PATCH 3/3] watchdog: add lockup_sys_info sysctl to dump sys info on system lockup Feng Tang
2025-11-11 13:26   ` Petr Mladek
2025-11-11 14:09     ` Feng Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox