[PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/3]  generalize panic_print's dump function to be used by other kernel parts
@ 2025-05-11  8:52 Feng Tang
  2025-05-11  8:52 ` [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info Feng Tang
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Feng Tang @ 2025-05-11  8:52 UTC (permalink / raw)
  To: Andrew Morton, Petr Mladek, Steven Rostedt, Lance Yang,
	linux-kernel
  Cc: mhiramat, llong, Feng Tang

When working on kernel stability issues, panic, task-hung and 
software/hardware lockup are frequently met. And to debug them, user
may need lots of system information at that time, like task call stacks,
lock info, memory info etc. 

panic case already has panic_print_sys_info() for this purpose, and has
a 'panic_print' bitmask to control what kinds of information is needed,
which is also helpful to debug other task-hung and lockup cases.

So this patchset extract the function out, and make it usable for other
cases which also need system info for debugging. 

Locally these have been used in our bug chasing for stablility issues
and was helpful.

Please help to review, thanks!

- Feng

Changelog:

  Since RFC:
     * Don't print all cpu backtrace if 'sysctl_hung_task_all_cpu_backtracemay'
       is 'false' (Lance Yang)
     * Change the name of 2 new kernel control knob to have 'mask' inside, and
       add kernel document and code comments for them (Lance Yang)
     * Make the sys_show_info() support printk msg replay and all CPU backtrace. 

Feng Tang (3):
  kernel/panic: generalize panic_print's function to show sys info
  kernel/hung_task: add option to dump system info when hung task
    detected
  kernel/watchdog: add option to dump system info when system is locked
    up

 .../admin-guide/kernel-parameters.txt         | 10 ++++
 include/linux/panic.h                         | 11 +++++
 kernel/hung_task.c                            | 42 ++++++++++++-----
 kernel/panic.c                                | 47 ++++++++++---------
 kernel/watchdog.c                             | 20 ++++++++
 5 files changed, 95 insertions(+), 35 deletions(-)

-- 
2.39.5 (Apple Git-154)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info
  2025-05-11  8:52 [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts Feng Tang
@ 2025-05-11  8:52 ` Feng Tang
  2025-05-13  9:57   ` Petr Mladek
  2025-05-11  8:52 ` [PATCH v1 2/3] kernel/hung_task: add option to dump system info when hung task detected Feng Tang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Feng Tang @ 2025-05-11  8:52 UTC (permalink / raw)
  To: Andrew Morton, Petr Mladek, Steven Rostedt, Lance Yang,
	linux-kernel
  Cc: mhiramat, llong, Feng Tang

panic_print was introduced to help debugging kernel panic by dumping
different kinds of system information like tasks' call stack, memory,
ftrace buffer etc. Acutually this function could help debugging cases
like task-hung, soft/hard lockup too, where user may need the snapshot
of system info at that time.

Extract sys_show_info() function out to be used by other kernel parts
for debugging.

Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
 include/linux/panic.h | 11 ++++++++++
 kernel/panic.c        | 47 +++++++++++++++++++++++--------------------
 2 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/include/linux/panic.h b/include/linux/panic.h
index 2494d51707ef..bb1796b64381 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -16,6 +16,17 @@ extern void oops_enter(void);
 extern void oops_exit(void);
 extern bool oops_may_print(void);
 
+#define SYS_PRINT_TASK_INFO		0x00000001
+#define SYS_PRINT_MEM_INFO		0x00000002
+#define SYS_PRINT_TIMER_INFO		0x00000004
+#define SYS_PRINT_LOCK_INFO		0x00000008
+#define SYS_PRINT_FTRACE_INFO		0x00000010
+#define SYS_PRINT_ALL_PRINTK_MSG	0x00000020
+#define SYS_PRINT_ALL_CPU_BT		0x00000040
+#define SYS_PRINT_BLOCKED_TASKS		0x00000080
+
+extern void sys_show_info(unsigned long info_mask);
+
 extern bool panic_triggering_all_cpu_backtrace;
 extern int panic_timeout;
 extern unsigned long panic_print;
diff --git a/kernel/panic.c b/kernel/panic.c
index a3889f38153d..2542ae3702f9 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -69,14 +69,6 @@ bool panic_triggering_all_cpu_backtrace;
 int panic_timeout = CONFIG_PANIC_TIMEOUT;
 EXPORT_SYMBOL_GPL(panic_timeout);
 
-#define PANIC_PRINT_TASK_INFO		0x00000001
-#define PANIC_PRINT_MEM_INFO		0x00000002
-#define PANIC_PRINT_TIMER_INFO		0x00000004
-#define PANIC_PRINT_LOCK_INFO		0x00000008
-#define PANIC_PRINT_FTRACE_INFO		0x00000010
-#define PANIC_PRINT_ALL_PRINTK_MSG	0x00000020
-#define PANIC_PRINT_ALL_CPU_BT		0x00000040
-#define PANIC_PRINT_BLOCKED_TASKS	0x00000080
 unsigned long panic_print;
 
 ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
@@ -208,33 +200,44 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
 }
 EXPORT_SYMBOL(nmi_panic);
 
-static void panic_print_sys_info(bool console_flush)
+void sys_show_info(unsigned long info_mask)
 {
-	if (console_flush) {
-		if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
-			console_flush_on_panic(CONSOLE_REPLAY_ALL);
-		return;
-	}
-
-	if (panic_print & PANIC_PRINT_TASK_INFO)
+	if (info_mask & SYS_PRINT_TASK_INFO)
 		show_state();
 
-	if (panic_print & PANIC_PRINT_MEM_INFO)
+	if (info_mask & SYS_PRINT_MEM_INFO)
 		show_mem();
 
-	if (panic_print & PANIC_PRINT_TIMER_INFO)
+	if (info_mask & SYS_PRINT_TIMER_INFO)
 		sysrq_timer_list_show();
 
-	if (panic_print & PANIC_PRINT_LOCK_INFO)
+	if (info_mask & SYS_PRINT_LOCK_INFO)
 		debug_show_all_locks();
 
-	if (panic_print & PANIC_PRINT_FTRACE_INFO)
+	if (info_mask & SYS_PRINT_FTRACE_INFO)
 		ftrace_dump(DUMP_ALL);
 
-	if (panic_print & PANIC_PRINT_BLOCKED_TASKS)
+	if (panic_print & SYS_PRINT_ALL_PRINTK_MSG)
+		console_flush_on_panic(CONSOLE_REPLAY_ALL);
+
+	if (info_mask & SYS_PRINT_ALL_CPU_BT)
+		trigger_all_cpu_backtrace();
+
+	if (info_mask & SYS_PRINT_BLOCKED_TASKS)
 		show_state_filter(TASK_UNINTERRUPTIBLE);
 }
 
+static void panic_print_sys_info(bool console_flush)
+{
+	if (console_flush) {
+		if (panic_print & SYS_PRINT_ALL_PRINTK_MSG)
+			console_flush_on_panic(CONSOLE_REPLAY_ALL);
+		return;
+	}
+
+	sys_show_info(panic_print & ~SYS_PRINT_ALL_PRINTK_MSG);
+}
+
 void check_panic_on_warn(const char *origin)
 {
 	unsigned int limit;
@@ -255,7 +258,7 @@ void check_panic_on_warn(const char *origin)
  */
 static void panic_other_cpus_shutdown(bool crash_kexec)
 {
-	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
+	if (panic_print & SYS_PRINT_ALL_CPU_BT) {
 		/* Temporary allow non-panic CPUs to write their backtraces. */
 		panic_triggering_all_cpu_backtrace = true;
 		trigger_all_cpu_backtrace();
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 2/3] kernel/hung_task: add option to dump system info when hung task detected
  2025-05-11  8:52 [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts Feng Tang
  2025-05-11  8:52 ` [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info Feng Tang
@ 2025-05-11  8:52 ` Feng Tang
  2025-05-11  8:52 ` [PATCH v1 3/3] kernel/watchdog: add option to dump system info when system is locked up Feng Tang
  2025-05-12  1:46 ` [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts Andrew Morton
  3 siblings, 0 replies; 22+ messages in thread
From: Feng Tang @ 2025-05-11  8:52 UTC (permalink / raw)
  To: Andrew Morton, Petr Mladek, Steven Rostedt, Lance Yang,
	linux-kernel
  Cc: mhiramat, llong, Feng Tang

Kernel panic code utilizes sys_show_info() to dump needed system
information to help debugging. Similarly, add this debug option for
task hung case, and 'hungtask_print_mask' is the control knob and a
bitmask to control what information should be printed out:

            bit 0: print all tasks info
            bit 1: print system memory info
            bit 2: print timer info
            bit 3: print locks info if CONFIG_LOCKDEP is on
            bit 4: print ftrace buffer
            bit 5: print all printk messages in buffer
            bit 6: print all CPUs backtrace (if available in the arch)
            bit 7: print only tasks in uninterruptible (blocked) state

Also simplify the code about dumping locks and triggering backtrace
for all CPUs by leveraging sys_show_info().

Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 +++
 kernel/hung_task.c                            | 42 +++++++++++++------
 2 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d9fd26b95b34..d35d8101bee9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4488,6 +4488,11 @@
 			Use this option carefully, maybe worth to setup a
 			bigger log buffer with "log_buf_len" along with this.
 
+	hungtask_print_mask=
+			Bitmask for printing system info when hung task is detected.
+			Details of bits definition is the same as panic_print's
+			definition above.
+
 	parkbd.port=	[HW] Parallel port number the keyboard adapter is
 			connected to, default is 0.
 			Format: <parport#>
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index dc898ec93463..3907e3c6fefa 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -58,12 +58,29 @@ static unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
 static int __read_mostly sysctl_hung_task_warnings = 10;
 
 static int __read_mostly did_panic;
-static bool hung_task_show_lock;
 static bool hung_task_call_panic;
-static bool hung_task_show_all_bt;
 
 static struct task_struct *watchdog_task;
 
+/*
+ * A bitmask to control what kinds of system info to be printed when a
+ * hung task is detected, it could be task, memory, lock etc. And the bit
+ * definition (from panic.h) is:
+ *
+ *	#define SYS_PRINT_TASK_INFO		0x00000001
+ *	#define SYS_PRINT_MEM_INFO		0x00000002
+ *	#define SYS_PRINT_TIMER_INFO		0x00000004
+ *	#define SYS_PRINT_LOCK_INFO		0x00000008
+ *	#define SYS_PRINT_FTRACE_INFO		0x00000010
+ *	#define SYS_PRINT_ALL_PRINTK_MSG	0x00000020
+ *	#define SYS_PRINT_ALL_CPU_BT		0x00000040
+ *	#define SYS_PRINT_BLOCKED_TASKS		0x00000080
+ */
+unsigned long hungtask_print_mask;
+core_param(hungtask_print_mask, hungtask_print_mask, ulong, 0644);
+
+static unsigned long cur_print_mask;
+
 #ifdef CONFIG_SMP
 /*
  * Should we dump all CPUs backtraces in a hung task event?
@@ -163,11 +180,16 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 	 */
 	sysctl_hung_task_detect_count++;
 
+	cur_print_mask = hungtask_print_mask;
+
+	if (!sysctl_hung_task_all_cpu_backtrace)
+		cur_print_mask &= ~SYS_PRINT_ALL_CPU_BT;
+
 	trace_sched_process_hang(t);
 
 	if (sysctl_hung_task_panic) {
 		console_verbose();
-		hung_task_show_lock = true;
+		cur_print_mask |= SYS_PRINT_LOCK_INFO;
 		hung_task_call_panic = true;
 	}
 
@@ -190,10 +212,10 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 			" disables this message.\n");
 		sched_show_task(t);
 		debug_show_blocker(t);
-		hung_task_show_lock = true;
+		cur_print_mask |= SYS_PRINT_LOCK_INFO;
 
 		if (sysctl_hung_task_all_cpu_backtrace)
-			hung_task_show_all_bt = true;
+			cur_print_mask |= SYS_PRINT_ALL_CPU_BT;
 		if (!sysctl_hung_task_warnings)
 			pr_info("Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings\n");
 	}
@@ -242,7 +264,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	if (test_taint(TAINT_DIE) || did_panic)
 		return;
 
-	hung_task_show_lock = false;
+	cur_print_mask = 0;
 	rcu_read_lock();
 	for_each_process_thread(g, t) {
 		unsigned int state;
@@ -266,14 +288,8 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	}
  unlock:
 	rcu_read_unlock();
-	if (hung_task_show_lock)
-		debug_show_all_locks();
-
-	if (hung_task_show_all_bt) {
-		hung_task_show_all_bt = false;
-		trigger_all_cpu_backtrace();
-	}
 
+	sys_show_info(cur_print_mask);
 	if (hung_task_call_panic)
 		panic("hung_task: blocked tasks");
 }
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 3/3] kernel/watchdog: add option to dump system info when system is locked up
  2025-05-11  8:52 [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts Feng Tang
  2025-05-11  8:52 ` [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info Feng Tang
  2025-05-11  8:52 ` [PATCH v1 2/3] kernel/hung_task: add option to dump system info when hung task detected Feng Tang
@ 2025-05-11  8:52 ` Feng Tang
  2025-05-12  1:46 ` [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts Andrew Morton
  3 siblings, 0 replies; 22+ messages in thread
From: Feng Tang @ 2025-05-11  8:52 UTC (permalink / raw)
  To: Andrew Morton, Petr Mladek, Steven Rostedt, Lance Yang,
	linux-kernel
  Cc: mhiramat, llong, Feng Tang

Kernel panic code utilizes sys_show_info() to dump needed system
information to help debugging. Similarly, add this debug option for
software/hardware lockup cases, and 'lockup_print_mask' is the control
knob and a bitmask to control what information should be printed out:

	bit 0: print all tasks info
	bit 1: print system memory info
	bit 2: print timer info
	bit 3: print locks info if CONFIG_LOCKDEP is on
	bit 4: print ftrace buffer
	bit 5: print all printk messages in buffer
	bit 6: print all CPUs backtrace (if available in the arch)
	bit 7: print only tasks in uninterruptible (blocked) state

Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
 .../admin-guide/kernel-parameters.txt         | 11 +++++++---
 kernel/watchdog.c                             | 20 +++++++++++++++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d35d8101bee9..2b8bda2b5f0b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4489,9 +4489,14 @@
 			bigger log buffer with "log_buf_len" along with this.
 
 	hungtask_print_mask=
-			Bitmask for printing system info when hung task is detected.
-			Details of bits definition is the same as panic_print's
-			definition above.
+			Bitmask for printing system info when hung task is
+			detected. Details of bits definition is the same as
+			panic_print's definition above.
+
+	lockup_print_mask=
+			Bitmask for printing system info when software/hardware
+			system lockup is detected. Details of bits definition
+			is the same as panic_print's definition above.
 
 	parkbd.port=	[HW] Parallel port number the keyboard adapter is
 			connected to, default is 0.
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 9fa2af9dbf2c..fb1b94929c3b 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -52,6 +52,23 @@ static int __read_mostly watchdog_hardlockup_available;
 struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
 
+/*
+ * A bitmask to control what kinds of system info to be printed when a
+ * software/hardware lockup is detected, it could be task, memory, lock
+ * etc. And the bit definition (from panic.h) is:
+ *
+ *	#define SYS_PRINT_TASK_INFO		0x00000001
+ *	#define SYS_PRINT_MEM_INFO		0x00000002
+ *	#define SYS_PRINT_TIMER_INFO		0x00000004
+ *	#define SYS_PRINT_LOCK_INFO		0x00000008
+ *	#define SYS_PRINT_FTRACE_INFO		0x00000010
+ *	#define SYS_PRINT_ALL_PRINTK_MSG	0x00000020
+ *	#define SYS_PRINT_ALL_CPU_BT		0x00000040
+ *	#define SYS_PRINT_BLOCKED_TASKS		0x00000080
+ */
+unsigned long lockup_print_mask;
+core_param(lockup_print_mask, lockup_print_mask, ulong, 0644);
+
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 
 # ifdef CONFIG_SMP
@@ -212,6 +229,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 				clear_bit_unlock(0, &hard_lockup_nmi_warn);
 		}
 
+		sys_show_info(lockup_print_mask);
 		if (hardlockup_panic)
 			nmi_panic(regs, "Hard LOCKUP");
 
@@ -774,6 +792,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		}
 
 		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
+
+		sys_show_info(lockup_print_mask);
 		if (softlockup_panic)
 			panic("softlockup: hung tasks");
 	}
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/3]  generalize panic_print's dump function to be used by other kernel parts
  2025-05-11  8:52 [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts Feng Tang
                   ` (2 preceding siblings ...)
  2025-05-11  8:52 ` [PATCH v1 3/3] kernel/watchdog: add option to dump system info when system is locked up Feng Tang
@ 2025-05-12  1:46 ` Andrew Morton
  2025-05-12  3:14   ` Feng Tang
  3 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2025-05-12  1:46 UTC (permalink / raw)
  To: Feng Tang
  Cc: Petr Mladek, Steven Rostedt, Lance Yang, linux-kernel, mhiramat,
	llong

On Sun, 11 May 2025 16:52:51 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:

> When working on kernel stability issues, panic, task-hung and 
> software/hardware lockup are frequently met. And to debug them, user
> may need lots of system information at that time, like task call stacks,
> lock info, memory info etc. 
> 
> panic case already has panic_print_sys_info() for this purpose, and has
> a 'panic_print' bitmask to control what kinds of information is needed,
> which is also helpful to debug other task-hung and lockup cases.
> 
> So this patchset extract the function out, and make it usable for other
> cases which also need system info for debugging. 
> 
> Locally these have been used in our bug chasing for stablility issues
> and was helpful.

Truth.  Our responses to panics, oopses, WARNs, BUGs, OOMs etc seem
quite poorly organized.  Some effort to clean up (and document!) all of
this sounds good.

My vote is to permit the display of every scrap of information we can
think of in all situations.  And then to permit users to select which of
that information is to be displayed under each situation.

As for this patchset - sounds good to me.  For now I'll await input
from reviewers.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/3]  generalize panic_print's dump function to be used by other kernel parts
  2025-05-12  1:46 ` [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts Andrew Morton
@ 2025-05-12  3:14   ` Feng Tang
  2025-05-12  8:23     ` Lance Yang
  0 siblings, 1 reply; 22+ messages in thread
From: Feng Tang @ 2025-05-12  3:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Steven Rostedt, Lance Yang, linux-kernel, mhiramat,
	llong

Hi Andrew,

Thanks for the review!

On Sun, May 11, 2025 at 06:46:17PM -0700, Andrew Morton wrote:
> On Sun, 11 May 2025 16:52:51 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
> 
> > When working on kernel stability issues, panic, task-hung and 
> > software/hardware lockup are frequently met. And to debug them, user
> > may need lots of system information at that time, like task call stacks,
> > lock info, memory info etc. 
> > 
> > panic case already has panic_print_sys_info() for this purpose, and has
> > a 'panic_print' bitmask to control what kinds of information is needed,
> > which is also helpful to debug other task-hung and lockup cases.
> > 
> > So this patchset extract the function out, and make it usable for other
> > cases which also need system info for debugging. 
> > 
> > Locally these have been used in our bug chasing for stablility issues
> > and was helpful.
> 
> Truth.  Our responses to panics, oopses, WARNs, BUGs, OOMs etc seem
> quite poorly organized.  Some effort to clean up (and document!) all of
> this sounds good.
> 
> My vote is to permit the display of every scrap of information we can
> think of in all situations.  And then to permit users to select which of
> that information is to be displayed under each situation.

Good point! Maybe one future todo is to add a gloabl system info dump
function with ONE global knob for selecting different kinds of information,
which could be embedded into some cases you mentioned above.

> As for this patchset - sounds good to me.  For now I'll await input
> from reviewers.

Thank you!

- Feng

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts
  2025-05-12  3:14   ` Feng Tang
@ 2025-05-12  8:23     ` Lance Yang
  2025-05-12  9:31       ` Feng Tang
  2025-05-13 13:27       ` Petr Mladek
  0 siblings, 2 replies; 22+ messages in thread
From: Lance Yang @ 2025-05-12  8:23 UTC (permalink / raw)
  To: Feng Tang, Andrew Morton
  Cc: Petr Mladek, Steven Rostedt, linux-kernel, mhiramat, llong



On 2025/5/12 11:14, Feng Tang wrote:
> Hi Andrew,
> 
> Thanks for the review!
> 
> On Sun, May 11, 2025 at 06:46:17PM -0700, Andrew Morton wrote:
>> On Sun, 11 May 2025 16:52:51 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
>>
>>> When working on kernel stability issues, panic, task-hung and
>>> software/hardware lockup are frequently met. And to debug them, user
>>> may need lots of system information at that time, like task call stacks,
>>> lock info, memory info etc.
>>>
>>> panic case already has panic_print_sys_info() for this purpose, and has
>>> a 'panic_print' bitmask to control what kinds of information is needed,
>>> which is also helpful to debug other task-hung and lockup cases.
>>>
>>> So this patchset extract the function out, and make it usable for other
>>> cases which also need system info for debugging.
>>>
>>> Locally these have been used in our bug chasing for stablility issues
>>> and was helpful.
>>
>> Truth.  Our responses to panics, oopses, WARNs, BUGs, OOMs etc seem
>> quite poorly organized.  Some effort to clean up (and document!) all of
>> this sounds good.
>>
>> My vote is to permit the display of every scrap of information we can
>> think of in all situations.  And then to permit users to select which of
>> that information is to be displayed under each situation.

Completely agreed. The tricky part is making a global knob that works for
all situations without breaking userspace, but it's a better system-wide
approach ;)

> 
> Good point! Maybe one future todo is to add a gloabl system info dump
> function with ONE global knob for selecting different kinds of information,
> which could be embedded into some cases you mentioned above.

IMHO, for features with their own knobs, we need:
a) The global knob (if enabled) turns on all related feature-level knobs,
b) while still allowing users to manually override individual knobs.

Something like:

If SYS_PRINT_ALL_CPU_BT (global knob) is on, it enables 
hung_task_all_cpu_backtrace
for hung-task situation automatically. But users can still disable it via
hung_task_all_cpu_backtrace.

Anyway, the global knob (when set) controls all feature-level knobs, but
they can override it if explicitly set ;)

Thanks,
Lance

> 
>> As for this patchset - sounds good to me.  For now I'll await input
>> from reviewers.
> 
> Thank you!
> 
> - Feng


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts
  2025-05-12  8:23     ` Lance Yang
@ 2025-05-12  9:31       ` Feng Tang
  2025-05-13 13:27       ` Petr Mladek
  1 sibling, 0 replies; 22+ messages in thread
From: Feng Tang @ 2025-05-12  9:31 UTC (permalink / raw)
  To: Lance Yang
  Cc: Andrew Morton, Petr Mladek, Steven Rostedt, linux-kernel,
	mhiramat, llong

On Mon, May 12, 2025 at 04:23:30PM +0800, Lance Yang wrote:
> 
> 
> On 2025/5/12 11:14, Feng Tang wrote:
> > Hi Andrew,
> > 
> > Thanks for the review!
> > 
> > On Sun, May 11, 2025 at 06:46:17PM -0700, Andrew Morton wrote:
> > > On Sun, 11 May 2025 16:52:51 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
> > > 
> > > > When working on kernel stability issues, panic, task-hung and
> > > > software/hardware lockup are frequently met. And to debug them, user
> > > > may need lots of system information at that time, like task call stacks,
> > > > lock info, memory info etc.
> > > > 
> > > > panic case already has panic_print_sys_info() for this purpose, and has
> > > > a 'panic_print' bitmask to control what kinds of information is needed,
> > > > which is also helpful to debug other task-hung and lockup cases.
> > > > 
> > > > So this patchset extract the function out, and make it usable for other
> > > > cases which also need system info for debugging.
> > > > 
> > > > Locally these have been used in our bug chasing for stablility issues
> > > > and was helpful.
> > > 
> > > Truth.  Our responses to panics, oopses, WARNs, BUGs, OOMs etc seem
> > > quite poorly organized.  Some effort to clean up (and document!) all of
> > > this sounds good.
> > > 
> > > My vote is to permit the display of every scrap of information we can
> > > think of in all situations.  And then to permit users to select which of
> > > that information is to be displayed under each situation.
> 
> Completely agreed. The tricky part is making a global knob that works for
> all situations without breaking userspace, but it's a better system-wide
> approach ;)
> 
> > 
> > Good point! Maybe one future todo is to add a gloabl system info dump
> > function with ONE global knob for selecting different kinds of information,
> > which could be embedded into some cases you mentioned above.
> 
> IMHO, for features with their own knobs, we need:
> a) The global knob (if enabled) turns on all related feature-level knobs,
> b) while still allowing users to manually override individual knobs.
> 
> Something like:
> 
> If SYS_PRINT_ALL_CPU_BT (global knob) is on, it enables
> hung_task_all_cpu_backtrace
> for hung-task situation automatically. But users can still disable it via
> hung_task_all_cpu_backtrace.
> 
> Anyway, the global knob (when set) controls all feature-level knobs, but
> they can override it if explicitly set ;)

Yes, it makes sense for parts which already has its own user space
control knob.

What I proposed is a todo mostly for other parts than panic/hungtask
in this patchset, as these parts have some special handling required,
like panic need to handle printk-replay for kexec case. 

Thanks,
Feng

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info
  2025-05-11  8:52 ` [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info Feng Tang
@ 2025-05-13  9:57   ` Petr Mladek
  2025-05-13 13:23     ` Feng Tang
  0 siblings, 1 reply; 22+ messages in thread
From: Petr Mladek @ 2025-05-13  9:57 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Steven Rostedt, Lance Yang, linux-kernel, mhiramat,
	llong

On Sun 2025-05-11 16:52:52, Feng Tang wrote:
> panic_print was introduced to help debugging kernel panic by dumping
> different kinds of system information like tasks' call stack, memory,
> ftrace buffer etc. Acutually this function could help debugging cases
> like task-hung, soft/hard lockup too, where user may need the snapshot
> of system info at that time.
> 
> Extract sys_show_info() function out to be used by other kernel parts
> for debugging.
> 
> --- a/include/linux/panic.h
> +++ b/include/linux/panic.h
> @@ -16,6 +16,17 @@ extern void oops_enter(void);
>  extern void oops_exit(void);
>  extern bool oops_may_print(void);
>  
> +#define SYS_PRINT_TASK_INFO		0x00000001
> +#define SYS_PRINT_MEM_INFO		0x00000002
> +#define SYS_PRINT_TIMER_INFO		0x00000004
> +#define SYS_PRINT_LOCK_INFO		0x00000008
> +#define SYS_PRINT_FTRACE_INFO		0x00000010
> +#define SYS_PRINT_ALL_PRINTK_MSG	0x00000020

Please, remove this option from the generic interface. It is
controversial. In the current form, it makes sense to replay
the log only in panic() after all the other actions:

	console_verbose();
	bust_spinlocks(1);
	panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
	printk_legacy_allow_panic_sync();
	console_unblank();
	debug_locks_off();
	console_flush_on_panic(CONSOLE_FLUSH_PENDING);

All these operations have some (side-)effect which increases
the chance to actually see the messages on the console.

I think that it was primary meant for graphics consoles. But there
it would need to be used together with printk_delay
(/proc/sys/kernel/printk_delay, sysctl printk_delay).

Note that it creates a hairy code. It is the only reason why we need to
call panic_print_sys_info() twice with "false" and "true"
parameter.

That said, I could imagine a generic use, for example, after forcing
ignore_loglevel or so. Otherwise, I do not see any advantage in
flushing the same messages again, for example, on serial or network
consoles where they are likely already stored. We could add this
later when there is a real-life demand.

> +#define SYS_PRINT_ALL_CPU_BT		0x00000040
> +#define SYS_PRINT_BLOCKED_TASKS		0x00000080

The generic approach might deserve a separate source file,
for example:

    include/linux/sys_info.h
    lib/sys_info.c

Also I always considered the bitmask as a horrible user interface.
It might be used internally. But it would be better to allow a human
readable parameter, for example:

	panic_sys_info=task,mem,timer,lock,ftrace,bt,all_bt,blocked_tasks

The console reply might be handled by a separate:

	panic_console_reply=1

And it would obsolete the existing "panic_print" which is an
ugly name and interface from my POV.

> +extern void sys_show_info(unsigned long info_mask);
> +
>  extern bool panic_triggering_all_cpu_backtrace;
>  extern int panic_timeout;
>  extern unsigned long panic_print;
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -208,33 +200,44 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
>  }
>  EXPORT_SYMBOL(nmi_panic);
>  
> -static void panic_print_sys_info(bool console_flush)
> +void sys_show_info(unsigned long info_mask)
>  {
> -	if (console_flush) {
> -		if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> -			console_flush_on_panic(CONSOLE_REPLAY_ALL);
> -		return;
> -	}
> -
> -	if (panic_print & PANIC_PRINT_TASK_INFO)
> +	if (info_mask & SYS_PRINT_TASK_INFO)
>  		show_state();
>  
> -	if (panic_print & PANIC_PRINT_MEM_INFO)
> +	if (info_mask & SYS_PRINT_MEM_INFO)
>  		show_mem();
>  
> -	if (panic_print & PANIC_PRINT_TIMER_INFO)
> +	if (info_mask & SYS_PRINT_TIMER_INFO)
>  		sysrq_timer_list_show();
>  
> -	if (panic_print & PANIC_PRINT_LOCK_INFO)
> +	if (info_mask & SYS_PRINT_LOCK_INFO)
>  		debug_show_all_locks();
>  
> -	if (panic_print & PANIC_PRINT_FTRACE_INFO)
> +	if (info_mask & SYS_PRINT_FTRACE_INFO)
>  		ftrace_dump(DUMP_ALL);
>  
> -	if (panic_print & PANIC_PRINT_BLOCKED_TASKS)
> +	if (panic_print & SYS_PRINT_ALL_PRINTK_MSG)
> +		console_flush_on_panic(CONSOLE_REPLAY_ALL);

I do not see any advantage in replaying the log at this stage.
It might make sense to replay the messages printed before
the critical situation. But it makes sense only when they
might be filtered/blocked before and can be seen now (unblanked
consoles, forced  ignore_loglevel, or so).

I would keep this in the bitmap for backward compactibility.
But I would ignore it my the generic print_sys_info().

And I would replace panic_print_sys_info(true) call with:

static void panic_console_replay(void)
{
	if (panic_print & SYS_PRINT_ALL_PRINTK_MSG)
		console_flush_on_panic(CONSOLE_REPLAY_ALL);
}

> +	if (info_mask & SYS_PRINT_ALL_CPU_BT)
> +		trigger_all_cpu_backtrace();
> +
> +	if (info_mask & SYS_PRINT_BLOCKED_TASKS)
>  		show_state_filter(TASK_UNINTERRUPTIBLE);
>  }

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info
  2025-05-13  9:57   ` Petr Mladek
@ 2025-05-13 13:23     ` Feng Tang
  2025-05-15 10:32       ` Petr Mladek
  0 siblings, 1 reply; 22+ messages in thread
From: Feng Tang @ 2025-05-13 13:23 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Steven Rostedt, Lance Yang, linux-kernel, mhiramat,
	llong

Hi Petr,

Thanks for the review!

On Tue, May 13, 2025 at 11:57:19AM +0200, Petr Mladek wrote:
> On Sun 2025-05-11 16:52:52, Feng Tang wrote:
> > panic_print was introduced to help debugging kernel panic by dumping
> > different kinds of system information like tasks' call stack, memory,
> > ftrace buffer etc. Acutually this function could help debugging cases
> > like task-hung, soft/hard lockup too, where user may need the snapshot
> > of system info at that time.
> > 
> > Extract sys_show_info() function out to be used by other kernel parts
> > for debugging.
> > 
> > --- a/include/linux/panic.h
> > +++ b/include/linux/panic.h
> > @@ -16,6 +16,17 @@ extern void oops_enter(void);
> >  extern void oops_exit(void);
> >  extern bool oops_may_print(void);
> >  
> > +#define SYS_PRINT_TASK_INFO		0x00000001
> > +#define SYS_PRINT_MEM_INFO		0x00000002
> > +#define SYS_PRINT_TIMER_INFO		0x00000004
> > +#define SYS_PRINT_LOCK_INFO		0x00000008
> > +#define SYS_PRINT_FTRACE_INFO		0x00000010
> > +#define SYS_PRINT_ALL_PRINTK_MSG	0x00000020
> 
> Please, remove this option from the generic interface. It is
> controversial. In the current form, it makes sense to replay
> the log only in panic() after all the other actions:
> 
> 	console_verbose();
> 	bust_spinlocks(1);
> 	panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
> 	printk_legacy_allow_panic_sync();
> 	console_unblank();
> 	debug_locks_off();
> 	console_flush_on_panic(CONSOLE_FLUSH_PENDING);

OK, will do.

In my RFC version, I did reserved it for panic use only :) and
added the support to general case in v1 after some hesitation. 

https://lore.kernel.org/lkml/20250507104322.30700-2-feng.tang@linux.alibaba.com/

> 
> All these operations have some (side-)effect which increases
> the chance to actually see the messages on the console.
> 
> I think that it was primary meant for graphics consoles. But there
> it would need to be used together with printk_delay
> (/proc/sys/kernel/printk_delay, sysctl printk_delay).
> 
> Note that it creates a hairy code. It is the only reason why we need to
> call panic_print_sys_info() twice with "false" and "true"
> parameter.

Yes.

> 
> That said, I could imagine a generic use, for example, after forcing
> ignore_loglevel or so. Otherwise, I do not see any advantage in
> flushing the same messages again, for example, on serial or network
> consoles where they are likely already stored. We could add this
> later when there is a real-life demand.

You are right, one main usage is for some product which has only graphics
console and no serial one. And we also used for serial console sometimes,
when the console have a lot of user space message mixed with kernel ones,
and we'd like to see the full clean kernel messages.

> 
> > +#define SYS_PRINT_ALL_CPU_BT		0x00000040
> > +#define SYS_PRINT_BLOCKED_TASKS		0x00000080
> 
> The generic approach might deserve a separate source file,
> for example:
> 
>     include/linux/sys_info.h
>     lib/sys_info.c

Thanks for the suggestion! I'm really not good at naming. 

As panic.c is always built-in, I'll put sys_info.c as obj-y.

> Also I always considered the bitmask as a horrible user interface.
> It might be used internally. But it would be better to allow a human
> readable parameter, for example:
> 
> 	panic_sys_info=task,mem,timer,lock,ftrace,bt,all_bt,blocked_tasks

Yes, it's convenient for developers, as a cmdline parameter being parsed
at boot time.

But I think bitmask may be easier for runtime changing as a core parameter
under /proc/ or /sys/, or from sysctl interface. There are also some other
modules use debug bitmask controlling printking info for different
sub-components.

And we have similar control knobs for hung, lockup etc.

Or should I change the name from 'xxx_print_mask' to 'xxx_sysinfo_flag'
in patch 2/3 ?

> 
> The console reply might be handled by a separate:
> 
> 	panic_console_reply=1
> 
> And it would obsolete the existing "panic_print" which is an
> ugly name and interface from my POV.

Agree it's ugly :). But besides a kernel parameter,  'panic_print' is
also a sysctl interface, I'm afraid renaming it might break user ABI.

> > +extern void sys_show_info(unsigned long info_mask);
> > +
> >  extern bool panic_triggering_all_cpu_backtrace;
> >  extern int panic_timeout;
> >  extern unsigned long panic_print;
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -208,33 +200,44 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
> >  }
> >  EXPORT_SYMBOL(nmi_panic);
> >  
> > -static void panic_print_sys_info(bool console_flush)
> > +void sys_show_info(unsigned long info_mask)
> >  {
> > -	if (console_flush) {
> > -		if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> > -			console_flush_on_panic(CONSOLE_REPLAY_ALL);
> > -		return;
> > -	}
> > -
> > -	if (panic_print & PANIC_PRINT_TASK_INFO)
> > +	if (info_mask & SYS_PRINT_TASK_INFO)
> >  		show_state();
> >  
> > -	if (panic_print & PANIC_PRINT_MEM_INFO)
> > +	if (info_mask & SYS_PRINT_MEM_INFO)
> >  		show_mem();
> >  
> > -	if (panic_print & PANIC_PRINT_TIMER_INFO)
> > +	if (info_mask & SYS_PRINT_TIMER_INFO)
> >  		sysrq_timer_list_show();
> >  
> > -	if (panic_print & PANIC_PRINT_LOCK_INFO)
> > +	if (info_mask & SYS_PRINT_LOCK_INFO)
> >  		debug_show_all_locks();
> >  
> > -	if (panic_print & PANIC_PRINT_FTRACE_INFO)
> > +	if (info_mask & SYS_PRINT_FTRACE_INFO)
> >  		ftrace_dump(DUMP_ALL);
> >  
> > -	if (panic_print & PANIC_PRINT_BLOCKED_TASKS)
> > +	if (panic_print & SYS_PRINT_ALL_PRINTK_MSG)
> > +		console_flush_on_panic(CONSOLE_REPLAY_ALL);
> 
> I do not see any advantage in replaying the log at this stage.
> It might make sense to replay the messages printed before
> the critical situation. But it makes sense only when they
> might be filtered/blocked before and can be seen now (unblanked
> consoles, forced  ignore_loglevel, or so).
> 
> I would keep this in the bitmap for backward compactibility.
> But I would ignore it my the generic print_sys_info().

OK.

> And I would replace panic_print_sys_info(true) call with:
> 
> static void panic_console_replay(void)
> {
> 	if (panic_print & SYS_PRINT_ALL_PRINTK_MSG)
> 		console_flush_on_panic(CONSOLE_REPLAY_ALL);
> }

Nice cleanup! Will do. I'll fold this change with this patch.

Thanks,
Feng

> 
> > +	if (info_mask & SYS_PRINT_ALL_CPU_BT)
> > +		trigger_all_cpu_backtrace();
> > +
> > +	if (info_mask & SYS_PRINT_BLOCKED_TASKS)
> >  		show_state_filter(TASK_UNINTERRUPTIBLE);
> >  }
> 
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts
  2025-05-12  8:23     ` Lance Yang
  2025-05-12  9:31       ` Feng Tang
@ 2025-05-13 13:27       ` Petr Mladek
  2025-05-13 17:09         ` Paul E. McKenney
  2025-05-14  3:22         ` Feng Tang
  1 sibling, 2 replies; 22+ messages in thread
From: Petr Mladek @ 2025-05-13 13:27 UTC (permalink / raw)
  To: Lance Yang
  Cc: Feng Tang, Andrew Morton, Steven Rostedt, linux-kernel, mhiramat,
	llong, Paul E. McKenney, John Ogness, Sergey Senozhatsky,
	Tomasz Figa, Peter Zijlstra, Ingo Molnar, Mel Gorman,
	Thomas Gleixner, Michal Hocko, Tejun Heo, Douglas Anderson

On Mon 2025-05-12 16:23:30, Lance Yang wrote:
> 
> 
> On 2025/5/12 11:14, Feng Tang wrote:
> > Hi Andrew,
> > 
> > Thanks for the review!
> > 
> > On Sun, May 11, 2025 at 06:46:17PM -0700, Andrew Morton wrote:
> > > On Sun, 11 May 2025 16:52:51 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
> > > 
> > > > When working on kernel stability issues, panic, task-hung and
> > > > software/hardware lockup are frequently met. And to debug them, user
> > > > may need lots of system information at that time, like task call stacks,
> > > > lock info, memory info etc.
> > > > 
> > > > panic case already has panic_print_sys_info() for this purpose, and has
> > > > a 'panic_print' bitmask to control what kinds of information is needed,
> > > > which is also helpful to debug other task-hung and lockup cases.
> > > > 
> > > > So this patchset extract the function out, and make it usable for other
> > > > cases which also need system info for debugging.
> > > > 
> > > > Locally these have been used in our bug chasing for stablility issues
> > > > and was helpful.
> > > 
> > > Truth.  Our responses to panics, oopses, WARNs, BUGs, OOMs etc seem
> > > quite poorly organized.  Some effort to clean up (and document!) all of
> > > this sounds good.
> > > 
> > > My vote is to permit the display of every scrap of information we can
> > > think of in all situations.  And then to permit users to select which of
> > > that information is to be displayed under each situation.
> 
> Completely agreed. The tricky part is making a global knob that works for
> all situations without breaking userspace, but it's a better system-wide
> approach ;)
> 
> > 
> > Good point! Maybe one future todo is to add a gloabl system info dump
> > function with ONE global knob for selecting different kinds of information,
> > which could be embedded into some cases you mentioned above.
> 
> IMHO, for features with their own knobs, we need:
> a) The global knob (if enabled) turns on all related feature-level knobs,
> b) while still allowing users to manually override individual knobs.
> 
> Something like:
> 
> If SYS_PRINT_ALL_CPU_BT (global knob) is on, it enables
> hung_task_all_cpu_backtrace
> for hung-task situation automatically. But users can still disable it via
> hung_task_all_cpu_backtrace.

I am all for unifying the options for printing debug information
in various emergency situations. I am just not sure whether we really
want to do the same in all situations.

Some lockup detectors tries to be more clever, for example:

  + RCU stall detector prints backtraces only from CPUs which are
    involved in the stall, see print_other_cpu_stall().

  + Workqueues watchdog shows backtraces from tasks which are
    preventing forward progress, see show_cpu_pool_hog().

And stalls are about scheduling (disabled preemption, disabled IRQ,
deadlocks, too long uninterruptible sleep). OOM is about memory
usage. Oops is about an invalid memory access. WARNs() are
completely random stuff.

Also I am afraid of printing too much information when the system
is supposed to continue running. It would make sense to print it in
nbcon_cpu_emergency_enter()/exit() context which disables
preemption. And it might cause softlockups on its own.

Finally, I wonder whether ftrace_dump() might cause a livelock when ftrace
is adding new messages in parallel.

The situation is much easier during panic() because the system is
going to die() anyway, non-panic CPUs are stopped, ...

That said, I could understand that people might want to see as much
information as possible when the console is fast and the range of
possible problems is big.

Anyway, I have added few more people into Cc who are interested into
the various watchdogs.

And there is parallel initiative which tries to unify the loglevel or
somehow make the filtering easier, see
https://lore.kernel.org/r/20250424070436.2380215-1-senozhatsky@chromium.org

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts
  2025-05-13 13:27       ` Petr Mladek
@ 2025-05-13 17:09         ` Paul E. McKenney
  2025-05-14  3:33           ` Feng Tang
  2025-05-14  3:22         ` Feng Tang
  1 sibling, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2025-05-13 17:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Lance Yang, Feng Tang, Andrew Morton, Steven Rostedt,
	linux-kernel, mhiramat, llong, John Ogness, Sergey Senozhatsky,
	Tomasz Figa, Peter Zijlstra, Ingo Molnar, Mel Gorman,
	Thomas Gleixner, Michal Hocko, Tejun Heo, Douglas Anderson

On Tue, May 13, 2025 at 03:27:33PM +0200, Petr Mladek wrote:
> On Mon 2025-05-12 16:23:30, Lance Yang wrote:
> > 
> > 
> > On 2025/5/12 11:14, Feng Tang wrote:
> > > Hi Andrew,
> > > 
> > > Thanks for the review!
> > > 
> > > On Sun, May 11, 2025 at 06:46:17PM -0700, Andrew Morton wrote:
> > > > On Sun, 11 May 2025 16:52:51 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
> > > > 
> > > > > When working on kernel stability issues, panic, task-hung and
> > > > > software/hardware lockup are frequently met. And to debug them, user
> > > > > may need lots of system information at that time, like task call stacks,
> > > > > lock info, memory info etc.
> > > > > 
> > > > > panic case already has panic_print_sys_info() for this purpose, and has
> > > > > a 'panic_print' bitmask to control what kinds of information is needed,
> > > > > which is also helpful to debug other task-hung and lockup cases.
> > > > > 
> > > > > So this patchset extract the function out, and make it usable for other
> > > > > cases which also need system info for debugging.
> > > > > 
> > > > > Locally these have been used in our bug chasing for stablility issues
> > > > > and was helpful.
> > > > 
> > > > Truth.  Our responses to panics, oopses, WARNs, BUGs, OOMs etc seem
> > > > quite poorly organized.  Some effort to clean up (and document!) all of
> > > > this sounds good.
> > > > 
> > > > My vote is to permit the display of every scrap of information we can
> > > > think of in all situations.  And then to permit users to select which of
> > > > that information is to be displayed under each situation.
> > 
> > Completely agreed. The tricky part is making a global knob that works for
> > all situations without breaking userspace, but it's a better system-wide
> > approach ;)
> > 
> > > 
> > > Good point! Maybe one future todo is to add a gloabl system info dump
> > > function with ONE global knob for selecting different kinds of information,
> > > which could be embedded into some cases you mentioned above.
> > 
> > IMHO, for features with their own knobs, we need:
> > a) The global knob (if enabled) turns on all related feature-level knobs,
> > b) while still allowing users to manually override individual knobs.
> > 
> > Something like:
> > 
> > If SYS_PRINT_ALL_CPU_BT (global knob) is on, it enables
> > hung_task_all_cpu_backtrace
> > for hung-task situation automatically. But users can still disable it via
> > hung_task_all_cpu_backtrace.
> 
> I am all for unifying the options for printing debug information
> in various emergency situations. I am just not sure whether we really
> want to do the same in all situations.
> 
> Some lockup detectors tries to be more clever, for example:
> 
>   + RCU stall detector prints backtraces only from CPUs which are
>     involved in the stall, see print_other_cpu_stall().
> 
>   + Workqueues watchdog shows backtraces from tasks which are
>     preventing forward progress, see show_cpu_pool_hog().
> 
> And stalls are about scheduling (disabled preemption, disabled IRQ,
> deadlocks, too long uninterruptible sleep). OOM is about memory
> usage. Oops is about an invalid memory access. WARNs() are
> completely random stuff.
> 
> Also I am afraid of printing too much information when the system
> is supposed to continue running. It would make sense to print it in
> nbcon_cpu_emergency_enter()/exit() context which disables
> preemption. And it might cause softlockups on its own.

And we did do some of the cleverness that Petr points out because of
problems caused by flooding the console log.  We first ran into this
sort of thing on embedded systems with slow serial consoles (where 115K
baud is now way slow), but it also shows up in other environments, for
example, those committing large numbers of console logs to stable storage,
multiplexing large numbers of logs across networks that sometimes get
congested, and so on.

So I second the call for individual knobs, either in addition to or
instead of the global knob.

> Finally, I wonder whether ftrace_dump() might cause a livelock when ftrace
> is adding new messages in parallel.

It definitely can cause problems, and me learning this the hard way is
why rcutorture calls tracing_off() before calling ftrace_dump().

> The situation is much easier during panic() because the system is
> going to die() anyway, non-panic CPUs are stopped, ...
> 
> That said, I could understand that people might want to see as much
> information as possible when the console is fast and the range of
> possible problems is big.

No argument here.

							Thanx, Paul

> Anyway, I have added few more people into Cc who are interested into
> the various watchdogs.
> 
> And there is parallel initiative which tries to unify the loglevel or
> somehow make the filtering easier, see
> https://lore.kernel.org/r/20250424070436.2380215-1-senozhatsky@chromium.org
> 
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts
  2025-05-13 13:27       ` Petr Mladek
  2025-05-13 17:09         ` Paul E. McKenney
@ 2025-05-14  3:22         ` Feng Tang
  1 sibling, 0 replies; 22+ messages in thread
From: Feng Tang @ 2025-05-14  3:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Lance Yang, Andrew Morton, Steven Rostedt, linux-kernel, mhiramat,
	llong, Paul E. McKenney, John Ogness, Sergey Senozhatsky,
	Tomasz Figa, Peter Zijlstra, Ingo Molnar, Mel Gorman,
	Thomas Gleixner, Michal Hocko, Tejun Heo, Douglas Anderson

On Tue, May 13, 2025 at 03:27:33PM +0200, Petr Mladek wrote:
> On Mon 2025-05-12 16:23:30, Lance Yang wrote:
> > 
> > 
> > On 2025/5/12 11:14, Feng Tang wrote:
> > > Hi Andrew,
> > > 
> > > Thanks for the review!
> > > 
> > > On Sun, May 11, 2025 at 06:46:17PM -0700, Andrew Morton wrote:
> > > > On Sun, 11 May 2025 16:52:51 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
> > > > 
> > > > > When working on kernel stability issues, panic, task-hung and
> > > > > software/hardware lockup are frequently met. And to debug them, user
> > > > > may need lots of system information at that time, like task call stacks,
> > > > > lock info, memory info etc.
> > > > > 
> > > > > panic case already has panic_print_sys_info() for this purpose, and has
> > > > > a 'panic_print' bitmask to control what kinds of information is needed,
> > > > > which is also helpful to debug other task-hung and lockup cases.
> > > > > 
> > > > > So this patchset extract the function out, and make it usable for other
> > > > > cases which also need system info for debugging.
> > > > > 
> > > > > Locally these have been used in our bug chasing for stablility issues
> > > > > and was helpful.
> > > > 
> > > > Truth.  Our responses to panics, oopses, WARNs, BUGs, OOMs etc seem
> > > > quite poorly organized.  Some effort to clean up (and document!) all of
> > > > this sounds good.
> > > > 
> > > > My vote is to permit the display of every scrap of information we can
> > > > think of in all situations.  And then to permit users to select which of
> > > > that information is to be displayed under each situation.
> > 
> > Completely agreed. The tricky part is making a global knob that works for
> > all situations without breaking userspace, but it's a better system-wide
> > approach ;)
> > 
> > > 
> > > Good point! Maybe one future todo is to add a gloabl system info dump
> > > function with ONE global knob for selecting different kinds of information,
> > > which could be embedded into some cases you mentioned above.
> > 
> > IMHO, for features with their own knobs, we need:
> > a) The global knob (if enabled) turns on all related feature-level knobs,
> > b) while still allowing users to manually override individual knobs.
> > 
> > Something like:
> > 
> > If SYS_PRINT_ALL_CPU_BT (global knob) is on, it enables
> > hung_task_all_cpu_backtrace
> > for hung-task situation automatically. But users can still disable it via
> > hung_task_all_cpu_backtrace.
> 
> I am all for unifying the options for printing debug information
> in various emergency situations. I am just not sure whether we really
> want to do the same in all situations.

Yes, valid concern.

> Some lockup detectors tries to be more clever, for example:
> 
>   + RCU stall detector prints backtraces only from CPUs which are
>     involved in the stall, see print_other_cpu_stall().
> 
>   + Workqueues watchdog shows backtraces from tasks which are
>     preventing forward progress, see show_cpu_pool_hog().
> 
> And stalls are about scheduling (disabled preemption, disabled IRQ,
> deadlocks, too long uninterruptible sleep). OOM is about memory
> usage. Oops is about an invalid memory access. WARNs() are
> completely random stuff.

Agreed. I noticed RCU has special handling and I skipped "RCU stall"
case in this patchset on purpose :)

> Also I am afraid of printing too much information when the system
> is supposed to continue running. It would make sense to print it in
> nbcon_cpu_emergency_enter()/exit() context which disables
> preemption. And it might cause softlockups on its own.

Yes. And for the global knob, my thought is it's 0 (disabled) by
default, which equals doing nothing. And its user should be
experienced developers who knows precisely what information they
need, and set it runtime or by kernel command line. 

For 'panic_print' which we used frequently in bug chasing, we still
let it be 0 even in our debug version kernel, and only enable it
in debugging.

As for the patch set, I tried to not change existing behavior, and
just added option for user to get more info when needed.

> Finally, I wonder whether ftrace_dump() might cause a livelock when ftrace
> is adding new messages in parallel.

IIUC,  ftrace_dump_one() will turn off the tracing during dump, and
should be safe.

> The situation is much easier during panic() because the system is
> going to die() anyway, non-panic CPUs are stopped, ...

Yes.

> That said, I could understand that people might want to see as much
> information as possible when the console is fast and the range of
> possible problems is big.
 
I agree with you that more is not always better, it should be based
on real needs, case by case.

> Anyway, I have added few more people into Cc who are interested into
> the various watchdogs.
> 
> And there is parallel initiative which tries to unify the loglevel or
> somehow make the filtering easier, see
> https://lore.kernel.org/r/20250424070436.2380215-1-senozhatsky@chromium.org

Thanks for involving more people and sharing the link.

Thanks,
Feng

> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts
  2025-05-13 17:09         ` Paul E. McKenney
@ 2025-05-14  3:33           ` Feng Tang
  2025-05-14 13:55             ` Paul E. McKenney
  0 siblings, 1 reply; 22+ messages in thread
From: Feng Tang @ 2025-05-14  3:33 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Petr Mladek, Lance Yang, Andrew Morton, Steven Rostedt,
	linux-kernel, mhiramat, llong, John Ogness, Sergey Senozhatsky,
	Tomasz Figa, Peter Zijlstra, Ingo Molnar, Mel Gorman,
	Thomas Gleixner, Michal Hocko, Tejun Heo, Douglas Anderson

On Tue, May 13, 2025 at 10:09:51AM -0700, Paul E. McKenney wrote:
> On Tue, May 13, 2025 at 03:27:33PM +0200, Petr Mladek wrote:
> > On Mon 2025-05-12 16:23:30, Lance Yang wrote:
> > > 
> > > 
> > > On 2025/5/12 11:14, Feng Tang wrote:
> > > > Hi Andrew,
> > > > 
> > > > Thanks for the review!
> > > > 
> > > > On Sun, May 11, 2025 at 06:46:17PM -0700, Andrew Morton wrote:
> > > > > On Sun, 11 May 2025 16:52:51 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
> > > > > 
> > > > > > When working on kernel stability issues, panic, task-hung and
> > > > > > software/hardware lockup are frequently met. And to debug them, user
> > > > > > may need lots of system information at that time, like task call stacks,
> > > > > > lock info, memory info etc.
> > > > > > 
> > > > > > panic case already has panic_print_sys_info() for this purpose, and has
> > > > > > a 'panic_print' bitmask to control what kinds of information is needed,
> > > > > > which is also helpful to debug other task-hung and lockup cases.
> > > > > > 
> > > > > > So this patchset extract the function out, and make it usable for other
> > > > > > cases which also need system info for debugging.
> > > > > > 
> > > > > > Locally these have been used in our bug chasing for stablility issues
> > > > > > and was helpful.
> > > > > 
> > > > > Truth.  Our responses to panics, oopses, WARNs, BUGs, OOMs etc seem
> > > > > quite poorly organized.  Some effort to clean up (and document!) all of
> > > > > this sounds good.
> > > > > 
> > > > > My vote is to permit the display of every scrap of information we can
> > > > > think of in all situations.  And then to permit users to select which of
> > > > > that information is to be displayed under each situation.
> > > 
> > > Completely agreed. The tricky part is making a global knob that works for
> > > all situations without breaking userspace, but it's a better system-wide
> > > approach ;)
> > > 
> > > > 
> > > > Good point! Maybe one future todo is to add a gloabl system info dump
> > > > function with ONE global knob for selecting different kinds of information,
> > > > which could be embedded into some cases you mentioned above.
> > > 
> > > IMHO, for features with their own knobs, we need:
> > > a) The global knob (if enabled) turns on all related feature-level knobs,
> > > b) while still allowing users to manually override individual knobs.
> > > 
> > > Something like:
> > > 
> > > If SYS_PRINT_ALL_CPU_BT (global knob) is on, it enables
> > > hung_task_all_cpu_backtrace
> > > for hung-task situation automatically. But users can still disable it via
> > > hung_task_all_cpu_backtrace.
> > 
> > I am all for unifying the options for printing debug information
> > in various emergency situations. I am just not sure whether we really
> > want to do the same in all situations.
> > 
> > Some lockup detectors tries to be more clever, for example:
> > 
> >   + RCU stall detector prints backtraces only from CPUs which are
> >     involved in the stall, see print_other_cpu_stall().
> > 
> >   + Workqueues watchdog shows backtraces from tasks which are
> >     preventing forward progress, see show_cpu_pool_hog().
> > 
> > And stalls are about scheduling (disabled preemption, disabled IRQ,
> > deadlocks, too long uninterruptible sleep). OOM is about memory
> > usage. Oops is about an invalid memory access. WARNs() are
> > completely random stuff.
> > 
> > Also I am afraid of printing too much information when the system
> > is supposed to continue running. It would make sense to print it in
> > nbcon_cpu_emergency_enter()/exit() context which disables
> > preemption. And it might cause softlockups on its own.
> 
> And we did do some of the cleverness that Petr points out because of
> problems caused by flooding the console log.  We first ran into this
> sort of thing on embedded systems with slow serial consoles (where 115K
> baud is now way slow), but it also shows up in other environments, for
> example, those committing large numbers of console logs to stable storage,
> multiplexing large numbers of logs across networks that sometimes get
> congested, and so on.
> 
> So I second the call for individual knobs, either in addition to or
> instead of the global knob.

Thanks for the detail elaboration! RCU stall case is also a main target
in the stability issues I have worked on, besides panic/taskhung/lockup.
I noticed it has its own mature handling, and dare not to touch it in
this patchset :)

Thanks,
Feng

> 
> > Finally, I wonder whether ftrace_dump() might cause a livelock when ftrace
> > is adding new messages in parallel.
> 
> It definitely can cause problems, and me learning this the hard way is
> why rcutorture calls tracing_off() before calling ftrace_dump().
> 
> > The situation is much easier during panic() because the system is
> > going to die() anyway, non-panic CPUs are stopped, ...
> > 
> > That said, I could understand that people might want to see as much
> > information as possible when the console is fast and the range of
> > possible problems is big.
> 
> No argument here.
> 
> 							Thanx, Paul
> 
> > Anyway, I have added few more people into Cc who are interested into
> > the various watchdogs.
> > 
> > And there is parallel initiative which tries to unify the loglevel or
> > somehow make the filtering easier, see
> > https://lore.kernel.org/r/20250424070436.2380215-1-senozhatsky@chromium.org
> > 
> > Best Regards,
> > Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts
  2025-05-14  3:33           ` Feng Tang
@ 2025-05-14 13:55             ` Paul E. McKenney
  0 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2025-05-14 13:55 UTC (permalink / raw)
  To: Feng Tang
  Cc: Petr Mladek, Lance Yang, Andrew Morton, Steven Rostedt,
	linux-kernel, mhiramat, llong, John Ogness, Sergey Senozhatsky,
	Tomasz Figa, Peter Zijlstra, Ingo Molnar, Mel Gorman,
	Thomas Gleixner, Michal Hocko, Tejun Heo, Douglas Anderson

On Wed, May 14, 2025 at 11:33:23AM +0800, Feng Tang wrote:
> On Tue, May 13, 2025 at 10:09:51AM -0700, Paul E. McKenney wrote:
> > On Tue, May 13, 2025 at 03:27:33PM +0200, Petr Mladek wrote:
> > > On Mon 2025-05-12 16:23:30, Lance Yang wrote:
> > > > 
> > > > 
> > > > On 2025/5/12 11:14, Feng Tang wrote:
> > > > > Hi Andrew,
> > > > > 
> > > > > Thanks for the review!
> > > > > 
> > > > > On Sun, May 11, 2025 at 06:46:17PM -0700, Andrew Morton wrote:
> > > > > > On Sun, 11 May 2025 16:52:51 +0800 Feng Tang <feng.tang@linux.alibaba.com> wrote:
> > > > > > 
> > > > > > > When working on kernel stability issues, panic, task-hung and
> > > > > > > software/hardware lockup are frequently met. And to debug them, user
> > > > > > > may need lots of system information at that time, like task call stacks,
> > > > > > > lock info, memory info etc.
> > > > > > > 
> > > > > > > panic case already has panic_print_sys_info() for this purpose, and has
> > > > > > > a 'panic_print' bitmask to control what kinds of information is needed,
> > > > > > > which is also helpful to debug other task-hung and lockup cases.
> > > > > > > 
> > > > > > > So this patchset extract the function out, and make it usable for other
> > > > > > > cases which also need system info for debugging.
> > > > > > > 
> > > > > > > Locally these have been used in our bug chasing for stablility issues
> > > > > > > and was helpful.
> > > > > > 
> > > > > > Truth.  Our responses to panics, oopses, WARNs, BUGs, OOMs etc seem
> > > > > > quite poorly organized.  Some effort to clean up (and document!) all of
> > > > > > this sounds good.
> > > > > > 
> > > > > > My vote is to permit the display of every scrap of information we can
> > > > > > think of in all situations.  And then to permit users to select which of
> > > > > > that information is to be displayed under each situation.
> > > > 
> > > > Completely agreed. The tricky part is making a global knob that works for
> > > > all situations without breaking userspace, but it's a better system-wide
> > > > approach ;)
> > > > 
> > > > > 
> > > > > Good point! Maybe one future todo is to add a gloabl system info dump
> > > > > function with ONE global knob for selecting different kinds of information,
> > > > > which could be embedded into some cases you mentioned above.
> > > > 
> > > > IMHO, for features with their own knobs, we need:
> > > > a) The global knob (if enabled) turns on all related feature-level knobs,
> > > > b) while still allowing users to manually override individual knobs.
> > > > 
> > > > Something like:
> > > > 
> > > > If SYS_PRINT_ALL_CPU_BT (global knob) is on, it enables
> > > > hung_task_all_cpu_backtrace
> > > > for hung-task situation automatically. But users can still disable it via
> > > > hung_task_all_cpu_backtrace.
> > > 
> > > I am all for unifying the options for printing debug information
> > > in various emergency situations. I am just not sure whether we really
> > > want to do the same in all situations.
> > > 
> > > Some lockup detectors tries to be more clever, for example:
> > > 
> > >   + RCU stall detector prints backtraces only from CPUs which are
> > >     involved in the stall, see print_other_cpu_stall().
> > > 
> > >   + Workqueues watchdog shows backtraces from tasks which are
> > >     preventing forward progress, see show_cpu_pool_hog().
> > > 
> > > And stalls are about scheduling (disabled preemption, disabled IRQ,
> > > deadlocks, too long uninterruptible sleep). OOM is about memory
> > > usage. Oops is about an invalid memory access. WARNs() are
> > > completely random stuff.
> > > 
> > > Also I am afraid of printing too much information when the system
> > > is supposed to continue running. It would make sense to print it in
> > > nbcon_cpu_emergency_enter()/exit() context which disables
> > > preemption. And it might cause softlockups on its own.
> > 
> > And we did do some of the cleverness that Petr points out because of
> > problems caused by flooding the console log.  We first ran into this
> > sort of thing on embedded systems with slow serial consoles (where 115K
> > baud is now way slow), but it also shows up in other environments, for
> > example, those committing large numbers of console logs to stable storage,
> > multiplexing large numbers of logs across networks that sometimes get
> > congested, and so on.
> > 
> > So I second the call for individual knobs, either in addition to or
> > instead of the global knob.
> 
> Thanks for the detail elaboration! RCU stall case is also a main target
> in the stability issues I have worked on, besides panic/taskhung/lockup.
> I noticed it has its own mature handling, and dare not to touch it in
> this patchset :)

Uhhh...

Please do dare to touch it, as that is the only way that it can possibly
improve.  Just please also be very careful *how* you touch it.  ;-)

							Thanx, Paul

> Thanks,
> Feng
> 
> > 
> > > Finally, I wonder whether ftrace_dump() might cause a livelock when ftrace
> > > is adding new messages in parallel.
> > 
> > It definitely can cause problems, and me learning this the hard way is
> > why rcutorture calls tracing_off() before calling ftrace_dump().
> > 
> > > The situation is much easier during panic() because the system is
> > > going to die() anyway, non-panic CPUs are stopped, ...
> > > 
> > > That said, I could understand that people might want to see as much
> > > information as possible when the console is fast and the range of
> > > possible problems is big.
> > 
> > No argument here.
> > 
> > 							Thanx, Paul
> > 
> > > Anyway, I have added few more people into Cc who are interested into
> > > the various watchdogs.
> > > 
> > > And there is parallel initiative which tries to unify the loglevel or
> > > somehow make the filtering easier, see
> > > https://lore.kernel.org/r/20250424070436.2380215-1-senozhatsky@chromium.org
> > > 
> > > Best Regards,
> > > Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info
  2025-05-13 13:23     ` Feng Tang
@ 2025-05-15 10:32       ` Petr Mladek
  2025-05-15 11:28         ` Lance Yang
  2025-05-16  5:38         ` Feng Tang
  0 siblings, 2 replies; 22+ messages in thread
From: Petr Mladek @ 2025-05-15 10:32 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Steven Rostedt, Lance Yang, linux-kernel, mhiramat,
	llong

On Tue 2025-05-13 21:23:25, Feng Tang wrote:
> Hi Petr,
> 
> Thanks for the review!
> 
> On Tue, May 13, 2025 at 11:57:19AM +0200, Petr Mladek wrote:
> > On Sun 2025-05-11 16:52:52, Feng Tang wrote:
> > > panic_print was introduced to help debugging kernel panic by dumping
> > > different kinds of system information like tasks' call stack, memory,
> > > ftrace buffer etc. Acutually this function could help debugging cases
> > > like task-hung, soft/hard lockup too, where user may need the snapshot
> > > of system info at that time.
> > > 
> > The generic approach might deserve a separate source file,
> > for example:
> > 
> >     include/linux/sys_info.h
> >     lib/sys_info.c
> 
> Thanks for the suggestion! I'm really not good at naming. 
> 
> As panic.c is always built-in, I'll put sys_info.c as obj-y.

Makes sense.

> > Also I always considered the bitmask as a horrible user interface.
> > It might be used internally. But it would be better to allow a human
> > readable parameter, for example:
> > 
> > 	panic_sys_info=task,mem,timer,lock,ftrace,bt,all_bt,blocked_tasks
> 
> Yes, it's convenient for developers, as a cmdline parameter being parsed
> at boot time.
> 
> But I think bitmask may be easier for runtime changing as a core parameter
> under /proc/ or /sys/, or from sysctl interface. There are also some other
> modules use debug bitmask controlling printking info for different
> sub-components.

Good to know. Could you please give me a pointer to some other modules
using the bitmask? I believe that they exist but I can't find any.
I wonder how common it is...

Anyway, I personally find words/names easier to use. For example,
I like the following interfaces:

  #> cat /sys/power/pm_test
  [none] core processors platform devices freezer

  #> cat /sys/kernel/debug/tracing/available_tracers
  blk function_graph wakeup_dl wakeup_rt wakeup function nop

  #> cat /proc/sys/kernel/seccomp/actions_avail
  kill_process kill_thread trap errno user_notif trace log allow
  # cat /proc/sys/kernel/seccomp/actions_logged
  kill_process kill_thread trap errno user_notif trace log

> And we have similar control knobs for hung, lockup etc.
> 
> Or should I change the name from 'xxx_print_mask' to 'xxx_sysinfo_flag'
> in patch 2/3 ?
> 
> > 
> > The console reply might be handled by a separate:
> > 
> > 	panic_console_reply=1
> > 
> > And it would obsolete the existing "panic_print" which is an
> > ugly name and interface from my POV.
> 
> Agree it's ugly :). But besides a kernel parameter,  'panic_print' is
> also a sysctl interface, I'm afraid renaming it might break user ABI.

A solution would be to keep it and create "panic_sys_info="
with the human readable parameters in parallel. They would
store the request in the same bitmap.

We could print a message that "panic_print" has been obsoleted
by "panic_sys_info" when people use it.

Both parameters would override the current bitmap. So the later
used parameter or procfs/sysfs write would win.

Note:

One question is whether to use sysctl or module parameters.

An advantage of sysctl is the "systcl" userspace tool. Some people
might like it. But the API is very old and a bit cumbersome for
implementing.

The sysfs, aka include/linux/moduleparam.h, API looks cleaner to me.
But the parameters are hidden in the /sys/... jungle ;-)

I would slightly prefer "sysctl" because these parameters are easier
to find.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info
  2025-05-15 10:32       ` Petr Mladek
@ 2025-05-15 11:28         ` Lance Yang
  2025-05-16  5:38         ` Feng Tang
  1 sibling, 0 replies; 22+ messages in thread
From: Lance Yang @ 2025-05-15 11:28 UTC (permalink / raw)
  To: Petr Mladek, Feng Tang
  Cc: Andrew Morton, Steven Rostedt, linux-kernel, mhiramat, llong



On 2025/5/15 18:32, Petr Mladek wrote:
> On Tue 2025-05-13 21:23:25, Feng Tang wrote:
>> Hi Petr,
>>
>> Thanks for the review!
>>
>> On Tue, May 13, 2025 at 11:57:19AM +0200, Petr Mladek wrote:
>>> On Sun 2025-05-11 16:52:52, Feng Tang wrote:
>>>> panic_print was introduced to help debugging kernel panic by dumping
>>>> different kinds of system information like tasks' call stack, memory,
>>>> ftrace buffer etc. Acutually this function could help debugging cases
>>>> like task-hung, soft/hard lockup too, where user may need the snapshot
>>>> of system info at that time.
>>>>
>>> The generic approach might deserve a separate source file,
>>> for example:
>>>
>>>      include/linux/sys_info.h
>>>      lib/sys_info.c
>>
>> Thanks for the suggestion! I'm really not good at naming.
>>
>> As panic.c is always built-in, I'll put sys_info.c as obj-y.
> 
> Makes sense.
> 
>>> Also I always considered the bitmask as a horrible user interface.
>>> It might be used internally. But it would be better to allow a human
>>> readable parameter, for example:
>>>
>>> 	panic_sys_info=task,mem,timer,lock,ftrace,bt,all_bt,blocked_tasks
>>
>> Yes, it's convenient for developers, as a cmdline parameter being parsed
>> at boot time.
>>
>> But I think bitmask may be easier for runtime changing as a core parameter
>> under /proc/ or /sys/, or from sysctl interface. There are also some other
>> modules use debug bitmask controlling printking info for different
>> sub-components.
> 
> Good to know. Could you please give me a pointer to some other modules
> using the bitmask? I believe that they exist but I can't find any.
> I wonder how common it is...
> 
> Anyway, I personally find words/names easier to use. For example,
> I like the following interfaces:
> 
>    #> cat /sys/power/pm_test
>    [none] core processors platform devices freezer
> 
>    #> cat /sys/kernel/debug/tracing/available_tracers
>    blk function_graph wakeup_dl wakeup_rt wakeup function nop
> 
>    #> cat /proc/sys/kernel/seccomp/actions_avail
>    kill_process kill_thread trap errno user_notif trace log allow
>    # cat /proc/sys/kernel/seccomp/actions_logged
>    kill_process kill_thread trap errno user_notif trace log

Clean and straightforward! I prefer this too ;p

> 
>> And we have similar control knobs for hung, lockup etc.
>>
>> Or should I change the name from 'xxx_print_mask' to 'xxx_sysinfo_flag'
>> in patch 2/3 ?
>>
>>>
>>> The console reply might be handled by a separate:
>>>
>>> 	panic_console_reply=1
>>>
>>> And it would obsolete the existing "panic_print" which is an
>>> ugly name and interface from my POV.
>>
>> Agree it's ugly :). But besides a kernel parameter,  'panic_print' is
>> also a sysctl interface, I'm afraid renaming it might break user ABI.
> 
> A solution would be to keep it and create "panic_sys_info="
> with the human readable parameters in parallel. They would
> store the request in the same bitmap.

Nice! Seems like a well-balanced solution ;)

> 
> We could print a message that "panic_print" has been obsoleted
> by "panic_sys_info" when people use it.

Indeed, we should add a deprecation warning. Perhaps the message could say:

"'panic_print=' is deprecated and will be removed. Use 'panic_sys_info=' 
instead."

e.g. pr_warn_ratelimited("block device autoloading is deprecated and will be
removed.\n") in blkdev_get_no_open().

> 
> Both parameters would override the current bitmap. So the later
> used parameter or procfs/sysfs write would win.

Makes sense to me. BTW, this override behavior needs to be documented
in kernel-doc.

> 
> Note:
> 
> One question is whether to use sysctl or module parameters.
> 
> An advantage of sysctl is the "systcl" userspace tool. Some people
> might like it. But the API is very old and a bit cumbersome for
> implementing.
> 
> The sysfs, aka include/linux/moduleparam.h, API looks cleaner to me.
> But the parameters are hidden in the /sys/... jungle ;-)
> 
> I would slightly prefer "sysctl" because these parameters are easier
> to find.

+1 for sysctl.

Thanks,
Lance

> 
> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info
  2025-05-15 10:32       ` Petr Mladek
  2025-05-15 11:28         ` Lance Yang
@ 2025-05-16  5:38         ` Feng Tang
  2025-05-16 13:39           ` Lance Yang
  2025-05-21  5:31           ` Feng Tang
  1 sibling, 2 replies; 22+ messages in thread
From: Feng Tang @ 2025-05-16  5:38 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Steven Rostedt, Lance Yang, linux-kernel, mhiramat,
	llong, Paul E. McKenney

On Thu, May 15, 2025 at 12:32:04PM +0200, Petr Mladek wrote:
> On Tue 2025-05-13 21:23:25, Feng Tang wrote:
> > Hi Petr,
> > 
> > Thanks for the review!
> > 
> > On Tue, May 13, 2025 at 11:57:19AM +0200, Petr Mladek wrote:
> > > On Sun 2025-05-11 16:52:52, Feng Tang wrote:
> > > > panic_print was introduced to help debugging kernel panic by dumping
> > > > different kinds of system information like tasks' call stack, memory,
> > > > ftrace buffer etc. Acutually this function could help debugging cases
> > > > like task-hung, soft/hard lockup too, where user may need the snapshot
> > > > of system info at that time.
> > > > 
> > > The generic approach might deserve a separate source file,
> > > for example:
> > > 
> > >     include/linux/sys_info.h
> > >     lib/sys_info.c
> > 
> > Thanks for the suggestion! I'm really not good at naming. 
> > 
> > As panic.c is always built-in, I'll put sys_info.c as obj-y.
> 
> Makes sense.
> 
> > > Also I always considered the bitmask as a horrible user interface.
> > > It might be used internally. But it would be better to allow a human
> > > readable parameter, for example:
> > > 
> > > 	panic_sys_info=task,mem,timer,lock,ftrace,bt,all_bt,blocked_tasks
> > 
> > Yes, it's convenient for developers, as a cmdline parameter being parsed
> > at boot time.
> > 
> > But I think bitmask may be easier for runtime changing as a core parameter
> > under /proc/ or /sys/, or from sysctl interface. There are also some other
> > modules use debug bitmask controlling printking info for different
> > sub-components.
> 
> Good to know. Could you please give me a pointer to some other modules
> using the bitmask? I believe that they exist but I can't find any.
> I wonder how common it is...

Definitely not common :) I only know one: ACPI, which has 2 debug knobs,
'debug_layer' is a bigmap to control which sub-component's msg to be
dumped, and 'debug_level'.

> Anyway, I personally find words/names easier to use.

True. For me, I have some notes sticked on my monitor: one about characters
for /proc/sysrq-trigger, one for panic_print, one for struct page's flag :)

> For example,
> I like the following interfaces:
> 
>   #> cat /sys/power/pm_test
>   [none] core processors platform devices freezer
> 
>   #> cat /sys/kernel/debug/tracing/available_tracers
>   blk function_graph wakeup_dl wakeup_rt wakeup function nop
> 
>   #> cat /proc/sys/kernel/seccomp/actions_avail
>   kill_process kill_thread trap errno user_notif trace log allow
>   # cat /proc/sys/kernel/seccomp/actions_logged
>   kill_process kill_thread trap errno user_notif trace log

Thanks for the info, will check them.

> > And we have similar control knobs for hung, lockup etc.
> > 
> > Or should I change the name from 'xxx_print_mask' to 'xxx_sysinfo_flag'
> > in patch 2/3 ?
> > 
> > > 
> > > The console reply might be handled by a separate:
> > > 
> > > 	panic_console_reply=1
> > > 
> > > And it would obsolete the existing "panic_print" which is an
> > > ugly name and interface from my POV.
> > 
> > Agree it's ugly :). But besides a kernel parameter,  'panic_print' is
> > also a sysctl interface, I'm afraid renaming it might break user ABI.
> 
> A solution would be to keep it and create "panic_sys_info="
> with the human readable parameters in parallel. They would
> store the request in the same bitmap.
> 
> We could print a message that "panic_print" has been obsoleted
> by "panic_sys_info" when people use it.
> 
> Both parameters would override the current bitmap. So the later
> used parameter or procfs/sysfs write would win.

Reasonalbe.

> Note:
> 
> One question is whether to use sysctl or module parameters.
> 
> An advantage of sysctl is the "systcl" userspace tool. Some people
> might like it. But the API is very old and a bit cumbersome for
> implementing.
> 
> The sysfs, aka include/linux/moduleparam.h, API looks cleaner to me.
> But the parameters are hidden in the /sys/... jungle ;-)
> 
> I would slightly prefer "sysctl" because these parameters are easier
> to find.

I will think about the string parsing in sys_info.c, and in the backend,
a bitmap is still needed to save the parsing result, and as the parameter
for sys_show_info().

Also if we go 'sysctl' way, in the future, some exising interface like
'sysctl_hung_task_all_cpu_backtrace' could be deprecated and integrated
into the 'hung_task_sys_info'? 

Thanks,
Feng

> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info
  2025-05-16  5:38         ` Feng Tang
@ 2025-05-16 13:39           ` Lance Yang
  2025-05-21  5:31           ` Feng Tang
  1 sibling, 0 replies; 22+ messages in thread
From: Lance Yang @ 2025-05-16 13:39 UTC (permalink / raw)
  To: Feng Tang, Petr Mladek
  Cc: Andrew Morton, Steven Rostedt, linux-kernel, mhiramat, llong,
	Paul E. McKenney



On 2025/5/16 13:38, Feng Tang wrote:
> On Thu, May 15, 2025 at 12:32:04PM +0200, Petr Mladek wrote:
>> On Tue 2025-05-13 21:23:25, Feng Tang wrote:
>>> Hi Petr,
>>>
>>> Thanks for the review!
>>>
>>> On Tue, May 13, 2025 at 11:57:19AM +0200, Petr Mladek wrote:
>>>> On Sun 2025-05-11 16:52:52, Feng Tang wrote:
>>>>> panic_print was introduced to help debugging kernel panic by dumping
>>>>> different kinds of system information like tasks' call stack, memory,
>>>>> ftrace buffer etc. Acutually this function could help debugging cases
>>>>> like task-hung, soft/hard lockup too, where user may need the snapshot
>>>>> of system info at that time.
>>>>>
>>>> The generic approach might deserve a separate source file,
>>>> for example:
>>>>
>>>>      include/linux/sys_info.h
>>>>      lib/sys_info.c
>>>
>>> Thanks for the suggestion! I'm really not good at naming.
>>>
>>> As panic.c is always built-in, I'll put sys_info.c as obj-y.
>>
>> Makes sense.
>>
>>>> Also I always considered the bitmask as a horrible user interface.
>>>> It might be used internally. But it would be better to allow a human
>>>> readable parameter, for example:
>>>>
>>>> 	panic_sys_info=task,mem,timer,lock,ftrace,bt,all_bt,blocked_tasks
>>>
>>> Yes, it's convenient for developers, as a cmdline parameter being parsed
>>> at boot time.
>>>
>>> But I think bitmask may be easier for runtime changing as a core parameter
>>> under /proc/ or /sys/, or from sysctl interface. There are also some other
>>> modules use debug bitmask controlling printking info for different
>>> sub-components.
>>
>> Good to know. Could you please give me a pointer to some other modules
>> using the bitmask? I believe that they exist but I can't find any.
>> I wonder how common it is...
> 
> Definitely not common :) I only know one: ACPI, which has 2 debug knobs,
> 'debug_layer' is a bigmap to control which sub-component's msg to be
> dumped, and 'debug_level'.
> 
>> Anyway, I personally find words/names easier to use.
> 
> True. For me, I have some notes sticked on my monitor: one about characters
> for /proc/sysrq-trigger, one for panic_print, one for struct page's flag :)
> 
>> For example,
>> I like the following interfaces:
>>
>>    #> cat /sys/power/pm_test
>>    [none] core processors platform devices freezer
>>
>>    #> cat /sys/kernel/debug/tracing/available_tracers
>>    blk function_graph wakeup_dl wakeup_rt wakeup function nop
>>
>>    #> cat /proc/sys/kernel/seccomp/actions_avail
>>    kill_process kill_thread trap errno user_notif trace log allow
>>    # cat /proc/sys/kernel/seccomp/actions_logged
>>    kill_process kill_thread trap errno user_notif trace log
> 
> Thanks for the info, will check them.
> 
>>> And we have similar control knobs for hung, lockup etc.
>>>
>>> Or should I change the name from 'xxx_print_mask' to 'xxx_sysinfo_flag'
>>> in patch 2/3 ?
>>>
>>>>
>>>> The console reply might be handled by a separate:
>>>>
>>>> 	panic_console_reply=1
>>>>
>>>> And it would obsolete the existing "panic_print" which is an
>>>> ugly name and interface from my POV.
>>>
>>> Agree it's ugly :). But besides a kernel parameter,  'panic_print' is
>>> also a sysctl interface, I'm afraid renaming it might break user ABI.
>>
>> A solution would be to keep it and create "panic_sys_info="
>> with the human readable parameters in parallel. They would
>> store the request in the same bitmap.
>>
>> We could print a message that "panic_print" has been obsoleted
>> by "panic_sys_info" when people use it.
>>
>> Both parameters would override the current bitmap. So the later
>> used parameter or procfs/sysfs write would win.
> 
> Reasonalbe.
> 
>> Note:
>>
>> One question is whether to use sysctl or module parameters.
>>
>> An advantage of sysctl is the "systcl" userspace tool. Some people
>> might like it. But the API is very old and a bit cumbersome for
>> implementing.
>>
>> The sysfs, aka include/linux/moduleparam.h, API looks cleaner to me.
>> But the parameters are hidden in the /sys/... jungle ;-)
>>
>> I would slightly prefer "sysctl" because these parameters are easier
>> to find.
> 
> I will think about the string parsing in sys_info.c, and in the backend,
> a bitmap is still needed to save the parsing result, and as the parameter
> for sys_show_info().
> 
> Also if we go 'sysctl' way, in the future, some exising interface like
> 'sysctl_hung_task_all_cpu_backtrace' could be deprecated and integrated
> into the 'hung_task_sys_info'?

Works for me. Let's go with 'sysctl' approach, and 
'hung_task_all_cpu_backtrace'
could be deprecated, though it won't be removed anytime soon. IMHO,
'hung_task_sys_info' is more organized and easier to expand - plus these
parameters are much easier to find as Petr said ;)

Thanks,
Lance

> 
> Thanks,
> Feng
> 
>> Best Regards,
>> Petr


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info
  2025-05-16  5:38         ` Feng Tang
  2025-05-16 13:39           ` Lance Yang
@ 2025-05-21  5:31           ` Feng Tang
  2025-06-10 15:44             ` Petr Mladek
  1 sibling, 1 reply; 22+ messages in thread
From: Feng Tang @ 2025-05-21  5:31 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Steven Rostedt, Lance Yang, linux-kernel, mhiramat,
	llong, Paul E. McKenney

On Fri, May 16, 2025 at 01:38:02PM +0800, Feng Tang wrote:
> On Thu, May 15, 2025 at 12:32:04PM +0200, Petr Mladek wrote:
[...]
> > > > The console reply might be handled by a separate:
> > > > 
> > > > 	panic_console_reply=1
> > > > 
> > > > And it would obsolete the existing "panic_print" which is an
> > > > ugly name and interface from my POV.
> > > 
> > > Agree it's ugly :). But besides a kernel parameter,  'panic_print' is
> > > also a sysctl interface, I'm afraid renaming it might break user ABI.
> > 
> > A solution would be to keep it and create "panic_sys_info="
> > with the human readable parameters in parallel. They would
> > store the request in the same bitmap.
> > 
> > We could print a message that "panic_print" has been obsoleted
> > by "panic_sys_info" when people use it.
> > 
> > Both parameters would override the current bitmap. So the later
> > used parameter or procfs/sysfs write would win.
> 
> Reasonalbe.
> 
> > Note:
> > 
> > One question is whether to use sysctl or module parameters.
> > 
> > An advantage of sysctl is the "systcl" userspace tool. Some people
> > might like it. But the API is very old and a bit cumbersome for
> > implementing.
> > 
> > The sysfs, aka include/linux/moduleparam.h, API looks cleaner to me.
> > But the parameters are hidden in the /sys/... jungle ;-)
> > 
> > I would slightly prefer "sysctl" because these parameters are easier
> > to find.
> 
> I will think about the string parsing in sys_info.c, and in the backend,
> a bitmap is still needed to save the parsing result, and as the parameter
> for sys_show_info().

Hi Petr

I tried further this way, and with below patch on top of current 1/3
patch, the 'panic_sys_info' sysctl interface basically works, as parsing
user-input, and save it in 'panic_print' bitmap. 

It has  one problem that it doesn't support the string parsing as a the
kernel command line parameter (auto-derived from sysctl interface), I'm
not sure if we should add a __setup() or early_param() for it, or it's
fine?

Thanks,
Feng

---
diff --git a/include/linux/sys_info.h b/include/linux/sys_info.h
index 79bf4a942e5f..d6d55646e25a 100644
--- a/include/linux/sys_info.h
+++ b/include/linux/sys_info.h
@@ -17,4 +17,8 @@
 
 extern void sys_show_info(unsigned long info_mask);
 
+struct ctl_table;
+extern int sysctl_sys_info_handler(const struct ctl_table *ro_table, int write,
+					  void *buffer, size_t *lenp,
+					  loff_t *ppos);
 #endif	/* _LINUX_SYS_INFO_H */
diff --git a/kernel/panic.c b/kernel/panic.c
index 3d9cf8063242..8ca9b30f0fe4 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -88,6 +88,13 @@ static const struct ctl_table kern_panic_table[] = {
 		.extra2         = SYSCTL_ONE,
 	},
 #endif
+	{
+		.procname	= "panic_sys_info",
+		.data		= &panic_print,
+		.maxlen         = sizeof(panic_print),
+		.mode		= 0644,
+		.proc_handler	= sysctl_sys_info_handler,
+	},
 	{
 		.procname       = "warn_limit",
 		.data           = &warn_limit,
diff --git a/lib/sys_info.c b/lib/sys_info.c
index 4090b2e0515e..27de6f0d0a4d 100644
--- a/lib/sys_info.c
+++ b/lib/sys_info.c
@@ -4,6 +4,121 @@
 #include <linux/console.h>
 #include <linux/nmi.h>
 
+struct sys_info_name {
+	unsigned long bit;
+	const char *name;
+};
+
+static const char sys_info_avail[] = " tasks mem timer lock ftrace all_bt blocked_tasks ";
+
+static const struct sys_info_name  si_names[] = {
+	{ SYS_SHOW_TASK_INFO,	"tasks" },
+	{ SYS_SHOW_MEM_INFO,	"mem" },
+	{ SYS_SHOW_TIMER_INFO,	"timer" },
+	{ SYS_SHOW_LOCK_INFO,	"lock" },
+	{ SYS_SHOW_FTRACE_INFO, "ftrace" },
+	{ SYS_SHOW_ALL_CPU_BT,	"all_bt" },
+	{ SYS_SHOW_BLOCKED_TASKS, "blocked_tasks" },
+};
+
+
+/* Expecting string like "xxx_sys_info=tasks,mem,timer,lock" */
+static int write_handler(const struct ctl_table *ro_table, void *buffer,
+				size_t *lenp, loff_t *ppos)
+{
+	char names[sizeof(sys_info_avail)];
+	char *buf, *name;
+	struct ctl_table table;
+	unsigned long *si_flag;
+	int i, len, ret;
+
+	si_flag = ro_table->data;
+
+	/* Clear it first */
+	*si_flag = 0;
+
+	table = *ro_table;
+	table.data = names;
+	table.maxlen = sizeof(names);
+	ret = proc_dostring(&table, 1, buffer, lenp, ppos);
+	if (ret)
+		return ret;
+
+	buf = names;
+	while ((name = strsep(&buf, ",")) && *name) {
+		for (i = 0; i < ARRAY_SIZE(si_names); i++) {
+			if (!strcmp(name, si_names[i].name))
+				*si_flag |= si_names[i].bit;
+		}
+	}
+
+	return 0;
+}
+
+int sysctl_sys_info_handler(const struct ctl_table *ro_table, int write,
+					  void *buffer, size_t *lenp,
+					  loff_t *ppos)
+{
+	char names[sizeof(sys_info_avail) + 1];
+	char *buf, *name;
+	struct ctl_table table;
+	unsigned long *si_flag;
+	int i, ret, len;
+
+	si_flag = ro_table->data;
+
+	if (write) {
+		/* Clear it first */
+		*si_flag = 0;
+
+		table = *ro_table;
+		table.data = names;
+		table.maxlen = sizeof(names);
+		ret = proc_dostring(&table, 1, buffer, lenp, ppos);
+		if (ret)
+			return ret;
+
+		/* Expecting string like "xxx_sys_info=tasks,mem,timer,lock" */
+		buf = names;
+		while ((name = strsep(&buf, ",")) && *name) {
+			for (i = 0; i < ARRAY_SIZE(si_names); i++) {
+				if (!strcmp(name, si_names[i].name))
+					*si_flag |= si_names[i].bit;
+			}
+		}
+
+		return 0;
+	} else {
+		bool first = true;
+
+		memset(names, 0, sizeof(names));
+
+		buf = names;
+		for (i = 0; i < ARRAY_SIZE(si_names); i++) {
+			if (*si_flag & si_names[i].bit) {
+
+				if (first) {
+					first = false;
+				} else {
+					*buf = ',';
+					buf++;
+				}
+
+				len = strlen(si_names[i].name);
+				strncpy(buf, si_names[i].name, len);
+				buf += len;
+			}
+
+		}
+		*buf = '\0';
+
+		table = *ro_table;
+		table.data = names;
+		table.maxlen = sizeof(names);
+		return proc_dostring(&table, 0, buffer, lenp, ppos);
+	}
+}
+
 void sys_show_info(unsigned long info_flag)
 {
 	if (info_flag & SYS_SHOW_TASK_INFO)



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info
  2025-05-21  5:31           ` Feng Tang
@ 2025-06-10 15:44             ` Petr Mladek
  2025-06-11 13:43               ` Feng Tang
  0 siblings, 1 reply; 22+ messages in thread
From: Petr Mladek @ 2025-06-10 15:44 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Steven Rostedt, Lance Yang, linux-kernel, mhiramat,
	llong, Paul E. McKenney

On Wed 2025-05-21 13:31:20, Feng Tang wrote:
> On Fri, May 16, 2025 at 01:38:02PM +0800, Feng Tang wrote:
> > On Thu, May 15, 2025 at 12:32:04PM +0200, Petr Mladek wrote:
> [...]
> > > > > The console reply might be handled by a separate:
> > > > > 
> > > > > 	panic_console_reply=1
> > > > > 
> > > > > And it would obsolete the existing "panic_print" which is an
> > > > > ugly name and interface from my POV.
> > > > 
> > > > Agree it's ugly :). But besides a kernel parameter,  'panic_print' is
> > > > also a sysctl interface, I'm afraid renaming it might break user ABI.
> > > 
> > > A solution would be to keep it and create "panic_sys_info="
> > > with the human readable parameters in parallel. They would
> > > store the request in the same bitmap.
> > > 
> > > We could print a message that "panic_print" has been obsoleted
> > > by "panic_sys_info" when people use it.
> > > 
> > > Both parameters would override the current bitmap. So the later
> > > used parameter or procfs/sysfs write would win.
> > 
> > Reasonalbe.
> > 
> > > Note:
> > > 
> > > One question is whether to use sysctl or module parameters.
> > > 
> > > An advantage of sysctl is the "systcl" userspace tool. Some people
> > > might like it. But the API is very old and a bit cumbersome for
> > > implementing.
> > > 
> > > The sysfs, aka include/linux/moduleparam.h, API looks cleaner to me.
> > > But the parameters are hidden in the /sys/... jungle ;-)
> > > 
> > > I would slightly prefer "sysctl" because these parameters are easier
> > > to find.
> > 
> > I will think about the string parsing in sys_info.c, and in the backend,
> > a bitmap is still needed to save the parsing result, and as the parameter
> > for sys_show_info().
> 
> Hi Petr
> 
> I tried further this way, and with below patch on top of current 1/3
> patch, the 'panic_sys_info' sysctl interface basically works, as parsing
> user-input, and save it in 'panic_print' bitmap. 

It does not apply. It seems that it depends on another change which
crated lib/sys_info.c...

> It has  one problem that it doesn't support the string parsing as a the
> kernel command line parameter (auto-derived from sysctl interface), I'm
> not sure if we should add a __setup() or early_param() for it, or it's
> fine?

Ah, I was not aware of this. We need to make it working from the
command line, definitely. I would go with __setup() for now. We could
always switch it to early_param() when anyone requires it.

See some more comments, below.

> ---
> diff --git a/include/linux/sys_info.h b/include/linux/sys_info.h
> index 79bf4a942e5f..d6d55646e25a 100644
> --- a/include/linux/sys_info.h
> +++ b/include/linux/sys_info.h
> @@ -17,4 +17,8 @@
>  
>  extern void sys_show_info(unsigned long info_mask);
>  
> +struct ctl_table;
> +extern int sysctl_sys_info_handler(const struct ctl_table *ro_table, int write,
> +					  void *buffer, size_t *lenp,
> +					  loff_t *ppos);
>  #endif	/* _LINUX_SYS_INFO_H */
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 3d9cf8063242..8ca9b30f0fe4 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -88,6 +88,13 @@ static const struct ctl_table kern_panic_table[] = {
>  		.extra2         = SYSCTL_ONE,
>  	},
>  #endif
> +	{
> +		.procname	= "panic_sys_info",
> +		.data		= &panic_print,
> +		.maxlen         = sizeof(panic_print),
> +		.mode		= 0644,
> +		.proc_handler	= sysctl_sys_info_handler,
> +	},
>  	{
>  		.procname       = "warn_limit",
>  		.data           = &warn_limit,
> diff --git a/lib/sys_info.c b/lib/sys_info.c
> index 4090b2e0515e..27de6f0d0a4d 100644
> --- a/lib/sys_info.c
> +++ b/lib/sys_info.c
> @@ -4,6 +4,121 @@
>  #include <linux/console.h>
>  #include <linux/nmi.h>
>  
> +struct sys_info_name {
> +	unsigned long bit;
> +	const char *name;
> +};
> +
> +static const char sys_info_avail[] = " tasks mem timer lock ftrace all_bt blocked_tasks ";

It is a bit confusing to have it space-separated when the parameter
is comma-separated. Also I am not sure why there is the leading and
trailing space.

I would expect:

static const char sys_info_avail[] = "tasks,mem,timer,lock,ftrace,all_bt,blocked_tasks";

> +static const struct sys_info_name  si_names[] = {
> +	{ SYS_SHOW_TASK_INFO,	"tasks" },
> +	{ SYS_SHOW_MEM_INFO,	"mem" },
> +	{ SYS_SHOW_TIMER_INFO,	"timer" },
> +	{ SYS_SHOW_LOCK_INFO,	"lock" },
> +	{ SYS_SHOW_FTRACE_INFO, "ftrace" },
> +	{ SYS_SHOW_ALL_CPU_BT,	"all_bt" },
> +	{ SYS_SHOW_BLOCKED_TASKS, "blocked_tasks" },
> +};

I guess that this is just an RFC. Anyway, I would expect that
SYS_SHOW_* values would be defined in sys_info.h.

> +
> +/* Expecting string like "xxx_sys_info=tasks,mem,timer,lock" */
> +static int write_handler(const struct ctl_table *ro_table, void *buffer,
> +				size_t *lenp, loff_t *ppos)
> +{
> +	char names[sizeof(sys_info_avail)];
> +	char *buf, *name;
> +	struct ctl_table table;
> +	unsigned long *si_flag;
> +	int i, len, ret;
> +
> +	si_flag = ro_table->data;
> +
> +	/* Clear it first */
> +	*si_flag = 0;
> +
> +	table = *ro_table;
> +	table.data = names;
> +	table.maxlen = sizeof(names);
> +	ret = proc_dostring(&table, 1, buffer, lenp, ppos);
> +	if (ret)
> +		return ret;
> +
> +	buf = names;
> +	while ((name = strsep(&buf, ",")) && *name) {
> +		for (i = 0; i < ARRAY_SIZE(si_names); i++) {
> +			if (!strcmp(name, si_names[i].name))
> +				*si_flag |= si_names[i].bit;
> +		}
> +	}
> +
> +	return 0;
> +}

The above function is defined but not used. The same code is
copy&pasted in the if (write) section below.

I think that we would need a helper function which could be used
in both sysctl_sys_info_handler() and in the __setup() callback.

Something like:

static unsigned long sys_info_parse_flags(char *str)
{
	unsigned long si_bits = 0;
	char *s, *name;
	int i;

	s = str;
	while ((name = strsep(&s, ",")) && *name) {
		for (i = 0; i < ARRAY_SIZE(si_names); i++) {
			if (!strcmp(name, si_names[i].name)) {
				*si_bits |= si_names[i].bit;
				break;
			}
		}
	}

	return si_bits;
}

> +
> +int sysctl_sys_info_handler(const struct ctl_table *ro_table, int write,
> +					  void *buffer, size_t *lenp,
> +					  loff_t *ppos)
> +{
> +	char names[sizeof(sys_info_avail) + 1];
> +	char *buf, *name;
> +	struct ctl_table table;
> +	unsigned long *si_flag;

Nit: I would call this "si_bits_global" to make it more clear that
     this is pointer to the global bitmask.


> +	int i, ret, len;
> +
> +	si_flag = ro_table->data;
> +
> +	if (write) {
> +		/* Clear it first */
> +		*si_flag = 0;

There is no synchronization against readers. IMHO, it is not worth it.
But we should at least update the global value only once.

We should define a local variable, e.g.

		unsigned long si_bits;

and do the following:

> +		table = *ro_table;
> +		table.data = names;
> +		table.maxlen = sizeof(names);
> +		ret = proc_dostring(&table, 1, buffer, lenp, ppos);

I would pass the "write" parameter here instead of the hard-coded "1".
Do we know that it should be exactly '1'?

> +		if (ret)
> +			return ret;

		si_bits = sys_info_parse_param(flags);
		/*
		 * The access to the global value is not synchronized.
		 * Update it at once at least.
		 */
		WRITE_ONCE(*si_bits_global, si_bits);

> +		/* Expecting string like "xxx_sys_info=tasks,mem,timer,lock" */
> +		buf = names;
> +		while ((name = strsep(&buf, ",")) && *name) {
> +			for (i = 0; i < ARRAY_SIZE(si_names); i++) {
> +				if (!strcmp(name, si_names[i].name))
> +					*si_flag |= si_names[i].bit;
> +			}
> +		}
> +
> +		return 0;
> +	} else {
> +		bool first = true;
> +
> +		memset(names, 0, sizeof(names));

I guess that you took this from read_actions_logged().

It looks too paranoid to me. I do not see it anywhere else.
IMHO, if the proc_dostring() does not stop at the trailing '\0'
then most interfaces would leak data.

> +		buf = names;
> +		for (i = 0; i < ARRAY_SIZE(si_names); i++) {
> +			if (*si_flag & si_names[i].bit) {
> +
> +				if (first) {
> +					first = false;
> +				} else {
> +					*buf = ',';
> +					buf++;
> +				}
> +
> +				len = strlen(si_names[i].name);
> +				strncpy(buf, si_names[i].name, len);
> +				buf += len;
> +			}
> +
> +		}
> +		*buf = '\0';

IMHO, always adding this trailing '\0' should be enough.

> +		table = *ro_table;
> +		table.data = names;
> +		table.maxlen = sizeof(names);
> +		return proc_dostring(&table, 0, buffer, lenp, ppos);

I would pass the "write" parameter here instead of the hard coded 0.
But it is a matter of taste.

> +	}
> +}
> +
>  void sys_show_info(unsigned long info_flag)
>  {
>  	if (info_flag & SYS_SHOW_TASK_INFO)

Best Regards,
Petr

PS: I am sorry for the late reply. Too many things have accumulated
over the few last weeks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info
  2025-06-10 15:44             ` Petr Mladek
@ 2025-06-11 13:43               ` Feng Tang
  0 siblings, 0 replies; 22+ messages in thread
From: Feng Tang @ 2025-06-11 13:43 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Steven Rostedt, Lance Yang, linux-kernel, mhiramat,
	llong, Paul E. McKenney

On Tue, Jun 10, 2025 at 05:44:34PM +0200, Petr Mladek wrote:
[...]
> > > > Note:
> > > > 
> > > > One question is whether to use sysctl or module parameters.
> > > > 
> > > > An advantage of sysctl is the "systcl" userspace tool. Some people
> > > > might like it. But the API is very old and a bit cumbersome for
> > > > implementing.
> > > > 
> > > > The sysfs, aka include/linux/moduleparam.h, API looks cleaner to me.
> > > > But the parameters are hidden in the /sys/... jungle ;-)
> > > > 
> > > > I would slightly prefer "sysctl" because these parameters are easier
> > > > to find.
> > > 
> > > I will think about the string parsing in sys_info.c, and in the backend,
> > > a bitmap is still needed to save the parsing result, and as the parameter
> > > for sys_show_info().
> > 
> > Hi Petr
> > 
> > I tried further this way, and with below patch on top of current 1/3
> > patch, the 'panic_sys_info' sysctl interface basically works, as parsing
> > user-input, and save it in 'panic_print' bitmap. 
> 
> It does not apply. It seems that it depends on another change which
> crated lib/sys_info.c...

My bad. It could be another my local change which follows your suggestion
of using a panic_console_replay() cleanup.

> > It has  one problem that it doesn't support the string parsing as a the
> > kernel command line parameter (auto-derived from sysctl interface), I'm
> > not sure if we should add a __setup() or early_param() for it, or it's
> > fine?
> 
> Ah, I was not aware of this. We need to make it working from the
> command line, definitely. I would go with __setup() for now. We could
> always switch it to early_param() when anyone requires it.

OK.

> See some more comments, below.
> 
> > ---
> > diff --git a/include/linux/sys_info.h b/include/linux/sys_info.h
> > index 79bf4a942e5f..d6d55646e25a 100644
> > --- a/include/linux/sys_info.h
> > +++ b/include/linux/sys_info.h
> > @@ -17,4 +17,8 @@
> >  
> >  extern void sys_show_info(unsigned long info_mask);
> >  
> > +struct ctl_table;
> > +extern int sysctl_sys_info_handler(const struct ctl_table *ro_table, int write,
> > +					  void *buffer, size_t *lenp,
> > +					  loff_t *ppos);
> >  #endif	/* _LINUX_SYS_INFO_H */
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 3d9cf8063242..8ca9b30f0fe4 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -88,6 +88,13 @@ static const struct ctl_table kern_panic_table[] = {
> >  		.extra2         = SYSCTL_ONE,
> >  	},
> >  #endif
> > +	{
> > +		.procname	= "panic_sys_info",
> > +		.data		= &panic_print,
> > +		.maxlen         = sizeof(panic_print),
> > +		.mode		= 0644,
> > +		.proc_handler	= sysctl_sys_info_handler,
> > +	},
> >  	{
> >  		.procname       = "warn_limit",
> >  		.data           = &warn_limit,
> > diff --git a/lib/sys_info.c b/lib/sys_info.c
> > index 4090b2e0515e..27de6f0d0a4d 100644
> > --- a/lib/sys_info.c
> > +++ b/lib/sys_info.c
> > @@ -4,6 +4,121 @@
> >  #include <linux/console.h>
> >  #include <linux/nmi.h>
> >  
> > +struct sys_info_name {
> > +	unsigned long bit;
> > +	const char *name;
> > +};
> > +
> > +static const char sys_info_avail[] = " tasks mem timer lock ftrace all_bt blocked_tasks ";
> 
> It is a bit confusing to have it space-separated when the parameter
> is comma-separated.

Aha, right.

> Also I am not sure why there is the leading and
> trailing space.

I tried to give more space to avoid 'char names[strlen()+1]' 

> I would expect:
> 
> static const char sys_info_avail[] = "tasks,mem,timer,lock,ftrace,all_bt,blocked_tasks";

OK.

> > +static const struct sys_info_name  si_names[] = {
> > +	{ SYS_SHOW_TASK_INFO,	"tasks" },
> > +	{ SYS_SHOW_MEM_INFO,	"mem" },
> > +	{ SYS_SHOW_TIMER_INFO,	"timer" },
> > +	{ SYS_SHOW_LOCK_INFO,	"lock" },
> > +	{ SYS_SHOW_FTRACE_INFO, "ftrace" },
> > +	{ SYS_SHOW_ALL_CPU_BT,	"all_bt" },
> > +	{ SYS_SHOW_BLOCKED_TASKS, "blocked_tasks" },
> > +};
> 
> I guess that this is just an RFC. Anyway, I would expect that
> SYS_SHOW_* values would be defined in sys_info.h.

Yes, in 0001 patch, they are defined in sys_info.h   

> > +
> > +/* Expecting string like "xxx_sys_info=tasks,mem,timer,lock" */
> > +static int write_handler(const struct ctl_table *ro_table, void *buffer,
> > +				size_t *lenp, loff_t *ppos)
> > +{
> > +	char names[sizeof(sys_info_avail)];
> > +	char *buf, *name;
> > +	struct ctl_table table;
> > +	unsigned long *si_flag;
> > +	int i, len, ret;
> > +
> > +	si_flag = ro_table->data;
> > +
> > +	/* Clear it first */
> > +	*si_flag = 0;
> > +
> > +	table = *ro_table;
> > +	table.data = names;
> > +	table.maxlen = sizeof(names);
> > +	ret = proc_dostring(&table, 1, buffer, lenp, ppos);
> > +	if (ret)
> > +		return ret;
> > +
> > +	buf = names;
> > +	while ((name = strsep(&buf, ",")) && *name) {
> > +		for (i = 0; i < ARRAY_SIZE(si_names); i++) {
> > +			if (!strcmp(name, si_names[i].name))
> > +				*si_flag |= si_names[i].bit;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> 
> The above function is defined but not used. The same code is
> copy&pasted in the if (write) section below.

I forgot to remove it in code clean :)

> I think that we would need a helper function which could be used
> in both sysctl_sys_info_handler() and in the __setup() callback.
> 
> Something like:
> 
> static unsigned long sys_info_parse_flags(char *str)
> {
> 	unsigned long si_bits = 0;
> 	char *s, *name;
> 	int i;
> 
> 	s = str;
> 	while ((name = strsep(&s, ",")) && *name) {
> 		for (i = 0; i < ARRAY_SIZE(si_names); i++) {
> 			if (!strcmp(name, si_names[i].name)) {
> 				*si_bits |= si_names[i].bit;
> 				break;

Good catch! thanks

> 			}
> 		}
> 	}
> 
> 	return si_bits;
> }

Will do.

> > +
> > +int sysctl_sys_info_handler(const struct ctl_table *ro_table, int write,
> > +					  void *buffer, size_t *lenp,
> > +					  loff_t *ppos)
> > +{
> > +	char names[sizeof(sys_info_avail) + 1];
> > +	char *buf, *name;
> > +	struct ctl_table table;
> > +	unsigned long *si_flag;
> 
> Nit: I would call this "si_bits_global" to make it more clear that
>      this is pointer to the global bitmask.
 
Makes sense.

> 
> > +	int i, ret, len;
> > +
> > +	si_flag = ro_table->data;
> > +
> > +	if (write) {
> > +		/* Clear it first */
> > +		*si_flag = 0;
> 
> There is no synchronization against readers. IMHO, it is not worth it.
> But we should at least update the global value only once.
> 
> We should define a local variable, e.g.
> 
> 		unsigned long si_bits;
> 
> and do the following:

Will do.

> > +		table = *ro_table;
> > +		table.data = names;
> > +		table.maxlen = sizeof(names);
> > +		ret = proc_dostring(&table, 1, buffer, lenp, ppos);
> 
> I would pass the "write" parameter here instead of the hard-coded "1".
> Do we know that it should be exactly '1'?
> 
> > +		if (ret)
> > +			return ret;
> 
> 		si_bits = sys_info_parse_param(flags);
> 		/*
> 		 * The access to the global value is not synchronized.
> 		 * Update it at once at least.
> 		 */
> 		WRITE_ONCE(*si_bits_global, si_bits);

Thanks for the suggestion.


> > +		/* Expecting string like "xxx_sys_info=tasks,mem,timer,lock" */
> > +		buf = names;
> > +		while ((name = strsep(&buf, ",")) && *name) {
> > +			for (i = 0; i < ARRAY_SIZE(si_names); i++) {
> > +				if (!strcmp(name, si_names[i].name))
> > +					*si_flag |= si_names[i].bit;
> > +			}
> > +		}
> > +
> > +		return 0;
> > +	} else {
> > +		bool first = true;
> > +
> > +		memset(names, 0, sizeof(names));
> 
> I guess that you took this from read_actions_logged().

Yes, I referred the seccomp.c in many places.

> It looks too paranoid to me. I do not see it anywhere else.
> IMHO, if the proc_dostring() does not stop at the trailing '\0'
> then most interfaces would leak data.
> 
> > +		buf = names;
> > +		for (i = 0; i < ARRAY_SIZE(si_names); i++) {
> > +			if (*si_flag & si_names[i].bit) {
> > +
> > +				if (first) {
> > +					first = false;
> > +				} else {
> > +					*buf = ',';
> > +					buf++;
> > +				}
> > +
> > +				len = strlen(si_names[i].name);
> > +				strncpy(buf, si_names[i].name, len);
> > +				buf += len;
> > +			}
> > +
> > +		}
> > +		*buf = '\0';
> 
> IMHO, always adding this trailing '\0' should be enough.

OK.

> 
> > +		table = *ro_table;
> > +		table.data = names;
> > +		table.maxlen = sizeof(names);
> > +		return proc_dostring(&table, 0, buffer, lenp, ppos);
> 
> I would pass the "write" parameter here instead of the hard coded 0.
> But it is a matter of taste.

I think it's obviously better :). will change.

> 
> > +	}
> > +}
> > +
> >  void sys_show_info(unsigned long info_flag)
> >  {
> >  	if (info_flag & SYS_SHOW_TASK_INFO)
> 
> Best Regards,
> Petr
> 
> PS: I am sorry for the late reply. Too many things have accumulated
> over the few last weeks.

No problem! I really appreciate your review and suggestions, like
your help on reviewing my early 'panic_print' patches years ago.

Thanks,
Feng


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-06-11 13:43 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-11  8:52 [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts Feng Tang
2025-05-11  8:52 ` [PATCH v1 1/3] kernel/panic: generalize panic_print's function to show sys info Feng Tang
2025-05-13  9:57   ` Petr Mladek
2025-05-13 13:23     ` Feng Tang
2025-05-15 10:32       ` Petr Mladek
2025-05-15 11:28         ` Lance Yang
2025-05-16  5:38         ` Feng Tang
2025-05-16 13:39           ` Lance Yang
2025-05-21  5:31           ` Feng Tang
2025-06-10 15:44             ` Petr Mladek
2025-06-11 13:43               ` Feng Tang
2025-05-11  8:52 ` [PATCH v1 2/3] kernel/hung_task: add option to dump system info when hung task detected Feng Tang
2025-05-11  8:52 ` [PATCH v1 3/3] kernel/watchdog: add option to dump system info when system is locked up Feng Tang
2025-05-12  1:46 ` [PATCH v1 0/3] generalize panic_print's dump function to be used by other kernel parts Andrew Morton
2025-05-12  3:14   ` Feng Tang
2025-05-12  8:23     ` Lance Yang
2025-05-12  9:31       ` Feng Tang
2025-05-13 13:27       ` Petr Mladek
2025-05-13 17:09         ` Paul E. McKenney
2025-05-14  3:33           ` Feng Tang
2025-05-14 13:55             ` Paul E. McKenney
2025-05-14  3:22         ` Feng Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).