[PATCH v3 0/4] sys_info: prevent duplicate backtraces

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/4] sys_info: prevent duplicate backtraces
@ 2026-06-25 15:25 Bradley Morgan
  2026-06-25 15:25 ` [PATCH v3 1/4] sys_info: add helper for callers that print some sys_info on their own Bradley Morgan
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Bradley Morgan @ 2026-06-25 15:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable, Bradley Morgan

Some callers handle SYS_INFO_ALL_BT themselves before calling sys_info().
When they strip that bit, an all_bt-only mask becomes zero and sys_info(0)
falls back to kernel_si_mask, potentially duplicating output.

This series adds sys_info_with_filter() to filter specific bits without
triggering the kernel_si_mask fallback.

Changes since v2:
- Use sys_info_with_filter() instead of sys_info_without_all_bt() per
  Petr's suggestion
- Filter applied at __sys_info() level to handle kernel_si_mask correctly
- Added panic.c conversion

Bradley Morgan (4):
  sys_info: add helper for callers that print some sys_info on their own
  watchdog: use sys_info_with_filter() to avoid duplicate backtraces
  powerpc/watchdog: use sys_info_with_filter() to avoid duplicate
    backtraces
  panic: use sys_info_with_filter() to avoid duplicate backtraces

 arch/powerpc/kernel/watchdog.c | 12 ++++++++----
 include/linux/sys_info.h       |  1 +
 kernel/panic.c                 |  2 +-
 kernel/watchdog.c              | 12 ++++++++----
 lib/sys_info.c                 | 20 ++++++++++++++++++--
 5 files changed, 36 insertions(+), 11 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/4] sys_info: add helper for callers that print some sys_info on their own
  2026-06-25 15:25 [PATCH v3 0/4] sys_info: prevent duplicate backtraces Bradley Morgan
@ 2026-06-25 15:25 ` Bradley Morgan
  2026-06-25 15:25 ` [PATCH v3 2/4] watchdog: use sys_info_with_filter() to avoid duplicate backtraces Bradley Morgan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Bradley Morgan @ 2026-06-25 15:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable, Bradley Morgan

Some callers print some sys_info on their own before calling sys_info().

Add a helper which would allow to prevent a duplicated output.

It is a bit tricky because kernel_si_mask should be used only
when the call-specific si_mask is empty. But the duplicated
output must be prevented there as well.

Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info on system lockup")
Cc: stable@vger.kernel.org
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Bradley Morgan <include@grrlz.net>
---
 include/linux/sys_info.h |  1 +
 lib/sys_info.c           | 20 ++++++++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/include/linux/sys_info.h b/include/linux/sys_info.h
index a5bc3ea3d44b..f1c2552ca3d1 100644
--- a/include/linux/sys_info.h
+++ b/include/linux/sys_info.h
@@ -18,6 +18,7 @@
 #define SYS_INFO_BLOCKED_TASKS		0x00000080
 
 void sys_info(unsigned long si_mask);
+void sys_info_with_filter(unsigned long si_mask, unsigned long si_ignore_mask);
 unsigned long sys_info_parse_param(char *str);
 
 #ifdef CONFIG_SYSCTL
diff --git a/lib/sys_info.c b/lib/sys_info.c
index f32a06ec9ed4..d411fee10415 100644
--- a/lib/sys_info.c
+++ b/lib/sys_info.c
@@ -136,8 +136,10 @@ static int __init sys_info_sysctl_init(void)
 subsys_initcall(sys_info_sysctl_init);
 #endif
 
-static void __sys_info(unsigned long si_mask)
+static void __sys_info(unsigned long si_mask, unsigned long si_ignore_mask)
 {
+	si_mask &= ~si_ignore_mask;
+
 	if (si_mask & SYS_INFO_TASKS)
 		show_state();
 
@@ -160,7 +162,21 @@ static void __sys_info(unsigned long si_mask)
 		show_state_filter(TASK_UNINTERRUPTIBLE);
 }
 
+void sys_info_with_filter(unsigned long si_mask, unsigned long si_ignore_mask)
+{
+	unsigned long dump_mask = si_mask & ~si_ignore_mask;
+
+	/*
+	 * Do not fall back to kernel_si_mask when the caller context
+	 * required only the ignored information.
+	 */
+	if (si_mask && !dump_mask)
+		return;
+
+	__sys_info(dump_mask ? : kernel_si_mask, si_ignore_mask);
+}
+
 void sys_info(unsigned long si_mask)
 {
-	__sys_info(si_mask ? : kernel_si_mask);
+	sys_info_with_filter(si_mask, 0);
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/4] watchdog: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-25 15:25 [PATCH v3 0/4] sys_info: prevent duplicate backtraces Bradley Morgan
  2026-06-25 15:25 ` [PATCH v3 1/4] sys_info: add helper for callers that print some sys_info on their own Bradley Morgan
@ 2026-06-25 15:25 ` Bradley Morgan
  2026-06-25 15:25 ` [PATCH v3 3/4] powerpc/watchdog: " Bradley Morgan
  2026-06-25 15:25 ` [PATCH v3 4/4] panic: " Bradley Morgan
  3 siblings, 0 replies; 16+ messages in thread
From: Bradley Morgan @ 2026-06-25 15:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable, Bradley Morgan

The watchdog prints all CPU backtraces itself. When the watchdog mask
contains only SYS_INFO_ALL_BT, stripping that bit leaves zero and
sys_info(0) falls back to kernel_sys_info.

Use sys_info_with_filter() so an explicit all_bt mask does not request
the global default.

Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info on system lockup")
Cc: stable@vger.kernel.org
Signed-off-by: Bradley Morgan <include@grrlz.net>
---
 kernel/watchdog.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 87dd5e0f6968..ff284593cb90 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -208,6 +208,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
 	int hardlockup_all_cpu_backtrace;
 	unsigned int this_cpu;
+	unsigned long si_mask;
 	unsigned long flags;
 
 	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
@@ -216,7 +217,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 		return;
 	}
 
-	hardlockup_all_cpu_backtrace = (hardlockup_si_mask & SYS_INFO_ALL_BT) ?
+	si_mask = READ_ONCE(hardlockup_si_mask);
+	hardlockup_all_cpu_backtrace = (si_mask & SYS_INFO_ALL_BT) ?
 					1 : sysctl_hardlockup_all_cpu_backtrace;
 	/*
 	 * Check for a hardlockup by making sure the CPU's timer
@@ -286,7 +288,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 			clear_bit_unlock(0, &hard_lockup_nmi_warn);
 	}
 
-	sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
+	sys_info_with_filter(si_mask, SYS_INFO_ALL_BT);
 	if (hardlockup_panic)
 		nmi_panic(regs, "Hard LOCKUP");
 
@@ -798,6 +800,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	struct pt_regs *regs = get_irq_regs();
 	int softlockup_all_cpu_backtrace;
 	int duration, thresh_count;
+	unsigned long si_mask;
 	unsigned long flags;
 
 	if (!watchdog_enabled)
@@ -809,7 +812,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	if (panic_in_progress())
 		return HRTIMER_NORESTART;
 
-	softlockup_all_cpu_backtrace = (softlockup_si_mask & SYS_INFO_ALL_BT) ?
+	si_mask = READ_ONCE(softlockup_si_mask);
+	softlockup_all_cpu_backtrace = (si_mask & SYS_INFO_ALL_BT) ?
 					1 : sysctl_softlockup_all_cpu_backtrace;
 
 	watchdog_hardlockup_kick();
@@ -900,7 +904,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		}
 
 		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
-		sys_info(softlockup_si_mask & ~SYS_INFO_ALL_BT);
+		sys_info_with_filter(si_mask, SYS_INFO_ALL_BT);
 		thresh_count = duration / get_softlockup_thresh();
 
 		if (softlockup_panic && thresh_count >= softlockup_panic)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/4] powerpc/watchdog: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-25 15:25 [PATCH v3 0/4] sys_info: prevent duplicate backtraces Bradley Morgan
  2026-06-25 15:25 ` [PATCH v3 1/4] sys_info: add helper for callers that print some sys_info on their own Bradley Morgan
  2026-06-25 15:25 ` [PATCH v3 2/4] watchdog: use sys_info_with_filter() to avoid duplicate backtraces Bradley Morgan
@ 2026-06-25 15:25 ` Bradley Morgan
  2026-06-26  9:42   ` Petr Mladek
  2026-06-25 15:25 ` [PATCH v3 4/4] panic: " Bradley Morgan
  3 siblings, 1 reply; 16+ messages in thread
From: Bradley Morgan @ 2026-06-25 15:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable, Bradley Morgan

The powerpc watchdog prints all CPU backtraces itself. When the watchdog
mask contains only SYS_INFO_ALL_BT, stripping that bit leaves zero and
sys_info(0) falls back to kernel_sys_info.

Use sys_info_with_filter() so an explicit all_bt mask does not request
the global default.

Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info on system lockup")
Cc: stable@vger.kernel.org
Signed-off-by: Bradley Morgan <include@grrlz.net>
---
 arch/powerpc/kernel/watchdog.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index c40c69368476..d3a9c6da962d 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -201,6 +201,7 @@ static bool set_cpu_stuck(int cpu)
 static void watchdog_smp_panic(int cpu)
 {
 	static cpumask_t wd_smp_cpus_ipi; // protected by reporting
+	unsigned long si_mask;
 	unsigned long flags;
 	u64 tb, last_reset;
 	int c;
@@ -236,8 +237,9 @@ static void watchdog_smp_panic(int cpu)
 	pr_emerg("CPU %d TB:%lld, last SMP heartbeat TB:%lld (%lldms ago)\n",
 		 cpu, tb, last_reset, tb_to_ns(tb - last_reset) / 1000000);
 
+	si_mask = READ_ONCE(hardlockup_si_mask);
 	if (sysctl_hardlockup_all_cpu_backtrace ||
-	    (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
+	    (si_mask & SYS_INFO_ALL_BT)) {
 		trigger_allbutcpu_cpu_backtrace(cpu);
 		cpumask_clear(&wd_smp_cpus_ipi);
 	} else {
@@ -251,7 +253,7 @@ static void watchdog_smp_panic(int cpu)
 		}
 	}
 
-	sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
+	sys_info_with_filter(si_mask, SYS_INFO_ALL_BT);
 	if (hardlockup_panic)
 		nmi_panic(NULL, "Hard LOCKUP");
 
@@ -371,6 +373,7 @@ static void watchdog_timer_interrupt(int cpu)
 
 DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
 {
+	unsigned long si_mask;
 	unsigned long flags;
 	int cpu = raw_smp_processor_id();
 	u64 tb;
@@ -418,11 +421,12 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
 
 		xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi
 
+		si_mask = READ_ONCE(hardlockup_si_mask);
 		if (sysctl_hardlockup_all_cpu_backtrace ||
-		    (hardlockup_si_mask & SYS_INFO_ALL_BT))
+		    (si_mask & SYS_INFO_ALL_BT))
 			trigger_allbutcpu_cpu_backtrace(cpu);
 
-		sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
+		sys_info_with_filter(si_mask, SYS_INFO_ALL_BT);
 		if (hardlockup_panic)
 			nmi_panic(regs, "Hard LOCKUP");
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-25 15:25 [PATCH v3 0/4] sys_info: prevent duplicate backtraces Bradley Morgan
                   ` (2 preceding siblings ...)
  2026-06-25 15:25 ` [PATCH v3 3/4] powerpc/watchdog: " Bradley Morgan
@ 2026-06-25 15:25 ` Bradley Morgan
  2026-06-26 10:23   ` Petr Mladek
  3 siblings, 1 reply; 16+ messages in thread
From: Bradley Morgan @ 2026-06-25 15:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable, Bradley Morgan

panic_other_cpus_shutdown() handles SYS_INFO_ALL_BT before stopping the
other CPUs. Do not ask sys_info() to handle that bit again later in the
panic path.

Use sys_info_with_filter() so panic_print=all_bt does not request more
output after the CPUs are stopped.

Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info on system lockup")
Cc: stable@vger.kernel.org
Signed-off-by: Bradley Morgan <include@grrlz.net>
---
 kernel/panic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 213725b612aa..eb842823df61 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -680,7 +680,7 @@ void vpanic(const char *fmt, va_list args)
 	 */
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
-	sys_info(panic_print);
+	sys_info_with_filter(panic_print, SYS_INFO_ALL_BT);
 
 	kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/4] powerpc/watchdog: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-25 15:25 ` [PATCH v3 3/4] powerpc/watchdog: " Bradley Morgan
@ 2026-06-26  9:42   ` Petr Mladek
  0 siblings, 0 replies; 16+ messages in thread
From: Petr Mladek @ 2026-06-26  9:42 UTC (permalink / raw)
  To: Bradley Morgan
  Cc: Andrew Morton, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable

On Thu 2026-06-25 15:25:57, Bradley Morgan wrote:
> The powerpc watchdog prints all CPU backtraces itself. When the watchdog
> mask contains only SYS_INFO_ALL_BT, stripping that bit leaves zero and
> sys_info(0) falls back to kernel_sys_info.
> 
> Use sys_info_with_filter() so an explicit all_bt mask does not request
> the global default.
> 
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -418,11 +421,12 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>  
>  		xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi
>  
> +		si_mask = READ_ONCE(hardlockup_si_mask);
>  		if (sysctl_hardlockup_all_cpu_backtrace ||
> -		    (hardlockup_si_mask & SYS_INFO_ALL_BT))
> +		    (si_mask & SYS_INFO_ALL_BT))
>  			trigger_allbutcpu_cpu_backtrace(cpu);
>  
> -		sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
> +		sys_info_with_filter(si_mask, SYS_INFO_ALL_BT);
>  		if (hardlockup_panic)
>  			nmi_panic(regs, "Hard LOCKUP");

I thought more about it and it is even more complicated.

Even if we prevent the duplicated output with sys_info_with_filter()
here. Then nmi_panic() might still trigger it once again.

We could say that this patch is a step in the right direction and
fix the other problem later. But I am not sure. We might need
a completely different approach and this is just a step aside.

And there is another problem in the panic() code. I am going to
comment in it in the 4th patch.

Best Regards,
Petr


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-25 15:25 ` [PATCH v3 4/4] panic: " Bradley Morgan
@ 2026-06-26 10:23   ` Petr Mladek
  2026-06-26 10:27     ` Bradley Morgan
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Petr Mladek @ 2026-06-26 10:23 UTC (permalink / raw)
  To: Bradley Morgan
  Cc: Andrew Morton, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable

On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
> panic_other_cpus_shutdown() handles SYS_INFO_ALL_BT before stopping the
> other CPUs. Do not ask sys_info() to handle that bit again later in the
> panic path.
> 
> Use sys_info_with_filter() so panic_print=all_bt does not request more
> output after the CPUs are stopped.
> 
> Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info on system lockup")
> Cc: stable@vger.kernel.org
> Signed-off-by: Bradley Morgan <include@grrlz.net>
> ---
>  kernel/panic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 213725b612aa..eb842823df61 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -680,7 +680,7 @@ void vpanic(const char *fmt, va_list args)
>  	 */
>  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>  
> -	sys_info(panic_print);
> +	sys_info_with_filter(panic_print, SYS_INFO_ALL_BT);

Hmm, this prevents printing backtraces from all CPUs completely.
But what if they were not printed?

They might be printed by:

static void panic_other_cpus_shutdown(bool crash_kexec)
{
	if (panic_print & SYS_INFO_ALL_BT)
		panic_trigger_all_cpu_backtrace();

[...]
}

But it checks only "panic_print" variable. It won't do anything
when (panic_print == 0).

In this case, we might still want to print the backraces when
SYS_INFO_ALL_BT is set in kernel_si_info.

>  	kmsg_dump_desc(KMSG_DUMP_PANIC, buf);

Of course, we might fix panic_other_cpus_shutdown() to check also
kernel_si_info.

But it all becomes very hairy. We have several levels:

   + watchdog-all_bt-specific option, e.g. sysctl_hardlockup_all_cpu_backtrace

   + watchdog-specific si_info preferences, e.g. hardlockup_si_mask

   + panic-specific si_info: panic_print

   + universal fallback for any layer: kernel_si_info

Now, we try to check all these variables back and forth to
trigger all backtraces or to avoid triggering them.
And it clearly does not work well and the code is more and more
hairy.

I think about another approach. The word "waterfall" comes to my mind.
Instead of checking all the settings back and forth, let's process
each setting one by one and just remember what has been done and
skip this in the next level.

All the si_info actions seems to dump a global system state.
So, it would make sense to remember the state in a global variable
even when it might be modified by more CPUs in parallel.

I am going to think more about it.

Please, do not send v4 until the discussion settles!

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-26 10:23   ` Petr Mladek
@ 2026-06-26 10:27     ` Bradley Morgan
  2026-06-26 12:06     ` Feng Tang
  2026-06-26 12:14     ` Petr Mladek
  2 siblings, 0 replies; 16+ messages in thread
From: Bradley Morgan @ 2026-06-26 10:27 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable

On June 26, 2026 11:23:48 AM GMT+01:00, Petr Mladek <pmladek@suse.com>
wrote:
>On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
>> panic_other_cpus_shutdown() handles SYS_INFO_ALL_BT before stopping the
>> other CPUs. Do not ask sys_info() to handle that bit again later in the
>> panic path.
>> 
>> Use sys_info_with_filter() so panic_print=all_bt does not request more
>> output after the CPUs are stopped.
>> 
>> Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info on
>system lockup")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Bradley Morgan <include@grrlz.net>
>> ---
>>  kernel/panic.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index 213725b612aa..eb842823df61 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -680,7 +680,7 @@ void vpanic(const char *fmt, va_list args)
>>  	 */
>>  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>>  
>> -	sys_info(panic_print);
>> +	sys_info_with_filter(panic_print, SYS_INFO_ALL_BT);
>
>Hmm, this prevents printing backtraces from all CPUs completely.
>But what if they were not printed?
>
>They might be printed by:
>
>static void panic_other_cpus_shutdown(bool crash_kexec)
>{
>	if (panic_print & SYS_INFO_ALL_BT)
>		panic_trigger_all_cpu_backtrace();
>
>[...]
>}
>
>But it checks only "panic_print" variable. It won't do anything
>when (panic_print == 0).
>
>In this case, we might still want to print the backraces when
>SYS_INFO_ALL_BT is set in kernel_si_info.
>
>>  	kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
>
>Of course, we might fix panic_other_cpus_shutdown() to check also
>kernel_si_info.
>
>But it all becomes very hairy. We have several levels:
>
>   + watchdog-all_bt-specific option, e.g. sysctl_hardlockup_all_cpu_backtrace
>
>   + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
>
>   + panic-specific si_info: panic_print
>
>   + universal fallback for any layer: kernel_si_info
>
>Now, we try to check all these variables back and forth to
>trigger all backtraces or to avoid triggering them.
>And it clearly does not work well and the code is more and more
>hairy.
>
>I think about another approach. The word "waterfall" comes to my mind.
>Instead of checking all the settings back and forth, let's process
>each setting one by one and just remember what has been done and
>skip this in the next level.
>
>All the si_info actions seems to dump a global system state.
>So, it would make sense to remember the state in a global variable
>even when it might be modified by more CPUs in parallel.

Not a bad idea.

>I am going to think more about it.
>
>Please, do not send v4 until the discussion settles!

I'll hold on V4.

When you've finished discussing, could I have your suggested patch?
if I think there is issues. I'll fix it.

>Best Regards,
>Petr
>

Thanks!


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-26 10:23   ` Petr Mladek
  2026-06-26 10:27     ` Bradley Morgan
@ 2026-06-26 12:06     ` Feng Tang
  2026-06-26 12:14     ` Petr Mladek
  2 siblings, 0 replies; 16+ messages in thread
From: Feng Tang @ 2026-06-26 12:06 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Bradley Morgan, Andrew Morton, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable

On Fri, Jun 26, 2026 at 12:23:48PM +0200, Petr Mladek wrote:
> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
> > panic_other_cpus_shutdown() handles SYS_INFO_ALL_BT before stopping the
> > other CPUs. Do not ask sys_info() to handle that bit again later in the
> > panic path.
> > 
> > Use sys_info_with_filter() so panic_print=all_bt does not request more
> > output after the CPUs are stopped.
> > 
> > Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info on system lockup")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Bradley Morgan <include@grrlz.net>
> > ---
> >  kernel/panic.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 213725b612aa..eb842823df61 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -680,7 +680,7 @@ void vpanic(const char *fmt, va_list args)
> >  	 */
> >  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> >  
> > -	sys_info(panic_print);
> > +	sys_info_with_filter(panic_print, SYS_INFO_ALL_BT);
> 
> Hmm, this prevents printing backtraces from all CPUs completely.
> But what if they were not printed?
> 
> They might be printed by:
> 
> static void panic_other_cpus_shutdown(bool crash_kexec)
> {
> 	if (panic_print & SYS_INFO_ALL_BT)
> 		panic_trigger_all_cpu_backtrace();
> 
> [...]
> }
> 
> But it checks only "panic_print" variable. It won't do anything
> when (panic_print == 0).
> 
> In this case, we might still want to print the backraces when
> SYS_INFO_ALL_BT is set in kernel_si_info.

Yep.

> 
> >  	kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
> 
> Of course, we might fix panic_other_cpus_shutdown() to check also
> kernel_si_info.
> 
> But it all becomes very hairy. We have several levels:
> 
>    + watchdog-all_bt-specific option, e.g. sysctl_hardlockup_all_cpu_backtrace
> 
>    + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
> 
>    + panic-specific si_info: panic_print
> 
>    + universal fallback for any layer: kernel_si_info
> 
> Now, we try to check all these variables back and forth to
> trigger all backtraces or to avoid triggering them.
> And it clearly does not work well and the code is more and more
> hairy.

Agree :)
 
> I think about another approach. The word "waterfall" comes to my mind.
> Instead of checking all the settings back and forth, let's process
> each setting one by one and just remember what has been done and
> skip this in the next level.

When initially reviewing V2's 4th patch, I thought about the
'panic_this_cpu_backtrace_printed', but it's a local variable which
records the state.

> All the si_info actions seems to dump a global system state.
> So, it would make sense to remember the state in a global variable
> even when it might be modified by more CPUs in parallel.

IIUC, panic case is kind of special, as it has to separate the
'sys_info()' op in different stage. Can we do a merge in the start
of vpanic() by:

	panic_print = panic_print ?: kernel_si_mask;

 as a addon patch ?

Thanks,
Feng

> I am going to think more about it.
> 
> Please, do not send v4 until the discussion settles!
> 
> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-26 10:23   ` Petr Mladek
  2026-06-26 10:27     ` Bradley Morgan
  2026-06-26 12:06     ` Feng Tang
@ 2026-06-26 12:14     ` Petr Mladek
  2026-06-26 12:17       ` Bradley Morgan
  2 siblings, 1 reply; 16+ messages in thread
From: Petr Mladek @ 2026-06-26 12:14 UTC (permalink / raw)
  To: Bradley Morgan
  Cc: Andrew Morton, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable

On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
> > panic_other_cpus_shutdown() handles SYS_INFO_ALL_BT before stopping the
> > other CPUs. Do not ask sys_info() to handle that bit again later in the
> > panic path.
> > 
> > Use sys_info_with_filter() so panic_print=all_bt does not request more
> > output after the CPUs are stopped.
> > 
> > Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info on system lockup")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Bradley Morgan <include@grrlz.net>
> > ---
> >  kernel/panic.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 213725b612aa..eb842823df61 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -680,7 +680,7 @@ void vpanic(const char *fmt, va_list args)
> >  	 */
> >  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> >  
> > -	sys_info(panic_print);
> > +	sys_info_with_filter(panic_print, SYS_INFO_ALL_BT);
> 
> Hmm, this prevents printing backtraces from all CPUs completely.
> But what if they were not printed?
> 
> They might be printed by:
> 
> static void panic_other_cpus_shutdown(bool crash_kexec)
> {
> 	if (panic_print & SYS_INFO_ALL_BT)
> 		panic_trigger_all_cpu_backtrace();
> 
> [...]
> }
> 
> But it checks only "panic_print" variable. It won't do anything
> when (panic_print == 0).
> 
> In this case, we might still want to print the backraces when
> SYS_INFO_ALL_BT is set in kernel_si_info.
> 
> >  	kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
> 
> Of course, we might fix panic_other_cpus_shutdown() to check also
> kernel_si_info.
> 
> But it all becomes very hairy. We have several levels:
> 
>    + watchdog-all_bt-specific option, e.g. sysctl_hardlockup_all_cpu_backtrace
> 
>    + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
> 
>    + panic-specific si_info: panic_print
> 
>    + universal fallback for any layer: kernel_si_info
> 
> Now, we try to check all these variables back and forth to
> trigger all backtraces or to avoid triggering them.
> And it clearly does not work well and the code is more and more
> hairy.
> 
> I think about another approach. The word "waterfall" comes to my mind.
> Instead of checking all the settings back and forth, let's process
> each setting one by one and just remember what has been done and
> skip this in the next level.
> 
> All the si_info actions seems to dump a global system state.
> So, it would make sense to remember the state in a global variable
> even when it might be modified by more CPUs in parallel.
> 
> I am going to think more about it.

I have created a POC using Gemini. I haven't tested it.
But it looks acceptable. And the logic seems to be more
straightforward.

One drawback is that it requires adding the _reset()
call for all sys_info() callers. It is fine in principle
but it might complicate back-porting because all changes
have to be done in one patch.

But honestly, this is a nice to have fix. Most people could
live happily without it.

From 3c66436d9978030845a96bfaedd6b914536e2ac4 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Fri, 26 Jun 2026 13:55:41 +0200
Subject: [POC] sys_info: Introduce state-tracking APIs to prevent duplicate
 backtraces

In watchdog, panic, and hung task detection scenarios, sys_info() can
be called multiple times or alongside direct backtrace triggers like
trigger_allbutcpu_cpu_backtrace(). This results in identical backtraces
being dumped repeatedly from all CPUs, cluttering the kernel log and
delaying or obscuring critical debug details.

Introduce a state tracking bitmask and associated helpers:
- sys_info_done(mask): Marks specific sys_info bits as already printed.
- sys_info_reset(): Resets the tracking state.
- sys_info_is_done(mask): Checks if all bits in the mask have been printed.

Update sys_info() to automatically filter out already printed bits
using this state. Integrate these APIs with the generic hardlockup
and softlockup watchdogs, the PowerPC watchdog, the hung task detector,
and the panic core. This ensures that each piece of system information
and backtrace output is printed at most once per lockup/panic event,
and the state is reset cleanly when a lockup does not trigger a panic.

Races between sys_info() callers are ignored. It should be acceptable
because the output from various watchdogs has never been synchronized.
And panic() never returns.

Assisted-by: gemini-1.5-flash
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 arch/powerpc/kernel/watchdog.c | 13 ++++++++++---
 include/linux/sys_info.h       |  3 +++
 kernel/hung_task.c             |  2 ++
 kernel/panic.c                 |  4 +++-
 kernel/watchdog.c              | 10 ++++++++--
 lib/sys_info.c                 | 30 +++++++++++++++++++++++++++++-
 6 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index c40c69368476..0eab7894b9dc 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -239,6 +239,7 @@ static void watchdog_smp_panic(int cpu)
 	if (sysctl_hardlockup_all_cpu_backtrace ||
 	    (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
 		trigger_allbutcpu_cpu_backtrace(cpu);
+		sys_info_done(SYS_INFO_ALL_BT);
 		cpumask_clear(&wd_smp_cpus_ipi);
 	} else {
 		/*
@@ -251,10 +252,12 @@ static void watchdog_smp_panic(int cpu)
 		}
 	}
 
-	sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
+	sys_info(hardlockup_si_mask);
 	if (hardlockup_panic)
 		nmi_panic(NULL, "Hard LOCKUP");
 
+	sys_info_reset();
+
 	wd_end_reporting();
 
 	return;
@@ -419,13 +422,17 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
 		xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi
 
 		if (sysctl_hardlockup_all_cpu_backtrace ||
-		    (hardlockup_si_mask & SYS_INFO_ALL_BT))
+		    (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
 			trigger_allbutcpu_cpu_backtrace(cpu);
+			sys_info_done(SYS_INFO_ALL_BT);
+		}
 
-		sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
+		sys_info(hardlockup_si_mask);
 		if (hardlockup_panic)
 			nmi_panic(regs, "Hard LOCKUP");
 
+		sys_info_reset();
+
 		wd_end_reporting();
 	}
 	/*
diff --git a/include/linux/sys_info.h b/include/linux/sys_info.h
index a5bc3ea3d44b..ad43548c75dd 100644
--- a/include/linux/sys_info.h
+++ b/include/linux/sys_info.h
@@ -18,6 +18,9 @@
 #define SYS_INFO_BLOCKED_TASKS		0x00000080
 
 void sys_info(unsigned long si_mask);
+void sys_info_done(unsigned long si_mask);
+void sys_info_reset(void);
+bool sys_info_is_done(unsigned long si_mask);
 unsigned long sys_info_parse_param(char *str);
 
 #ifdef CONFIG_SYSCTL
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 6fcc94ce4ca9..dbb6a27770f5 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -354,6 +354,8 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 
 	if (hung_task_call_panic)
 		panic("hung_task: blocked tasks");
+
+	sys_info_reset();
 }
 
 static long hung_timeout_jiffies(unsigned long last_checked,
diff --git a/kernel/panic.c b/kernel/panic.c
index 213725b612aa..86ce17f03da2 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -550,8 +550,10 @@ static void panic_trigger_all_cpu_backtrace(void)
  */
 static void panic_other_cpus_shutdown(bool crash_kexec)
 {
-	if (panic_print & SYS_INFO_ALL_BT)
+	if ((panic_print & SYS_INFO_ALL_BT) && !sys_info_is_done(SYS_INFO_ALL_BT)) {
 		panic_trigger_all_cpu_backtrace();
+		sys_info_done(SYS_INFO_ALL_BT);
+	}
 
 	/*
 	 * Note that smp_send_stop() is the usual SMP shutdown function,
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 87dd5e0f6968..f431087c68a7 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -282,14 +282,17 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 
 	if (hardlockup_all_cpu_backtrace) {
 		trigger_allbutcpu_cpu_backtrace(cpu);
+		sys_info_done(SYS_INFO_ALL_BT);
 		if (!hardlockup_panic)
 			clear_bit_unlock(0, &hard_lockup_nmi_warn);
 	}
 
-	sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
+	sys_info(hardlockup_si_mask);
 	if (hardlockup_panic)
 		nmi_panic(regs, "Hard LOCKUP");
 
+	sys_info_reset();
+
 	per_cpu(watchdog_hardlockup_warned, cpu) = true;
 }
 
@@ -895,16 +898,19 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 
 		if (softlockup_all_cpu_backtrace) {
 			trigger_allbutcpu_cpu_backtrace(smp_processor_id());
+			sys_info_done(SYS_INFO_ALL_BT);
 			if (!softlockup_panic)
 				clear_bit_unlock(0, &soft_lockup_nmi_warn);
 		}
 
 		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
-		sys_info(softlockup_si_mask & ~SYS_INFO_ALL_BT);
+		sys_info(softlockup_si_mask);
 		thresh_count = duration / get_softlockup_thresh();
 
 		if (softlockup_panic && thresh_count >= softlockup_panic)
 			panic("softlockup: hung tasks");
+
+		sys_info_reset();
 	}
 
 	return HRTIMER_RESTART;
diff --git a/lib/sys_info.c b/lib/sys_info.c
index f32a06ec9ed4..f8e6176fae75 100644
--- a/lib/sys_info.c
+++ b/lib/sys_info.c
@@ -160,7 +160,35 @@ static void __sys_info(unsigned long si_mask)
 		show_state_filter(TASK_UNINTERRUPTIBLE);
 }
 
+static unsigned long sys_info_done_mask;
+
+void sys_info_done(unsigned long si_mask)
+{
+	sys_info_done_mask |= si_mask;
+}
+
+void sys_info_reset(void)
+{
+	sys_info_done_mask = 0;
+}
+
+bool sys_info_is_done(unsigned long si_mask)
+{
+	return (sys_info_done_mask & si_mask) == si_mask;
+}
+
 void sys_info(unsigned long si_mask)
 {
-	__sys_info(si_mask ? : kernel_si_mask);
+	unsigned long mask;
+
+	if (si_mask)
+		mask = si_mask & ~sys_info_done_mask;
+	else
+		mask = kernel_si_mask & ~sys_info_done_mask;
+
+	if (!mask)
+		return;
+
+	__sys_info(mask);
+	sys_info_done(mask);
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-26 12:14     ` Petr Mladek
@ 2026-06-26 12:17       ` Bradley Morgan
  2026-06-26 12:32         ` Bradley Morgan
  0 siblings, 1 reply; 16+ messages in thread
From: Bradley Morgan @ 2026-06-26 12:17 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable

On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek <pmladek@suse.com>
wrote:
>On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
>> > panic_other_cpus_shutdown() handles SYS_INFO_ALL_BT before stopping
>the
>> > other CPUs. Do not ask sys_info() to handle that bit again later in
>the
>> > panic path.
>> > 
>> > Use sys_info_with_filter() so panic_print=all_bt does not request more
>> > output after the CPUs are stopped.
>> > 
>> > Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info
>on system lockup")
>> > Cc: stable@vger.kernel.org
>> > Signed-off-by: Bradley Morgan <include@grrlz.net>
>> > ---
>> >  kernel/panic.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> > 
>> > diff --git a/kernel/panic.c b/kernel/panic.c
>> > index 213725b612aa..eb842823df61 100644
>> > --- a/kernel/panic.c
>> > +++ b/kernel/panic.c
>> > @@ -680,7 +680,7 @@ void vpanic(const char *fmt, va_list args)
>> >  	 */
>> >  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>> >  
>> > -	sys_info(panic_print);
>> > +	sys_info_with_filter(panic_print, SYS_INFO_ALL_BT);
>> 
>> Hmm, this prevents printing backtraces from all CPUs completely.
>> But what if they were not printed?
>> 
>> They might be printed by:
>> 
>> static void panic_other_cpus_shutdown(bool crash_kexec)
>> {
>> 	if (panic_print & SYS_INFO_ALL_BT)
>> 		panic_trigger_all_cpu_backtrace();
>> 
>> [...]
>> }
>> 
>> But it checks only "panic_print" variable. It won't do anything
>> when (panic_print == 0).
>> 
>> In this case, we might still want to print the backraces when
>> SYS_INFO_ALL_BT is set in kernel_si_info.
>> 
>> >  	kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
>> 
>> Of course, we might fix panic_other_cpus_shutdown() to check also
>> kernel_si_info.
>> 
>> But it all becomes very hairy. We have several levels:
>> 
>>    + watchdog-all_bt-specific option, e.g.
>sysctl_hardlockup_all_cpu_backtrace
>> 
>>    + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
>> 
>>    + panic-specific si_info: panic_print
>> 
>>    + universal fallback for any layer: kernel_si_info
>> 
>> Now, we try to check all these variables back and forth to
>> trigger all backtraces or to avoid triggering them.
>> And it clearly does not work well and the code is more and more
>> hairy.
>> 
>> I think about another approach. The word "waterfall" comes to my mind.
>> Instead of checking all the settings back and forth, let's process
>> each setting one by one and just remember what has been done and
>> skip this in the next level.
>> 
>> All the si_info actions seems to dump a global system state.
>> So, it would make sense to remember the state in a global variable
>> even when it might be modified by more CPUs in parallel.
>> 
>> I am going to think more about it.
>
>I have created a POC using Gemini. I haven't tested it.
>But it looks acceptable. And the logic seems to be more
>straightforward.
>
>One drawback is that it requires adding the _reset()
>call for all sys_info() callers. It is fine in principle
>but it might complicate back-porting because all changes
>have to be done in one patch.
>
>But honestly, this is a nice to have fix. Most people could
>live happily without it.
>
>From 3c66436d9978030845a96bfaedd6b914536e2ac4 Mon Sep 17 00:00:00 2001
>From: Petr Mladek <pmladek@suse.com>
>Date: Fri, 26 Jun 2026 13:55:41 +0200
>Subject: [POC] sys_info: Introduce state-tracking APIs to prevent duplicate
> backtraces
>
>In watchdog, panic, and hung task detection scenarios, sys_info() can
>be called multiple times or alongside direct backtrace triggers like
>trigger_allbutcpu_cpu_backtrace(). This results in identical backtraces
>being dumped repeatedly from all CPUs, cluttering the kernel log and
>delaying or obscuring critical debug details.
>
>Introduce a state tracking bitmask and associated helpers:
>- sys_info_done(mask): Marks specific sys_info bits as already printed.
>- sys_info_reset(): Resets the tracking state.
>- sys_info_is_done(mask): Checks if all bits in the mask have been printed.
>
>Update sys_info() to automatically filter out already printed bits
>using this state. Integrate these APIs with the generic hardlockup
>and softlockup watchdogs, the PowerPC watchdog, the hung task detector,
>and the panic core. This ensures that each piece of system information
>and backtrace output is printed at most once per lockup/panic event,
>and the state is reset cleanly when a lockup does not trigger a panic.
>
>Races between sys_info() callers are ignored. It should be acceptable
>because the output from various watchdogs has never been synchronized.
>And panic() never returns.
>
>Assisted-by: gemini-1.5-flash ?

Why not use gemini 3.5 flash?

I can try if you want. 

Could I have the prompt you used? :)

>Signed-off-by: Petr Mladek <pmladek@suse.com>
>---
> arch/powerpc/kernel/watchdog.c | 13 ++++++++++---
> include/linux/sys_info.h       |  3 +++
> kernel/hung_task.c             |  2 ++
> kernel/panic.c                 |  4 +++-
> kernel/watchdog.c              | 10 ++++++++--
> lib/sys_info.c                 | 30 +++++++++++++++++++++++++++++-
> 6 files changed, 55 insertions(+), 7 deletions(-)
>
>diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
>index c40c69368476..0eab7894b9dc 100644
>--- a/arch/powerpc/kernel/watchdog.c
>+++ b/arch/powerpc/kernel/watchdog.c
>@@ -239,6 +239,7 @@ static void watchdog_smp_panic(int cpu)
> 	if (sysctl_hardlockup_all_cpu_backtrace ||
> 	    (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
> 		trigger_allbutcpu_cpu_backtrace(cpu);
>+		sys_info_done(SYS_INFO_ALL_BT);
> 		cpumask_clear(&wd_smp_cpus_ipi);
> 	} else {
> 		/*
>@@ -251,10 +252,12 @@ static void watchdog_smp_panic(int cpu)
> 		}
> 	}
> 
>-	sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>+	sys_info(hardlockup_si_mask);
> 	if (hardlockup_panic)
> 		nmi_panic(NULL, "Hard LOCKUP");
> 
>+	sys_info_reset();
>+
> 	wd_end_reporting();
> 
> 	return;
>@@ -419,13 +422,17 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
> 		xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi
> 
> 		if (sysctl_hardlockup_all_cpu_backtrace ||
>-		    (hardlockup_si_mask & SYS_INFO_ALL_BT))
>+		    (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
> 			trigger_allbutcpu_cpu_backtrace(cpu);
>+			sys_info_done(SYS_INFO_ALL_BT);
>+		}
> 
>-		sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>+		sys_info(hardlockup_si_mask);
> 		if (hardlockup_panic)
> 			nmi_panic(regs, "Hard LOCKUP");
> 
>+		sys_info_reset();
>+
> 		wd_end_reporting();
> 	}
> 	/*
>diff --git a/include/linux/sys_info.h b/include/linux/sys_info.h
>index a5bc3ea3d44b..ad43548c75dd 100644
>--- a/include/linux/sys_info.h
>+++ b/include/linux/sys_info.h
>@@ -18,6 +18,9 @@
> #define SYS_INFO_BLOCKED_TASKS		0x00000080
> 
> void sys_info(unsigned long si_mask);
>+void sys_info_done(unsigned long si_mask);
>+void sys_info_reset(void);
>+bool sys_info_is_done(unsigned long si_mask);
> unsigned long sys_info_parse_param(char *str);
> 
> #ifdef CONFIG_SYSCTL
>diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>index 6fcc94ce4ca9..dbb6a27770f5 100644
>--- a/kernel/hung_task.c
>+++ b/kernel/hung_task.c
>@@ -354,6 +354,8 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> 
> 	if (hung_task_call_panic)
> 		panic("hung_task: blocked tasks");
>+
>+	sys_info_reset();
> }
> 
> static long hung_timeout_jiffies(unsigned long last_checked,
>diff --git a/kernel/panic.c b/kernel/panic.c
>index 213725b612aa..86ce17f03da2 100644
>--- a/kernel/panic.c
>+++ b/kernel/panic.c
>@@ -550,8 +550,10 @@ static void panic_trigger_all_cpu_backtrace(void)
>  */
> static void panic_other_cpus_shutdown(bool crash_kexec)
> {
>-	if (panic_print & SYS_INFO_ALL_BT)
>+	if ((panic_print & SYS_INFO_ALL_BT) && !sys_info_is_done(SYS_INFO_ALL_BT)) {
> 		panic_trigger_all_cpu_backtrace();
>+		sys_info_done(SYS_INFO_ALL_BT);
>+	}
> 
> 	/*
> 	 * Note that smp_send_stop() is the usual SMP shutdown function,
>diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>index 87dd5e0f6968..f431087c68a7 100644
>--- a/kernel/watchdog.c
>+++ b/kernel/watchdog.c
>@@ -282,14 +282,17 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> 
> 	if (hardlockup_all_cpu_backtrace) {
> 		trigger_allbutcpu_cpu_backtrace(cpu);
>+		sys_info_done(SYS_INFO_ALL_BT);
> 		if (!hardlockup_panic)
> 			clear_bit_unlock(0, &hard_lockup_nmi_warn);
> 	}
> 
>-	sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>+	sys_info(hardlockup_si_mask);
> 	if (hardlockup_panic)
> 		nmi_panic(regs, "Hard LOCKUP");
> 
>+	sys_info_reset();
>+
> 	per_cpu(watchdog_hardlockup_warned, cpu) = true;
> }
> 
>@@ -895,16 +898,19 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> 
> 		if (softlockup_all_cpu_backtrace) {
> 			trigger_allbutcpu_cpu_backtrace(smp_processor_id());
>+			sys_info_done(SYS_INFO_ALL_BT);
> 			if (!softlockup_panic)
> 				clear_bit_unlock(0, &soft_lockup_nmi_warn);
> 		}
> 
> 		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
>-		sys_info(softlockup_si_mask & ~SYS_INFO_ALL_BT);
>+		sys_info(softlockup_si_mask);
> 		thresh_count = duration / get_softlockup_thresh();
> 
> 		if (softlockup_panic && thresh_count >= softlockup_panic)
> 			panic("softlockup: hung tasks");
>+
>+		sys_info_reset();
> 	}
> 
> 	return HRTIMER_RESTART;
>diff --git a/lib/sys_info.c b/lib/sys_info.c
>index f32a06ec9ed4..f8e6176fae75 100644
>--- a/lib/sys_info.c
>+++ b/lib/sys_info.c
>@@ -160,7 +160,35 @@ static void __sys_info(unsigned long si_mask)
> 		show_state_filter(TASK_UNINTERRUPTIBLE);
> }
> 
>+static unsigned long sys_info_done_mask;
>+
>+void sys_info_done(unsigned long si_mask)
>+{
>+	sys_info_done_mask |= si_mask;
>+}
>+
>+void sys_info_reset(void)
>+{
>+	sys_info_done_mask = 0;
>+}
>+
>+bool sys_info_is_done(unsigned long si_mask)
>+{
>+	return (sys_info_done_mask & si_mask) == si_mask;
>+}
>+
> void sys_info(unsigned long si_mask)
> {
>-	__sys_info(si_mask ? : kernel_si_mask);
>+	unsigned long mask;
>+
>+	if (si_mask)
>+		mask = si_mask & ~sys_info_done_mask;
>+	else
>+		mask = kernel_si_mask & ~sys_info_done_mask;
>+
>+	if (!mask)
>+		return;
>+
>+	__sys_info(mask);
>+	sys_info_done(mask);
> }
>

Thanks!


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-26 12:17       ` Bradley Morgan
@ 2026-06-26 12:32         ` Bradley Morgan
  2026-06-26 14:26           ` Petr Mladek
  0 siblings, 1 reply; 16+ messages in thread
From: Bradley Morgan @ 2026-06-26 12:32 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable

On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan <include@grrlz.net>
wrote:
>On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek <pmladek@suse.com>
>wrote:
>>On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
>>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
>>> > panic_other_cpus_shutdown() handles SYS_INFO_ALL_BT before stopping
>>the
>>> > other CPUs. Do not ask sys_info() to handle that bit again later in
>>the
>>> > panic path.
>>> > 
>>> > Use sys_info_with_filter() so panic_print=all_bt does not request
>more
>>> > output after the CPUs are stopped.
>>> > 
>>> > Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info
>>on system lockup")
>>> > Cc: stable@vger.kernel.org
>>> > Signed-off-by: Bradley Morgan <include@grrlz.net>
>>> > ---
>>> >  kernel/panic.c | 2 +-
>>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>>> > 
>>> > diff --git a/kernel/panic.c b/kernel/panic.c
>>> > index 213725b612aa..eb842823df61 100644
>>> > --- a/kernel/panic.c
>>> > +++ b/kernel/panic.c
>>> > @@ -680,7 +680,7 @@ void vpanic(const char *fmt, va_list args)
>>> >  	 */
>>> >  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>>> >  
>>> > -	sys_info(panic_print);
>>> > +	sys_info_with_filter(panic_print, SYS_INFO_ALL_BT);
>>> 
>>> Hmm, this prevents printing backtraces from all CPUs completely.
>>> But what if they were not printed?
>>> 
>>> They might be printed by:
>>> 
>>> static void panic_other_cpus_shutdown(bool crash_kexec)
>>> {
>>> 	if (panic_print & SYS_INFO_ALL_BT)
>>> 		panic_trigger_all_cpu_backtrace();
>>> 
>>> [...]
>>> }
>>> 
>>> But it checks only "panic_print" variable. It won't do anything
>>> when (panic_print == 0).
>>> 
>>> In this case, we might still want to print the backraces when
>>> SYS_INFO_ALL_BT is set in kernel_si_info.
>>> 
>>> >  	kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
>>> 
>>> Of course, we might fix panic_other_cpus_shutdown() to check also
>>> kernel_si_info.
>>> 
>>> But it all becomes very hairy. We have several levels:
>>> 
>>>    + watchdog-all_bt-specific option, e.g.
>>sysctl_hardlockup_all_cpu_backtrace
>>> 
>>>    + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
>>> 
>>>    + panic-specific si_info: panic_print
>>> 
>>>    + universal fallback for any layer: kernel_si_info
>>> 
>>> Now, we try to check all these variables back and forth to
>>> trigger all backtraces or to avoid triggering them.
>>> And it clearly does not work well and the code is more and more
>>> hairy.
>>> 
>>> I think about another approach. The word "waterfall" comes to my mind.
>>> Instead of checking all the settings back and forth, let's process
>>> each setting one by one and just remember what has been done and
>>> skip this in the next level.
>>> 
>>> All the si_info actions seems to dump a global system state.
>>> So, it would make sense to remember the state in a global variable
>>> even when it might be modified by more CPUs in parallel.
>>> 
>>> I am going to think more about it.
>>
>>I have created a POC using Gemini. I haven't tested it.
>>But it looks acceptable. And the logic seems to be more
>>straightforward.
>>
>>One drawback is that it requires adding the _reset()
>>call for all sys_info() callers. It is fine in principle
>>but it might complicate back-porting because all changes
>>have to be done in one patch.
>>
>>But honestly, this is a nice to have fix. Most people could
>>live happily without it.
>>
>>From 3c66436d9978030845a96bfaedd6b914536e2ac4 Mon Sep 17 00:00:00 2001
>>From: Petr Mladek <pmladek@suse.com>
>>Date: Fri, 26 Jun 2026 13:55:41 +0200
>>Subject: [POC] sys_info: Introduce state-tracking APIs to prevent
>duplicate
>> backtraces
>>
>>In watchdog, panic, and hung task detection scenarios, sys_info() can
>>be called multiple times or alongside direct backtrace triggers like
>>trigger_allbutcpu_cpu_backtrace(). This results in identical backtraces
>>being dumped repeatedly from all CPUs, cluttering the kernel log and
>>delaying or obscuring critical debug details.
>>
>>Introduce a state tracking bitmask and associated helpers:
>>- sys_info_done(mask): Marks specific sys_info bits as already printed.
>>- sys_info_reset(): Resets the tracking state.
>>- sys_info_is_done(mask): Checks if all bits in the mask have been
>printed.
>>
>>Update sys_info() to automatically filter out already printed bits
>>using this state. Integrate these APIs with the generic hardlockup
>>and softlockup watchdogs, the PowerPC watchdog, the hung task detector,
>>and the panic core. This ensures that each piece of system information
>>and backtrace output is printed at most once per lockup/panic event,
>>and the state is reset cleanly when a lockup does not trigger a panic.
>>
>>Races between sys_info() callers are ignored. It should be acceptable
>>because the output from various watchdogs has never been synchronized.
>>And panic() never returns.
>>
>>Assisted-by: gemini-1.5-flash ?
>
>Why not use gemini 3.5 flash?
>
>I can try if you want. 
>
>Could I have the prompt you used? :)
>
>>Signed-off-by: Petr Mladek <pmladek@suse.com>
>>---
>> arch/powerpc/kernel/watchdog.c | 13 ++++++++++---
>> include/linux/sys_info.h       |  3 +++
>> kernel/hung_task.c             |  2 ++
>> kernel/panic.c                 |  4 +++-
>> kernel/watchdog.c              | 10 ++++++++--
>> lib/sys_info.c                 | 30 +++++++++++++++++++++++++++++-
>> 6 files changed, 55 insertions(+), 7 deletions(-)
>>
>>diff --git a/arch/powerpc/kernel/watchdog.c
>b/arch/powerpc/kernel/watchdog.c
>>index c40c69368476..0eab7894b9dc 100644
>>--- a/arch/powerpc/kernel/watchdog.c
>>+++ b/arch/powerpc/kernel/watchdog.c
>>@@ -239,6 +239,7 @@ static void watchdog_smp_panic(int cpu)
>> 	if (sysctl_hardlockup_all_cpu_backtrace ||
>> 	    (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
>> 		trigger_allbutcpu_cpu_backtrace(cpu);
>>+		sys_info_done(SYS_INFO_ALL_BT);
>> 		cpumask_clear(&wd_smp_cpus_ipi);
>> 	} else {
>> 		/*
>>@@ -251,10 +252,12 @@ static void watchdog_smp_panic(int cpu)
>> 		}
>> 	}
>> 
>>-	sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>>+	sys_info(hardlockup_si_mask);
>> 	if (hardlockup_panic)
>> 		nmi_panic(NULL, "Hard LOCKUP");
>> 
>>+	sys_info_reset();
>>+
>> 	wd_end_reporting();
>> 
>> 	return;
>>@@ -419,13 +422,17 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>> 		xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi
>> 
>> 		if (sysctl_hardlockup_all_cpu_backtrace ||
>>-		    (hardlockup_si_mask & SYS_INFO_ALL_BT))
>>+		    (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
>> 			trigger_allbutcpu_cpu_backtrace(cpu);
>>+			sys_info_done(SYS_INFO_ALL_BT);
>>+		}
>> 
>>-		sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>>+		sys_info(hardlockup_si_mask);
>> 		if (hardlockup_panic)
>> 			nmi_panic(regs, "Hard LOCKUP");
>> 
>>+		sys_info_reset();
>>+
>> 		wd_end_reporting();
>> 	}
>> 	/*
>>diff --git a/include/linux/sys_info.h b/include/linux/sys_info.h
>>index a5bc3ea3d44b..ad43548c75dd 100644
>>--- a/include/linux/sys_info.h
>>+++ b/include/linux/sys_info.h
>>@@ -18,6 +18,9 @@
>> #define SYS_INFO_BLOCKED_TASKS		0x00000080
>> 
>> void sys_info(unsigned long si_mask);
>>+void sys_info_done(unsigned long si_mask);
>>+void sys_info_reset(void);
>>+bool sys_info_is_done(unsigned long si_mask);
>> unsigned long sys_info_parse_param(char *str);
>> 
>> #ifdef CONFIG_SYSCTL
>>diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>>index 6fcc94ce4ca9..dbb6a27770f5 100644
>>--- a/kernel/hung_task.c
>>+++ b/kernel/hung_task.c
>>@@ -354,6 +354,8 @@ static void check_hung_uninterruptible_tasks(unsigned
>long timeout)
>> 
>> 	if (hung_task_call_panic)
>> 		panic("hung_task: blocked tasks");
>>+
>>+	sys_info_reset();
>> }
>> 
>> static long hung_timeout_jiffies(unsigned long last_checked,
>>diff --git a/kernel/panic.c b/kernel/panic.c
>>index 213725b612aa..86ce17f03da2 100644
>>--- a/kernel/panic.c
>>+++ b/kernel/panic.c
>>@@ -550,8 +550,10 @@ static void panic_trigger_all_cpu_backtrace(void)
>>  */
>> static void panic_other_cpus_shutdown(bool crash_kexec)
>> {
>>-	if (panic_print & SYS_INFO_ALL_BT)
>>+	if ((panic_print & SYS_INFO_ALL_BT) && !sys_info_is_done(SYS_INFO_ALL_BT)) {
>> 		panic_trigger_all_cpu_backtrace();
>>+		sys_info_done(SYS_INFO_ALL_BT);
>>+	}
>> 
>> 	/*
>> 	 * Note that smp_send_stop() is the usual SMP shutdown function,
>>diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>>index 87dd5e0f6968..f431087c68a7 100644
>>--- a/kernel/watchdog.c
>>+++ b/kernel/watchdog.c
>>@@ -282,14 +282,17 @@ void watchdog_hardlockup_check(unsigned int cpu,
>struct pt_regs *regs)
>> 
>> 	if (hardlockup_all_cpu_backtrace) {
>> 		trigger_allbutcpu_cpu_backtrace(cpu);
>>+		sys_info_done(SYS_INFO_ALL_BT);
>> 		if (!hardlockup_panic)
>> 			clear_bit_unlock(0, &hard_lockup_nmi_warn);
>> 	}
>> 
>>-	sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>>+	sys_info(hardlockup_si_mask);
>> 	if (hardlockup_panic)
>> 		nmi_panic(regs, "Hard LOCKUP");
>> 
>>+	sys_info_reset();
>>+
>> 	per_cpu(watchdog_hardlockup_warned, cpu) = true;
>> }
>> 
>>@@ -895,16 +898,19 @@ static enum hrtimer_restart
>watchdog_timer_fn(struct hrtimer *hrtimer)
>> 
>> 		if (softlockup_all_cpu_backtrace) {
>> 			trigger_allbutcpu_cpu_backtrace(smp_processor_id());
>>+			sys_info_done(SYS_INFO_ALL_BT);
>> 			if (!softlockup_panic)
>> 				clear_bit_unlock(0, &soft_lockup_nmi_warn);
>> 		}
>> 
>> 		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
>>-		sys_info(softlockup_si_mask & ~SYS_INFO_ALL_BT);
>>+		sys_info(softlockup_si_mask);
>> 		thresh_count = duration / get_softlockup_thresh();
>> 
>> 		if (softlockup_panic && thresh_count >= softlockup_panic)
>> 			panic("softlockup: hung tasks");
>>+
>>+		sys_info_reset();
>> 	}
>> 
>> 	return HRTIMER_RESTART;
>>diff --git a/lib/sys_info.c b/lib/sys_info.c
>>index f32a06ec9ed4..f8e6176fae75 100644
>>--- a/lib/sys_info.c
>>+++ b/lib/sys_info.c
>>@@ -160,7 +160,35 @@ static void __sys_info(unsigned long si_mask)
>> 		show_state_filter(TASK_UNINTERRUPTIBLE);
>> }
>> 
>>+static unsigned long sys_info_done_mask;
>>+
>>+void sys_info_done(unsigned long si_mask)
>>+{
>>+	sys_info_done_mask |= si_mask;
>>+}
>>+
>>+void sys_info_reset(void)
>>+{
>>+	sys_info_done_mask = 0;
>>+}
>>+
>>+bool sys_info_is_done(unsigned long si_mask)
>>+{
>>+	return (sys_info_done_mask & si_mask) == si_mask;
>>+}
>>+
>> void sys_info(unsigned long si_mask)
>> {
>>-	__sys_info(si_mask ? : kernel_si_mask);
>>+	unsigned long mask;
>>+
>>+	if (si_mask)
>>+		mask = si_mask & ~sys_info_done_mask;
>>+	else
>>+		mask = kernel_si_mask & ~sys_info_done_mask;
>>+
>>+	if (!mask)
>>+		return;
>>+
>>+	__sys_info(mask);
>>+	sys_info_done(mask);
>> }
>>
>
>Thanks!


Hmm.. new idea 

kernel/dump_filter.c ?

What this file could do is to handle a generic lockup state machine
so any subsystem can log what it already dumped?


I know it may bloat, but it's better then cramming fixes in.

What do you guys think? Maybe we could start a RFC for this?

Thanks!


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-26 12:32         ` Bradley Morgan
@ 2026-06-26 14:26           ` Petr Mladek
  2026-06-26 14:35             ` Bradley Morgan
  0 siblings, 1 reply; 16+ messages in thread
From: Petr Mladek @ 2026-06-26 14:26 UTC (permalink / raw)
  To: Bradley Morgan
  Cc: Andrew Morton, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable

On Fri 2026-06-26 13:32:38, Bradley Morgan wrote:
> On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan <include@grrlz.net>
> wrote:
> >On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek <pmladek@suse.com>
> >wrote:
> >>On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
> >>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
> >>> But it all becomes very hairy. We have several levels:
> >>> 
> >>>    + watchdog-all_bt-specific option, e.g.
> >>sysctl_hardlockup_all_cpu_backtrace
> >>> 
> >>>    + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
> >>> 
> >>>    + panic-specific si_info: panic_print
> >>> 
> >>>    + universal fallback for any layer: kernel_si_info
> >>> 
> >>> Now, we try to check all these variables back and forth to
> >>> trigger all backtraces or to avoid triggering them.
> >>> And it clearly does not work well and the code is more and more
> >>> hairy.
> >>> 
> >>> I think about another approach. The word "waterfall" comes to my mind.
> >>> Instead of checking all the settings back and forth, let's process
> >>> each setting one by one and just remember what has been done and
> >>> skip this in the next level.
> >>> 
> >>> All the si_info actions seems to dump a global system state.
> >>> So, it would make sense to remember the state in a global variable
> >>> even when it might be modified by more CPUs in parallel.
> >>> 
> Hmm.. new idea 
> 
> kernel/dump_filter.c ?
> 
> What this file could do is to handle a generic lockup state machine
> so any subsystem can log what it already dumped?
> 
> I know it may bloat, but it's better then cramming fixes in.

I am not sure what exactly you would like to achieve but it sounds
a bit scary ;-)

Anyway, we should not synchronize the watchdog reports against
each other, definitely. They are running in non-compatible contexts
(task vs interrupt vs NMI). Also we should not add any locking
because they usually print something when the system has enough
troubles.

Also I think that it is not worth preventing duplicated backtraces
or reports from a single CPU. IMHO, it is not a big problem
in practice.

So, we are down to large reports, like backtraces from all CPUs,
timers, locks, ... which are handled by sys_info(). So, I think
that it should be enough to handle this inside the sys_info() API.

I do not want to say that my proposal was the best solution.
I am sure that there are better ones. But we need to consider
the gain vs. complexity.

Honestly, I am already a bit scared by the complexity which
we the sys_info() API added. And it is hard to imagine that
adding another API would make it easier. But I might be wrong.

Instead, it might make sense to integrate the conflicting
subsystem-specific calls under the sys_info() API.
I mean that, for example watchdog_hardlockup_check() won't
call trigger_allbutcpu_cpu_backtrace() directly but
it would call it via sys_info() API so that sys_info()
could keep track of it. Something like:

void sys_info_allbutcpu_bt(int cpu)
{
	trigger_allbutcpu_cpu_backtrace(cpu);
	/*
	 * The caller likely printed backtrace of the given @cpu
	 * on its own. Prevent duplicate backtraces from all
	 * CPUs with potential next sys_info() call.
	 */
	sys_info_done(SYS_INFO_ALL_BT);
}

But I am not sure if it is really easier to follow
than calling sys_info_done() from the watchdog code.

Some watchdogs try to optimize the output and print backtraces
only from CPUs which are relevant for the given lockup.
We should keep the logic for selecting the set of CPUs
in the watchdog code. We just need to solve how to elegantly
make sys_info() aware of it or at least about the more massive
reports.

Anyway, I would prefer to keep it simple until we see some problems
in practice.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-26 14:26           ` Petr Mladek
@ 2026-06-26 14:35             ` Bradley Morgan
  2026-06-26 14:47               ` Petr Mladek
  0 siblings, 1 reply; 16+ messages in thread
From: Bradley Morgan @ 2026-06-26 14:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable

On June 26, 2026 3:26:11 PM GMT+01:00, Petr Mladek <pmladek@suse.com>
wrote:
>On Fri 2026-06-26 13:32:38, Bradley Morgan wrote:
>> On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan
><include@grrlz.net>
>> wrote:
>> >On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek <pmladek@suse.com>
>> >wrote:
>> >>On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
>> >>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
>> >>> But it all becomes very hairy. We have several levels:
>> >>> 
>> >>>    + watchdog-all_bt-specific option, e.g.
>> >>sysctl_hardlockup_all_cpu_backtrace
>> >>> 
>> >>>    + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
>> >>> 
>> >>>    + panic-specific si_info: panic_print
>> >>> 
>> >>>    + universal fallback for any layer: kernel_si_info
>> >>> 
>> >>> Now, we try to check all these variables back and forth to
>> >>> trigger all backtraces or to avoid triggering them.
>> >>> And it clearly does not work well and the code is more and more
>> >>> hairy.
>> >>> 
>> >>> I think about another approach. The word "waterfall" comes to my
>mind.
>> >>> Instead of checking all the settings back and forth, let's process
>> >>> each setting one by one and just remember what has been done and
>> >>> skip this in the next level.
>> >>> 
>> >>> All the si_info actions seems to dump a global system state.
>> >>> So, it would make sense to remember the state in a global variable
>> >>> even when it might be modified by more CPUs in parallel.
>> >>> 
>> Hmm.. new idea 
>> 
>> kernel/dump_filter.c ?
>> 
>> What this file could do is to handle a generic lockup state machine
>> so any subsystem can log what it already dumped?
>> 
>> I know it may bloat, but it's better then cramming fixes in.
>
>I am not sure what exactly you would like to achieve but it sounds
>a bit scary ;-)
>
>Anyway, we should not synchronize the watchdog reports against
>each other, definitely. They are running in non-compatible contexts
>(task vs interrupt vs NMI). Also we should not add any locking
>because they usually print something when the system has enough
>troubles.
>
>Also I think that it is not worth preventing duplicated backtraces
>or reports from a single CPU. IMHO, it is not a big problem
>in practice.
>
>So, we are down to large reports, like backtraces from all CPUs,
>timers, locks, ... which are handled by sys_info(). So, I think
>that it should be enough to handle this inside the sys_info() API.
>
>I do not want to say that my proposal was the best solution.
>I am sure that there are better ones. But we need to consider
>the gain vs. complexity.
>
>Honestly, I am already a bit scared by the complexity which
>we the sys_info() API added. And it is hard to imagine that
>adding another API would make it easier. But I might be wrong.
>
>Instead, it might make sense to integrate the conflicting
>subsystem-specific calls under the sys_info() API.
>I mean that, for example watchdog_hardlockup_check() won't
>call trigger_allbutcpu_cpu_backtrace() directly but
>it would call it via sys_info() API so that sys_info()
>could keep track of it. Something like:
>
>void sys_info_allbutcpu_bt(int cpu)
>{
>	trigger_allbutcpu_cpu_backtrace(cpu);
>	/*
>	 * The caller likely printed backtrace of the given @cpu
>	 * on its own. Prevent duplicate backtraces from all
>	 * CPUs with potential next sys_info() call.
>	 */
>	sys_info_done(SYS_INFO_ALL_BT);
>}
>
>But I am not sure if it is really easier to follow
>than calling sys_info_done() from the watchdog code.
>
>Some watchdogs try to optimize the output and print backtraces
>only from CPUs which are relevant for the given lockup.
>We should keep the logic for selecting the set of CPUs
>in the watchdog code. We just need to solve how to elegantly
>make sys_info() aware of it or at least about the more massive
>reports.
>
>Anyway, I would prefer to keep it simple until we see some problems
>in practice.
>
>Best Regards,
>Petr
>


I understand it's scary. To make a new file in the first place.

But I was a bit vague of what I wanted, and I'm sorry.

So, the reason why I'd suggest a new file, is because if any subsystem
Theoretically bypasses sys_info to log a lockup, this completely misses
the filter and duplicates the dump

My file would act as a generic lockless state machine that any
subsystem can update regardless of how they dump logs.

If you have any questions, feel absolutely free to ask! :)

Discussion is a way to make everyone happy!

Thanks!


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-26 14:35             ` Bradley Morgan
@ 2026-06-26 14:47               ` Petr Mladek
  2026-06-26 14:58                 ` Bradley Morgan
  0 siblings, 1 reply; 16+ messages in thread
From: Petr Mladek @ 2026-06-26 14:47 UTC (permalink / raw)
  To: Bradley Morgan
  Cc: Andrew Morton, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable

On Fri 2026-06-26 15:35:19, Bradley Morgan wrote:
> On June 26, 2026 3:26:11 PM GMT+01:00, Petr Mladek <pmladek@suse.com>
> wrote:
> >On Fri 2026-06-26 13:32:38, Bradley Morgan wrote:
> >> On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan
> ><include@grrlz.net>
> >> wrote:
> >> >On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek <pmladek@suse.com>
> >> >wrote:
> >> >>On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
> >> >>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
> >> >>> But it all becomes very hairy. We have several levels:
> >> >>> 
> >> >>>    + watchdog-all_bt-specific option, e.g.
> >> >>sysctl_hardlockup_all_cpu_backtrace
> >> >>> 
> >> >>>    + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
> >> >>> 
> >> >>>    + panic-specific si_info: panic_print
> >> >>> 
> >> >>>    + universal fallback for any layer: kernel_si_info
> >> >>> 
> >> >>> Now, we try to check all these variables back and forth to
> >> >>> trigger all backtraces or to avoid triggering them.
> >> >>> And it clearly does not work well and the code is more and more
> >> >>> hairy.
> >> >>> 
> >> >>> I think about another approach. The word "waterfall" comes to my
> >mind.
> >> >>> Instead of checking all the settings back and forth, let's process
> >> >>> each setting one by one and just remember what has been done and
> >> >>> skip this in the next level.
> >> >>> 
> >> >>> All the si_info actions seems to dump a global system state.
> >> >>> So, it would make sense to remember the state in a global variable
> >> >>> even when it might be modified by more CPUs in parallel.
> >> >>> 
> >> Hmm.. new idea 
> >> 
> >> kernel/dump_filter.c ?
> >> 
> >> What this file could do is to handle a generic lockup state machine
> >> so any subsystem can log what it already dumped?
> >> 
> >> I know it may bloat, but it's better then cramming fixes in.
> >
> >I am not sure what exactly you would like to achieve but it sounds
> >a bit scary ;-)
> >
> >Anyway, we should not synchronize the watchdog reports against
> >each other, definitely. They are running in non-compatible contexts
> >(task vs interrupt vs NMI). Also we should not add any locking
> >because they usually print something when the system has enough
> >troubles.
> >
> >Also I think that it is not worth preventing duplicated backtraces
> >or reports from a single CPU. IMHO, it is not a big problem
> >in practice.
> >
> >So, we are down to large reports, like backtraces from all CPUs,
> >timers, locks, ... which are handled by sys_info(). So, I think
> >that it should be enough to handle this inside the sys_info() API.
> >
> >I do not want to say that my proposal was the best solution.
> >I am sure that there are better ones. But we need to consider
> >the gain vs. complexity.
> >
> >Honestly, I am already a bit scared by the complexity which
> >we the sys_info() API added. And it is hard to imagine that
> >adding another API would make it easier. But I might be wrong.
> >
> >Instead, it might make sense to integrate the conflicting
> >subsystem-specific calls under the sys_info() API.
> >I mean that, for example watchdog_hardlockup_check() won't
> >call trigger_allbutcpu_cpu_backtrace() directly but
> >it would call it via sys_info() API so that sys_info()
> >could keep track of it. Something like:
> >
> >void sys_info_allbutcpu_bt(int cpu)
> >{
> >	trigger_allbutcpu_cpu_backtrace(cpu);
> >	/*
> >	 * The caller likely printed backtrace of the given @cpu
> >	 * on its own. Prevent duplicate backtraces from all
> >	 * CPUs with potential next sys_info() call.
> >	 */
> >	sys_info_done(SYS_INFO_ALL_BT);
> >}
> >
> >But I am not sure if it is really easier to follow
> >than calling sys_info_done() from the watchdog code.
> >
> >Some watchdogs try to optimize the output and print backtraces
> >only from CPUs which are relevant for the given lockup.
> >We should keep the logic for selecting the set of CPUs
> >in the watchdog code. We just need to solve how to elegantly
> >make sys_info() aware of it or at least about the more massive
> >reports.
> >
> >Anyway, I would prefer to keep it simple until we see some problems
> >in practice.
> >
> >Best Regards,
> >Petr
> >
> 
> 
> I understand it's scary. To make a new file in the first place.
> 
> But I was a bit vague of what I wanted, and I'm sorry.
> 
> So, the reason why I'd suggest a new file, is because if any subsystem
> Theoretically bypasses sys_info to log a lockup, this completely misses
> the filter and duplicates the dump
> 
> My file would act as a generic lockless state machine that any
> subsystem can update regardless of how they dump logs.
> 
> If you have any questions, feel absolutely free to ask! :)
> 
> Discussion is a way to make everyone happy!

Honestly, I am more and more wondering whether your are a real person
or AI bot.

Best Regards,
Petr


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
  2026-06-26 14:47               ` Petr Mladek
@ 2026-06-26 14:58                 ` Bradley Morgan
  0 siblings, 0 replies; 16+ messages in thread
From: Bradley Morgan @ 2026-06-26 14:58 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Feng Tang, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Madhavan Srinivasan, Douglas Anderson,
	linux-kernel, linuxppc-dev, stable

On June 26, 2026 3:47:12 PM GMT+01:00, Petr Mladek <pmladek@suse.com>
wrote:
>On Fri 2026-06-26 15:35:19, Bradley Morgan wrote:
>> On June 26, 2026 3:26:11 PM GMT+01:00, Petr Mladek <pmladek@suse.com>
>> wrote:
>> >On Fri 2026-06-26 13:32:38, Bradley Morgan wrote:
>> >> On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan
>> ><include@grrlz.net>
>> >> wrote:
>> >> >On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek
><pmladek@suse.com>
>> >> >wrote:
>> >> >>On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
>> >> >>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
>> >> >>> But it all becomes very hairy. We have several levels:
>> >> >>> 
>> >> >>>    + watchdog-all_bt-specific option, e.g.
>> >> >>sysctl_hardlockup_all_cpu_backtrace
>> >> >>> 
>> >> >>>    + watchdog-specific si_info preferences, e.g.
>hardlockup_si_mask
>> >> >>> 
>> >> >>>    + panic-specific si_info: panic_print
>> >> >>> 
>> >> >>>    + universal fallback for any layer: kernel_si_info
>> >> >>> 
>> >> >>> Now, we try to check all these variables back and forth to
>> >> >>> trigger all backtraces or to avoid triggering them.
>> >> >>> And it clearly does not work well and the code is more and more
>> >> >>> hairy.
>> >> >>> 
>> >> >>> I think about another approach. The word "waterfall" comes to my
>> >mind.
>> >> >>> Instead of checking all the settings back and forth, let's
>process
>> >> >>> each setting one by one and just remember what has been done and
>> >> >>> skip this in the next level.
>> >> >>> 
>> >> >>> All the si_info actions seems to dump a global system state.
>> >> >>> So, it would make sense to remember the state in a global
>variable
>> >> >>> even when it might be modified by more CPUs in parallel.
>> >> >>> 
>> >> Hmm.. new idea 
>> >> 
>> >> kernel/dump_filter.c ?
>> >> 
>> >> What this file could do is to handle a generic lockup state machine
>> >> so any subsystem can log what it already dumped?
>> >> 
>> >> I know it may bloat, but it's better then cramming fixes in.
>> >
>> >I am not sure what exactly you would like to achieve but it sounds
>> >a bit scary ;-)
>> >
>> >Anyway, we should not synchronize the watchdog reports against
>> >each other, definitely. They are running in non-compatible contexts
>> >(task vs interrupt vs NMI). Also we should not add any locking
>> >because they usually print something when the system has enough
>> >troubles.
>> >
>> >Also I think that it is not worth preventing duplicated backtraces
>> >or reports from a single CPU. IMHO, it is not a big problem
>> >in practice.
>> >
>> >So, we are down to large reports, like backtraces from all CPUs,
>> >timers, locks, ... which are handled by sys_info(). So, I think
>> >that it should be enough to handle this inside the sys_info() API.
>> >
>> >I do not want to say that my proposal was the best solution.
>> >I am sure that there are better ones. But we need to consider
>> >the gain vs. complexity.
>> >
>> >Honestly, I am already a bit scared by the complexity which
>> >we the sys_info() API added. And it is hard to imagine that
>> >adding another API would make it easier. But I might be wrong.
>> >
>> >Instead, it might make sense to integrate the conflicting
>> >subsystem-specific calls under the sys_info() API.
>> >I mean that, for example watchdog_hardlockup_check() won't
>> >call trigger_allbutcpu_cpu_backtrace() directly but
>> >it would call it via sys_info() API so that sys_info()
>> >could keep track of it. Something like:
>> >
>> >void sys_info_allbutcpu_bt(int cpu)
>> >{
>> >	trigger_allbutcpu_cpu_backtrace(cpu);
>> >	/*
>> >	 * The caller likely printed backtrace of the given @cpu
>> >	 * on its own. Prevent duplicate backtraces from all
>> >	 * CPUs with potential next sys_info() call.
>> >	 */
>> >	sys_info_done(SYS_INFO_ALL_BT);
>> >}
>> >
>> >But I am not sure if it is really easier to follow
>> >than calling sys_info_done() from the watchdog code.
>> >
>> >Some watchdogs try to optimize the output and print backtraces
>> >only from CPUs which are relevant for the given lockup.
>> >We should keep the logic for selecting the set of CPUs
>> >in the watchdog code. We just need to solve how to elegantly
>> >make sys_info() aware of it or at least about the more massive
>> >reports.
>> >
>> >Anyway, I would prefer to keep it simple until we see some problems
>> >in practice.
>> >
>> >Best Regards,
>> >Petr
>> >
>> 
>> 
>> I understand it's scary. To make a new file in the first place.
>> 
>> But I was a bit vague of what I wanted, and I'm sorry.
>> 
>> So, the reason why I'd suggest a new file, is because if any subsystem
>> Theoretically bypasses sys_info to log a lockup, this completely misses
>> the filter and duplicates the dump
>> 
>> My file would act as a generic lockless state machine that any
>> subsystem can update regardless of how they dump logs.
>> 
>> If you have any questions, feel absolutely free to ask! :)
>> 
>> Discussion is a way to make everyone happy!
>
>Honestly, I am more and more wondering whether your are a real person
>or AI bot.

Sigh..

I can verify myself through video call if you don't believe I am human :)

why I suggested a new file is because AI said it would be a good idea.

I told it what I should do, and it told me to do a new file.

I knew it was over engineering slightly, but I was a bit stressed, 
and I wanted some sort of just new API which is less buggy imho

I should've told you that I used AI to figure the whole new file idea, 


Really sorry petr..

>Best Regards,
>Petr
>

Thanks!


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-06-26 14:59 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 15:25 [PATCH v3 0/4] sys_info: prevent duplicate backtraces Bradley Morgan
2026-06-25 15:25 ` [PATCH v3 1/4] sys_info: add helper for callers that print some sys_info on their own Bradley Morgan
2026-06-25 15:25 ` [PATCH v3 2/4] watchdog: use sys_info_with_filter() to avoid duplicate backtraces Bradley Morgan
2026-06-25 15:25 ` [PATCH v3 3/4] powerpc/watchdog: " Bradley Morgan
2026-06-26  9:42   ` Petr Mladek
2026-06-25 15:25 ` [PATCH v3 4/4] panic: " Bradley Morgan
2026-06-26 10:23   ` Petr Mladek
2026-06-26 10:27     ` Bradley Morgan
2026-06-26 12:06     ` Feng Tang
2026-06-26 12:14     ` Petr Mladek
2026-06-26 12:17       ` Bradley Morgan
2026-06-26 12:32         ` Bradley Morgan
2026-06-26 14:26           ` Petr Mladek
2026-06-26 14:35             ` Bradley Morgan
2026-06-26 14:47               ` Petr Mladek
2026-06-26 14:58                 ` Bradley Morgan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.