All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Bradley Morgan <include@grrlz.net>
Cc: Feng Tang <feng.tang@linux.alibaba.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <chleroy@kernel.org>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Douglas Anderson <dianders@chromium.org>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
Date: Thu, 2 Jul 2026 11:09:41 +0200	[thread overview]
Message-ID: <akYq1YaCpZ0b4SBS@pathway.suse.cz> (raw)
In-Reply-To: <E482A23D-4E1C-42C0-9D07-83C6CDFD1546@grrlz.net>

On Mon 2026-06-29 13:54:18, Bradley Morgan wrote:
> On 29 June 2026 12:40:52 BST, Feng Tang <feng.tang@linux.alibaba.com>
> wrote:
> >On Fri, Jun 26, 2026 at 02:14:14PM +0200, Petr Mladek wrote:
> >> On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
> >> > On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
> >> > > panic_other_cpus_shutdown() handles SYS_INFO_ALL_BT before stopping
> >the
> >> > > other CPUs. Do not ask sys_info() to handle that bit again later in
> >the
> >> > > panic path.
> >> > > 
> >> > > Use sys_info_with_filter() so panic_print=all_bt does not request
> >more
> >> > > output after the CPUs are stopped.
> >> > > 
> >> > > Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys
> >info on system lockup")
> >> > > Cc: stable@vger.kernel.org
> >> > > Signed-off-by: Bradley Morgan <include@grrlz.net>
> >> > > ---
> >> > >  kernel/panic.c | 2 +-
> >> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> > > 
> >> > > diff --git a/kernel/panic.c b/kernel/panic.c
> >> > > index 213725b612aa..eb842823df61 100644
> >> > > --- a/kernel/panic.c
> >> > > +++ b/kernel/panic.c
> >> > > @@ -680,7 +680,7 @@ void vpanic(const char *fmt, va_list args)
> >> > >  	 */
> >> > >  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> >> > >  
> >> > > -	sys_info(panic_print);
> >> > > +	sys_info_with_filter(panic_print, SYS_INFO_ALL_BT);
> >> > 
> >> > Hmm, this prevents printing backtraces from all CPUs completely.
> >> > But what if they were not printed?
> >> > 
> >> > They might be printed by:
> >> > 
> >> > static void panic_other_cpus_shutdown(bool crash_kexec)
> >> > {
> >> > 	if (panic_print & SYS_INFO_ALL_BT)
> >> > 		panic_trigger_all_cpu_backtrace();
> >> > 
> >> > [...]
> >> > }
> >> > 
> >> > But it checks only "panic_print" variable. It won't do anything
> >> > when (panic_print == 0).
> >> > 
> >> > In this case, we might still want to print the backraces when
> >> > SYS_INFO_ALL_BT is set in kernel_si_info.
> >> > 
> >> > >  	kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
> >> > 
> >> > Of course, we might fix panic_other_cpus_shutdown() to check also
> >> > kernel_si_info.
> >> > 
> >> > But it all becomes very hairy. We have several levels:
> >> > 
> >> >    + watchdog-all_bt-specific option, e.g.
> >sysctl_hardlockup_all_cpu_backtrace
> >> > 
> >> >    + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
> >> > 
> >> >    + panic-specific si_info: panic_print
> >> > 
> >> >    + universal fallback for any layer: kernel_si_info
> >> > 
> >> > Now, we try to check all these variables back and forth to
> >> > trigger all backtraces or to avoid triggering them.
> >> > And it clearly does not work well and the code is more and more
> >> > hairy.
> >> > 
> >> > I think about another approach. The word "waterfall" comes to my mind.
> >> > Instead of checking all the settings back and forth, let's process
> >> > each setting one by one and just remember what has been done and
> >> > skip this in the next level.
> >> > 
> >> > All the si_info actions seems to dump a global system state.
> >> > So, it would make sense to remember the state in a global variable
> >> > even when it might be modified by more CPUs in parallel.
> >> > 
> >> > I am going to think more about it.
> >> 
> >> I have created a POC using Gemini. I haven't tested it.
> >> But it looks acceptable. And the logic seems to be more
> >> straightforward.
> >> 
> >> One drawback is that it requires adding the _reset()
> >> call for all sys_info() callers. It is fine in principle
> >> but it might complicate back-porting because all changes
> >> have to be done in one patch.
> >> 
> >> But honestly, this is a nice to have fix. Most people could
> >> live happily without it.
> >> 
> >> From 3c66436d9978030845a96bfaedd6b914536e2ac4 Mon Sep 17 00:00:00 2001
> >> From: Petr Mladek <pmladek@suse.com>
> >> Date: Fri, 26 Jun 2026 13:55:41 +0200
> >> Subject: [POC] sys_info: Introduce state-tracking APIs to prevent
> >duplicate
> >>  backtraces
> >> 
> >> In watchdog, panic, and hung task detection scenarios, sys_info() can
> >> be called multiple times or alongside direct backtrace triggers like
> >> trigger_allbutcpu_cpu_backtrace(). This results in identical backtraces
> >> being dumped repeatedly from all CPUs, cluttering the kernel log and
> >> delaying or obscuring critical debug details.
> >> 
> >> Introduce a state tracking bitmask and associated helpers:
> >> - sys_info_done(mask): Marks specific sys_info bits as already printed.
> >> - sys_info_reset(): Resets the tracking state.
> >> - sys_info_is_done(mask): Checks if all bits in the mask have been
> >printed.
> >> 
> >> Update sys_info() to automatically filter out already printed bits
> >> using this state. Integrate these APIs with the generic hardlockup
> >> and softlockup watchdogs, the PowerPC watchdog, the hung task detector,
> >> and the panic core. This ensures that each piece of system information
> >> and backtrace output is printed at most once per lockup/panic event,
> >> and the state is reset cleanly when a lockup does not trigger a panic.
> >> 
> >> Races between sys_info() callers are ignored. It should be acceptable
> >> because the output from various watchdogs has never been synchronized.
> >> And panic() never returns.
> >> 
> >> Assisted-by: gemini-1.5-flash
> >> Signed-off-by: Petr Mladek <pmladek@suse.com>
> >
> >Yep. There are cases that people want panic on task-hung or sw/hw lockup,
> >and this could remove much duplication of sys info dump, thanks!
> >
> >Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com>
> 
> Thanks,
> 
> im feeling a new file to do all the force panic jazz, but putting tape
> on sys_info.c isn't bd either.

I wonder how to move forward with this.

Honestly, I am not sure what exactly you mean by creating another
API for tracking the reports so I could not judge it. Feel free
to sent some POC.

Otherwise, I would go with my proposal to remember the printed states
by the sys_info API. I am not sure whether I should send a proper
patch or you would like to somehow improve it.

Best Regards,
Petr

  reply	other threads:[~2026-07-02  9:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 15:25 [PATCH v3 0/4] sys_info: prevent duplicate backtraces Bradley Morgan
2026-06-25 15:25 ` [PATCH v3 1/4] sys_info: add helper for callers that print some sys_info on their own Bradley Morgan
2026-06-25 15:25 ` [PATCH v3 2/4] watchdog: use sys_info_with_filter() to avoid duplicate backtraces Bradley Morgan
2026-06-25 15:25 ` [PATCH v3 3/4] powerpc/watchdog: " Bradley Morgan
2026-06-26  9:42   ` Petr Mladek
2026-06-25 15:25 ` [PATCH v3 4/4] panic: " Bradley Morgan
2026-06-26 10:23   ` Petr Mladek
2026-06-26 10:27     ` Bradley Morgan
2026-06-26 12:06     ` Feng Tang
2026-06-26 12:14     ` Petr Mladek
2026-06-26 12:17       ` Bradley Morgan
2026-06-26 12:32         ` Bradley Morgan
2026-06-26 14:26           ` Petr Mladek
2026-06-26 14:35             ` Bradley Morgan
2026-06-26 14:47               ` Petr Mladek
2026-06-26 14:58                 ` Bradley Morgan
2026-06-29 11:40       ` Feng Tang
2026-06-29 12:54         ` Bradley Morgan
2026-07-02  9:09           ` Petr Mladek [this message]
2026-07-02 18:13             ` Bradley Morgan
2026-07-02 18:22               ` Bradley Morgan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=akYq1YaCpZ0b4SBS@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=chleroy@kernel.org \
    --cc=dianders@chromium.org \
    --cc=feng.tang@linux.alibaba.com \
    --cc=include@grrlz.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.