From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
Andrew Morton <akpm@linux-foundation.org>,
Feng Tang <feng.tang@linux.alibaba.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] panic: remove redundant panic-cpu backtrace
Date: Tue, 12 Aug 2025 15:03:35 +0200 [thread overview]
Message-ID: <aJs7p_UjPIfb_XYd@pathway> (raw)
In-Reply-To: <84pldghkho.fsf@jogness.linutronix.de>
On Thu 2025-07-31 09:51:07, John Ogness wrote:
> On 2025-07-31, Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> > On (25/07/31 09:15), John Ogness wrote:
> >> On 2025-07-31, Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> >> > SYS_INFO_ALL_CPU_BT sends NMI backtrace request to
> >> > all CPUs, which dumps an extra backtrace on panic CPU.
> >>
> >> Isn't this only true if CONFIG_DEBUG_BUGVERBOSE=y?
> >
> > Are you referring to vpanic()->dump_stack()?
>
> Yes.
>
> > Another way to get backtrace on panic CPU is via BUG(), which routes
> > through die()->__die_body(), which prints registers, stack trace,
> > and so on, before it calls into panic(). This might be x86 specific,
> > though.
>
> So in that case you see 2 stack traces if CONFIG_DEBUG_BUGVERBOSE=y?
>
> >> Also, the information is not the same. trigger_all_cpu_backtrace() will
> >> also dump the registers. For CONFIG_DEBUG_BUGVERBOSE=y on the panic CPU,
> >> only the stack is dumped.
IMHO, this is actually not true, see the following code:
void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
int exclude_cpu,
void (*raise)(cpumask_t *mask))
{
[...]
/*
* Don't try to send an NMI to this cpu; it may work on some
* architectures, but on others it may not, and we'll get
* information at least as useful just by doing a dump_stack() here.
* Note that nmi_cpu_backtrace(NULL) will clear the cpu bit.
*/
if (cpumask_test_cpu(this_cpu, to_cpumask(backtrace_mask)))
nmi_cpu_backtrace(NULL);
[...]
}
, where
bool nmi_cpu_backtrace(struct pt_regs *regs)
{
[...]
if (regs)
show_regs(regs);
else
dump_stack();
[...]
}
So, I think that the following patch should not make it worse:
diff --git a/kernel/panic.c b/kernel/panic.c
index 72fcbb5a071b..dfbfe1ce7bfc 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -67,6 +67,7 @@ static unsigned int warn_limit __read_mostly;
static bool panic_console_replay;
bool panic_triggering_all_cpu_backtrace;
+bool panic_this_cpu_backtrace_printed;
int panic_timeout = CONFIG_PANIC_TIMEOUT;
EXPORT_SYMBOL_GPL(panic_timeout);
@@ -328,6 +329,22 @@ void check_panic_on_warn(const char *origin)
origin, limit);
}
+static void panic_trigger_all_cpu_backtraces(void)
+{
+ /* Temporary allow non-panic CPUs to write their backtraces. */
+ panic_triggering_all_cpu_backtrace = true;
+
+ if (panic_this_cpu_backtrace_printed) {
+ int this_cpu = raw_smp_processor_id();
+
+ trigger_allbutcpu_cpu_backtrace(this_cpu);
+ } else {
+ trigger_all_cpu_backtrace();
+ }
+
+ panic_triggering_all_cpu_backtrace = false;
+}
+
/*
* Helper that triggers the NMI backtrace (if set in panic_print)
* and then performs the secondary CPUs shutdown - we cannot have
@@ -335,12 +352,8 @@ void check_panic_on_warn(const char *origin)
*/
static void panic_other_cpus_shutdown(bool crash_kexec)
{
- if (panic_print & SYS_INFO_ALL_CPU_BT) {
- /* Temporary allow non-panic CPUs to write their backtraces. */
- panic_triggering_all_cpu_backtrace = true;
- trigger_all_cpu_backtrace();
- panic_triggering_all_cpu_backtrace = false;
- }
+ if (panic_print & SYS_INFO_ALL_CPU_BT)
+ panic_trigger_all_cpu_backtraces();
/*
* Note that smp_send_stop() is the usual SMP shutdown function,
@@ -422,13 +435,15 @@ void vpanic(const char *fmt, va_list args)
buf[len - 1] = '\0';
pr_emerg("Kernel panic - not syncing: %s\n", buf);
-#ifdef CONFIG_DEBUG_BUGVERBOSE
/*
* Avoid nested stack-dumping if a panic occurs during oops processing
*/
- if (!test_taint(TAINT_DIE) && oops_in_progress <= 1)
+ if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
+ panic_this_cpu_backtrace_printed = true;
+ } else if (IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE)) {
dump_stack();
-#endif
+ panic_this_cpu_backtrace_printed = true;
+ }
/*
* If kgdb is enabled, give it a chance to run before we stop all
> > Hmm, it's getting complicated, probably isn't worth it then.
>
> I think it is worth cleaning up, but it probably won't be such a simple
> fix. All call paths of redundant stack trace printing should be
> identified and then we can decide on a clean solution.
I feel that the check
+ if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
is kind of a hack. It would be nice to make it cleaner. But I am not
sure how complicated it would be.
Anyway, I think that storing the information, whether the backtrace
was printed or not, into a global variable, is a step in the right
direction.
Best Regards,
Petr
next prev parent reply other threads:[~2025-08-12 13:03 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-31 3:02 [PATCH] panic: remove redundant panic-cpu backtrace Sergey Senozhatsky
2025-07-31 7:09 ` John Ogness
2025-07-31 7:32 ` Sergey Senozhatsky
2025-07-31 7:45 ` John Ogness
2025-07-31 9:51 ` Sergey Senozhatsky
2025-08-12 13:03 ` Petr Mladek [this message]
2025-08-13 6:36 ` Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aJs7p_UjPIfb_XYd@pathway \
--to=pmladek@suse.com \
--cc=akpm@linux-foundation.org \
--cc=feng.tang@linux.alibaba.com \
--cc=john.ogness@linutronix.de \
--cc=linux-kernel@vger.kernel.org \
--cc=senozhatsky@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.