linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic
@ 2021-11-10 19:01 Hari Bathini
  2021-11-11  6:14 ` Michael Ellerman
  0 siblings, 1 reply; 5+ messages in thread
From: Hari Bathini @ 2021-11-10 19:01 UTC (permalink / raw)
  To: mpe, linuxppc-dev; +Cc: Hari Bathini, sourabhjain, mahesh, npiggin

In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would show only the panic'ing CPU as online. Sample output of
crash-utility with such vmcore:

  # crash vmlinux vmcore
  ...
        KERNEL: vmlinux
      DUMPFILE: vmcore  [PARTIAL DUMP]
          CPUS: 1
          DATE: Wed Nov 10 09:56:34 EST 2021
        UPTIME: 00:00:42
  LOAD AVERAGE: 2.27, 0.69, 0.24
         TASKS: 183
      NODENAME: XXXXXXXXX
       RELEASE: 5.15.0+
       VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
       MACHINE: ppc64le  (2500 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 3394
       COMMAND: "bash"
          TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
           CPU: 1
         STATE: TASK_RUNNING (PANIC)

  crash> p -x __cpu_online_mask
  __cpu_online_mask = $1 = {
    bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>
  crash>
  crash> p -x __cpu_active_mask
  __cpu_active_mask = $2 = {
    bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

  - In general, the bulk of the vmcores analyzed were from crash
    due to exception.

  - The above did change since commit 8341f2f222d7 ("sysrq: Use
    panic() to force a crash") started using panic() instead of
    deferencing NULL pointer to force a kernel crash. But then
    commit de6e5d38417e ("powerpc: smp_send_stop do not offline
    stopped CPUs") stopped marking CPUs as offline till kernel
    commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
    reverted that change.

To avoid vmcore from showing only one CPU as online in panic path,
skip marking non panic'ing CPUs as offline while stopping them
during fadump crash.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/smp.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index c23ee842c4c3..20555d5d5966 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -61,6 +61,7 @@
 #include <asm/cpu_has_feature.h>
 #include <asm/ftrace.h>
 #include <asm/kup.h>
+#include <asm/fadump.h>
 
 #ifdef DEBUG
 #include <asm/udbg.h>
@@ -626,7 +627,8 @@ static void nmi_stop_this_cpu(struct pt_regs *regs)
 	/*
 	 * IRQs are already hard disabled by the smp_handle_nmi_ipi.
 	 */
-	set_cpu_online(smp_processor_id(), false);
+	if (!(oops_in_progress && should_fadump_crash()))
+		set_cpu_online(smp_processor_id(), false);
 
 	spin_begin();
 	while (1)
@@ -650,7 +652,8 @@ static void stop_this_cpu(void *dummy)
 	 * to know other CPUs are offline before it breaks locks to flush
 	 * printk buffers, in case we panic()ed while holding the lock.
 	 */
-	set_cpu_online(smp_processor_id(), false);
+	if (!(oops_in_progress && should_fadump_crash()))
+		set_cpu_online(smp_processor_id(), false);
 
 	spin_begin();
 	while (1)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-17  6:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-10 19:01 [PATCH] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic Hari Bathini
2021-11-11  6:14 ` Michael Ellerman
2021-11-11 12:06   ` Hari Bathini
2021-11-11 13:20     ` Nicholas Piggin
2021-11-17  6:09       ` Hari Bathini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).