linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
@ 2021-12-07 10:37 Hari Bathini
  2021-12-07 10:37 ` [PATCH v3 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic Hari Bathini
  2021-12-15  0:24 ` [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Michael Ellerman
  0 siblings, 2 replies; 3+ messages in thread
From: Hari Bathini @ 2021-12-07 10:37 UTC (permalink / raw)
  To: mpe, linuxppc-dev, npiggin
  Cc: Hari Bathini, mahesh, sourabhjain, kernel test robot

Kdump can be triggered after panic_notifers since commit f06e5153f4ae2
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump
after panic_notifers") introduced crash_kexec_post_notifiers option.
But using this option would mean smp_send_stop(), that marks all other
CPUs as offline, gets called before kdump is triggered. As a result,
kdump routines fail to save other CPUs' registers. To fix this, kdump
friendly crash_smp_send_stop() function was introduced with kernel
commit 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump
friendly version in panic path"). Override this kdump friendly weak
function to handle crash_kexec_post_notifiers option appropriately
on powerpc.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
[Fixed signature of crash_stop_this_cpu() - reported by lkp@intel.com]
Reported-by: kernel test robot <lkp@intel.com>
---
 arch/powerpc/kernel/smp.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index c23ee842c4c3..2d33c167b438 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -620,6 +620,36 @@ void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
 }
 #endif
 
+#ifdef CONFIG_NMI_IPI
+static void crash_stop_this_cpu(struct pt_regs *regs)
+#else
+static void crash_stop_this_cpu(void *dummy)
+#endif
+{
+	/*
+	 * Just busy wait here and avoid marking CPU as offline to ensure
+	 * register data is captured appropriately.
+	 */
+	while (1)
+		cpu_relax();
+}
+
+void crash_smp_send_stop(void)
+{
+	static bool stopped = false;
+
+	if (stopped)
+		return;
+
+	stopped = true;
+
+#ifdef CONFIG_NMI_IPI
+	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_stop_this_cpu, 1000000);
+#else
+	smp_call_function(crash_stop_this_cpu, NULL, 0);
+#endif /* CONFIG_NMI_IPI */
+}
+
 #ifdef CONFIG_NMI_IPI
 static void nmi_stop_this_cpu(struct pt_regs *regs)
 {
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v3 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic
  2021-12-07 10:37 [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Hari Bathini
@ 2021-12-07 10:37 ` Hari Bathini
  2021-12-15  0:24 ` [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Hari Bathini @ 2021-12-07 10:37 UTC (permalink / raw)
  To: mpe, linuxppc-dev, npiggin; +Cc: Hari Bathini, mahesh, sourabhjain

In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:

  # crash vmlinux vmcore
  ...
        KERNEL: vmlinux
      DUMPFILE: vmcore  [PARTIAL DUMP]
          CPUS: 1
          DATE: Wed Nov 10 09:56:34 EST 2021
        UPTIME: 00:00:42
  LOAD AVERAGE: 2.27, 0.69, 0.24
         TASKS: 183
      NODENAME: XXXXXXXXX
       RELEASE: 5.15.0+
       VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
       MACHINE: ppc64le  (2500 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 3394
       COMMAND: "bash"
          TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
           CPU: 1
         STATE: TASK_RUNNING (PANIC)

  crash> p -x __cpu_online_mask
  __cpu_online_mask = $1 = {
    bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>
  crash>
  crash> p -x __cpu_active_mask
  __cpu_active_mask = $2 = {
    bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

  - In general, the bulk of the vmcores analyzed were from crash
    due to exception.

  - The above did change since commit 8341f2f222d7 ("sysrq: Use
    panic() to force a crash") started using panic() instead of
    deferencing NULL pointer to force a kernel crash. But then
    commit de6e5d38417e ("powerpc: smp_send_stop do not offline
    stopped CPUs") stopped marking CPUs as offline till kernel
    commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
    reverted that change.

To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
 arch/powerpc/kernel/fadump.c |  8 ++++++++
 arch/powerpc/kernel/smp.c    | 10 ++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b7ceb041743c..60f5fc14aa23 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1641,6 +1641,14 @@ int __init setup_fadump(void)
 	else if (fw_dump.reserve_dump_area_size)
 		fw_dump.ops->fadump_init_mem_struct(&fw_dump);
 
+	/*
+	 * In case of panic, fadump is triggered via ppc_panic_event()
+	 * panic notifier. Setting crash_kexec_post_notifiers to 'true'
+	 * lets panic() function take crash friendly path before panic
+	 * notifiers are invoked.
+	 */
+	crash_kexec_post_notifiers = true;
+
 	return 1;
 }
 subsys_initcall(setup_fadump);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 2d33c167b438..10fb01837e6b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -61,6 +61,7 @@
 #include <asm/cpu_has_feature.h>
 #include <asm/ftrace.h>
 #include <asm/kup.h>
+#include <asm/fadump.h>
 
 #ifdef DEBUG
 #include <asm/udbg.h>
@@ -638,6 +639,15 @@ void crash_smp_send_stop(void)
 {
 	static bool stopped = false;
 
+	/*
+	 * In case of fadump, register data for all CPUs is captured by f/w
+	 * on ibm,os-term rtas call. Skip IPI callbacks to other CPUs before
+	 * this rtas call to avoid tricky post processing of those CPUs'
+	 * backtraces.
+	 */
+	if (should_fadump_crash())
+		return;
+
 	if (stopped)
 		return;
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
  2021-12-07 10:37 [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Hari Bathini
  2021-12-07 10:37 ` [PATCH v3 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic Hari Bathini
@ 2021-12-15  0:24 ` Michael Ellerman
  1 sibling, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2021-12-15  0:24 UTC (permalink / raw)
  To: mpe, linuxppc-dev, Hari Bathini, npiggin
  Cc: mahesh, sourabhjain, kernel test robot

On Tue, 7 Dec 2021 16:07:18 +0530, Hari Bathini wrote:
> Kdump can be triggered after panic_notifers since commit f06e5153f4ae2
> ("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump
> after panic_notifers") introduced crash_kexec_post_notifiers option.
> But using this option would mean smp_send_stop(), that marks all other
> CPUs as offline, gets called before kdump is triggered. As a result,
> kdump routines fail to save other CPUs' registers. To fix this, kdump
> friendly crash_smp_send_stop() function was introduced with kernel
> commit 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump
> friendly version in panic path"). Override this kdump friendly weak
> function to handle crash_kexec_post_notifiers option appropriately
> on powerpc.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
      https://git.kernel.org/powerpc/c/219572d2fc4135b5ce65c735d881787d48b10e71
[2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic
      https://git.kernel.org/powerpc/c/06e629c25daa519be620a8c17359ae8fc7a2e903

cheers

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-12-15  0:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-07 10:37 [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Hari Bathini
2021-12-07 10:37 ` [PATCH v3 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic Hari Bathini
2021-12-15  0:24 ` [PATCH v3 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).