* [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
@ 2021-11-25 18:09 Hari Bathini
2021-11-25 18:09 ` [PATCH v2 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic Hari Bathini
2021-11-26 23:22 ` [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option kernel test robot
0 siblings, 2 replies; 3+ messages in thread
From: Hari Bathini @ 2021-11-25 18:09 UTC (permalink / raw)
To: mpe, linuxppc-dev, npiggin; +Cc: Hari Bathini, mahesh, sourabhjain
Kdump can be triggered after panic_notifers since commit f06e5153f4ae2
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump
after panic_notifers") introduced crash_kexec_post_notifiers option.
But using this option would mean smp_send_stop(), that marks all other
CPUs as offline, gets called before kdump is triggered. As a result,
kdump routines fail to save other CPUs' registers. To fix this, kdump
friendly crash_smp_send_stop() function was introduced with kernel
commit 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump
friendly version in panic path"). Override this kdump friendly weak
function to handle crash_kexec_post_notifiers option appropriately
on powerpc.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
* New patch to handle the case where kdump is triggered after
panic notifiers.
arch/powerpc/kernel/smp.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index c23ee842c4c3..d34e6b67684c 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -620,6 +620,32 @@ void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
}
#endif
+static void crash_stop_this_cpu(struct pt_regs *regs)
+{
+ /*
+ * Just busy wait here and avoid marking CPU as offline to ensure
+ * register data of all these CPUs is captured appropriately.
+ */
+ while (1)
+ cpu_relax();
+}
+
+void crash_smp_send_stop(void)
+{
+ static bool stopped = false;
+
+ if (stopped)
+ return;
+
+ stopped = true;
+
+#ifdef CONFIG_NMI_IPI
+ smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_stop_this_cpu, 1000000);
+#else
+ smp_call_function(crash_stop_this_cpu, NULL, 0);
+#endif /* CONFIG_NMI_IPI */
+}
+
#ifdef CONFIG_NMI_IPI
static void nmi_stop_this_cpu(struct pt_regs *regs)
{
--
2.33.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH v2 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic
2021-11-25 18:09 [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Hari Bathini
@ 2021-11-25 18:09 ` Hari Bathini
2021-11-26 23:22 ` [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option kernel test robot
1 sibling, 0 replies; 3+ messages in thread
From: Hari Bathini @ 2021-11-25 18:09 UTC (permalink / raw)
To: mpe, linuxppc-dev, npiggin; +Cc: Hari Bathini, mahesh, sourabhjain
In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:
# crash vmlinux vmcore
...
KERNEL: vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 1
DATE: Wed Nov 10 09:56:34 EST 2021
UPTIME: 00:00:42
LOAD AVERAGE: 2.27, 0.69, 0.24
TASKS: 183
NODENAME: XXXXXXXXX
RELEASE: 5.15.0+
VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
MACHINE: ppc64le (2500 Mhz)
MEMORY: 8 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash"
PID: 3394
COMMAND: "bash"
TASK: c0000000150a5f80 [THREAD_INFO: c0000000150a5f80]
CPU: 1
STATE: TASK_RUNNING (PANIC)
crash> p -x __cpu_online_mask
__cpu_online_mask = $1 = {
bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
}
crash>
crash>
crash> p -x __cpu_active_mask
__cpu_active_mask = $2 = {
bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
}
crash>
While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:
- In general, the bulk of the vmcores analyzed were from crash
due to exception.
- The above did change since commit 8341f2f222d7 ("sysrq: Use
panic() to force a crash") started using panic() instead of
deferencing NULL pointer to force a kernel crash. But then
commit de6e5d38417e ("powerpc: smp_send_stop do not offline
stopped CPUs") stopped marking CPUs as offline till kernel
commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
reverted that change.
To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
Changes in v2:
* Let panic(0 take crash friendly path when fadump is enabled
using crash_kexec_post_notifiers option.
arch/powerpc/kernel/fadump.c | 8 ++++++++
arch/powerpc/kernel/smp.c | 10 ++++++++++
2 files changed, 18 insertions(+)
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b7ceb041743c..60f5fc14aa23 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1641,6 +1641,14 @@ int __init setup_fadump(void)
else if (fw_dump.reserve_dump_area_size)
fw_dump.ops->fadump_init_mem_struct(&fw_dump);
+ /*
+ * In case of panic, fadump is triggered via ppc_panic_event()
+ * panic notifier. Setting crash_kexec_post_notifiers to 'true'
+ * lets panic() function take crash friendly path before panic
+ * notifiers are invoked.
+ */
+ crash_kexec_post_notifiers = true;
+
return 1;
}
subsys_initcall(setup_fadump);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index d34e6b67684c..00a52b6e3888 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -61,6 +61,7 @@
#include <asm/cpu_has_feature.h>
#include <asm/ftrace.h>
#include <asm/kup.h>
+#include <asm/fadump.h>
#ifdef DEBUG
#include <asm/udbg.h>
@@ -634,6 +635,15 @@ void crash_smp_send_stop(void)
{
static bool stopped = false;
+ /*
+ * In case of fadump, register data for all CPUs is captured by f/w
+ * on ibm,os-term rtas call. Skip IPI callbacks to other CPUs before
+ * this rtas call to avoid tricky post processing of those CPUs'
+ * backtraces.
+ */
+ if (should_fadump_crash())
+ return;
+
if (stopped)
return;
--
2.33.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
2021-11-25 18:09 [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Hari Bathini
2021-11-25 18:09 ` [PATCH v2 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic Hari Bathini
@ 2021-11-26 23:22 ` kernel test robot
1 sibling, 0 replies; 3+ messages in thread
From: kernel test robot @ 2021-11-26 23:22 UTC (permalink / raw)
To: Hari Bathini, mpe, linuxppc-dev, npiggin
Cc: sourabhjain, kbuild-all, mahesh, Hari Bathini
Hi Hari,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.16-rc2 next-20211126]
[cannot apply to mpe/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Hari-Bathini/powerpc-handle-kdump-appropriately-with-crash_kexec_post_notifiers-option/20211126-021120
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-iss476-smp_defconfig (https://download.01.org/0day-ci/archive/20211127/202111270740.T4QBMa4L-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/112b5fcac650e78c2130b7f43ef66d965e69623e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Hari-Bathini/powerpc-handle-kdump-appropriately-with-crash_kexec_post_notifiers-option/20211126-021120
git checkout 112b5fcac650e78c2130b7f43ef66d965e69623e
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=powerpc SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
arch/powerpc/kernel/smp.c: In function 'crash_smp_send_stop':
>> arch/powerpc/kernel/smp.c:645:27: error: passing argument 1 of 'smp_call_function' from incompatible pointer type [-Werror=incompatible-pointer-types]
645 | smp_call_function(crash_stop_this_cpu, NULL, 0);
| ^~~~~~~~~~~~~~~~~~~
| |
| void (*)(struct pt_regs *)
In file included from include/linux/lockdep.h:14,
from include/linux/rcupdate.h:29,
from include/linux/rculist.h:11,
from include/linux/pid.h:5,
from include/linux/sched.h:14,
from include/linux/sched/mm.h:7,
from arch/powerpc/kernel/smp.c:18:
include/linux/smp.h:149:40: note: expected 'smp_call_func_t' {aka 'void (*)(void *)'} but argument is of type 'void (*)(struct pt_regs *)'
149 | void smp_call_function(smp_call_func_t func, void *info, int wait);
| ~~~~~~~~~~~~~~~~^~~~
cc1: all warnings being treated as errors
Kconfig warnings: (for reference only)
WARNING: unmet direct dependencies detected for HOTPLUG_CPU
Depends on SMP && (PPC_PSERIES || PPC_PMAC || PPC_POWERNV || FSL_SOC_BOOKE
Selected by
- PM_SLEEP_SMP && SMP && (ARCH_SUSPEND_POSSIBLE || ARCH_HIBERNATION_POSSIBLE && PM_SLEEP
vim +/smp_call_function +645 arch/powerpc/kernel/smp.c
632
633 void crash_smp_send_stop(void)
634 {
635 static bool stopped = false;
636
637 if (stopped)
638 return;
639
640 stopped = true;
641
642 #ifdef CONFIG_NMI_IPI
643 smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_stop_this_cpu, 1000000);
644 #else
> 645 smp_call_function(crash_stop_this_cpu, NULL, 0);
646 #endif /* CONFIG_NMI_IPI */
647 }
648
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-11-26 23:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-25 18:09 [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Hari Bathini
2021-11-25 18:09 ` [PATCH v2 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic Hari Bathini
2021-11-26 23:22 ` [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).