* [PATCH v2 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic
2021-11-25 18:09 [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Hari Bathini
@ 2021-11-25 18:09 ` Hari Bathini
2021-11-26 23:22 ` [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option kernel test robot
1 sibling, 0 replies; 3+ messages in thread
From: Hari Bathini @ 2021-11-25 18:09 UTC (permalink / raw)
To: mpe, linuxppc-dev, npiggin; +Cc: Hari Bathini, mahesh, sourabhjain
In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:
# crash vmlinux vmcore
...
KERNEL: vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 1
DATE: Wed Nov 10 09:56:34 EST 2021
UPTIME: 00:00:42
LOAD AVERAGE: 2.27, 0.69, 0.24
TASKS: 183
NODENAME: XXXXXXXXX
RELEASE: 5.15.0+
VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
MACHINE: ppc64le (2500 Mhz)
MEMORY: 8 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash"
PID: 3394
COMMAND: "bash"
TASK: c0000000150a5f80 [THREAD_INFO: c0000000150a5f80]
CPU: 1
STATE: TASK_RUNNING (PANIC)
crash> p -x __cpu_online_mask
__cpu_online_mask = $1 = {
bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
}
crash>
crash>
crash> p -x __cpu_active_mask
__cpu_active_mask = $2 = {
bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
}
crash>
While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:
- In general, the bulk of the vmcores analyzed were from crash
due to exception.
- The above did change since commit 8341f2f222d7 ("sysrq: Use
panic() to force a crash") started using panic() instead of
deferencing NULL pointer to force a kernel crash. But then
commit de6e5d38417e ("powerpc: smp_send_stop do not offline
stopped CPUs") stopped marking CPUs as offline till kernel
commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
reverted that change.
To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---
Changes in v2:
* Let panic(0 take crash friendly path when fadump is enabled
using crash_kexec_post_notifiers option.
arch/powerpc/kernel/fadump.c | 8 ++++++++
arch/powerpc/kernel/smp.c | 10 ++++++++++
2 files changed, 18 insertions(+)
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b7ceb041743c..60f5fc14aa23 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1641,6 +1641,14 @@ int __init setup_fadump(void)
else if (fw_dump.reserve_dump_area_size)
fw_dump.ops->fadump_init_mem_struct(&fw_dump);
+ /*
+ * In case of panic, fadump is triggered via ppc_panic_event()
+ * panic notifier. Setting crash_kexec_post_notifiers to 'true'
+ * lets panic() function take crash friendly path before panic
+ * notifiers are invoked.
+ */
+ crash_kexec_post_notifiers = true;
+
return 1;
}
subsys_initcall(setup_fadump);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index d34e6b67684c..00a52b6e3888 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -61,6 +61,7 @@
#include <asm/cpu_has_feature.h>
#include <asm/ftrace.h>
#include <asm/kup.h>
+#include <asm/fadump.h>
#ifdef DEBUG
#include <asm/udbg.h>
@@ -634,6 +635,15 @@ void crash_smp_send_stop(void)
{
static bool stopped = false;
+ /*
+ * In case of fadump, register data for all CPUs is captured by f/w
+ * on ibm,os-term rtas call. Skip IPI callbacks to other CPUs before
+ * this rtas call to avoid tricky post processing of those CPUs'
+ * backtraces.
+ */
+ if (should_fadump_crash())
+ return;
+
if (stopped)
return;
--
2.33.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
2021-11-25 18:09 [PATCH v2 1/2] powerpc: handle kdump appropriately with crash_kexec_post_notifiers option Hari Bathini
2021-11-25 18:09 ` [PATCH v2 2/2] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic Hari Bathini
@ 2021-11-26 23:22 ` kernel test robot
1 sibling, 0 replies; 3+ messages in thread
From: kernel test robot @ 2021-11-26 23:22 UTC (permalink / raw)
To: Hari Bathini, mpe, linuxppc-dev, npiggin
Cc: sourabhjain, kbuild-all, mahesh, Hari Bathini
Hi Hari,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.16-rc2 next-20211126]
[cannot apply to mpe/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Hari-Bathini/powerpc-handle-kdump-appropriately-with-crash_kexec_post_notifiers-option/20211126-021120
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-iss476-smp_defconfig (https://download.01.org/0day-ci/archive/20211127/202111270740.T4QBMa4L-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/112b5fcac650e78c2130b7f43ef66d965e69623e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Hari-Bathini/powerpc-handle-kdump-appropriately-with-crash_kexec_post_notifiers-option/20211126-021120
git checkout 112b5fcac650e78c2130b7f43ef66d965e69623e
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=powerpc SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
arch/powerpc/kernel/smp.c: In function 'crash_smp_send_stop':
>> arch/powerpc/kernel/smp.c:645:27: error: passing argument 1 of 'smp_call_function' from incompatible pointer type [-Werror=incompatible-pointer-types]
645 | smp_call_function(crash_stop_this_cpu, NULL, 0);
| ^~~~~~~~~~~~~~~~~~~
| |
| void (*)(struct pt_regs *)
In file included from include/linux/lockdep.h:14,
from include/linux/rcupdate.h:29,
from include/linux/rculist.h:11,
from include/linux/pid.h:5,
from include/linux/sched.h:14,
from include/linux/sched/mm.h:7,
from arch/powerpc/kernel/smp.c:18:
include/linux/smp.h:149:40: note: expected 'smp_call_func_t' {aka 'void (*)(void *)'} but argument is of type 'void (*)(struct pt_regs *)'
149 | void smp_call_function(smp_call_func_t func, void *info, int wait);
| ~~~~~~~~~~~~~~~~^~~~
cc1: all warnings being treated as errors
Kconfig warnings: (for reference only)
WARNING: unmet direct dependencies detected for HOTPLUG_CPU
Depends on SMP && (PPC_PSERIES || PPC_PMAC || PPC_POWERNV || FSL_SOC_BOOKE
Selected by
- PM_SLEEP_SMP && SMP && (ARCH_SUSPEND_POSSIBLE || ARCH_HIBERNATION_POSSIBLE && PM_SLEEP
vim +/smp_call_function +645 arch/powerpc/kernel/smp.c
632
633 void crash_smp_send_stop(void)
634 {
635 static bool stopped = false;
636
637 if (stopped)
638 return;
639
640 stopped = true;
641
642 #ifdef CONFIG_NMI_IPI
643 smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_stop_this_cpu, 1000000);
644 #else
> 645 smp_call_function(crash_stop_this_cpu, NULL, 0);
646 #endif /* CONFIG_NMI_IPI */
647 }
648
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply [flat|nested] 3+ messages in thread