public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] lkdtm/bugs: add test for panic() with stuck secondary CPUs
@ 2023-08-31 10:10 Mark Rutland
  2023-08-31 12:45 ` Sumit Garg
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Mark Rutland @ 2023-08-31 10:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: dianders, keescook, mark.rutland, sumit.garg, swboyd

Upon a panic() the kernel will use either smp_send_stop() or
crash_smp_send_stop() to attempt to stop secondary CPUs via an IPI,
which may or may not be an NMI. Generally it's preferable that this is an
NMI so that CPUs can be stopped in as many situations as possible, but
it's not always possible to provide an NMI, and there are cases where
CPUs may be unable to handle the NMI regardless.

This patch adds a test for panic() where all other CPUs are stuck with
interrupts disabled, which can be used to check whether the kernel
gracefully handles CPUs failing to respond to a stop, and whe NMIs stops
work.

For example, on arm64 *without* an NMI, this results in:

| # echo PANIC_STOP_IRQOFF > /sys/kernel/debug/provoke-crash/DIRECT
| lkdtm: Performing direct entry PANIC_STOP_IRQOFF
| Kernel panic - not syncing: panic stop irqoff test
| CPU: 2 PID: 24 Comm: migration/2 Not tainted 6.5.0-rc3-00077-ge6c782389895-dirty #4
| Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
| Stopper: multi_cpu_stop+0x0/0x1a0 <- stop_machine_cpuslocked+0x158/0x1a4
| Call trace:
|  dump_backtrace+0x94/0xec
|  show_stack+0x18/0x24
|  dump_stack_lvl+0x74/0xc0
|  dump_stack+0x18/0x24
|  panic+0x358/0x3e8
|  lkdtm_PANIC+0x0/0x18
|  multi_cpu_stop+0x9c/0x1a0
|  cpu_stopper_thread+0x84/0x118
|  smpboot_thread_fn+0x224/0x248
|  kthread+0x114/0x118
|  ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| SMP: failed to stop secondary CPUs 0-3
| Kernel Offset: 0x401cf3490000 from 0xffff800080000000
| PHYS_OFFSET: 0x40000000
| CPU features: 0x00000000,68c167a1,cce6773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: panic stop irqoff test ]---

On arm64 *with* an NMI, this results in:

| # echo PANIC_STOP_IRQOFF > /sys/kernel/debug/provoke-crash/DIRECT
| lkdtm: Performing direct entry PANIC_STOP_IRQOFF
| Kernel panic - not syncing: panic stop irqoff test
| CPU: 1 PID: 19 Comm: migration/1 Not tainted 6.5.0-rc3-00077-ge6c782389895-dirty #4
| Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
| Stopper: multi_cpu_stop+0x0/0x1a0 <- stop_machine_cpuslocked+0x158/0x1a4
| Call trace:
|  dump_backtrace+0x94/0xec
|  show_stack+0x18/0x24
|  dump_stack_lvl+0x74/0xc0
|  dump_stack+0x18/0x24
|  panic+0x358/0x3e8
|  lkdtm_PANIC+0x0/0x18
|  multi_cpu_stop+0x9c/0x1a0
|  cpu_stopper_thread+0x84/0x118
|  smpboot_thread_fn+0x224/0x248
|  kthread+0x114/0x118
|  ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x55a9c0bc0000 from 0xffff800080000000
| PHYS_OFFSET: 0x40000000
| CPU features: 0x00000000,68c167a1,fce6773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: panic stop irqoff test ]---

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Stephen Boyd <swboyd@chromium.org
Cc: Sumit Garg <sumit.garg@linaro.org>
---
 drivers/misc/lkdtm/bugs.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

I've tested this with the arm64 NMI IPI patches:

  https://lore.kernel.org/linux-arm-kernel/20230830191314.1618136-1-dianders@chromium.org/

Specifically, with the patch that uses an NMI for IPI_CPU_STOP and
IPI_CPU_CRASH_STOP:

  https://lore.kernel.org/linux-arm-kernel/20230830121115.v12.5.Ifadbfd45b22c52edcb499034dd4783d096343260@changeid/

Mark.

diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c
index 3c95600ab2f71..368da8b83cd1c 100644
--- a/drivers/misc/lkdtm/bugs.c
+++ b/drivers/misc/lkdtm/bugs.c
@@ -6,12 +6,14 @@
  * test source files.
  */
 #include "lkdtm.h"
+#include <linux/cpu.h>
 #include <linux/list.h>
 #include <linux/sched.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/task_stack.h>
-#include <linux/uaccess.h>
 #include <linux/slab.h>
+#include <linux/stop_machine.h>
+#include <linux/uaccess.h>
 
 #if IS_ENABLED(CONFIG_X86_32) && !IS_ENABLED(CONFIG_UML)
 #include <asm/desc.h>
@@ -73,6 +75,30 @@ static void lkdtm_PANIC(void)
 	panic("dumptest");
 }
 
+static int panic_stop_irqoff_fn(void *arg)
+{
+	atomic_t *v = arg;
+
+	/*
+	 * Trigger the panic after all other CPUs have entered this function,
+	 * so that they are guaranteed to have IRQs disabled.
+	 */
+	if (atomic_inc_return(v) == num_online_cpus())
+		panic("panic stop irqoff test");
+
+	for (;;)
+		cpu_relax();
+}
+
+static void lkdtm_PANIC_STOP_IRQOFF(void)
+{
+	atomic_t v = ATOMIC_INIT(0);
+
+	cpus_read_lock();
+	stop_machine(panic_stop_irqoff_fn, &v, cpu_online_mask);
+	cpus_read_unlock();
+}
+
 static void lkdtm_BUG(void)
 {
 	BUG();
@@ -598,6 +624,7 @@ static noinline void lkdtm_CORRUPT_PAC(void)
 
 static struct crashtype crashtypes[] = {
 	CRASHTYPE(PANIC),
+	CRASHTYPE(PANIC_STOP_IRQOFF),
 	CRASHTYPE(BUG),
 	CRASHTYPE(WARNING),
 	CRASHTYPE(WARNING_MESSAGE),
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-09-21 17:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-31 10:10 [PATCH] lkdtm/bugs: add test for panic() with stuck secondary CPUs Mark Rutland
2023-08-31 12:45 ` Sumit Garg
2023-08-31 13:07   ` Mark Rutland
2023-08-31 13:16     ` Sumit Garg
2023-08-31 16:16 ` Doug Anderson
2023-09-21 16:03   ` Mark Rutland
2023-08-31 19:15 ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox