public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Doug Anderson <dianders@chromium.org>
Cc: linux-kernel@vger.kernel.org, keescook@chromium.org,
	sumit.garg@linaro.org, swboyd@chromium.org
Subject: Re: [PATCH] lkdtm/bugs: add test for panic() with stuck secondary CPUs
Date: Thu, 21 Sep 2023 17:03:18 +0100	[thread overview]
Message-ID: <ZQxpRj2ZayFSIZSo@FVFF77S0Q05N> (raw)
In-Reply-To: <CAD=FV=Wb0TRE1AEr=f7f-iGWrnF=Wabdde5XJW2XNjMpy+uJGQ@mail.gmail.com>

On Thu, Aug 31, 2023 at 09:16:31AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Thu, Aug 31, 2023 at 3:10 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > Upon a panic() the kernel will use either smp_send_stop() or
> > crash_smp_send_stop() to attempt to stop secondary CPUs via an IPI,
> > which may or may not be an NMI. Generally it's preferable that this is an
> > NMI so that CPUs can be stopped in as many situations as possible, but
> > it's not always possible to provide an NMI, and there are cases where
> > CPUs may be unable to handle the NMI regardless.
> >
> > This patch adds a test for panic() where all other CPUs are stuck with
> > interrupts disabled, which can be used to check whether the kernel
> > gracefully handles CPUs failing to respond to a stop, and whe NMIs stops
> > work.
> >
> > For example, on arm64 *without* an NMI, this results in:
> >
> > | # echo PANIC_STOP_IRQOFF > /sys/kernel/debug/provoke-crash/DIRECT
> > | lkdtm: Performing direct entry PANIC_STOP_IRQOFF
> > | Kernel panic - not syncing: panic stop irqoff test
> > | CPU: 2 PID: 24 Comm: migration/2 Not tainted 6.5.0-rc3-00077-ge6c782389895-dirty #4
> > | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > | Stopper: multi_cpu_stop+0x0/0x1a0 <- stop_machine_cpuslocked+0x158/0x1a4
> > | Call trace:
> > |  dump_backtrace+0x94/0xec
> > |  show_stack+0x18/0x24
> > |  dump_stack_lvl+0x74/0xc0
> > |  dump_stack+0x18/0x24
> > |  panic+0x358/0x3e8
> > |  lkdtm_PANIC+0x0/0x18
> > |  multi_cpu_stop+0x9c/0x1a0
> > |  cpu_stopper_thread+0x84/0x118
> > |  smpboot_thread_fn+0x224/0x248
> > |  kthread+0x114/0x118
> > |  ret_from_fork+0x10/0x20
> > | SMP: stopping secondary CPUs
> > | SMP: failed to stop secondary CPUs 0-3
> > | Kernel Offset: 0x401cf3490000 from 0xffff800080000000
> > | PHYS_OFFSET: 0x40000000
> > | CPU features: 0x00000000,68c167a1,cce6773f
> > | Memory Limit: none
> > | ---[ end Kernel panic - not syncing: panic stop irqoff test ]---
> >
> > On arm64 *with* an NMI, this results in:
> >
> > | # echo PANIC_STOP_IRQOFF > /sys/kernel/debug/provoke-crash/DIRECT
> > | lkdtm: Performing direct entry PANIC_STOP_IRQOFF
> > | Kernel panic - not syncing: panic stop irqoff test
> > | CPU: 1 PID: 19 Comm: migration/1 Not tainted 6.5.0-rc3-00077-ge6c782389895-dirty #4
> > | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > | Stopper: multi_cpu_stop+0x0/0x1a0 <- stop_machine_cpuslocked+0x158/0x1a4
> > | Call trace:
> > |  dump_backtrace+0x94/0xec
> > |  show_stack+0x18/0x24
> > |  dump_stack_lvl+0x74/0xc0
> > |  dump_stack+0x18/0x24
> > |  panic+0x358/0x3e8
> > |  lkdtm_PANIC+0x0/0x18
> > |  multi_cpu_stop+0x9c/0x1a0
> > |  cpu_stopper_thread+0x84/0x118
> > |  smpboot_thread_fn+0x224/0x248
> > |  kthread+0x114/0x118
> > |  ret_from_fork+0x10/0x20
> > | SMP: stopping secondary CPUs
> > | Kernel Offset: 0x55a9c0bc0000 from 0xffff800080000000
> > | PHYS_OFFSET: 0x40000000
> > | CPU features: 0x00000000,68c167a1,fce6773f
> > | Memory Limit: none
> > | ---[ end Kernel panic - not syncing: panic stop irqoff test ]---
> 
> Since Sumit didn't notice it and I had to use a "diff" tool to help me
> spot the difference in your two examples, maybe telegraph for the
> reader of the commit message that the important thing is the lack of
> "SMP: failed to stop secondary CPUs" in your example.

Sure; I've added explicit "note" lines to v2 for that.

> > +static void lkdtm_PANIC_STOP_IRQOFF(void)
> > +{
> > +       atomic_t v = ATOMIC_INIT(0);
> > +
> > +       cpus_read_lock();
> > +       stop_machine(panic_stop_irqoff_fn, &v, cpu_online_mask);
> 
> It broke my brain just a little bit that stop_machine() isn't backed
> by "IPI_CPU_STOP" since the point of this was to test the patch
> modifying how "IPI_CPU_STOP" worked. ;-) Assuming I didn't mess up, I
> guess stop_machine() is just used to get all the CPUs running with
> interrupts disabled and then the last CPU does a panic. The _panic_
> then tests "IPI_CPU_STOP" / "IPI_CPU_CRASH_STOP".

Yup; I had hoped that the comment within panic_stop_irqoff_fn() was sufficient
for that:

	/*
	 * Trigger the panic after all other CPUs have entered this function,
	 * so that they are guaranteed to have IRQs disabled.
	 */

I've made that a bit more explicit now:

	/*
	 * As stop_machine() disables interrupts, all CPUs within this function
	 * have interrupts disabled and cannot take a regular IPI.
	 *
	 * The last CPU which enters here will trigger a panic, and as all CPUs
	 * cannot take a regular IPI, we'll only be able to stop secondaries if
	 * smp_send_stop() or crash_smp_send_stop() uses an NMI.
	 */ 

> Probably this is obvious to everyone but me, so no action is needed
> unless you agree that it's confusing and can think of a good comment /
> commit message addition to help.

Hopefully the above is good enough?

I agree the stop_machine() naming is also somewhat confusing here, but I can't
do much about that.

> In any case, after Sumit's feedback is addressed:
> 
> Reviewed-by: Douglas Anderson <dianders@chromium.org>

Thanks!

Mark.

  reply	other threads:[~2023-09-21 17:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-31 10:10 [PATCH] lkdtm/bugs: add test for panic() with stuck secondary CPUs Mark Rutland
2023-08-31 12:45 ` Sumit Garg
2023-08-31 13:07   ` Mark Rutland
2023-08-31 13:16     ` Sumit Garg
2023-08-31 16:16 ` Doug Anderson
2023-09-21 16:03   ` Mark Rutland [this message]
2023-08-31 19:15 ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQxpRj2ZayFSIZSo@FVFF77S0Q05N \
    --to=mark.rutland@arm.com \
    --cc=dianders@chromium.org \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sumit.garg@linaro.org \
    --cc=swboyd@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox