public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Sean Christopherson <seanjc@google.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org,
	"Guilherme G . Piccoli" <gpiccoli@igalia.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>
Subject: Re: [PATCH 1/2] x86/crash: Disable virt in core NMI crash handler to avoid double list_add
Date: Thu, 12 May 2022 12:51:08 +0200	[thread overview]
Message-ID: <87wnervxb7.ffs@tglx> (raw)
In-Reply-To: <20220511234332.3654455-2-seanjc@google.com>

Sean,

On Wed, May 11 2022 at 23:43, Sean Christopherson wrote:
> @@ -840,6 +858,20 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
>  	unsigned long msecs;
>  	local_irq_disable();
>  
> +	/*
> +	 * Invoking multiple callbacks is not currently supported, registering
> +	 * the NMI handler twice will cause a list_add() double add BUG().
> +	 * The exception is the "nop" handler in the emergency reboot path,
> +	 * which can run after e.g. kdump's shootdown.  Do nothing if the crash
> +	 * handler has already run, i.e. has already prepared other CPUs, the
> +	 * reboot path doesn't have any work of its to do, it just needs to
> +	 * ensure all CPUs have prepared for reboot.

This is confusing at best. The double list add is just one part of the
problem, which would be trivial enough to fix.

The real point is that after the first shoot down all other CPUs are
stuck in crash_nmi_callback() and won't respond to the second NMI.

So trying to run this twice is completely pointless and guaranteed to
run into the timeout.

> +	 */
> +	if (shootdown_callback) {
> +		WARN_ON_ONCE(callback != nmi_shootdown_nop);
> +		return;

Instead of playing games with the callback pointer, I prefer to make
this all explicit. Delta patch below.

Thanks,

        tglx
---
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -528,10 +528,7 @@ static inline void kb_wait(void)
 	}
 }
 
-static void nmi_shootdown_nop(int cpu, struct pt_regs *regs)
-{
-	/* Nothing to do, the NMI shootdown handler disables virtualization. */
-}
+static inline void nmi_shootdown_cpus_on_restart(void);
 
 /* Use NMIs as IPIs to tell all CPUs to disable virtualization */
 static void emergency_vmx_disable_all(void)
@@ -554,7 +551,7 @@ static void emergency_vmx_disable_all(vo
 		__cpu_emergency_vmxoff();
 
 		/* Halt and exit VMX root operation on the other CPUs. */
-		nmi_shootdown_cpus(nmi_shootdown_nop);
+		nmi_shootdown_cpus_on_restart();
 	}
 }
 
@@ -829,7 +826,8 @@ static int crash_nmi_callback(unsigned i
 		return NMI_HANDLED;
 	local_irq_disable();
 
-	shootdown_callback(cpu, regs);
+	if (shootdown_callback)
+		shootdown_callback(cpu, regs);
 
 	/*
 	 * Prepare the CPU for reboot _after_ invoking the callback so that the
@@ -846,31 +844,30 @@ static int crash_nmi_callback(unsigned i
 	return NMI_HANDLED;
 }
 
-/*
- * Halt all other CPUs, calling the specified function on each of them
+/**
+ * nmi_shootdown_cpus - Stop other CPUs via NMI
+ * @callback:	Optional callback to be invoked from the NMI handler
  *
- * This function can be used to halt all other CPUs on crash
- * or emergency reboot time. The function passed as parameter
- * will be called inside a NMI handler on all CPUs.
+ * The NMI handler on the remote CPUs invokes @callback, if not
+ * NULL, first and then disables virtualization to ensure that
+ * INIT is recognized during reboot.
+ *
+ * nmi_shootdown_cpus() can only be invoked once. After the first
+ * invocation all other CPUs are stuck in crash_nmi_callback() and
+ * cannot respond to a second NMI.
  */
 void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
 	unsigned long msecs;
+
 	local_irq_disable();
 
 	/*
-	 * Invoking multiple callbacks is not currently supported, registering
-	 * the NMI handler twice will cause a list_add() double add BUG().
-	 * The exception is the "nop" handler in the emergency reboot path,
-	 * which can run after e.g. kdump's shootdown.  Do nothing if the crash
-	 * handler has already run, i.e. has already prepared other CPUs, the
-	 * reboot path doesn't have any work of its to do, it just needs to
-	 * ensure all CPUs have prepared for reboot.
+	 * Aside of being pointless this would register the NMI handler
+	 * twice causing list corruption.
 	 */
-	if (shootdown_callback) {
-		WARN_ON_ONCE(callback != nmi_shootdown_nop);
+	if (WARN_ON_ONCE(crash_ipi_issued))
 		return;
-	}
 
 	/* Make a note of crashing cpu. Will be used in NMI callback. */
 	crashing_cpu = safe_smp_processor_id();
@@ -902,6 +899,12 @@ void nmi_shootdown_cpus(nmi_shootdown_cb
 	/* Leave the nmi callback set */
 }
 
+static inline void nmi_shootdown_cpus_on_restart(void)
+{
+	if (!crash_ipi_issued)
+		nmi_shootdown_cpus(NULL);
+}
+
 /*
  * Check if the crash dumping IPI got issued and if so, call its callback
  * directly. This function is used when we have already been in NMI handler.
@@ -929,6 +932,8 @@ void nmi_shootdown_cpus(nmi_shootdown_cb
 	/* No other CPUs to shoot down */
 }
 
+static inline void nmi_shootdown_cpus_on_restart(void) { }
+
 void run_crash_ipi_callback(struct pt_regs *regs)
 {
 }

  parent reply	other threads:[~2022-05-12 10:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-11 23:43 [PATCH 0/2] x86/crash: Fix double list_add nmi_shootdown bug Sean Christopherson
2022-05-11 23:43 ` [PATCH 1/2] x86/crash: Disable virt in core NMI crash handler to avoid double list_add Sean Christopherson
2022-05-12  9:14   ` Vitaly Kuznetsov
2022-05-12 10:51   ` Thomas Gleixner [this message]
2022-05-12 14:14     ` Sean Christopherson
2022-05-12 14:35       ` Sean Christopherson
2022-05-12 15:48       ` Thomas Gleixner
2022-05-11 23:43 ` [PATCH 2/2] x86/reboot: Disable virtualization in an emergency if SVM is supported Sean Christopherson
2022-05-12  8:37   ` Vitaly Kuznetsov
2022-05-12 10:57   ` Thomas Gleixner
2022-05-12 14:39     ` Sean Christopherson
2022-05-12 15:47       ` Thomas Gleixner
2022-05-13 11:10 ` [PATCH] x86/nmi: Make register_nmi_handler() more robust Thomas Gleixner
2022-05-15 11:37   ` Thomas Gleixner
2022-05-15 11:39   ` [PATCH V2] " Thomas Gleixner
2022-05-17  7:34 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wnervxb7.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gpiccoli@igalia.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox