* [PATCH v3 1/3] x86/crash: Disable virt in core NMI crash handler to avoid double shootdown
2022-11-14 23:34 [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
@ 2022-11-14 23:34 ` Sean Christopherson
2022-11-14 23:34 ` [PATCH v3 2/3] x86/reboot: Disable virtualization in an emergency if SVM is supported Sean Christopherson
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-11-14 23:34 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
Cc: H. Peter Anvin, linux-kernel, Guilherme G . Piccoli,
Vitaly Kuznetsov, Paolo Bonzini, Tom Lendacky,
Sean Christopherson
Disable virtualization in crash_nmi_callback() and rework the
emergency_vmx_disable_all() path to do an NMI shootdown if and only if a
shootdown has not already occurred. NMI crash shootdown fundamentally
can't support multiple invocations as responding CPUs are deliberately
put into halt state without unblocking NMIs. But, the emergency reboot
path doesn't have any work of its own, it simply cares about disabling
virtualization, i.e. so long as a shootdown occurred, emergency reboot
doesn't care who initiated the shootdown, or when.
If "crash_kexec_post_notifiers" is specified on the kernel command line,
panic() will invoke crash_smp_send_stop() and result in a second call to
nmi_shootdown_cpus() during native_machine_emergency_restart().
Invoke the callback _before_ disabling virtualization, as the current
VMCS needs to be cleared before doing VMXOFF. Note, this results in a
subtle change in ordering between disabling virtualization and stopping
Intel PT on the responding CPUs. While VMX and Intel PT do interact,
VMXOFF and writes to MSR_IA32_RTIT_CTL do not induce faults between one
another, which is all that matters when panicking.
Harden nmi_shootdown_cpus() against multiple invocations to try and
capture any such kernel bugs via a WARN instead of hanging the system
during a crash/dump, e.g. prior to the recent hardening of
register_nmi_handler(), re-registering the NMI handler would trigger a
double list_add() and hang the system if CONFIG_BUG_ON_DATA_CORRUPTION=y.
list_add double add: new=ffffffff82220800, prev=ffffffff8221cfe8, next=ffffffff82220800.
WARNING: CPU: 2 PID: 1319 at lib/list_debug.c:29 __list_add_valid+0x67/0x70
Call Trace:
__register_nmi_handler+0xcf/0x130
nmi_shootdown_cpus+0x39/0x90
native_machine_emergency_restart+0x1c9/0x1d0
panic+0x237/0x29b
Extract the disabling logic to a common helper to deduplicate code, and
to prepare for doing the shootdown in the emergency reboot path if SVM
is supported.
Note, prior to commit ed72736183c4 ("x86/reboot: Force all cpus to exit
VMX root if VMX is supported"), nmi_shootdown_cpus() was subtly protected
against a second invocation by a cpu_vmx_enabled() check as the kdump
handler would disable VMX if it ran first.
Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported)
Cc: stable@vger.kernel.org
Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/all/20220427224924.592546-2-gpiccoli@igalia.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
arch/x86/include/asm/reboot.h | 1 +
arch/x86/kernel/crash.c | 16 +--------
arch/x86/kernel/reboot.c | 66 ++++++++++++++++++++++++++++-------
3 files changed, 56 insertions(+), 27 deletions(-)
diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 04c17be9b5fd..8f2da36435a6 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,6 +25,7 @@ void __noreturn machine_real_restart(unsigned int type);
#define MRR_BIOS 0
#define MRR_APM 1
+void cpu_crash_disable_virtualization(void);
typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
void nmi_panic_self_stop(struct pt_regs *regs);
void nmi_shootdown_cpus(nmi_shootdown_cb callback);
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9730c88530fc..6257981ed837 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -81,15 +81,6 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
*/
cpu_crash_vmclear_loaded_vmcss();
- /* Disable VMX or SVM if needed.
- *
- * We need to disable virtualization on all CPUs.
- * Having VMX or SVM enabled on any CPU may break rebooting
- * after the kdump kernel has finished its task.
- */
- cpu_emergency_vmxoff();
- cpu_emergency_svm_disable();
-
/*
* Disable Intel PT to stop its logging
*/
@@ -148,12 +139,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
*/
cpu_crash_vmclear_loaded_vmcss();
- /* Booting kdump kernel with VMX or SVM enabled won't work,
- * because (among other limitations) we can't disable paging
- * with the virt flags.
- */
- cpu_emergency_vmxoff();
- cpu_emergency_svm_disable();
+ cpu_crash_disable_virtualization();
/*
* Disable Intel PT to stop its logging
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index c3636ea4aa71..f2655b78d73c 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -528,10 +528,7 @@ static inline void kb_wait(void)
}
}
-static void vmxoff_nmi(int cpu, struct pt_regs *regs)
-{
- cpu_emergency_vmxoff();
-}
+static inline void nmi_shootdown_cpus_on_restart(void);
/* Use NMIs as IPIs to tell all CPUs to disable virtualization */
static void emergency_vmx_disable_all(void)
@@ -554,7 +551,7 @@ static void emergency_vmx_disable_all(void)
__cpu_emergency_vmxoff();
/* Halt and exit VMX root operation on the other CPUs. */
- nmi_shootdown_cpus(vmxoff_nmi);
+ nmi_shootdown_cpus_on_restart();
}
}
@@ -802,6 +799,18 @@ static nmi_shootdown_cb shootdown_callback;
static atomic_t waiting_for_crash_ipi;
static int crash_ipi_issued;
+void cpu_crash_disable_virtualization(void)
+{
+ /*
+ * Disable virtualization, i.e. VMX or SVM, so that INIT is recognized
+ * during reboot. VMX blocks INIT if the CPU is post-VMXON, and SVM
+ * blocks INIT if GIF=0. Note, STGI #UDs if SVM isn't enabled, so it's
+ * easier to just disable SVM unconditionally.
+ */
+ cpu_emergency_vmxoff();
+ cpu_emergency_svm_disable();
+}
+
static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
{
int cpu;
@@ -817,7 +826,14 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
return NMI_HANDLED;
local_irq_disable();
- shootdown_callback(cpu, regs);
+ if (shootdown_callback)
+ shootdown_callback(cpu, regs);
+
+ /*
+ * Prepare the CPU for reboot _after_ invoking the callback so that the
+ * callback can safely use virtualization instructions, e.g. VMCLEAR.
+ */
+ cpu_crash_disable_virtualization();
atomic_dec(&waiting_for_crash_ipi);
/* Assume hlt works */
@@ -828,18 +844,32 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
return NMI_HANDLED;
}
-/*
- * Halt all other CPUs, calling the specified function on each of them
+/**
+ * nmi_shootdown_cpus - Stop other CPUs via NMI
+ * @callback: Optional callback to be invoked from the NMI handler
*
- * This function can be used to halt all other CPUs on crash
- * or emergency reboot time. The function passed as parameter
- * will be called inside a NMI handler on all CPUs.
+ * The NMI handler on the remote CPUs invokes @callback, if not
+ * NULL, first and then disables virtualization to ensure that
+ * INIT is recognized during reboot.
+ *
+ * nmi_shootdown_cpus() can only be invoked once. After the first
+ * invocation all other CPUs are stuck in crash_nmi_callback() and
+ * cannot respond to a second NMI.
*/
void nmi_shootdown_cpus(nmi_shootdown_cb callback)
{
unsigned long msecs;
+
local_irq_disable();
+ /*
+ * Avoid certain doom if a shootdown already occurred; re-registering
+ * the NMI handler will cause list corruption, modifying the callback
+ * will do who knows what, etc...
+ */
+ if (WARN_ON_ONCE(crash_ipi_issued))
+ return;
+
/* Make a note of crashing cpu. Will be used in NMI callback. */
crashing_cpu = safe_smp_processor_id();
@@ -867,7 +897,17 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
msecs--;
}
- /* Leave the nmi callback set */
+ /*
+ * Leave the nmi callback set, shootdown is a one-time thing. Clearing
+ * the callback could result in a NULL pointer dereference if a CPU
+ * (finally) responds after the timeout expires.
+ */
+}
+
+static inline void nmi_shootdown_cpus_on_restart(void)
+{
+ if (!crash_ipi_issued)
+ nmi_shootdown_cpus(NULL);
}
/*
@@ -897,6 +937,8 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
/* No other CPUs to shoot down */
}
+static inline void nmi_shootdown_cpus_on_restart(void) { }
+
void run_crash_ipi_callback(struct pt_regs *regs)
{
}
--
2.38.1.431.g37b22c650d-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v3 2/3] x86/reboot: Disable virtualization in an emergency if SVM is supported
2022-11-14 23:34 [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
2022-11-14 23:34 ` [PATCH v3 1/3] x86/crash: Disable virt in core NMI crash handler to avoid double shootdown Sean Christopherson
@ 2022-11-14 23:34 ` Sean Christopherson
2022-11-14 23:34 ` [PATCH v3 3/3] x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller Sean Christopherson
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-11-14 23:34 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
Cc: H. Peter Anvin, linux-kernel, Guilherme G . Piccoli,
Vitaly Kuznetsov, Paolo Bonzini, Tom Lendacky,
Sean Christopherson
Disable SVM on all CPUs via NMI shootdown during an emergency reboot.
Like VMX, SVM can block INIT and thus prevent bringing up other CPUs via
INIT-SIPI-SIPI.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kernel/reboot.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f2655b78d73c..973a1b2c61bf 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -530,27 +530,26 @@ static inline void kb_wait(void)
static inline void nmi_shootdown_cpus_on_restart(void);
-/* Use NMIs as IPIs to tell all CPUs to disable virtualization */
-static void emergency_vmx_disable_all(void)
+static void emergency_reboot_disable_virtualization(void)
{
/* Just make sure we won't change CPUs while doing this */
local_irq_disable();
/*
- * Disable VMX on all CPUs before rebooting, otherwise we risk hanging
- * the machine, because the CPU blocks INIT when it's in VMX root.
+ * Disable virtualization on all CPUs before rebooting to avoid hanging
+ * the system, as VMX and SVM block INIT when running in the host.
*
* We can't take any locks and we may be on an inconsistent state, so
- * use NMIs as IPIs to tell the other CPUs to exit VMX root and halt.
+ * use NMIs as IPIs to tell the other CPUs to disable VMX/SVM and halt.
*
- * Do the NMI shootdown even if VMX if off on _this_ CPU, as that
- * doesn't prevent a different CPU from being in VMX root operation.
+ * Do the NMI shootdown even if virtualization is off on _this_ CPU, as
+ * other CPUs may have virtualization enabled.
*/
- if (cpu_has_vmx()) {
- /* Safely force _this_ CPU out of VMX root operation. */
- __cpu_emergency_vmxoff();
+ if (cpu_has_vmx() || cpu_has_svm(NULL)) {
+ /* Safely force _this_ CPU out of VMX/SVM operation. */
+ cpu_crash_disable_virtualization();
- /* Halt and exit VMX root operation on the other CPUs. */
+ /* Disable VMX/SVM and halt on other CPUs. */
nmi_shootdown_cpus_on_restart();
}
}
@@ -587,7 +586,7 @@ static void native_machine_emergency_restart(void)
unsigned short mode;
if (reboot_emergency)
- emergency_vmx_disable_all();
+ emergency_reboot_disable_virtualization();
tboot_shutdown(TB_SHUTDOWN_REBOOT);
--
2.38.1.431.g37b22c650d-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v3 3/3] x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller
2022-11-14 23:34 [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
2022-11-14 23:34 ` [PATCH v3 1/3] x86/crash: Disable virt in core NMI crash handler to avoid double shootdown Sean Christopherson
2022-11-14 23:34 ` [PATCH v3 2/3] x86/reboot: Disable virtualization in an emergency if SVM is supported Sean Christopherson
@ 2022-11-14 23:34 ` Sean Christopherson
2022-11-15 0:32 ` [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug Andrew Cooper
2022-11-15 15:33 ` Guilherme G. Piccoli
4 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-11-14 23:34 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
Cc: H. Peter Anvin, linux-kernel, Guilherme G . Piccoli,
Vitaly Kuznetsov, Paolo Bonzini, Tom Lendacky,
Sean Christopherson
Fold __cpu_emergency_vmxoff() into cpu_emergency_vmxoff(), its sole
remaining caller, to discourage crash/emergency code from handling VMX
but not SVM. The behavior of VMX and SVM is so similar that it's highly
unlikely that there will be a scenario where VMX needs to be disabled,
but SVM can be left enabled. In other words, during a crash and/or
emergency reboot, funnel all virtualization teardown sequences through
cpu_crash_disable_virtualization().
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/virtext.h | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 8757078d4442..a9414ef5269e 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -60,22 +60,12 @@ static inline int cpu_vmx_enabled(void)
return __read_cr4() & X86_CR4_VMXE;
}
-/** Disable VMX if it is enabled on the current CPU
- *
- * You shouldn't call this if cpu_has_vmx() returns 0.
- */
-static inline void __cpu_emergency_vmxoff(void)
-{
- if (cpu_vmx_enabled())
- cpu_vmxoff();
-}
-
/** Disable VMX if it is supported and enabled on the current CPU
*/
static inline void cpu_emergency_vmxoff(void)
{
- if (cpu_has_vmx())
- __cpu_emergency_vmxoff();
+ if (cpu_has_vmx() && cpu_vmx_enabled())
+ cpu_vmxoff();
}
--
2.38.1.431.g37b22c650d-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug
2022-11-14 23:34 [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
` (2 preceding siblings ...)
2022-11-14 23:34 ` [PATCH v3 3/3] x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller Sean Christopherson
@ 2022-11-15 0:32 ` Andrew Cooper
2022-11-15 16:56 ` Sean Christopherson
2022-11-15 15:33 ` Guilherme G. Piccoli
4 siblings, 1 reply; 10+ messages in thread
From: Andrew Cooper @ 2022-11-15 0:32 UTC (permalink / raw)
To: Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86@kernel.org
Cc: H. Peter Anvin, linux-kernel@vger.kernel.org,
Guilherme G . Piccoli, Vitaly Kuznetsov, Paolo Bonzini,
Tom Lendacky, Andrew Cooper
On 14/11/2022 23:34, Sean Christopherson wrote:
> Tom,
>
> I Cc'd you this time around because the APM doesn't explicitly state that
> GIF is set when EFER.SVME is disabled. KVM's nSVM emulation does set GIF
> in this case, but it's not clear whether or not KVM is making up behavior.
> If clearing EFER.SVME doesn't set GIF, then patch 1 needs to be modified
> to try STGI before clearing EFER.SVME, e.g. if a crash is initiated from
> KVM between CLGI and STGI. Responding CPUs are "safe" because GIF=0 also
> blocks NMIs, but the initiating CPU might leave GIF=0 when jumping into
> the new kernel.
GIF exists independently of EFER.SVME.
It is also gets set by the SKINIT instruction, which is why there is an
asymmetry on the #UD conditions of STGI and CLGI.
STGI is also intended to be used by the DLME once critical
initialisation is done, hence why it's dependent on EFER.SVME || SKINIT.
~Andrew
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug
2022-11-15 0:32 ` [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug Andrew Cooper
@ 2022-11-15 16:56 ` Sean Christopherson
2022-11-15 18:34 ` Tom Lendacky
2022-11-15 19:58 ` Andrew Cooper
0 siblings, 2 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-11-15 16:56 UTC (permalink / raw)
To: Andrew Cooper
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
x86@kernel.org, H. Peter Anvin, linux-kernel@vger.kernel.org,
Guilherme G . Piccoli, Vitaly Kuznetsov, Paolo Bonzini,
Tom Lendacky
On Tue, Nov 15, 2022, Andrew Cooper wrote:
> On 14/11/2022 23:34, Sean Christopherson wrote:
> > Tom,
> >
> > I Cc'd you this time around because the APM doesn't explicitly state that
> > GIF is set when EFER.SVME is disabled. KVM's nSVM emulation does set GIF
> > in this case, but it's not clear whether or not KVM is making up behavior.
> > If clearing EFER.SVME doesn't set GIF, then patch 1 needs to be modified
> > to try STGI before clearing EFER.SVME, e.g. if a crash is initiated from
> > KVM between CLGI and STGI. Responding CPUs are "safe" because GIF=0 also
> > blocks NMIs, but the initiating CPU might leave GIF=0 when jumping into
> > the new kernel.
>
> GIF exists independently of EFER.SVME.
>
> It is also gets set by the SKINIT instruction, which is why there is an
> asymmetry on the #UD conditions of STGI and CLGI.
>
> STGI is also intended to be used by the DLME once critical
> initialisation is done, hence why it's dependent on EFER.SVME || SKINIT.
Gah, stupid APM. The pseudocode states "EFER.SVME || CPUID.SKINIT", but the
description and the comment both say that SVM must be enabled to execute SKINIT,
which made me hope that disabling SVM would reset everything.
This instruction generates a #UD exception if SVM is not enabled. See “Enabling
SVM” in AMD64 Architecture Programmer’s Manual Volume 2: System Instructions,
order# 24593.
...
IF ((EFER.SVME == 0) && !(CPUID 8000_0001.ECX[SKINIT]) || (!PROTECTED_MODE))
EXCEPTION [#UD] // This instruction can only be executed
// in protected mode with SVM enabled
^^^^^^^^^^^
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug
2022-11-15 16:56 ` Sean Christopherson
@ 2022-11-15 18:34 ` Tom Lendacky
2022-11-15 19:58 ` Andrew Cooper
1 sibling, 0 replies; 10+ messages in thread
From: Tom Lendacky @ 2022-11-15 18:34 UTC (permalink / raw)
To: Sean Christopherson, Andrew Cooper
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
x86@kernel.org, H. Peter Anvin, linux-kernel@vger.kernel.org,
Guilherme G . Piccoli, Vitaly Kuznetsov, Paolo Bonzini
On 11/15/22 10:56, Sean Christopherson wrote:
> On Tue, Nov 15, 2022, Andrew Cooper wrote:
>> On 14/11/2022 23:34, Sean Christopherson wrote:
>>> Tom,
>>>
>>> I Cc'd you this time around because the APM doesn't explicitly state that
>>> GIF is set when EFER.SVME is disabled. KVM's nSVM emulation does set GIF
>>> in this case, but it's not clear whether or not KVM is making up behavior.
>>> If clearing EFER.SVME doesn't set GIF, then patch 1 needs to be modified
>>> to try STGI before clearing EFER.SVME, e.g. if a crash is initiated from
>>> KVM between CLGI and STGI. Responding CPUs are "safe" because GIF=0 also
>>> blocks NMIs, but the initiating CPU might leave GIF=0 when jumping into
>>> the new kernel.
>>
>> GIF exists independently of EFER.SVME.
>>
>> It is also gets set by the SKINIT instruction, which is why there is an
>> asymmetry on the #UD conditions of STGI and CLGI.
>>
>> STGI is also intended to be used by the DLME once critical
>> initialisation is done, hence why it's dependent on EFER.SVME || SKINIT.
>
> Gah, stupid APM. The pseudocode states "EFER.SVME || CPUID.SKINIT", but the
> description and the comment both say that SVM must be enabled to execute SKINIT,
> which made me hope that disabling SVM would reset everything.
>
> This instruction generates a #UD exception if SVM is not enabled. See “Enabling
> SVM” in AMD64 Architecture Programmer’s Manual Volume 2: System Instructions,
> order# 24593.
>
> ...
>
> IF ((EFER.SVME == 0) && !(CPUID 8000_0001.ECX[SKINIT]) || (!PROTECTED_MODE))
> EXCEPTION [#UD] // This instruction can only be executed
> // in protected mode with SVM enabled
> ^^^^^^^^^^^
I'll see if I can't get someone to update the psuedo-code comment here.
But, to your original question, as far as I understand, GIF is not
automatically set if SVME is cleared from EFER.
Thanks,
Tom
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug
2022-11-15 16:56 ` Sean Christopherson
2022-11-15 18:34 ` Tom Lendacky
@ 2022-11-15 19:58 ` Andrew Cooper
1 sibling, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2022-11-15 19:58 UTC (permalink / raw)
To: Sean Christopherson
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
x86@kernel.org, H. Peter Anvin, linux-kernel@vger.kernel.org,
Guilherme G . Piccoli, Vitaly Kuznetsov, Paolo Bonzini,
Tom Lendacky, Andrew Cooper
On 15/11/2022 16:56, Sean Christopherson wrote:
> On Tue, Nov 15, 2022, Andrew Cooper wrote:
>> On 14/11/2022 23:34, Sean Christopherson wrote:
>>> Tom,
>>>
>>> I Cc'd you this time around because the APM doesn't explicitly state that
>>> GIF is set when EFER.SVME is disabled. KVM's nSVM emulation does set GIF
>>> in this case, but it's not clear whether or not KVM is making up behavior.
>>> If clearing EFER.SVME doesn't set GIF, then patch 1 needs to be modified
>>> to try STGI before clearing EFER.SVME, e.g. if a crash is initiated from
>>> KVM between CLGI and STGI. Responding CPUs are "safe" because GIF=0 also
>>> blocks NMIs, but the initiating CPU might leave GIF=0 when jumping into
>>> the new kernel.
>> GIF exists independently of EFER.SVME.
>>
>> It is also gets set by the SKINIT instruction, which is why there is an
>> asymmetry on the #UD conditions of STGI and CLGI.
>>
>> STGI is also intended to be used by the DLME once critical
>> initialisation is done, hence why it's dependent on EFER.SVME || SKINIT.
> Gah, stupid APM. The pseudocode states "EFER.SVME || CPUID.SKINIT", but the
> description and the comment both say that SVM must be enabled to execute SKINIT,
> which made me hope that disabling SVM would reset everything.
>
> This instruction generates a #UD exception if SVM is not enabled. See “Enabling
> SVM” in AMD64 Architecture Programmer’s Manual Volume 2: System Instructions,
> order# 24593.
>
> ...
>
> IF ((EFER.SVME == 0) && !(CPUID 8000_0001.ECX[SKINIT]) || (!PROTECTED_MODE))
> EXCEPTION [#UD] // This instruction can only be executed
> // in protected mode with SVM enabled
> ^^^^^^^^^^^
/sigh - and there needs to be one extra set of brackets to remove the
ambiguity in that pseudocode.
I think it's tied up with the complexity of the VMM LOCK feature. Just
as with Intel and the SMX inside/outside VMX bits in
MSR_FEATURE_CONTROL, there's a requirement stated in the PPR/BKDG that
firmware offers the option to enable or disable SVM, where disable is
actually set the lock field.
VM_CR.SVM_LOCK can only be configured when EFER.SVME is clear, after
which SKINIT (and STGI) for DTRM is still usable, while general SVM isn't.
In this case, the CPU is operating with EFER.SVME=0 and a working(ish) GIF.
I find it curious that both Intel and AMD have VMM+DTRM technology that
overlap (feature wise), appeared at the same time in silicon, *and* have
slightly weird ways of force disabling VMM while keeping DTRM
operational. Funny how these two features have played out.
~Andrew
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug
2022-11-14 23:34 [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
` (3 preceding siblings ...)
2022-11-15 0:32 ` [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug Andrew Cooper
@ 2022-11-15 15:33 ` Guilherme G. Piccoli
2022-11-15 19:36 ` Sean Christopherson
4 siblings, 1 reply; 10+ messages in thread
From: Guilherme G. Piccoli @ 2022-11-15 15:33 UTC (permalink / raw)
To: Sean Christopherson, x86
Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Dave Hansen,
Borislav Petkov, linux-kernel, Vitaly Kuznetsov, Paolo Bonzini,
Tom Lendacky
On 14/11/2022 20:34, Sean Christopherson wrote:
> [...]
> v3:
> - Re-collect Guilherme's Tested-by.
> - Tweak comment in patch 1 to reference STGI instead of CLGI.
> - Celebrate this series' half-birthday.
Heheh
Thanks a lot for persisting with this Sean, much appreciated! I'm
surprised on how long is taking to get these _fixes_ merged in the
kernel, hence your effort is very valuable =)
Cheers,
Guilherme
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug
2022-11-15 15:33 ` Guilherme G. Piccoli
@ 2022-11-15 19:36 ` Sean Christopherson
0 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-11-15 19:36 UTC (permalink / raw)
To: Guilherme G. Piccoli
Cc: x86, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Dave Hansen,
Borislav Petkov, linux-kernel, Vitaly Kuznetsov, Paolo Bonzini,
Tom Lendacky
On Tue, Nov 15, 2022, Guilherme G. Piccoli wrote:
> On 14/11/2022 20:34, Sean Christopherson wrote:
> > [...]
> > v3:
> > - Re-collect Guilherme's Tested-by.
> > - Tweak comment in patch 1 to reference STGI instead of CLGI.
> > - Celebrate this series' half-birthday.
>
> Heheh
>
> Thanks a lot for persisting with this Sean, much appreciated! I'm
> surprised on how long is taking to get these _fixes_ merged in the
> kernel, hence your effort is very valuable =)
Well, to be fair, the fixes aren't perfect. Aside from the GIF thing, patch 2
breaks CONFIG_SMP=n.
I think there's another bug lurking too. The emergency reboot path doesn't
VMCLEAR VMCSes. AFAIK, Intel doesn't guarantee the VMCS caches are purged on
INIT, so if the reboot doesn't actually RESET CPUs, the new kernel could observe
memory corruption due to an old VMCS getting written back.
Argh, and I missed sysvec_reboot() + smp_stop_nmi_callback() for SVM support.
And slightly longer term, this entire mess can be cleaned up. Once KVM's handling
of VMX/SVM initialization sucks less[*], all of the disabling logic can be moved
into KVM callbacks and the kernel can stop speculatively trying to disable VMX/SVM.
I'll send a v4 to fix all of the suspected bugs, and then work on another series to
clean up the callbacks, which will have dependencies on both the kvm_init() rework
and this series.
[*] https://lore.kernel.org/all/20221102231911.3107438-1-seanjc@google.com
^ permalink raw reply [flat|nested] 10+ messages in thread