[PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly
@ 2026-02-09 19:51 Yosry Ahmed
  2026-02-09 19:51 ` [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2 Yosry Ahmed
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-09 19:51 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed

Add more graceful handling of L2 clearing EFER.SVME without L1
interception, which is architecturally undefined. Shutdown L1 instead of
running it with corrupted L2 state, and add a test to verify the new
behavior.

I did not CC stable on patch 1 because it's not technically a KVM bug,
but it would be nice to have it backported. Leaving the decision to
Sean.

Yosry Ahmed (2):
  KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2
  KVM: selftests: Add a test for L2 clearing EFER.SVME without intercept

 arch/x86/kvm/svm/svm.c                        | 11 ++++
 tools/testing/selftests/kvm/Makefile.kvm      |  1 +
 .../kvm/x86/svm_nested_clear_efer_svme.c      | 55 +++++++++++++++++++
 3 files changed, 67 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c


base-commit: e944fe2c09f405a2e2d147145c9b470084bc4c9a
-- 
2.53.0.rc2.204.g2597b5adb4-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2
  2026-02-09 19:51 [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly Yosry Ahmed
@ 2026-02-09 19:51 ` Yosry Ahmed
  2026-02-26 16:36   ` Yosry Ahmed
  2026-02-09 19:51 ` [PATCH v2 2/2] KVM: selftests: Add a test for L2 clearing EFER.SVME without intercept Yosry Ahmed
  2026-03-05 17:08 ` [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly Sean Christopherson
  2 siblings, 1 reply; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-09 19:51 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed

KVM tracks when EFER.SVME is set and cleared to initialize and tear down
nested state. However, it doesn't differentiate if EFER.SVME is getting
toggled in L1 or L2+. If L2 clears EFER.SVME, and L1 does not intercept
the EFER write, KVM exits guest mode and tears down nested state while
L2 is running, executing L1 without injecting a proper #VMEXIT.

According to the APM:

    The effect of turning off EFER.SVME while a guest is running is
    undefined; therefore, the VMM should always prevent guests from
    writing EFER.

Since the behavior is architecturally undefined, KVM gets to choose what
to do. Inject a triple fault into L1 as a more graceful option that
running L1 with corrupted state.

Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/svm.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5f0136dbdde6..ccd73a3be3f9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -216,6 +216,17 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 
 	if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
 		if (!(efer & EFER_SVME)) {
+			/*
+			 * Architecturally, clearing EFER.SVME while a guest is
+			 * running yields undefined behavior, i.e. KVM can do
+			 * literally anything.  Force the vCPU back into L1 as
+			 * that is the safest option for KVM, but synthesize a
+			 * triple fault (for L1!) so that KVM at least doesn't
+			 * run random L2 code in the context of L1.
+			 */
+			if (is_guest_mode(vcpu))
+				kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+
 			svm_leave_nested(vcpu);
 			/* #GP intercept is still needed for vmware backdoor */
 			if (!enable_vmware_backdoor)
-- 
2.53.0.rc2.204.g2597b5adb4-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] KVM: selftests: Add a test for L2 clearing EFER.SVME without intercept
  2026-02-09 19:51 [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly Yosry Ahmed
  2026-02-09 19:51 ` [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2 Yosry Ahmed
@ 2026-02-09 19:51 ` Yosry Ahmed
  2026-03-05 17:08 ` [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly Sean Christopherson
  2 siblings, 0 replies; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-09 19:51 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed

Add a test that verifies KVM's newly introduced behavior of synthesizing
a triple fault in L1 if L2 clears EFER.SVME without an L1 interception
(which is architecturally undefined).

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 tools/testing/selftests/kvm/Makefile.kvm      |  1 +
 .../kvm/x86/svm_nested_clear_efer_svme.c      | 55 +++++++++++++++++++
 2 files changed, 56 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 58eee0474db6..89ba5d6ee741 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -110,6 +110,7 @@ TEST_GEN_PROGS_x86 += x86/state_test
 TEST_GEN_PROGS_x86 += x86/vmx_preemption_timer_test
 TEST_GEN_PROGS_x86 += x86/svm_vmcall_test
 TEST_GEN_PROGS_x86 += x86/svm_int_ctl_test
+TEST_GEN_PROGS_x86 += x86/svm_nested_clear_efer_svme
 TEST_GEN_PROGS_x86 += x86/svm_nested_shutdown_test
 TEST_GEN_PROGS_x86 += x86/svm_nested_soft_inject_test
 TEST_GEN_PROGS_x86 += x86/tsc_scaling_sync
diff --git a/tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c b/tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c
new file mode 100644
index 000000000000..a521a9eed061
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2026, Google LLC.
+ */
+#include "kvm_util.h"
+#include "vmx.h"
+#include "svm_util.h"
+#include "kselftest.h"
+
+
+#define L2_GUEST_STACK_SIZE 64
+
+static void l2_guest_code(void)
+{
+	unsigned long efer = rdmsr(MSR_EFER);
+
+	/* generic_svm_setup() initializes EFER_SVME set for L2 */
+	GUEST_ASSERT(efer & EFER_SVME);
+	wrmsr(MSR_EFER, efer & ~EFER_SVME);
+
+	/* Unreachable, L1 should be shutdown */
+	GUEST_ASSERT(0);
+}
+
+static void l1_guest_code(struct svm_test_data *svm)
+{
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+
+	generic_svm_setup(svm, l2_guest_code,
+			  &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+	run_guest(svm->vmcb, svm->vmcb_gpa);
+
+	/* Unreachable, L1 should be shutdown */
+	GUEST_ASSERT(0);
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	vm_vaddr_t nested_gva = 0;
+
+	TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM));
+
+	vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
+
+	vcpu_alloc_svm(vm, &nested_gva);
+	vcpu_args_set(vcpu, 1, nested_gva);
+
+	vcpu_run(vcpu);
+	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_SHUTDOWN);
+
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.53.0.rc2.204.g2597b5adb4-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2
  2026-02-09 19:51 ` [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2 Yosry Ahmed
@ 2026-02-26 16:36   ` Yosry Ahmed
  2026-02-26 18:20     ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-26 16:36 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Sean Christopherson, Paolo Bonzini, kvm, linux-kernel

On Mon, Feb 09, 2026 at 07:51:41PM +0000, Yosry Ahmed wrote:
> KVM tracks when EFER.SVME is set and cleared to initialize and tear down
> nested state. However, it doesn't differentiate if EFER.SVME is getting
> toggled in L1 or L2+. If L2 clears EFER.SVME, and L1 does not intercept
> the EFER write, KVM exits guest mode and tears down nested state while
> L2 is running, executing L1 without injecting a proper #VMEXIT.
> 
> According to the APM:
> 
>     The effect of turning off EFER.SVME while a guest is running is
>     undefined; therefore, the VMM should always prevent guests from
>     writing EFER.
> 
> Since the behavior is architecturally undefined, KVM gets to choose what
> to do. Inject a triple fault into L1 as a more graceful option that
> running L1 with corrupted state.
> 
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/svm.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 5f0136dbdde6..ccd73a3be3f9 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -216,6 +216,17 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>  
>  	if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
>  		if (!(efer & EFER_SVME)) {
> +			/*
> +			 * Architecturally, clearing EFER.SVME while a guest is
> +			 * running yields undefined behavior, i.e. KVM can do
> +			 * literally anything.  Force the vCPU back into L1 as
> +			 * that is the safest option for KVM, but synthesize a
> +			 * triple fault (for L1!) so that KVM at least doesn't
> +			 * run random L2 code in the context of L1.
> +			 */
> +			if (is_guest_mode(vcpu))
> +				kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> +

Sigh, I think this is not correct in all cases:

1. If userspace restores a vCPU with EFER.SVME=0 to a vCPU with
EFER.SVME=1 (e.g. restoring a vCPU running to a vCPU running L2).
Typically KVM_SET_SREGS is done before KVM_SET_NESTED_STATE, so we may
set EFER.SVME = 0 before leaving guest mode.

2. On vCPU reset, we clear EFER. Hmm, this one is seemingly okay tho,
looking at kvm_vcpu_reset(), we leave nested first:

	/*
	 * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
	 * possible to INIT the vCPU while L2 is active.  Force the vCPU back
	 * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
	 * bits), i.e. virtualization is disabled.
	 */
	if (is_guest_mode(vcpu))
		kvm_leave_nested(vcpu);

	...

	kvm_x86_call(set_efer)(vcpu, 0);

So I think the only problematic case is (1). We can probably fix this by
plumbing host_initiated through set_efer? This is getting more
complicated than I would have liked..


>  			svm_leave_nested(vcpu);
>  			/* #GP intercept is still needed for vmware backdoor */
>  			if (!enable_vmware_backdoor)
> -- 
> 2.53.0.rc2.204.g2597b5adb4-goog
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2
  2026-02-26 16:36   ` Yosry Ahmed
@ 2026-02-26 18:20     ` Sean Christopherson
  2026-02-27 20:03       ` Yosry Ahmed
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2026-02-26 18:20 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Yosry Ahmed, Paolo Bonzini, kvm, linux-kernel

On Thu, Feb 26, 2026, Yosry Ahmed wrote:
> On Mon, Feb 09, 2026 at 07:51:41PM +0000, Yosry Ahmed wrote:
> > KVM tracks when EFER.SVME is set and cleared to initialize and tear down
> > nested state. However, it doesn't differentiate if EFER.SVME is getting
> > toggled in L1 or L2+. If L2 clears EFER.SVME, and L1 does not intercept
> > the EFER write, KVM exits guest mode and tears down nested state while
> > L2 is running, executing L1 without injecting a proper #VMEXIT.
> > 
> > According to the APM:
> > 
> >     The effect of turning off EFER.SVME while a guest is running is
> >     undefined; therefore, the VMM should always prevent guests from
> >     writing EFER.
> > 
> > Since the behavior is architecturally undefined, KVM gets to choose what
> > to do. Inject a triple fault into L1 as a more graceful option that
> > running L1 with corrupted state.
> > 
> > Co-developed-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> >  arch/x86/kvm/svm/svm.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 5f0136dbdde6..ccd73a3be3f9 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -216,6 +216,17 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
> >  
> >  	if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
> >  		if (!(efer & EFER_SVME)) {
> > +			/*
> > +			 * Architecturally, clearing EFER.SVME while a guest is
> > +			 * running yields undefined behavior, i.e. KVM can do
> > +			 * literally anything.  Force the vCPU back into L1 as
> > +			 * that is the safest option for KVM, but synthesize a
> > +			 * triple fault (for L1!) so that KVM at least doesn't
> > +			 * run random L2 code in the context of L1.
> > +			 */
> > +			if (is_guest_mode(vcpu))
> > +				kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> > +
> 
> Sigh, I think this is not correct in all cases:
> 
> 1. If userspace restores a vCPU with EFER.SVME=0 to a vCPU with
> EFER.SVME=1 (e.g. restoring a vCPU running to a vCPU running L2).
> Typically KVM_SET_SREGS is done before KVM_SET_NESTED_STATE, so we may
> set EFER.SVME = 0 before leaving guest mode.
> 
> 2. On vCPU reset, we clear EFER. Hmm, this one is seemingly okay tho,
> looking at kvm_vcpu_reset(), we leave nested first:
> 
> 	/*
> 	 * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
> 	 * possible to INIT the vCPU while L2 is active.  Force the vCPU back
> 	 * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
> 	 * bits), i.e. virtualization is disabled.
> 	 */
> 	if (is_guest_mode(vcpu))
> 		kvm_leave_nested(vcpu);
> 
> 	...
> 
> 	kvm_x86_call(set_efer)(vcpu, 0);
> 
> So I think the only problematic case is (1). We can probably fix this by
> plumbing host_initiated through set_efer? This is getting more
> complicated than I would have liked..

What if we instead hook WRMSR interception?  A little fugly (well, more than a
little), but I think it would minimize the chances of a false-positive.  The
biggest potential flaw I see is that this will incorrectly triple fault if KVM
synthesizes a #VMEXIT while emulating the WRMSR.  But that really shouldn't
happen, because even a #GP=>#VMEXIT needs to be queued but not synthesized until
the emulation sequence completes (any other behavior would risk confusing KVM).

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e214..1d8d9960df20 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3119,10 +3119,28 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 
 static int msr_interception(struct kvm_vcpu *vcpu)
 {
-       if (to_svm(vcpu)->vmcb->control.exit_info_1)
-               return kvm_emulate_wrmsr(vcpu);
-       else
+       bool efer_l2 = is_guest_mode(vcpu) && kvm_rcx_read(vcpu) == MSR_EFER;
+       int r;
+
+       if (!to_svm(vcpu)->vmcb->control.exit_info_1)
                return kvm_emulate_rdmsr(vcpu);
+
+       r = kvm_emulate_wrmsr(vcpu);
+
+       /*
+        * If EFER.SVME is cleared while the vCPU is in L2, KVM forces the vCPU
+        * back into L1 as that is the safest option for KVM.  Architecturally,
+        * clearing EFER.SVME while a guest is running yields undefined behavior,
+        * i.e. KVM can do literally anything.  Synthesize a shutdown (for L1!)
+        * if EFER.SVME was cleared on a guest WRMSR (to avoid false positives
+        * on userspace restoring state), so that so that KVM at least doesn't
+        * run random L2 code in the
+        * context of L1.
+        */
+       if (r && efer_l2 && !is_guest_mode(vcpu))
+               kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+
+       return r;
 }
 
 static int interrupt_window_interception(struct kvm_vcpu *vcpu)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2
  2026-02-26 18:20     ` Sean Christopherson
@ 2026-02-27 20:03       ` Yosry Ahmed
  2026-02-28  0:41         ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-27 20:03 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Yosry Ahmed, Paolo Bonzini, kvm, linux-kernel

> > > @@ -216,6 +216,17 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
> > >
> > >     if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
> > >             if (!(efer & EFER_SVME)) {
> > > +                   /*
> > > +                    * Architecturally, clearing EFER.SVME while a guest is
> > > +                    * running yields undefined behavior, i.e. KVM can do
> > > +                    * literally anything.  Force the vCPU back into L1 as
> > > +                    * that is the safest option for KVM, but synthesize a
> > > +                    * triple fault (for L1!) so that KVM at least doesn't
> > > +                    * run random L2 code in the context of L1.
> > > +                    */
> > > +                   if (is_guest_mode(vcpu))
> > > +                           kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> > > +
> >
> > Sigh, I think this is not correct in all cases:
> >
> > 1. If userspace restores a vCPU with EFER.SVME=0 to a vCPU with
> > EFER.SVME=1 (e.g. restoring a vCPU running to a vCPU running L2).
> > Typically KVM_SET_SREGS is done before KVM_SET_NESTED_STATE, so we may
> > set EFER.SVME = 0 before leaving guest mode.
> >
> > 2. On vCPU reset, we clear EFER. Hmm, this one is seemingly okay tho,
> > looking at kvm_vcpu_reset(), we leave nested first:
> >
> >       /*
> >        * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
> >        * possible to INIT the vCPU while L2 is active.  Force the vCPU back
> >        * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
> >        * bits), i.e. virtualization is disabled.
> >        */
> >       if (is_guest_mode(vcpu))
> >               kvm_leave_nested(vcpu);
> >
> >       ...
> >
> >       kvm_x86_call(set_efer)(vcpu, 0);
> >
> > So I think the only problematic case is (1). We can probably fix this by
> > plumbing host_initiated through set_efer? This is getting more
> > complicated than I would have liked..
>
> What if we instead hook WRMSR interception?  A little fugly (well, more than a
> little), but I think it would minimize the chances of a false-positive.  The
> biggest potential flaw I see is that this will incorrectly triple fault if KVM
> synthesizes a #VMEXIT while emulating the WRMSR.  But that really shouldn't
> happen, because even a #GP=>#VMEXIT needs to be queued but not synthesized until
> the emulation sequence completes (any other behavior would risk confusing KVM).

What if we key off vcpu->wants_to_run?

It's less protection against false positives from things like
kvm_vcpu_reset() if it didn't leave nested before clearing EFER, but
more protection against the #VMEXIT case you mentioned. Also should be
much lower on the fugliness scale imo.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2
  2026-02-27 20:03       ` Yosry Ahmed
@ 2026-02-28  0:41         ` Sean Christopherson
  2026-02-28  0:46           ` Yosry Ahmed
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2026-02-28  0:41 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Yosry Ahmed, Paolo Bonzini, kvm, linux-kernel

On Fri, Feb 27, 2026, Yosry Ahmed wrote:
> > > > @@ -216,6 +216,17 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
> > > >
> > > >     if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
> > > >             if (!(efer & EFER_SVME)) {
> > > > +                   /*
> > > > +                    * Architecturally, clearing EFER.SVME while a guest is
> > > > +                    * running yields undefined behavior, i.e. KVM can do
> > > > +                    * literally anything.  Force the vCPU back into L1 as
> > > > +                    * that is the safest option for KVM, but synthesize a
> > > > +                    * triple fault (for L1!) so that KVM at least doesn't
> > > > +                    * run random L2 code in the context of L1.
> > > > +                    */
> > > > +                   if (is_guest_mode(vcpu))
> > > > +                           kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> > > > +
> > >
> > > Sigh, I think this is not correct in all cases:
> > >
> > > 1. If userspace restores a vCPU with EFER.SVME=0 to a vCPU with
> > > EFER.SVME=1 (e.g. restoring a vCPU running to a vCPU running L2).
> > > Typically KVM_SET_SREGS is done before KVM_SET_NESTED_STATE, so we may
> > > set EFER.SVME = 0 before leaving guest mode.
> > >
> > > 2. On vCPU reset, we clear EFER. Hmm, this one is seemingly okay tho,
> > > looking at kvm_vcpu_reset(), we leave nested first:
> > >
> > >       /*
> > >        * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
> > >        * possible to INIT the vCPU while L2 is active.  Force the vCPU back
> > >        * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
> > >        * bits), i.e. virtualization is disabled.
> > >        */
> > >       if (is_guest_mode(vcpu))
> > >               kvm_leave_nested(vcpu);
> > >
> > >       ...
> > >
> > >       kvm_x86_call(set_efer)(vcpu, 0);
> > >
> > > So I think the only problematic case is (1). We can probably fix this by
> > > plumbing host_initiated through set_efer? This is getting more
> > > complicated than I would have liked..
> >
> > What if we instead hook WRMSR interception?  A little fugly (well, more than a
> > little), but I think it would minimize the chances of a false-positive.  The
> > biggest potential flaw I see is that this will incorrectly triple fault if KVM
> > synthesizes a #VMEXIT while emulating the WRMSR.  But that really shouldn't
> > happen, because even a #GP=>#VMEXIT needs to be queued but not synthesized until
> > the emulation sequence completes (any other behavior would risk confusing KVM).
> 
> What if we key off vcpu->wants_to_run?

That crossed my mind too.

> It's less protection against false positives from things like
> kvm_vcpu_reset() if it didn't leave nested before clearing EFER, but
> more protection against the #VMEXIT case you mentioned. Also should be
> much lower on the fugliness scale imo.

Yeah, I had pretty much the exact same thought process and assessment.  I suggested
the WRMSR approach because I'm not sure how I feel about using wants_to_run for
functional behavior.  But after realizing that hooking WRMSR won't handle RSM,
I'm solidly against my WRMSR idea.

Honestly, I'm leaning slightly towards dropping this patch entirely since it's
not a bug fix.  But I'm definitely not completely against it either.  So what if
we throw it in, but plan on reverting if there are any more problems (that aren't
obviously due to goofs elsewhere in KVM).

Is this what you were thinking?

---
 arch/x86/kvm/svm/svm.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1b31b033d79b..3e48e9c1c955 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -216,6 +216,19 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 
 	if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
 		if (!(efer & EFER_SVME)) {
+			/*
+			 * Architecturally, clearing EFER.SVME while a guest is
+			 * running yields undefined behavior, i.e. KVM can do
+			 * literally anything.  Force the vCPU back into L1 as
+			 * that is the safest option for KVM, but synthesize a
+			 * triple fault (for L1!) so that KVM at least doesn't
+			 * run random L2 code in the context of L1.  Do so if
+			 * and only if the vCPU is actively running, e.g. to
+			 * avoid positives if userspace is stuffing state.
+			 */
+			if (is_guest_mode(vcpu) && vcpu->wants_to_run)
+				kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+
 			svm_leave_nested(vcpu);
 			/* #GP intercept is still needed for vmware backdoor */
 			if (!enable_vmware_backdoor)

base-commit: 95deaec3557dced322e2540bfa426e60e5373d46
--

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2
  2026-02-28  0:41         ` Sean Christopherson
@ 2026-02-28  0:46           ` Yosry Ahmed
  2026-03-02 22:48             ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-28  0:46 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Yosry Ahmed, Paolo Bonzini, kvm, linux-kernel

> > What if we key off vcpu->wants_to_run?
>
> That crossed my mind too.
>
> > It's less protection against false positives from things like
> > kvm_vcpu_reset() if it didn't leave nested before clearing EFER, but
> > more protection against the #VMEXIT case you mentioned. Also should be
> > much lower on the fugliness scale imo.
>
> Yeah, I had pretty much the exact same thought process and assessment.  I suggested
> the WRMSR approach because I'm not sure how I feel about using wants_to_run for
> functional behavior.  But after realizing that hooking WRMSR won't handle RSM,
> I'm solidly against my WRMSR idea.
>
> Honestly, I'm leaning slightly towards dropping this patch entirely since it's
> not a bug fix.  But I'm definitely not completely against it either.  So what if
> we throw it in, but plan on reverting if there are any more problems (that aren't
> obviously due to goofs elsewhere in KVM).

I am okay with that.

>
> Is this what you were thinking?

Yeah, exactly.

>
> ---
>  arch/x86/kvm/svm/svm.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 1b31b033d79b..3e48e9c1c955 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -216,6 +216,19 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>
>         if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
>                 if (!(efer & EFER_SVME)) {
> +                       /*
> +                        * Architecturally, clearing EFER.SVME while a guest is
> +                        * running yields undefined behavior, i.e. KVM can do
> +                        * literally anything.  Force the vCPU back into L1 as
> +                        * that is the safest option for KVM, but synthesize a
> +                        * triple fault (for L1!) so that KVM at least doesn't
> +                        * run random L2 code in the context of L1.  Do so if
> +                        * and only if the vCPU is actively running, e.g. to
> +                        * avoid positives if userspace is stuffing state.
> +                        */
> +                       if (is_guest_mode(vcpu) && vcpu->wants_to_run)
> +                               kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> +
>                         svm_leave_nested(vcpu);
>                         /* #GP intercept is still needed for vmware backdoor */
>                         if (!enable_vmware_backdoor)
>
> base-commit: 95deaec3557dced322e2540bfa426e60e5373d46
> --

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2
  2026-02-28  0:46           ` Yosry Ahmed
@ 2026-03-02 22:48             ` Sean Christopherson
  0 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2026-03-02 22:48 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Yosry Ahmed, Paolo Bonzini, kvm, linux-kernel

On Fri, Feb 27, 2026, Yosry Ahmed wrote:
> > > What if we key off vcpu->wants_to_run?
> >
> > That crossed my mind too.
> >
> > > It's less protection against false positives from things like
> > > kvm_vcpu_reset() if it didn't leave nested before clearing EFER, but
> > > more protection against the #VMEXIT case you mentioned. Also should be
> > > much lower on the fugliness scale imo.
> >
> > Yeah, I had pretty much the exact same thought process and assessment.  I suggested
> > the WRMSR approach because I'm not sure how I feel about using wants_to_run for
> > functional behavior.  But after realizing that hooking WRMSR won't handle RSM,
> > I'm solidly against my WRMSR idea.
> >
> > Honestly, I'm leaning slightly towards dropping this patch entirely since it's
> > not a bug fix.  But I'm definitely not completely against it either.  So what if
> > we throw it in, but plan on reverting if there are any more problems (that aren't
> > obviously due to goofs elsewhere in KVM).
> 
> I am okay with that.
> 
> >
> > Is this what you were thinking?
> 
> Yeah, exactly.

Nice.  No need for a v3, I'll fixup when applying (it might be a while before
this gets any "thanks", as I want to land it behind all of the stable@ fixes).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly
  2026-02-09 19:51 [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly Yosry Ahmed
  2026-02-09 19:51 ` [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2 Yosry Ahmed
  2026-02-09 19:51 ` [PATCH v2 2/2] KVM: selftests: Add a test for L2 clearing EFER.SVME without intercept Yosry Ahmed
@ 2026-03-05 17:08 ` Sean Christopherson
  2 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2026-03-05 17:08 UTC (permalink / raw)
  To: Sean Christopherson, Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel

On Mon, 09 Feb 2026 19:51:40 +0000, Yosry Ahmed wrote:
> Add more graceful handling of L2 clearing EFER.SVME without L1
> interception, which is architecturally undefined. Shutdown L1 instead of
> running it with corrupted L2 state, and add a test to verify the new
> behavior.
> 
> I did not CC stable on patch 1 because it's not technically a KVM bug,
> but it would be nice to have it backported. Leaving the decision to
> Sean.
> 
> [...]

Applied to kvm-x86 nested, with the discussed fixup.  Thanks!

[1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2
      https://github.com/kvm-x86/linux/commit/cdc69269b18a
[2/2] KVM: selftests: Add a test for L2 clearing EFER.SVME without intercept
      https://github.com/kvm-x86/linux/commit/3900e56eb184

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-03-05 17:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-09 19:51 [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly Yosry Ahmed
2026-02-09 19:51 ` [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2 Yosry Ahmed
2026-02-26 16:36   ` Yosry Ahmed
2026-02-26 18:20     ` Sean Christopherson
2026-02-27 20:03       ` Yosry Ahmed
2026-02-28  0:41         ` Sean Christopherson
2026-02-28  0:46           ` Yosry Ahmed
2026-03-02 22:48             ` Sean Christopherson
2026-02-09 19:51 ` [PATCH v2 2/2] KVM: selftests: Add a test for L2 clearing EFER.SVME without intercept Yosry Ahmed
2026-03-05 17:08 ` [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox