All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry@kernel.org>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2
Date: Thu, 26 Feb 2026 10:20:27 -0800	[thread overview]
Message-ID: <aaCO62eQiZX5pvSk@google.com> (raw)
In-Reply-To: <txfn2izdpaavep6yrcujlxkqrqf2gwk2ccb6dplwcfnsstdnie@lgx74e27nus7>

On Thu, Feb 26, 2026, Yosry Ahmed wrote:
> On Mon, Feb 09, 2026 at 07:51:41PM +0000, Yosry Ahmed wrote:
> > KVM tracks when EFER.SVME is set and cleared to initialize and tear down
> > nested state. However, it doesn't differentiate if EFER.SVME is getting
> > toggled in L1 or L2+. If L2 clears EFER.SVME, and L1 does not intercept
> > the EFER write, KVM exits guest mode and tears down nested state while
> > L2 is running, executing L1 without injecting a proper #VMEXIT.
> > 
> > According to the APM:
> > 
> >     The effect of turning off EFER.SVME while a guest is running is
> >     undefined; therefore, the VMM should always prevent guests from
> >     writing EFER.
> > 
> > Since the behavior is architecturally undefined, KVM gets to choose what
> > to do. Inject a triple fault into L1 as a more graceful option that
> > running L1 with corrupted state.
> > 
> > Co-developed-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> >  arch/x86/kvm/svm/svm.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 5f0136dbdde6..ccd73a3be3f9 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -216,6 +216,17 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
> >  
> >  	if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
> >  		if (!(efer & EFER_SVME)) {
> > +			/*
> > +			 * Architecturally, clearing EFER.SVME while a guest is
> > +			 * running yields undefined behavior, i.e. KVM can do
> > +			 * literally anything.  Force the vCPU back into L1 as
> > +			 * that is the safest option for KVM, but synthesize a
> > +			 * triple fault (for L1!) so that KVM at least doesn't
> > +			 * run random L2 code in the context of L1.
> > +			 */
> > +			if (is_guest_mode(vcpu))
> > +				kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> > +
> 
> Sigh, I think this is not correct in all cases:
> 
> 1. If userspace restores a vCPU with EFER.SVME=0 to a vCPU with
> EFER.SVME=1 (e.g. restoring a vCPU running to a vCPU running L2).
> Typically KVM_SET_SREGS is done before KVM_SET_NESTED_STATE, so we may
> set EFER.SVME = 0 before leaving guest mode.
> 
> 2. On vCPU reset, we clear EFER. Hmm, this one is seemingly okay tho,
> looking at kvm_vcpu_reset(), we leave nested first:
> 
> 	/*
> 	 * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
> 	 * possible to INIT the vCPU while L2 is active.  Force the vCPU back
> 	 * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
> 	 * bits), i.e. virtualization is disabled.
> 	 */
> 	if (is_guest_mode(vcpu))
> 		kvm_leave_nested(vcpu);
> 
> 	...
> 
> 	kvm_x86_call(set_efer)(vcpu, 0);
> 
> So I think the only problematic case is (1). We can probably fix this by
> plumbing host_initiated through set_efer? This is getting more
> complicated than I would have liked..

What if we instead hook WRMSR interception?  A little fugly (well, more than a
little), but I think it would minimize the chances of a false-positive.  The
biggest potential flaw I see is that this will incorrectly triple fault if KVM
synthesizes a #VMEXIT while emulating the WRMSR.  But that really shouldn't
happen, because even a #GP=>#VMEXIT needs to be queued but not synthesized until
the emulation sequence completes (any other behavior would risk confusing KVM).

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e214..1d8d9960df20 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3119,10 +3119,28 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 
 static int msr_interception(struct kvm_vcpu *vcpu)
 {
-       if (to_svm(vcpu)->vmcb->control.exit_info_1)
-               return kvm_emulate_wrmsr(vcpu);
-       else
+       bool efer_l2 = is_guest_mode(vcpu) && kvm_rcx_read(vcpu) == MSR_EFER;
+       int r;
+
+       if (!to_svm(vcpu)->vmcb->control.exit_info_1)
                return kvm_emulate_rdmsr(vcpu);
+
+       r = kvm_emulate_wrmsr(vcpu);
+
+       /*
+        * If EFER.SVME is cleared while the vCPU is in L2, KVM forces the vCPU
+        * back into L1 as that is the safest option for KVM.  Architecturally,
+        * clearing EFER.SVME while a guest is running yields undefined behavior,
+        * i.e. KVM can do literally anything.  Synthesize a shutdown (for L1!)
+        * if EFER.SVME was cleared on a guest WRMSR (to avoid false positives
+        * on userspace restoring state), so that so that KVM at least doesn't
+        * run random L2 code in the
+        * context of L1.
+        */
+       if (r && efer_l2 && !is_guest_mode(vcpu))
+               kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+
+       return r;
 }
 
 static int interrupt_window_interception(struct kvm_vcpu *vcpu)

  reply	other threads:[~2026-02-26 18:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-09 19:51 [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly Yosry Ahmed
2026-02-09 19:51 ` [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2 Yosry Ahmed
2026-02-26 16:36   ` Yosry Ahmed
2026-02-26 18:20     ` Sean Christopherson [this message]
2026-02-27 20:03       ` Yosry Ahmed
2026-02-28  0:41         ` Sean Christopherson
2026-02-28  0:46           ` Yosry Ahmed
2026-03-02 22:48             ` Sean Christopherson
2026-02-09 19:51 ` [PATCH v2 2/2] KVM: selftests: Add a test for L2 clearing EFER.SVME without intercept Yosry Ahmed
2026-03-05 17:08 ` [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaCO62eQiZX5pvSk@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=yosry.ahmed@linux.dev \
    --cc=yosry@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.