public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry@kernel.org>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2
Date: Thu, 26 Feb 2026 10:20:27 -0800	[thread overview]
Message-ID: <aaCO62eQiZX5pvSk@google.com> (raw)
In-Reply-To: <txfn2izdpaavep6yrcujlxkqrqf2gwk2ccb6dplwcfnsstdnie@lgx74e27nus7>

On Thu, Feb 26, 2026, Yosry Ahmed wrote:
> On Mon, Feb 09, 2026 at 07:51:41PM +0000, Yosry Ahmed wrote:
> > KVM tracks when EFER.SVME is set and cleared to initialize and tear down
> > nested state. However, it doesn't differentiate if EFER.SVME is getting
> > toggled in L1 or L2+. If L2 clears EFER.SVME, and L1 does not intercept
> > the EFER write, KVM exits guest mode and tears down nested state while
> > L2 is running, executing L1 without injecting a proper #VMEXIT.
> > 
> > According to the APM:
> > 
> >     The effect of turning off EFER.SVME while a guest is running is
> >     undefined; therefore, the VMM should always prevent guests from
> >     writing EFER.
> > 
> > Since the behavior is architecturally undefined, KVM gets to choose what
> > to do. Inject a triple fault into L1 as a more graceful option that
> > running L1 with corrupted state.
> > 
> > Co-developed-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> >  arch/x86/kvm/svm/svm.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 5f0136dbdde6..ccd73a3be3f9 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -216,6 +216,17 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
> >  
> >  	if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
> >  		if (!(efer & EFER_SVME)) {
> > +			/*
> > +			 * Architecturally, clearing EFER.SVME while a guest is
> > +			 * running yields undefined behavior, i.e. KVM can do
> > +			 * literally anything.  Force the vCPU back into L1 as
> > +			 * that is the safest option for KVM, but synthesize a
> > +			 * triple fault (for L1!) so that KVM at least doesn't
> > +			 * run random L2 code in the context of L1.
> > +			 */
> > +			if (is_guest_mode(vcpu))
> > +				kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> > +
> 
> Sigh, I think this is not correct in all cases:
> 
> 1. If userspace restores a vCPU with EFER.SVME=0 to a vCPU with
> EFER.SVME=1 (e.g. restoring a vCPU running to a vCPU running L2).
> Typically KVM_SET_SREGS is done before KVM_SET_NESTED_STATE, so we may
> set EFER.SVME = 0 before leaving guest mode.
> 
> 2. On vCPU reset, we clear EFER. Hmm, this one is seemingly okay tho,
> looking at kvm_vcpu_reset(), we leave nested first:
> 
> 	/*
> 	 * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
> 	 * possible to INIT the vCPU while L2 is active.  Force the vCPU back
> 	 * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
> 	 * bits), i.e. virtualization is disabled.
> 	 */
> 	if (is_guest_mode(vcpu))
> 		kvm_leave_nested(vcpu);
> 
> 	...
> 
> 	kvm_x86_call(set_efer)(vcpu, 0);
> 
> So I think the only problematic case is (1). We can probably fix this by
> plumbing host_initiated through set_efer? This is getting more
> complicated than I would have liked..

What if we instead hook WRMSR interception?  A little fugly (well, more than a
little), but I think it would minimize the chances of a false-positive.  The
biggest potential flaw I see is that this will incorrectly triple fault if KVM
synthesizes a #VMEXIT while emulating the WRMSR.  But that really shouldn't
happen, because even a #GP=>#VMEXIT needs to be queued but not synthesized until
the emulation sequence completes (any other behavior would risk confusing KVM).

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e214..1d8d9960df20 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3119,10 +3119,28 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 
 static int msr_interception(struct kvm_vcpu *vcpu)
 {
-       if (to_svm(vcpu)->vmcb->control.exit_info_1)
-               return kvm_emulate_wrmsr(vcpu);
-       else
+       bool efer_l2 = is_guest_mode(vcpu) && kvm_rcx_read(vcpu) == MSR_EFER;
+       int r;
+
+       if (!to_svm(vcpu)->vmcb->control.exit_info_1)
                return kvm_emulate_rdmsr(vcpu);
+
+       r = kvm_emulate_wrmsr(vcpu);
+
+       /*
+        * If EFER.SVME is cleared while the vCPU is in L2, KVM forces the vCPU
+        * back into L1 as that is the safest option for KVM.  Architecturally,
+        * clearing EFER.SVME while a guest is running yields undefined behavior,
+        * i.e. KVM can do literally anything.  Synthesize a shutdown (for L1!)
+        * if EFER.SVME was cleared on a guest WRMSR (to avoid false positives
+        * on userspace restoring state), so that so that KVM at least doesn't
+        * run random L2 code in the
+        * context of L1.
+        */
+       if (r && efer_l2 && !is_guest_mode(vcpu))
+               kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+
+       return r;
 }
 
 static int interrupt_window_interception(struct kvm_vcpu *vcpu)

  reply	other threads:[~2026-02-26 18:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-09 19:51 [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly Yosry Ahmed
2026-02-09 19:51 ` [PATCH v2 1/2] KVM: SVM: Triple fault L1 on unintercepted EFER.SVME clear by L2 Yosry Ahmed
2026-02-26 16:36   ` Yosry Ahmed
2026-02-26 18:20     ` Sean Christopherson [this message]
2026-02-27 20:03       ` Yosry Ahmed
2026-02-28  0:41         ` Sean Christopherson
2026-02-28  0:46           ` Yosry Ahmed
2026-03-02 22:48             ` Sean Christopherson
2026-02-09 19:51 ` [PATCH v2 2/2] KVM: selftests: Add a test for L2 clearing EFER.SVME without intercept Yosry Ahmed
2026-03-05 17:08 ` [PATCH v2 0/2] KVM: nSVM: Handle L2 clearing EFER.SVME properly Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaCO62eQiZX5pvSk@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=yosry.ahmed@linux.dev \
    --cc=yosry@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox