Re: "KVM: x86/mmu: Overhaul TDP MMU zapping and flushing" breaks SVM on Hyper-V

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Tianyu Lan <ltykernel@gmail.com>,
	"Michael Kelley (LINUX)" <mikelley@microsoft.com>
Subject: Re: "KVM: x86/mmu: Overhaul TDP MMU zapping and flushing" breaks SVM on Hyper-V
Date: Mon, 13 Feb 2023 19:11:55 +0000	[thread overview]
Message-ID: <Y+qLe42h9ZPRINrG@google.com> (raw)
In-Reply-To: <35ff8f48-2677-78ea-b5f3-329c75ce65c9@redhat.com>

On Mon, Feb 13, 2023, Paolo Bonzini wrote:
> On 2/13/23 18:38, Sean Christopherson wrote:
> > On Fri, Feb 10, 2023, Jeremi Piotrowski wrote:
> > > Hi Paolo/Sean,
> > > 
> > > We've noticed that changes introduced in "KVM: x86/mmu: Overhaul TDP MMU
> > > zapping and flushing" conflict with a nested Hyper-V enlightenment that is
> > > always enabled on AMD CPUs (HV_X64_NESTED_ENLIGHTENED_TLB). The scenario that
> > > is affected is L0 Hyper-V + L1 KVM on AMD,
> > 
> > Do you see issues with Intel and HV_X64_NESTED_GUEST_MAPPING_FLUSH?  IIUC, on the
> > KVM side, that setup is equivalent to HV_X64_NESTED_ENLIGHTENED_TLB.
> 
> My reading of the spec[1] is that HV_X64_NESTED_ENLIGHTENED_TLB will cause
> svm_flush_tlb_current to behave (in Intel parlance) as an INVVPID rather
> than an INVEPT.

Oh!  Good catch!  Yeah, that'll be a problem.  Copy-pasting the relevant snippet
so future me doesn't have to reread the spec:

  If the nested hypervisor opts into the enlightenment, ASID invalidations just
  flush TLB entires derived from first level address translation (i.e. the
  virtual address space).

Specifically, the "missing" flushes when a root's (nCR3) refcount goes to zero
are expected because KVM relies on flushing via svm_flush_tlb_current() when the
old, stale root might be reused.  That would lead to consuming stale entries when
reusing a previously freed root.

  int kvm_mmu_load(struct kvm_vcpu *vcpu)
  {
	int r;

	...

	/*
	 * Flush any TLB entries for the new root, the provenance of the root
	 * is unknown.  Even if KVM ensures there are no stale TLB entries
	 * for a freed root, in theory another hypervisor could have left
	 * stale entries.  Flushing on alloc also allows KVM to skip the TLB
	 * flush when freeing a root (see kvm_tdp_mmu_put_root()).
	 */
	static_call(kvm_x86_flush_tlb_current)(vcpu);
  out:
	return r;
  }

> So svm_flush_tlb_current has to be changed to also add a
> call to HvCallFlushGuestPhysicalAddressSpace.  I'm not sure if that's a good
> idea though.

That's not strictly necessary, e.g. flushes from kvm_invalidate_pcid() and
kvm_post_set_cr4() don't need to effect a full flush.  I believe the virtual
address flush is also sufficient for avic_activate_vmcb().  Nested (from KVM's
perspective, i.e. running L3) can just be mutually exclusive with
HV_X64_NESTED_ENLIGHTENED_TLB.

That just leaves kvm_mmu_new_pgd()'s force_flush_and_sync_on_reuse and the
aforementioned kvm_mmu_load().

That said, the above cases where a virtual address flush is sufficient are
rare operations when using NPT, so adding a new KVM_REQ_TLB_FLUSH_ROOT or
whatever probably isn't worth doing.

> First, that's a TLB shootdown rather than just a local thing;
> flush_tlb_current is supposed to be relatively cheap, and there would be a
> lot of them because of the unconditional calls to
> nested_svm_transition_tlb_flush on vmentry/vmexit.

This isn't a nested scenario for KVM though.  

> Second, while the nCR3 matches across virtual processors for SVM, the (nCR3,
> ASID) pair does not, so it doesn't even make much sense to do a TLB
> shootdown.
> 
> Depending on the performance results of adding the hypercall to
> svm_flush_tlb_current, the fix could indeed be to just disable usage of
> HV_X64_NESTED_ENLIGHTENED_TLB.

Minus making nested SVM (L3) mutually exclusive, I believe this will do the trick:

---
 arch/x86/kvm/kvm_onhyperv.c | 9 +++++++++
 arch/x86/kvm/kvm_onhyperv.h | 4 ++++
 arch/x86/kvm/svm/svm.c      | 3 +++
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/kvm/kvm_onhyperv.c b/arch/x86/kvm/kvm_onhyperv.c
index 482d6639ef88..e03e9296c1cf 100644
--- a/arch/x86/kvm/kvm_onhyperv.c
+++ b/arch/x86/kvm/kvm_onhyperv.c
@@ -107,3 +107,12 @@ void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
 	}
 }
 EXPORT_SYMBOL_GPL(hv_track_root_tdp);
+
+void hv_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+	if (kvm_x86_ops.tlb_remote_flush != hv_remote_flush_tlb)
+		return;
+
+	WARN_ON_ONCE(hyperv_flush_guest_mapping(vcpu->arch.mmu->root.hpa));
+}
+EXPORT_SYMBOL_GPL(hv_flush_tlb_current);
diff --git a/arch/x86/kvm/kvm_onhyperv.h b/arch/x86/kvm/kvm_onhyperv.h
index 287e98ef9df3..30789dfd3544 100644
--- a/arch/x86/kvm/kvm_onhyperv.h
+++ b/arch/x86/kvm/kvm_onhyperv.h
@@ -11,10 +11,14 @@ int hv_remote_flush_tlb_with_range(struct kvm *kvm,
 		struct kvm_tlb_range *range);
 int hv_remote_flush_tlb(struct kvm *kvm);
 void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp);
+void hv_flush_tlb_current(struct kvm_vcpu *vcpu);
 #else /* !CONFIG_HYPERV */
 static inline void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
 {
 }
+static inline void hv_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+}
 #endif /* !CONFIG_HYPERV */
 
 #endif
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b43775490074..bfc71dfa8482 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3733,6 +3733,9 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	/* blah blah blah */
+	hv_flush_tlb_current(vcpu);
+
 	/*
 	 * Unlike VMX, SVM doesn't provide a way to flush only NPT TLB entries.
 	 * A TLB flush for the current ASID flushes both "host" and "guest" TLB

base-commit: 9fa259abdb42051e5ab4cbf0bc0cd21adcf95a4f
--

next prev parent reply	other threads:[~2023-02-13 19:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-10 18:17 "KVM: x86/mmu: Overhaul TDP MMU zapping and flushing" breaks SVM on Hyper-V Jeremi Piotrowski
2023-02-10 18:45 ` Sean Christopherson
2023-02-13 12:44   ` Jeremi Piotrowski
2023-02-13 12:50     ` Paolo Bonzini
2023-02-13 18:05       ` Jeremi Piotrowski
2023-02-13 18:26         ` Paolo Bonzini
2023-02-13 17:38 ` Sean Christopherson
2023-02-13 17:49   ` Jeremi Piotrowski
2023-02-13 18:11   ` Paolo Bonzini
2023-02-13 19:11     ` Sean Christopherson [this message]
2023-02-13 19:56       ` Paolo Bonzini
2023-02-14 20:27         ` Jeremi Piotrowski
2023-02-15 22:16           ` Sean Christopherson
2023-02-16 14:40             ` Jeremi Piotrowski
2023-02-24 16:17               ` Jeremi Piotrowski
2023-02-24 16:26                 ` Paolo Bonzini

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:482d6639ef8 dfblob:e03e9296c1c dfblob:287e98ef9df
dfblob:30789dfd354 dfblob:b4377549007 dfblob:bfc71dfa848 )
 OR (
bs:"Re: "
bs:"KVM: x86/mmu: Overhaul TDP MMU zapping and flushing"
bs:" breaks SVM on Hyper-V" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+qLe42h9ZPRINrG@google.com \
    --to=seanjc@google.com \
    --cc=jpiotrowski@linux.microsoft.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ltykernel@gmail.com \
    --cc=mikelley@microsoft.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.