From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
pbonzini@redhat.com, chao.gao@intel.com, kai.huang@intel.com,
robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com
Subject: Re: [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU
Date: Tue, 5 Sep 2023 15:31:59 -0700 [thread overview]
Message-ID: <ZPesX2xp6rGZsxlE@google.com> (raw)
In-Reply-To: <ZPWHtUh9SZDc4EoN@yzhao56-desk.sh.intel.com>
On Mon, Sep 04, 2023, Yan Zhao wrote:
> On Fri, Aug 25, 2023 at 02:34:30PM -0700, Sean Christopherson wrote:
> > On Fri, Jul 14, 2023, Yan Zhao wrote:
> > > Convert kvm_zap_gfn_range() from holding mmu_lock for write to holding for
> > > read in TDP MMU and allow zapping of non-leaf SPTEs of level <= 1G.
> > > TLB flushes are executed/requested within tdp_mmu_zap_spte_atomic() guarded
> > > by RCU lock.
> > >
> > > GFN zap can be super slow if mmu_lock is held for write when there are
> > > contentions. In worst cases, huge cpu cycles are spent on yielding GFN by
> > > GFN, i.e. the loop of "check and flush tlb -> drop rcu lock ->
> > > drop mmu_lock -> cpu_relax() -> take mmu_lock -> take rcu lock" are entered
> > > for every GFN.
> > > Contentions can either from concurrent zaps holding mmu_lock for write or
> > > from tdp_mmu_map() holding mmu_lock for read.
> >
> > The lock contention should go away with a pre-check[*], correct? That's a more
> Yes, I think so, though I don't have time to verify it yet.
>
> > complete solution too, in that it also avoids lock contention for the shadow MMU,
> > which presumably suffers the same problem (I don't see anything that would prevent
> > it from yielding).
> >
> > If we do want to zap with mmu_lock held for read, I think we should convert
> > kvm_tdp_mmu_zap_leafs() and all its callers to run under read, because unless I'm
> > missing something, the rules are the same regardless of _why_ KVM is zapping, e.g.
> > the zap needs to be protected by mmu_invalidate_in_progress, which ensures no other
> > tasks will race to install SPTEs that are supposed to be zapped.
> Yes. I did't do that to the unmap path was only because I don't want to make a
> big code change.
> The write lock in kvm_unmap_gfn_range() path is taken in arch-agnostic code,
> which is not easy to change, right?
Yeah. The lock itself isn't bad, especially if we can convert all mmu_nofitier
hooks, e.g. we already have KVM_MMU_LOCK(), adding a variant for mmu_notifiers
would be quite easy.
The bigger problem would be kvm_mmu_invalidate_{begin,end}() and getting the
memory ordering right, especially if there are multiple mmu_notifier events in
flight.
But I was actually thinking of a cheesier approach: drop and reacquire mmu_lock
when zapping, e.g. without the necessary changes in tdp_mmu_zap_leafs():
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 735c976913c2..c89a2511789b 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -882,9 +882,15 @@ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
{
struct kvm_mmu_page *root;
+ write_unlock(&kvm->mmu_lock);
+ read_lock(&kvm->mmu_lock);
+
for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);
+ read_unlock(&kvm->mmu_lock);
+ write_lock(&kvm->mmu_lock);
+
return flush;
}
vCPUs would still get blocked, but for a smaller duration, and the lock contention
between vCPUs and the zapping task would mostly go away.
> > If you post a version of this patch that converts kvm_tdp_mmu_zap_leafs(), please
> > post it as a standalone patch. At a glance it doesn't have any dependencies on the
> > MTRR changes, and I don't want this type of changed buried at the end of a series
> > that is for a fairly niche setup. This needs a lot of scrutiny to make sure zapping
> > under read really is safe
> Given the pre-check patch should work, do you think it's still worthwhile to do
> this convertion?
I do think it would be a net positive, though I don't know that it's worth your
time without a concrete use cases. My gut instinct could be wrong, so I wouldn't
want to take on the risk of running with mmu_lock held for read without hard
performance numbers to justify the change.
next prev parent reply other threads:[~2023-09-05 22:32 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-14 6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
2023-07-14 6:50 ` [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs Yan Zhao
2023-10-07 7:00 ` Like Xu
2023-10-09 19:52 ` Sean Christopherson
2023-10-09 21:27 ` Sean Christopherson
2023-10-09 21:36 ` Sean Christopherson
2023-10-10 3:46 ` Yan Zhao
2023-10-11 0:08 ` Sean Christopherson
2023-10-11 1:47 ` Yan Zhao
2023-07-14 6:50 ` [PATCH v4 02/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper in kvm_tdp_page_fault() Yan Zhao
2023-07-14 6:51 ` [PATCH v4 03/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when CR0.CD toggles Yan Zhao
2023-07-14 6:51 ` [PATCH v4 04/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when update mtrr Yan Zhao
2023-07-14 6:52 ` [PATCH v4 05/12] KVM: x86/mmu: zap KVM TDP when noncoherent DMA assignment starts/stops Yan Zhao
2023-07-14 6:52 ` [PATCH v4 06/12] KVM: x86/mmu: move TDP zaps from guest MTRRs update to CR0.CD toggling Yan Zhao
2023-07-14 6:53 ` [PATCH v4 07/12] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED Yan Zhao
2023-08-25 21:43 ` Sean Christopherson
2023-09-04 7:41 ` Yan Zhao
2023-07-14 6:53 ` [PATCH v4 08/12] KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are honored Yan Zhao
2023-08-25 21:46 ` Sean Christopherson
2023-09-04 7:46 ` Yan Zhao
2023-07-14 6:54 ` [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn " Yan Zhao
2023-08-25 22:47 ` Sean Christopherson
2023-09-04 8:24 ` Yan Zhao
2023-07-14 6:55 ` [PATCH v4 10/12] KVM: x86/mmu: fine-grained gfn zap " Yan Zhao
2023-08-25 23:13 ` Sean Christopherson
2023-09-04 8:37 ` Yan Zhao
2023-07-14 6:56 ` [PATCH v4 11/12] KVM: x86/mmu: split a single gfn zap range " Yan Zhao
2023-08-25 23:15 ` Sean Christopherson
2023-09-04 8:39 ` Yan Zhao
2023-07-14 6:56 ` [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU Yan Zhao
2023-08-25 21:34 ` Sean Christopherson
2023-09-04 7:31 ` Yan Zhao
2023-09-05 22:31 ` Sean Christopherson [this message]
2023-09-06 0:50 ` Yan Zhao
2023-08-25 23:17 ` [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Sean Christopherson
2023-09-04 8:48 ` Yan Zhao
2023-10-05 1:29 ` Sean Christopherson
2023-10-05 2:19 ` Huang, Kai
2023-10-05 2:28 ` Sean Christopherson
2023-10-06 0:50 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZPesX2xp6rGZsxlE@google.com \
--to=seanjc@google.com \
--cc=chao.gao@intel.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=robert.hoo.linux@gmail.com \
--cc=yan.y.zhao@intel.com \
--cc=yuan.yao@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.