linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	pbonzini@redhat.com, chao.gao@intel.com, kai.huang@intel.com,
	robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com
Subject: Re: [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU
Date: Tue, 5 Sep 2023 15:31:59 -0700	[thread overview]
Message-ID: <ZPesX2xp6rGZsxlE@google.com> (raw)
In-Reply-To: <ZPWHtUh9SZDc4EoN@yzhao56-desk.sh.intel.com>

On Mon, Sep 04, 2023, Yan Zhao wrote:
> On Fri, Aug 25, 2023 at 02:34:30PM -0700, Sean Christopherson wrote:
> > On Fri, Jul 14, 2023, Yan Zhao wrote:
> > > Convert kvm_zap_gfn_range() from holding mmu_lock for write to holding for
> > > read in TDP MMU and allow zapping of non-leaf SPTEs of level <= 1G.
> > > TLB flushes are executed/requested within tdp_mmu_zap_spte_atomic() guarded
> > > by RCU lock.
> > > 
> > > GFN zap can be super slow if mmu_lock is held for write when there are
> > > contentions. In worst cases, huge cpu cycles are spent on yielding GFN by
> > > GFN, i.e. the loop of "check and flush tlb -> drop rcu lock ->
> > > drop mmu_lock -> cpu_relax() -> take mmu_lock -> take rcu lock" are entered
> > > for every GFN.
> > > Contentions can either from concurrent zaps holding mmu_lock for write or
> > > from tdp_mmu_map() holding mmu_lock for read.
> > 
> > The lock contention should go away with a pre-check[*], correct?  That's a more
> Yes, I think so, though I don't have time to verify it yet.
> 
> > complete solution too, in that it also avoids lock contention for the shadow MMU,
> > which presumably suffers the same problem (I don't see anything that would prevent
> > it from yielding).
> > 
> > If we do want to zap with mmu_lock held for read, I think we should convert
> > kvm_tdp_mmu_zap_leafs() and all its callers to run under read, because unless I'm
> > missing something, the rules are the same regardless of _why_ KVM is zapping, e.g.
> > the zap needs to be protected by mmu_invalidate_in_progress, which ensures no other
> > tasks will race to install SPTEs that are supposed to be zapped.
> Yes. I did't do that to the unmap path was only because I don't want to make a
> big code change.
> The write lock in kvm_unmap_gfn_range() path is taken in arch-agnostic code,
> which is not easy to change, right?

Yeah.  The lock itself isn't bad, especially if we can convert all mmu_nofitier
hooks, e.g. we already have KVM_MMU_LOCK(), adding a variant for mmu_notifiers
would be quite easy.

The bigger problem would be kvm_mmu_invalidate_{begin,end}() and getting the
memory ordering right, especially if there are multiple mmu_notifier events in
flight.

But I was actually thinking of a cheesier approach: drop and reacquire mmu_lock
when zapping, e.g. without the necessary changes in tdp_mmu_zap_leafs():

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 735c976913c2..c89a2511789b 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -882,9 +882,15 @@ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
 {
        struct kvm_mmu_page *root;
 
+       write_unlock(&kvm->mmu_lock);
+       read_lock(&kvm->mmu_lock);
+
        for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
                flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);
 
+       read_unlock(&kvm->mmu_lock);
+       write_lock(&kvm->mmu_lock);
+
        return flush;
 }

vCPUs would still get blocked, but for a smaller duration, and the lock contention
between vCPUs and the zapping task would mostly go away.

> > If you post a version of this patch that converts kvm_tdp_mmu_zap_leafs(), please
> > post it as a standalone patch.  At a glance it doesn't have any dependencies on the
> > MTRR changes, and I don't want this type of changed buried at the end of a series
> > that is for a fairly niche setup.  This needs a lot of scrutiny to make sure zapping
> > under read really is safe
> Given the pre-check patch should work, do you think it's still worthwhile to do
> this convertion?

I do think it would be a net positive, though I don't know that it's worth your
time without a concrete use cases.  My gut instinct could be wrong, so I wouldn't
want to take on the risk of running with mmu_lock held for read without hard
performance numbers to justify the change.

  reply	other threads:[~2023-09-05 22:32 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
2023-07-14  6:50 ` [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs Yan Zhao
2023-10-07  7:00   ` Like Xu
2023-10-09 19:52     ` Sean Christopherson
2023-10-09 21:27       ` Sean Christopherson
2023-10-09 21:36         ` Sean Christopherson
2023-10-10  3:46         ` Yan Zhao
2023-10-11  0:08           ` Sean Christopherson
2023-10-11  1:47             ` Yan Zhao
2023-07-14  6:50 ` [PATCH v4 02/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper in kvm_tdp_page_fault() Yan Zhao
2023-07-14  6:51 ` [PATCH v4 03/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when CR0.CD toggles Yan Zhao
2023-07-14  6:51 ` [PATCH v4 04/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when update mtrr Yan Zhao
2023-07-14  6:52 ` [PATCH v4 05/12] KVM: x86/mmu: zap KVM TDP when noncoherent DMA assignment starts/stops Yan Zhao
2023-07-14  6:52 ` [PATCH v4 06/12] KVM: x86/mmu: move TDP zaps from guest MTRRs update to CR0.CD toggling Yan Zhao
2023-07-14  6:53 ` [PATCH v4 07/12] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED Yan Zhao
2023-08-25 21:43   ` Sean Christopherson
2023-09-04  7:41     ` Yan Zhao
2023-07-14  6:53 ` [PATCH v4 08/12] KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are honored Yan Zhao
2023-08-25 21:46   ` Sean Christopherson
2023-09-04  7:46     ` Yan Zhao
2023-07-14  6:54 ` [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn " Yan Zhao
2023-08-25 22:47   ` Sean Christopherson
2023-09-04  8:24     ` Yan Zhao
2023-07-14  6:55 ` [PATCH v4 10/12] KVM: x86/mmu: fine-grained gfn zap " Yan Zhao
2023-08-25 23:13   ` Sean Christopherson
2023-09-04  8:37     ` Yan Zhao
2023-07-14  6:56 ` [PATCH v4 11/12] KVM: x86/mmu: split a single gfn zap range " Yan Zhao
2023-08-25 23:15   ` Sean Christopherson
2023-09-04  8:39     ` Yan Zhao
2023-07-14  6:56 ` [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU Yan Zhao
2023-08-25 21:34   ` Sean Christopherson
2023-09-04  7:31     ` Yan Zhao
2023-09-05 22:31       ` Sean Christopherson [this message]
2023-09-06  0:50         ` Yan Zhao
2023-08-25 23:17 ` [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Sean Christopherson
2023-09-04  8:48   ` Yan Zhao
2023-10-05  1:29 ` Sean Christopherson
2023-10-05  2:19   ` Huang, Kai
2023-10-05  2:28     ` Sean Christopherson
2023-10-06  0:50       ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZPesX2xp6rGZsxlE@google.com \
    --to=seanjc@google.com \
    --cc=chao.gao@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=robert.hoo.linux@gmail.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yuan.yao@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).