All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	David Hildenbrand <david@redhat.com>,
	David Matlack <dmatlack@google.com>,
	Ben Gardon <bgardon@google.com>,
	Mingwei Zhang <mizhang@google.com>
Subject: Re: [PATCH v4 24/30] KVM: x86/mmu: Zap defunct roots via asynchronous worker
Date: Thu, 3 Mar 2022 22:08:08 +0000	[thread overview]
Message-ID: <YiE8SH4Sn1zzRZwe@google.com> (raw)
In-Reply-To: <20220303193842.370645-25-pbonzini@redhat.com>

On Thu, Mar 03, 2022, Paolo Bonzini wrote:
> Zap defunct roots, a.k.a. roots that have been invalidated after their
> last reference was initially dropped, asynchronously via the system work
> queue instead of forcing the work upon the unfortunate task that happened
> to drop the last reference.
> 
> If a vCPU task drops the last reference, the vCPU is effectively blocked
> by the host for the entire duration of the zap.  If the root being zapped
> happens be fully populated with 4kb leaf SPTEs, e.g. due to dirty logging
> being active, the zap can take several hundred seconds.  Unsurprisingly,
> most guests are unhappy if a vCPU disappears for hundreds of seconds.
> 
> E.g. running a synthetic selftest that triggers a vCPU root zap with
> ~64tb of guest memory and 4kb SPTEs blocks the vCPU for 900+ seconds.
> Offloading the zap to a worker drops the block time to <100ms.
> 
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Ben Gardon <bgardon@google.com>
> Message-Id: <20220226001546.360188-23-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index e24a1bff9218..2456f880508d 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -170,13 +170,24 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
>  	 */
>  	if (!kvm_tdp_root_mark_invalid(root)) {
>  		refcount_set(&root->tdp_mmu_root_count, 1);
> -		tdp_mmu_zap_root(kvm, root, shared);
>  
>  		/*
> -		 * Give back the reference that was added back above.  We now
> +		 * If the struct kvm is alive, we might as well zap the root
> +		 * in a worker.  The worker takes ownership of the reference we
> +		 * just added to root and is flushed before the struct kvm dies.

Not a fan of the "we might as well zap the root in a worker", IMO we should require
going forward that invalidated, reachable TDP MMU roots are always zapped in a worker

> +		 */
> +		if (likely(refcount_read(&kvm->users_count))) {
> +			tdp_mmu_schedule_zap_root(kvm, root);

Regarding the need for kvm_tdp_mmu_invalidate_all_roots() to guard against
re-queueing a root for zapping, this is the point where it becomes functionally
problematic.  When "fast zap" was the only user of tdp_mmu_schedule_zap_root(),
re-queueing was benign as the work struct was guaranteed to not be currently
queued.  But this code runs outside of slots_lock, and so a root that was "put"
but hasn't finished zapping can be observed and re-queued by the "fast zap.

I think it makes sense to create a rule/invariant that an invalidated TDP MMU root
_must_ be zapped via the work queue.  Then 

I.e. do this as fixup:

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 40bf861b622a..cff4f2102a63 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1019,8 +1019,9 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
  * of invalidated roots; the list is effectively the list of work items in
  * the workqueue.
  *
- * Skip roots that are already queued for zapping, flushing the work queue will
- * ensure invalidated roots are zapped regardless of when they were queued.
+ * Skip roots that are already invalid and thus queued for zapping, flushing
+ * the work queue will ensure invalid roots are zapped regardless of when they
+ * were queued.
  *
  * Because mmu_lock is held for write, it should be impossible to observe a
  * root with zero refcount,* i.e. the list of roots cannot be stale.
@@ -1034,13 +1035,12 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)

        lockdep_assert_held_write(&kvm->mmu_lock);
        list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
-               if (root->tdp_mmu_async_data)
+               if (kvm_tdp_root_mark_invalid(root))
                        continue;

                if (WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root)))
                        continue;

-               root->role.invalid = true;
                tdp_mmu_schedule_zap_root(kvm, root);
        }
 }

> +			return;
> +		}
> +
> +		/*
> +		 * The struct kvm is being destroyed, zap synchronously and give
> +		 * back immediately the reference that was added above.  We now
>  		 * know that the root is invalid, so go ahead and free it if
>  		 * no one has taken a reference in the meanwhile.
>  		 */
> +		tdp_mmu_zap_root(kvm, root, shared);
>  		if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
>  			return;
>  	}
> -- 
> 2.31.1
> 
>

  reply	other threads:[~2022-03-03 22:08 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-03 19:38 [PATCH v4 00/30] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 01/30] KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 02/30] KVM: x86/mmu: Fix wrong/misleading comments in TDP MMU fast zap Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 03/30] KVM: x86/mmu: Formalize TDP MMU's (unintended?) deferred TLB flush logic Paolo Bonzini
2022-03-03 23:39   ` Mingwei Zhang
2022-03-03 19:38 ` [PATCH v4 04/30] KVM: x86/mmu: Document that zapping invalidated roots doesn't need to flush Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 05/30] KVM: x86/mmu: Require mmu_lock be held for write in unyielding root iter Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 06/30] KVM: x86/mmu: only perform eager page splitting on valid roots Paolo Bonzini
2022-03-03 20:03   ` Sean Christopherson
2022-03-03 19:38 ` [PATCH v4 07/30] KVM: x86/mmu: do not allow readers to acquire references to invalid roots Paolo Bonzini
2022-03-03 20:12   ` Sean Christopherson
2022-03-03 19:38 ` [PATCH v4 08/30] KVM: x86/mmu: Check for !leaf=>leaf, not PFN change, in TDP MMU SP removal Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 09/30] KVM: x86/mmu: Batch TLB flushes from TDP MMU for MMU notifier change_spte Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 10/30] KVM: x86/mmu: Drop RCU after processing each root in MMU notifier hooks Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 11/30] KVM: x86/mmu: Add helpers to read/write TDP MMU SPTEs and document RCU Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 12/30] KVM: x86/mmu: WARN if old _or_ new SPTE is REMOVED in non-atomic path Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 13/30] KVM: x86/mmu: Refactor low-level TDP MMU set SPTE helper to take raw values Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 14/30] KVM: x86/mmu: Zap only the target TDP MMU shadow page in NX recovery Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 15/30] KVM: x86/mmu: Skip remote TLB flush when zapping all of TDP MMU Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 16/30] KVM: x86/mmu: Add dedicated helper to zap TDP MMU root shadow page Paolo Bonzini
2022-03-04  0:07   ` Mingwei Zhang
2022-03-03 19:38 ` [PATCH v4 17/30] KVM: x86/mmu: Require mmu_lock be held for write to zap TDP MMU range Paolo Bonzini
2022-03-04  0:14   ` Mingwei Zhang
2022-03-03 19:38 ` [PATCH v4 18/30] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range() Paolo Bonzini
2022-03-04  1:16   ` Mingwei Zhang
2022-03-04 16:11     ` Sean Christopherson
2022-03-04 18:00       ` Mingwei Zhang
2022-03-04 18:42         ` Sean Christopherson
2022-03-11 15:09   ` Vitaly Kuznetsov
2022-03-13 18:40   ` Mingwei Zhang
2022-03-25 15:13     ` Sean Christopherson
2022-03-26 18:10       ` Mingwei Zhang
2022-03-28 15:06         ` Sean Christopherson
2022-03-03 19:38 ` [PATCH v4 19/30] KVM: x86/mmu: Do remote TLB flush before dropping RCU in TDP MMU resched Paolo Bonzini
2022-03-04  1:19   ` Mingwei Zhang
2022-03-03 19:38 ` [PATCH v4 20/30] KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow pages Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 21/30] KVM: x86/mmu: Zap invalidated roots via asynchronous worker Paolo Bonzini
2022-03-03 20:54   ` Sean Christopherson
2022-03-03 21:06     ` Sean Christopherson
2022-03-03 21:20   ` Sean Christopherson
2022-03-03 21:32     ` Sean Christopherson
2022-03-04  6:48       ` Paolo Bonzini
2022-03-04 16:02         ` Sean Christopherson
2022-03-04 18:11           ` Paolo Bonzini
2022-03-05  0:34             ` Sean Christopherson
2022-03-05 19:53               ` Paolo Bonzini
2022-03-08 21:29                 ` Sean Christopherson
2022-03-11 17:50                   ` Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 22/30] KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 23/30] KVM: x86/mmu: Zap roots in two passes to avoid inducing RCU stalls Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 24/30] KVM: x86/mmu: Zap defunct roots via asynchronous worker Paolo Bonzini
2022-03-03 22:08   ` Sean Christopherson [this message]
2022-03-03 19:38 ` [PATCH v4 25/30] KVM: x86/mmu: Check for a REMOVED leaf SPTE before making the SPTE Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 26/30] KVM: x86/mmu: WARN on any attempt to atomically update REMOVED SPTE Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 27/30] KVM: selftests: Move raw KVM_SET_USER_MEMORY_REGION helper to utils Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 28/30] KVM: selftests: Split out helper to allocate guest mem via memfd Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 29/30] KVM: selftests: Define cpu_relax() helpers for s390 and x86 Paolo Bonzini
2022-03-03 19:38 ` [PATCH v4 30/30] KVM: selftests: Add test to populate a VM with the max possible guest mem Paolo Bonzini
2022-03-08 14:47   ` Paolo Bonzini
2022-03-08 15:36     ` Christian Borntraeger
2022-03-08 21:09     ` Sean Christopherson
2022-03-08 17:25 ` [PATCH v4 00/30] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YiE8SH4Sn1zzRZwe@google.com \
    --to=seanjc@google.com \
    --cc=bgardon@google.com \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.