All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	David Hildenbrand <david@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Matlack <dmatlack@google.com>,
	Ben Gardon <bgardon@google.com>,
	Mingwei Zhang <mizhang@google.com>
Subject: Re: [PATCH v3 20/28] KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root
Date: Tue, 1 Mar 2022 19:43:19 +0000	[thread overview]
Message-ID: <Yh53V23gSJ6jphnS@google.com> (raw)
In-Reply-To: <28276890-c90c-e9a9-3cab-15264617ef5a@redhat.com>

On Tue, Mar 01, 2022, Paolo Bonzini wrote:
> On 2/26/22 01:15, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 3031b42c27a6..b838cfa984ad 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -91,21 +91,66 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
> >   	WARN_ON(!root->tdp_mmu_page);
> > -	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
> > -	list_del_rcu(&root->link);
> > -	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> > +	/*
> > +	 * Ensure root->role.invalid is read after the refcount reaches zero to
> > +	 * avoid zapping the root multiple times, e.g. if a different task
> > +	 * acquires a reference (after the root was marked invalid) and puts
> > +	 * the last reference, all while holding mmu_lock for read.  Pairs
> > +	 * with the smp_mb__before_atomic() below.
> > +	 */
> > +	smp_mb__after_atomic();
> > +
> > +	/*
> > +	 * Free the root if it's already invalid.  Invalid roots must be zapped
> > +	 * before their last reference is put, i.e. there's no work to be done,
> > +	 * and all roots must be invalidated (see below) before they're freed.
> > +	 * Re-zapping invalid roots would put KVM into an infinite loop (again,
> > +	 * see below).
> > +	 */
> > +	if (root->role.invalid) {
> > +		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
> > +		list_del_rcu(&root->link);
> > +		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> > +
> > +		call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * Invalidate the root to prevent it from being reused by a vCPU, and
> > +	 * so that KVM doesn't re-zap the root when its last reference is put
> > +	 * again (see above).
> > +	 */
> > +	root->role.invalid = true;
> > +
> > +	/*
> > +	 * Ensure role.invalid is visible if a concurrent reader acquires a
> > +	 * reference after the root's refcount is reset.  Pairs with the
> > +	 * smp_mb__after_atomic() above.
> > +	 */
> > +	smp_mb__before_atomic();
> 
> I have reviewed the series and I only have very minor comments... but this
> part is beyond me.  The lavish comments don't explain what is an
> optimization and what is a requirement, 

Ah, they're all requirements, but the invalid part also optimizes the case where
a root was marked invalid before its last reference was was ever put.

What I really meant to refer to by "zapping" was the entire sequence of restoring
the refcount to '1', zapping the root, and recursively re-dropping that ref.  Avoiding
that "zap" is a requirement, otherwise KVM would get stuck in an infinite loop.

> and after spending quite some time I wonder if all this should just be
> 
>         if (refcount_dec_not_one(&root->tdp_mmu_root_count))
>                 return;
> 
> 	if (!xchg(&root->role.invalid, true) {

The refcount being '1' means there's another task currently using root, marking
the root invalid will mean checks on the root's validity are non-deterministic
for the other task.  

> 	 	tdp_mmu_zap_root(kvm, root, shared);
> 
> 		/*
> 		 * Do not assume the refcount is still 1: because
> 		 * tdp_mmu_zap_root can yield, a different task
> 		 * might have grabbed a reference to this root.
> 		 *
> 	        if (refcount_dec_not_one(&root->tdp_mmu_root_count))

This is wrong, _this_ task can't drop a reference taken by the other task.

>         	        return;
> 	}
> 
> 	/*
> 	 * The root is invalid, and its reference count has reached
> 	 * zero.  It must have been zapped either in the "if" above or
> 	 * by someone else, and we're definitely the last thread to see
> 	 * it apart from RCU-protected page table walks.
> 	 */
> 	refcount_set(&root->tdp_mmu_root_count, 0);

Not sure what you intended here, KVM should never force a refcount to '0'.

> 	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
> 	list_del_rcu(&root->link);
> 	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> 
> 	call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
> 
> (Yay for xchg's implicit memory barriers)

xchg() is a very good idea.  The smp_mb_*() stuff was carried over from the previous
version where this sequence set another flag in addition to role.invalid.

Is this less funky (untested)?

	/*
	 * Invalidate the root to prevent it from being reused by a vCPU while
	 * the root is being zapped, i.e. to allow yielding while zapping the
	 * root (see below).
	 *
	 * Free the root if it's already invalid.  Invalid roots must be zapped
	 * before their last reference is put, i.e. there's no work to be done,
	 * and all roots must be invalidated before they're freed (this code).
	 * Re-zapping invalid roots would put KVM into an infinite loop.
	 *
	 * Note, xchg() provides an implicit barrier to ensure role.invalid is
	 * visible if a concurrent reader acquires a reference after the root's
	 * refcount is reset.
	 */
	if (xchg(root->role.invalid, true))
		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
		list_del_rcu(&root->link);
		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);

		call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
		return;
	}



  reply	other threads:[~2022-03-01 19:43 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-26  0:15 [PATCH v3 00/28] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 01/28] KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots Sean Christopherson
2022-03-02 19:08   ` Mingwei Zhang
2022-03-02 19:51     ` Sean Christopherson
2022-03-03  0:57       ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 02/28] KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU Sean Christopherson
2022-03-02 19:50   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 03/28] KVM: x86/mmu: Fix wrong/misleading comments in TDP MMU fast zap Sean Christopherson
2022-02-28 23:15   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 04/28] KVM: x86/mmu: Formalize TDP MMU's (unintended?) deferred TLB flush logic Sean Christopherson
2022-03-02 23:59   ` Mingwei Zhang
2022-03-03  0:12     ` Sean Christopherson
2022-03-03  1:20       ` Mingwei Zhang
2022-03-03  1:41         ` Sean Christopherson
2022-03-03  4:50           ` Mingwei Zhang
2022-03-03 16:45             ` Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 05/28] KVM: x86/mmu: Document that zapping invalidated roots doesn't need to flush Sean Christopherson
2022-02-28 23:17   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 06/28] KVM: x86/mmu: Require mmu_lock be held for write in unyielding root iter Sean Christopherson
2022-02-28 23:26   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 07/28] KVM: x86/mmu: Check for !leaf=>leaf, not PFN change, in TDP MMU SP removal Sean Christopherson
2022-03-01  0:11   ` Ben Gardon
2022-03-03 18:02   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 08/28] KVM: x86/mmu: Batch TLB flushes from TDP MMU for MMU notifier change_spte Sean Christopherson
2022-03-03 18:08   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 09/28] KVM: x86/mmu: Drop RCU after processing each root in MMU notifier hooks Sean Christopherson
2022-03-03 18:24   ` Mingwei Zhang
2022-03-03 18:32   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 10/28] KVM: x86/mmu: Add helpers to read/write TDP MMU SPTEs and document RCU Sean Christopherson
2022-03-03 18:34   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 11/28] KVM: x86/mmu: WARN if old _or_ new SPTE is REMOVED in non-atomic path Sean Christopherson
2022-03-03 18:37   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 12/28] KVM: x86/mmu: Refactor low-level TDP MMU set SPTE helper to take raw vals Sean Christopherson
2022-03-03 18:47   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 13/28] KVM: x86/mmu: Zap only the target TDP MMU shadow page in NX recovery Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 14/28] KVM: x86/mmu: Skip remote TLB flush when zapping all of TDP MMU Sean Christopherson
2022-03-01  0:19   ` Ben Gardon
2022-03-03 18:50   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 15/28] KVM: x86/mmu: Add dedicated helper to zap TDP MMU root shadow page Sean Christopherson
2022-03-01  0:32   ` Ben Gardon
2022-03-03 21:19   ` Mingwei Zhang
2022-03-03 21:24     ` Mingwei Zhang
2022-03-03 23:06       ` Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 16/28] KVM: x86/mmu: Require mmu_lock be held for write to zap TDP MMU range Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 17/28] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range() Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 18/28] KVM: x86/mmu: Do remote TLB flush before dropping RCU in TDP MMU resched Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 19/28] KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow pages Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 20/28] KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root Sean Christopherson
2022-03-01 18:21   ` Paolo Bonzini
2022-03-01 19:43     ` Sean Christopherson [this message]
2022-03-01 20:12       ` Paolo Bonzini
2022-03-02  2:13         ` Sean Christopherson
2022-03-02 14:54           ` Paolo Bonzini
2022-03-02 17:43             ` Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 21/28] KVM: x86/mmu: Zap roots in two passes to avoid inducing RCU stalls Sean Christopherson
2022-03-01  0:43   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 22/28] KVM: x86/mmu: Zap defunct roots via asynchronous worker Sean Christopherson
2022-03-01 17:57   ` Ben Gardon
2022-03-02 17:25   ` Paolo Bonzini
2022-03-02 17:35     ` Sean Christopherson
2022-03-02 18:33       ` David Matlack
2022-03-02 18:36         ` Paolo Bonzini
2022-03-02 18:01     ` Sean Christopherson
2022-03-02 18:20       ` Paolo Bonzini
2022-03-02 19:33         ` Sean Christopherson
2022-03-02 20:14           ` Paolo Bonzini
2022-03-02 20:47             ` Sean Christopherson
2022-03-02 21:22               ` Paolo Bonzini
2022-03-02 22:25                 ` Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 23/28] KVM: x86/mmu: Check for a REMOVED leaf SPTE before making the SPTE Sean Christopherson
2022-03-01 18:06   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 24/28] KVM: x86/mmu: WARN on any attempt to atomically update REMOVED SPTE Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 25/28] KVM: selftests: Move raw KVM_SET_USER_MEMORY_REGION helper to utils Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 26/28] KVM: selftests: Split out helper to allocate guest mem via memfd Sean Christopherson
2022-02-28 23:36   ` David Woodhouse
2022-03-02 18:36     ` Paolo Bonzini
2022-03-02 21:55       ` David Woodhouse
2022-02-26  0:15 ` [PATCH v3 27/28] KVM: selftests: Define cpu_relax() helpers for s390 and x86 Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 28/28] KVM: selftests: Add test to populate a VM with the max possible guest mem Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yh53V23gSJ6jphnS@google.com \
    --to=seanjc@google.com \
    --cc=bgardon@google.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.