All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Ben Gardon <bgardon@google.com>
Subject: Re: [PATCH] KVM: x86/mmu: Zap only TDP MMU leafs in zap range and mmu_notifier unmap
Date: Mon, 28 Mar 2022 10:40:17 +0200	[thread overview]
Message-ID: <87lewuo4ge.fsf@redhat.com> (raw)
In-Reply-To: <20220325230348.2587437-1-seanjc@google.com>

Sean Christopherson <seanjc@google.com> writes:

> Re-introduce zapping only leaf SPTEs in kvm_zap_gfn_range() and
> kvm_tdp_mmu_unmap_gfn_range(), this time without losing a pending TLB
> flush when processing multiple roots (including nested TDP shadow roots).
> Dropping the TLB flush resulted in random crashes when running Hyper-V
> Server 2019 in a guest with KSM enabled in the host (or any source of
> mmu_notifier invalidations, KSM is just the easiest to force).
>
> This effectively revert commits 873dd122172f8cce329113cfb0dfe3d2344d80c0
> and fcb93eb6d09dd302cbef22bd95a5858af75e4156, and thus restores commit
> cf3e26427c08ad9015956293ab389004ac6a338e, plus this delta on top:
>
> bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
>         struct kvm_mmu_page *root;
>
>         for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
> -               flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, false);
> +               flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);
>
>         return flush;
>  }
>

I confirm this fixes the issue I was seeing, thanks!

Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>

> Cc: Ben Gardon <bgardon@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c     |  4 +--
>  arch/x86/kvm/mmu/tdp_mmu.c | 57 +++++++++++---------------------------
>  arch/x86/kvm/mmu/tdp_mmu.h |  8 +-----
>  3 files changed, 19 insertions(+), 50 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 1361eb4599b4..a7cb877f3784 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5842,8 +5842,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>  
>  	if (is_tdp_mmu_enabled(kvm)) {
>  		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
> -			flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start,
> -							  gfn_end, flush);
> +			flush = kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start,
> +						      gfn_end, true, flush);
>  	}
>  
>  	if (flush)
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index b3b6426725d4..c4333efb9e9c 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -906,10 +906,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>  }
>  
>  /*
> - * Tears down the mappings for the range of gfns, [start, end), and frees the
> - * non-root pages mapping GFNs strictly within that range. Returns true if
> - * SPTEs have been cleared and a TLB flush is needed before releasing the
> - * MMU lock.
> + * Zap leafs SPTEs for the range of gfns, [start, end). Returns true if SPTEs
> + * have been cleared and a TLB flush is needed before releasing the MMU lock.
>   *
>   * If can_yield is true, will release the MMU lock and reschedule if the
>   * scheduler needs the CPU or there is contention on the MMU lock. If this
> @@ -917,44 +915,25 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>   * the caller must ensure it does not supply too large a GFN range, or the
>   * operation can cause a soft lockup.
>   */
> -static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
> -			  gfn_t start, gfn_t end, bool can_yield, bool flush)
> +static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
> +			      gfn_t start, gfn_t end, bool can_yield, bool flush)
>  {
> -	bool zap_all = (start == 0 && end >= tdp_mmu_max_gfn_host());
>  	struct tdp_iter iter;
>  
> -	/*
> -	 * No need to try to step down in the iterator when zapping all SPTEs,
> -	 * zapping the top-level non-leaf SPTEs will recurse on their children.
> -	 * Do not do it above the 1GB level, to avoid making tdp_mmu_set_spte's
> -	 * recursion too expensive and allow yielding.
> -	 */
> -	int min_level = zap_all ? PG_LEVEL_1G : PG_LEVEL_4K;
> -
>  	end = min(end, tdp_mmu_max_gfn_host());
>  
>  	lockdep_assert_held_write(&kvm->mmu_lock);
>  
>  	rcu_read_lock();
>  
> -	for_each_tdp_pte_min_level(iter, root, min_level, start, end) {
> +	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
>  		if (can_yield &&
>  		    tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) {
>  			flush = false;
>  			continue;
>  		}
>  
> -		if (!is_shadow_present_pte(iter.old_spte))
> -			continue;
> -
> -		/*
> -		 * If this is a non-last-level SPTE that covers a larger range
> -		 * than should be zapped, continue, and zap the mappings at a
> -		 * lower level, except when zapping all SPTEs.
> -		 */
> -		if (!zap_all &&
> -		    (iter.gfn < start ||
> -		     iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end) &&
> +		if (!is_shadow_present_pte(iter.old_spte) ||
>  		    !is_last_spte(iter.old_spte, iter.level))
>  			continue;
>  
> @@ -962,17 +941,13 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>  		flush = true;
>  	}
>  
> +	rcu_read_unlock();
> +
>  	/*
> -	 * Need to flush before releasing RCU.  TODO: do it only if intermediate
> -	 * page tables were zapped; there is no need to flush under RCU protection
> -	 * if no 'struct kvm_mmu_page' is freed.
> +	 * Because this flow zaps _only_ leaf SPTEs, the caller doesn't need
> +	 * to provide RCU protection as no 'struct kvm_mmu_page' will be freed.
>  	 */
> -	if (flush)
> -		kvm_flush_remote_tlbs_with_address(kvm, start, end - start);
> -
> -	rcu_read_unlock();
> -
> -	return false;
> +	return flush;
>  }
>  
>  /*
> @@ -981,13 +956,13 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>   * SPTEs have been cleared and a TLB flush is needed before releasing the
>   * MMU lock.
>   */
> -bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
> -				 gfn_t end, bool can_yield, bool flush)
> +bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
> +			   bool can_yield, bool flush)
>  {
>  	struct kvm_mmu_page *root;
>  
>  	for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
> -		flush = zap_gfn_range(kvm, root, start, end, can_yield, flush);
> +		flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);
>  
>  	return flush;
>  }
> @@ -1235,8 +1210,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
>  				 bool flush)
>  {
> -	return __kvm_tdp_mmu_zap_gfn_range(kvm, range->slot->as_id, range->start,
> -					   range->end, range->may_block, flush);
> +	return kvm_tdp_mmu_zap_leafs(kvm, range->slot->as_id, range->start,
> +				     range->end, range->may_block, flush);
>  }
>  
>  typedef bool (*tdp_handler_t)(struct kvm *kvm, struct tdp_iter *iter,
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 5e5ef2576c81..54bc8118c40a 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -15,14 +15,8 @@ __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
>  void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
>  			  bool shared);
>  
> -bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
> +bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start,
>  				 gfn_t end, bool can_yield, bool flush);
> -static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id,
> -					     gfn_t start, gfn_t end, bool flush)
> -{
> -	return __kvm_tdp_mmu_zap_gfn_range(kvm, as_id, start, end, true, flush);
> -}
> -
>  bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp);
>  void kvm_tdp_mmu_zap_all(struct kvm *kvm);
>  void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
>
> base-commit: 19164ad08bf668bca4f4bfbaacaa0a47c1b737a6

-- 
Vitaly


  reply	other threads:[~2022-03-28  8:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-25 23:03 [PATCH] KVM: x86/mmu: Zap only TDP MMU leafs in zap range and mmu_notifier unmap Sean Christopherson
2022-03-28  8:40 ` Vitaly Kuznetsov [this message]
2022-03-28 14:52   ` Sean Christopherson
2022-03-29 17:03 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lewuo4ge.fsf@redhat.com \
    --to=vkuznets@redhat.com \
    --cc=bgardon@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.