Re: [PATCH 6/6] KVM: MMU: fast zap all shadow pages

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Gleb Natapov <gleb@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>, KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH 6/6] KVM: MMU: fast zap all shadow pages
Date: Mon, 18 Mar 2013 17:46:01 -0300	[thread overview]
Message-ID: <20130318204601.GA16208@amt.cnet> (raw)
In-Reply-To: <514007A0.1040400@linux.vnet.ibm.com>

On Wed, Mar 13, 2013 at 12:59:12PM +0800, Xiao Guangrong wrote:
> The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
> walk and zap all shadow pages one by one, also it need to zap all guest
> page's rmap and all shadow page's parent spte list. Particularly, things
> become worse if guest uses more memory or vcpus. It is not good for
> scalability.
> 
> Since all shadow page will be zapped, we can directly zap the mmu-cache
> and rmap so that vcpu will fault on the new mmu-cache, after that, we can
> directly free the memory used by old mmu-cache.
> 
> The root shadow page is little especial since they are currently used by
> vcpus, we can not directly free them. So, we zap the root shadow pages and
> re-add them into the new mmu-cache.
> 
> After this patch, kvm_mmu_zap_all can be faster 113% than before
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 files changed, 56 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index e326099..536d9ce 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -4186,18 +4186,68 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
> 
>  void kvm_mmu_zap_all(struct kvm *kvm)
>  {
> -	struct kvm_mmu_page *sp, *node;
> +	LIST_HEAD(root_mmu_pages);
>  	LIST_HEAD(invalid_list);
> +	struct list_head pte_list_descs;
> +	struct kvm_mmu_cache *cache = &kvm->arch.mmu_cache;
> +	struct kvm_mmu_page *sp, *node;
> +	struct pte_list_desc *desc, *ndesc;
> +	int root_sp = 0;
> 
>  	spin_lock(&kvm->mmu_lock);
> +
>  restart:
> -	list_for_each_entry_safe(sp, node,
> -	      &kvm->arch.mmu_cache.active_mmu_pages, link)
> -		if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
> -			goto restart;
> +	/*
> +	 * The root shadow pages are being used on vcpus that can not
> +	 * directly removed, we filter them out and re-add them to the
> +	 * new mmu cache.
> +	 */
> +	list_for_each_entry_safe(sp, node, &cache->active_mmu_pages, link)
> +		if (sp->root_count) {
> +			int ret;
> +
> +			root_sp++;
> +			ret = kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
> +			list_move(&sp->link, &root_mmu_pages);
> +			if (ret)
> +				goto restart;
> +		}
> +
> +	list_splice(&cache->active_mmu_pages, &invalid_list);
> +	list_replace(&cache->pte_list_descs, &pte_list_descs);
> +
> +	/*
> +	 * Reset the mmu cache so that later vcpu will fault on the new
> +	 * mmu cache.
> +	 */
> +	memset(cache, 0, sizeof(*cache));
> +	kvm_mmu_init(kvm);

Xiao,

I suppose zeroing of kvm_mmu_cache can be avoided, if the links are
removed at prepare_zap_page. So perhaps

- spin_lock(mmu_lock)
- for each page
	- zero sp->spt[], remove page from linked lists
- flush remote TLB (batched)
- spin_unlock(mmu_lock)
- free data (which is safe because freeing has its own serialization)
- spin_lock(mmu_lock)
- account for the pages freed
- spin_unlock(mmu_lock)

(or if you think of some other way to not have the mmu_cache zeroing step).

Note the account for pages freed step after pages are actually
freed: as discussed with Takuya, having pages freed and freed page
accounting out of sync across mmu_lock is potentially problematic:
kvm->arch.n_used_mmu_pages and friends do not reflect reality which can
cause problems for SLAB freeing and page allocation throttling.

next prev parent reply	other threads:[~2013-03-18 20:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-13  4:55 [PATCH 0/6] KVM: MMU: fast zap all shadow pages Xiao Guangrong
2013-03-13  4:55 ` [PATCH 1/6] KVM: MMU: move mmu related members into a separate struct Xiao Guangrong
2013-03-13  4:56 ` [PATCH 2/6] KVM: MMU: introduce mmu_cache->pte_list_descs Xiao Guangrong
2013-03-13  4:57 ` [PATCH 3/6] KVM: x86: introduce memslot_set_lpage_disallowed Xiao Guangrong
2013-03-13  4:57 ` [PATCH 4/6] KVM: x86: introduce kvm_clear_all_gfn_page_info Xiao Guangrong
2013-03-13  4:58 ` [PATCH 5/6] KVM: MMU: delete shadow page from hash list in kvm_mmu_prepare_zap_page Xiao Guangrong
2013-03-13  4:59 ` [PATCH 6/6] KVM: MMU: fast zap all shadow pages Xiao Guangrong
2013-03-14  1:07   ` Marcelo Tosatti
2013-03-14  1:35     ` Marcelo Tosatti
2013-03-14  4:42       ` Xiao Guangrong
2013-03-18 20:46   ` Marcelo Tosatti [this message]
2013-03-19  3:06     ` Xiao Guangrong
2013-03-19 14:40       ` Marcelo Tosatti
2013-03-19 15:37         ` Xiao Guangrong
2013-03-19 22:37           ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130318204601.GA16208@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).