Re: [PATCH 6/6] KVM: MMU: fast zap all shadow pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>, KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH 6/6] KVM: MMU: fast zap all shadow pages
Date: Tue, 19 Mar 2013 11:06:35 +0800	[thread overview]
Message-ID: <5147D63B.4000400@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130318204601.GA16208@amt.cnet>

On 03/19/2013 04:46 AM, Marcelo Tosatti wrote:
> On Wed, Mar 13, 2013 at 12:59:12PM +0800, Xiao Guangrong wrote:
>> The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
>> walk and zap all shadow pages one by one, also it need to zap all guest
>> page's rmap and all shadow page's parent spte list. Particularly, things
>> become worse if guest uses more memory or vcpus. It is not good for
>> scalability.
>>
>> Since all shadow page will be zapped, we can directly zap the mmu-cache
>> and rmap so that vcpu will fault on the new mmu-cache, after that, we can
>> directly free the memory used by old mmu-cache.
>>
>> The root shadow page is little especial since they are currently used by
>> vcpus, we can not directly free them. So, we zap the root shadow pages and
>> re-add them into the new mmu-cache.
>>
>> After this patch, kvm_mmu_zap_all can be faster 113% than before
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/mmu.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++-----
>>  1 files changed, 56 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index e326099..536d9ce 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -4186,18 +4186,68 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
>>
>>  void kvm_mmu_zap_all(struct kvm *kvm)
>>  {
>> -	struct kvm_mmu_page *sp, *node;
>> +	LIST_HEAD(root_mmu_pages);
>>  	LIST_HEAD(invalid_list);
>> +	struct list_head pte_list_descs;
>> +	struct kvm_mmu_cache *cache = &kvm->arch.mmu_cache;
>> +	struct kvm_mmu_page *sp, *node;
>> +	struct pte_list_desc *desc, *ndesc;
>> +	int root_sp = 0;
>>
>>  	spin_lock(&kvm->mmu_lock);
>> +
>>  restart:
>> -	list_for_each_entry_safe(sp, node,
>> -	      &kvm->arch.mmu_cache.active_mmu_pages, link)
>> -		if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
>> -			goto restart;
>> +	/*
>> +	 * The root shadow pages are being used on vcpus that can not
>> +	 * directly removed, we filter them out and re-add them to the
>> +	 * new mmu cache.
>> +	 */
>> +	list_for_each_entry_safe(sp, node, &cache->active_mmu_pages, link)
>> +		if (sp->root_count) {
>> +			int ret;
>> +
>> +			root_sp++;
>> +			ret = kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
>> +			list_move(&sp->link, &root_mmu_pages);
>> +			if (ret)
>> +				goto restart;
>> +		}
>> +
>> +	list_splice(&cache->active_mmu_pages, &invalid_list);
>> +	list_replace(&cache->pte_list_descs, &pte_list_descs);
>> +
>> +	/*
>> +	 * Reset the mmu cache so that later vcpu will fault on the new
>> +	 * mmu cache.
>> +	 */
>> +	memset(cache, 0, sizeof(*cache));
>> +	kvm_mmu_init(kvm);
> 
> Xiao,
> 
> I suppose zeroing of kvm_mmu_cache can be avoided, if the links are
> removed at prepare_zap_page. So perhaps

The purpose of zeroing of kvm_mmu_cache is resetting the hashtable and
some count numbers.
[.n_request_mmu_pages and .n_max_mmu_pages should not be changed, i will
fix this].

> 
> - spin_lock(mmu_lock)
> - for each page
> 	- zero sp->spt[], remove page from linked lists

sizeof(mmu_cache) is:
(1 << 10) * sizeof (hlist_head) + 4 * sizeof(unsigned int) = 2^13 + 16
and it is constant. In your way, for every sp, we need to zap:
512 entries + a hash-node = 2^12 + 8
especially the workload depends on the size of guest memory.
Why you think this way is better?

> - flush remote TLB (batched)
> - spin_unlock(mmu_lock)
> - free data (which is safe because freeing has its own serialization)

We should free the root sp in mmu-lock like my patch.

> - spin_lock(mmu_lock)
> - account for the pages freed
> - spin_unlock(mmu_lock)

The count numbers are still inconsistent if other thread hold mmu-lock between
zero shadow page and recount.

Marcelo, i really confused what is the benefit in this way but i might
completely misunderstand it.

next prev parent reply	other threads:[~2013-03-19  3:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-13  4:55 [PATCH 0/6] KVM: MMU: fast zap all shadow pages Xiao Guangrong
2013-03-13  4:55 ` [PATCH 1/6] KVM: MMU: move mmu related members into a separate struct Xiao Guangrong
2013-03-13  4:56 ` [PATCH 2/6] KVM: MMU: introduce mmu_cache->pte_list_descs Xiao Guangrong
2013-03-13  4:57 ` [PATCH 3/6] KVM: x86: introduce memslot_set_lpage_disallowed Xiao Guangrong
2013-03-13  4:57 ` [PATCH 4/6] KVM: x86: introduce kvm_clear_all_gfn_page_info Xiao Guangrong
2013-03-13  4:58 ` [PATCH 5/6] KVM: MMU: delete shadow page from hash list in kvm_mmu_prepare_zap_page Xiao Guangrong
2013-03-13  4:59 ` [PATCH 6/6] KVM: MMU: fast zap all shadow pages Xiao Guangrong
2013-03-14  1:07   ` Marcelo Tosatti
2013-03-14  1:35     ` Marcelo Tosatti
2013-03-14  4:42       ` Xiao Guangrong
2013-03-18 20:46   ` Marcelo Tosatti
2013-03-19  3:06     ` Xiao Guangrong [this message]
2013-03-19 14:40       ` Marcelo Tosatti
2013-03-19 15:37         ` Xiao Guangrong
2013-03-19 22:37           ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5147D63B.4000400@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.