Re: [PATCH v3 12/15] KVM: MMU: fast invalid all shadow pages

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com, avi.kivity@gmail.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v3 12/15] KVM: MMU: fast invalid all shadow pages
Date: Thu, 18 Apr 2013 23:20:57 +0800	[thread overview]
Message-ID: <51700F59.6080707@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130418132924.GB29705@amt.cnet>

On 04/18/2013 09:29 PM, Marcelo Tosatti wrote:
> On Thu, Apr 18, 2013 at 10:03:06AM -0300, Marcelo Tosatti wrote:
>> On Thu, Apr 18, 2013 at 12:00:16PM +0800, Xiao Guangrong wrote:
>>>>
>>>> What is the justification for this? 
>>>
>>> We want the rmap of being deleted memslot is removed-only that is
>>> needed for unmapping rmap out of mmu-lock.
>>>
>>> ======
>>> 1) do not corrupt the rmap
>>> 2) keep pte-list-descs available
>>> 3) keep shadow page available
>>>
>>> Resolve 1):
>>> we make the invalid rmap be remove-only that means we only delete and
>>> clear spte from the rmap, no new sptes can be added to it.
>>> This is reasonable since kvm can not do address translation on invalid rmap
>>> (gfn_to_pfn is failed on invalid memslot) and all sptes on invalid rmap can
>>> not be reused (they belong to invalid shadow page).
>>> ======
>>>
>>> clear_flush_young / test_young / change_pte of mmu-notify can rewrite
>>> rmap with the present-spte (P bit is set), we should umap rmap in
>>> these handlers.
>>>
>>>>
>>>>> +
>>>>> +	/*
>>>>> +	 * To ensure that all vcpus and mmu-notify are not clearing
>>>>> +	 * spte and rmap entry.
>>>>> +	 */
>>>>> +	synchronize_srcu_expedited(&kvm->srcu);
>>>>> +}
>>>>> +
>>>>>  #ifdef MMU_DEBUG
>>>>>  static int is_empty_shadow_page(u64 *spt)
>>>>>  {
>>>>> @@ -2219,6 +2283,11 @@ static void clear_sp_write_flooding_count(u64 *spte)
>>>>>  	__clear_sp_write_flooding_count(sp);
>>>>>  }
>>>>>  
>>>>> +static bool is_valid_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>>>>> +{
>>>>> +	return likely(sp->mmu_valid_gen == kvm->arch.mmu_valid_gen);
>>>>> +}
>>>>> +
>>>>>  static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>>>>>  					     gfn_t gfn,
>>>>>  					     gva_t gaddr,
>>>>> @@ -2245,6 +2314,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>>>>>  		role.quadrant = quadrant;
>>>>>  	}
>>>>>  	for_each_gfn_sp(vcpu->kvm, sp, gfn) {
>>>>> +		if (!is_valid_sp(vcpu->kvm, sp))
>>>>> +			continue;
>>>>> +
>>>>>  		if (!need_sync && sp->unsync)
>>>>>  			need_sync = true;
>>>>>  
>>>>> @@ -2281,6 +2353,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>>>>>  
>>>>>  		account_shadowed(vcpu->kvm, gfn);
>>>>>  	}
>>>>> +	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
>>>>>  	init_shadow_page_table(sp);
>>>>>  	trace_kvm_mmu_get_page(sp, true);
>>>>>  	return sp;
>>>>> @@ -2451,8 +2524,12 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
>>>>>  	ret = mmu_zap_unsync_children(kvm, sp, invalid_list);
>>>>>  	kvm_mmu_page_unlink_children(kvm, sp);
>>>>
>>>> The rmaps[] arrays linked to !is_valid_sp() shadow pages should not be
>>>> accessed (as they have been freed already).
>>>>
>>>> I suppose the is_valid_sp() conditional below should be moved earlier,
>>>> before kvm_mmu_unlink_parents or any other rmap access.
>>>>
>>>> This is fine: the !is_valid_sp() shadow pages are only reachable
>>>> by SLAB and the hypervisor itself.
>>>
>>> Unfortunately we can not do this. :(
>>>
>>> The sptes in shadow pape can linked to many slots, if the spte is linked
>>> to the rmap of being deleted memslot, it is ok, otherwise, the rmap of
>>> still used memslot is miss updated.
>>>
>>> For example, slot 0 is being deleted, sp->spte[0] is linked on slot[0].rmap,
>>> sp->spte[1] is linked on slot[1].rmap. If we do not access rmap of this 'sp',
>>> the already-freed spte[1] is still linked on slot[1].rmap.
>>>
>>> We can let kvm update the rmap for sp->spte[1] and do not unlink sp->spte[0].
>>> This is also not allowed since mmu-notify can access the invalid rmap before
>>> the memslot is destroyed, then mmu-notify will get already-freed spte on
>>> the rmap or page Access/Dirty is miss tracked (if let mmu-notify do not access
>>> the invalid rmap).
>>
>> Why not release all rmaps?
>>
>> Subject: [PATCH v2 3/7] KVM: x86: introduce kvm_clear_all_gfn_page_info
>>
>> This function is used to reset the rmaps and page info of all guest page
>> which will be used in later patch
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> 
> Which you have later in patchset.

The patch you mentioned is old (v2), now it only resets lpage-info excluding
rmap:

======
[PATCH v3 11/15] KVM: MMU: introduce kvm_clear_all_lpage_info

This function is used to reset the large page info of all guest page
which will be used in later patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
======

We can not release all rmaps. If we do this, ->invalidate_page and
->invalidate_range_start can not find any spte using the host page,
that means, Accessed/Dirty for host page is missing tracked.
(missing call kvm_set_pfn_accessed and kvm_set_pfn_dirty properly.)

Furthermore, when we drop a invalid-gen spte, we will call
kvm_set_pfn_dirty/kvm_set_pfn_dirty for a already-freed host page since
mmu-notify can not find the spte by rmap.
(we can skip drop-spte for the invalid-gen sp, but A/D for host page
can be missed)

That is why i introduced unmap_invalid_rmap out of mmu-lock.

> 
> So, what is the justification for the zap root + generation number increase 
> to work on a per memslot basis, given that
> 
>         /*
>          * If memory slot is created, or moved, we need to clear all
>          * mmio sptes.
>          */
>         if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
>                 kvm_mmu_zap_mmio_sptes(kvm);
>                 kvm_reload_remote_mmus(kvm);
>         }
> 
> Is going to be dealt with generation number on mmio spte idea?

Yes. Actually, this patchset (v3) is based on other two patchset:

======
This patchset is based on my previous two patchset:
[PATCH 0/2] KVM: x86: avoid potential soft lockup and unneeded mmu reload
(https://lkml.org/lkml/2013/4/1/2)

[PATCH v2 0/6] KVM: MMU: fast invalid all mmio sptes
(https://lkml.org/lkml/2013/4/1/134)
======

We did that it [PATCH v2 0/6] KVM: MMU: fast invalid all mmio sptes.

> 
> Note at the moment all shadows pages are zapped on deletion / move,
> and there is no performance complaint for those cases.

Yes, zap-mmio is only needed for MEMSLOT_CREATE.

> 
> In fact, for what case is generation number on mmio spte optimizes for?
> The cases are where slots are deleted/moved/created on a live guest
> are:
> 
> - Legacy VGA mode operation where VGA slots are created/deleted. Zapping
> all shadow not a performance issue in that case.
> - Device hotplug (not performance critical).
> - Remapping of PCI memory regions (not a performance issue).
> - Memory hotplug (not a performance issue).
> 
> These are all rare events in which there is no particular concern about
> rebuilding shadow pages to resume cruise speed operation.
> 
> So from this POV (please correct if not accurate) avoiding problems
> with huge number of shadow pages is all thats being asked for.
> 
> Which is handled nicely by zap roots + sp gen number increase.

So, we can use "zap roots + sp gen number" instead of current zap-mmio-sp?
If yes, i totally agree with you.

next prev parent reply	other threads:[~2013-04-18 15:20 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-16  6:32 [PATCH v3 00/15] KVM: MMU: fast zap all shadow pages Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 01/15] KVM: x86: clean up and optimize for kvm_arch_free_memslot Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 02/15] KVM: fold kvm_arch_create_memslot into kvm_arch_prepare_memory_region Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 03/15] KVM: x86: do not reuse rmap when memslot is moved Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 04/15] KVM: MMU: abstract memslot rmap related operations Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 05/15] KVM: MMU: allow per-rmap operations Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 06/15] KVM: MMU: allow concurrently clearing spte on remove-only pte-list Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 07/15] KVM: MMU: introduce invalid rmap handlers Xiao Guangrong
2013-04-17 23:38   ` Marcelo Tosatti
2013-04-18  3:15     ` Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 08/15] KVM: MMU: allow unmap invalid rmap out of mmu-lock Xiao Guangrong
2013-04-18 11:00   ` Gleb Natapov
2013-04-18 11:22     ` Xiao Guangrong
2013-04-18 11:38       ` Gleb Natapov
2013-04-18 12:10         ` Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 09/15] KVM: MMU: introduce free_meslot_rmap_desc_nolock Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 10/15] KVM: x86: introduce memslot_set_lpage_disallowed Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 11/15] KVM: MMU: introduce kvm_clear_all_lpage_info Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 12/15] KVM: MMU: fast invalid all shadow pages Xiao Guangrong
2013-04-18  0:05   ` Marcelo Tosatti
2013-04-18  4:00     ` Xiao Guangrong
2013-04-18 13:03       ` Marcelo Tosatti
2013-04-18 13:29         ` Marcelo Tosatti
2013-04-18 15:20           ` Xiao Guangrong [this message]
2013-04-16  6:32 ` [PATCH v3 13/15] KVM: x86: use the fast way to invalid all pages Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 14/15] KVM: move srcu_read_lock/srcu_read_unlock to arch-specified code Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 15/15] KVM: MMU: replace kvm_zap_all with kvm_mmu_invalid_all_pages Xiao Guangrong
2013-04-18  0:08   ` Marcelo Tosatti
2013-04-18  4:03     ` Xiao Guangrong
2013-04-20 17:18       ` Marcelo Tosatti
2013-04-21  6:59         ` Xiao Guangrong
2013-04-21 13:03 ` [PATCH v3 00/15] KVM: MMU: fast zap all shadow pages Gleb Natapov
2013-04-21 14:09   ` Xiao Guangrong
2013-04-21 15:24     ` Marcelo Tosatti
2013-04-22  2:50       ` Xiao Guangrong
2013-04-22  9:21     ` Gleb Natapov
2013-04-23  0:19       ` Xiao Guangrong
2013-04-23  6:28         ` Gleb Natapov
2013-04-23  7:20           ` Xiao Guangrong
2013-04-23  7:33             ` Gleb Natapov
2013-04-21 15:27   ` Marcelo Tosatti
2013-04-21 15:35     ` Marcelo Tosatti
2013-04-22 12:39       ` Gleb Natapov
2013-04-22 13:45         ` Takuya Yoshikawa
2013-04-22 23:02           ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51700F59.6080707@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=avi.kivity@gmail.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox