Re: [PATCH v3 12/15] KVM: MMU: fast invalid all shadow pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com, avi.kivity@gmail.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v3 12/15] KVM: MMU: fast invalid all shadow pages
Date: Thu, 18 Apr 2013 23:20:57 +0800	[thread overview]
Message-ID: <51700F59.6080707@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130418132924.GB29705@amt.cnet>

On 04/18/2013 09:29 PM, Marcelo Tosatti wrote:
> On Thu, Apr 18, 2013 at 10:03:06AM -0300, Marcelo Tosatti wrote:
>> On Thu, Apr 18, 2013 at 12:00:16PM +0800, Xiao Guangrong wrote:
>>>>
>>>> What is the justification for this? 
>>>
>>> We want the rmap of being deleted memslot is removed-only that is
>>> needed for unmapping rmap out of mmu-lock.
>>>
>>> ======
>>> 1) do not corrupt the rmap
>>> 2) keep pte-list-descs available
>>> 3) keep shadow page available
>>>
>>> Resolve 1):
>>> we make the invalid rmap be remove-only that means we only delete and
>>> clear spte from the rmap, no new sptes can be added to it.
>>> This is reasonable since kvm can not do address translation on invalid rmap
>>> (gfn_to_pfn is failed on invalid memslot) and all sptes on invalid rmap can
>>> not be reused (they belong to invalid shadow page).
>>> ======
>>>
>>> clear_flush_young / test_young / change_pte of mmu-notify can rewrite
>>> rmap with the present-spte (P bit is set), we should umap rmap in
>>> these handlers.
>>>
>>>>
>>>>> +
>>>>> +	/*
>>>>> +	 * To ensure that all vcpus and mmu-notify are not clearing
>>>>> +	 * spte and rmap entry.
>>>>> +	 */
>>>>> +	synchronize_srcu_expedited(&kvm->srcu);
>>>>> +}
>>>>> +
>>>>>  #ifdef MMU_DEBUG
>>>>>  static int is_empty_shadow_page(u64 *spt)
>>>>>  {
>>>>> @@ -2219,6 +2283,11 @@ static void clear_sp_write_flooding_count(u64 *spte)
>>>>>  	__clear_sp_write_flooding_count(sp);
>>>>>  }
>>>>>  
>>>>> +static bool is_valid_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>>>>> +{
>>>>> +	return likely(sp->mmu_valid_gen == kvm->arch.mmu_valid_gen);
>>>>> +}
>>>>> +
>>>>>  static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>>>>>  					     gfn_t gfn,
>>>>>  					     gva_t gaddr,
>>>>> @@ -2245,6 +2314,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>>>>>  		role.quadrant = quadrant;
>>>>>  	}
>>>>>  	for_each_gfn_sp(vcpu->kvm, sp, gfn) {
>>>>> +		if (!is_valid_sp(vcpu->kvm, sp))
>>>>> +			continue;
>>>>> +
>>>>>  		if (!need_sync && sp->unsync)
>>>>>  			need_sync = true;
>>>>>  
>>>>> @@ -2281,6 +2353,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>>>>>  
>>>>>  		account_shadowed(vcpu->kvm, gfn);
>>>>>  	}
>>>>> +	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
>>>>>  	init_shadow_page_table(sp);
>>>>>  	trace_kvm_mmu_get_page(sp, true);
>>>>>  	return sp;
>>>>> @@ -2451,8 +2524,12 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
>>>>>  	ret = mmu_zap_unsync_children(kvm, sp, invalid_list);
>>>>>  	kvm_mmu_page_unlink_children(kvm, sp);
>>>>
>>>> The rmaps[] arrays linked to !is_valid_sp() shadow pages should not be
>>>> accessed (as they have been freed already).
>>>>
>>>> I suppose the is_valid_sp() conditional below should be moved earlier,
>>>> before kvm_mmu_unlink_parents or any other rmap access.
>>>>
>>>> This is fine: the !is_valid_sp() shadow pages are only reachable
>>>> by SLAB and the hypervisor itself.
>>>
>>> Unfortunately we can not do this. :(
>>>
>>> The sptes in shadow pape can linked to many slots, if the spte is linked
>>> to the rmap of being deleted memslot, it is ok, otherwise, the rmap of
>>> still used memslot is miss updated.
>>>
>>> For example, slot 0 is being deleted, sp->spte[0] is linked on slot[0].rmap,
>>> sp->spte[1] is linked on slot[1].rmap. If we do not access rmap of this 'sp',
>>> the already-freed spte[1] is still linked on slot[1].rmap.
>>>
>>> We can let kvm update the rmap for sp->spte[1] and do not unlink sp->spte[0].
>>> This is also not allowed since mmu-notify can access the invalid rmap before
>>> the memslot is destroyed, then mmu-notify will get already-freed spte on
>>> the rmap or page Access/Dirty is miss tracked (if let mmu-notify do not access
>>> the invalid rmap).
>>
>> Why not release all rmaps?
>>
>> Subject: [PATCH v2 3/7] KVM: x86: introduce kvm_clear_all_gfn_page_info
>>
>> This function is used to reset the rmaps and page info of all guest page
>> which will be used in later patch
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> 
> Which you have later in patchset.

The patch you mentioned is old (v2), now it only resets lpage-info excluding
rmap:

======
[PATCH v3 11/15] KVM: MMU: introduce kvm_clear_all_lpage_info

This function is used to reset the large page info of all guest page
which will be used in later patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
======

We can not release all rmaps. If we do this, ->invalidate_page and
->invalidate_range_start can not find any spte using the host page,
that means, Accessed/Dirty for host page is missing tracked.
(missing call kvm_set_pfn_accessed and kvm_set_pfn_dirty properly.)

Furthermore, when we drop a invalid-gen spte, we will call
kvm_set_pfn_dirty/kvm_set_pfn_dirty for a already-freed host page since
mmu-notify can not find the spte by rmap.
(we can skip drop-spte for the invalid-gen sp, but A/D for host page
can be missed)

That is why i introduced unmap_invalid_rmap out of mmu-lock.

> 
> So, what is the justification for the zap root + generation number increase 
> to work on a per memslot basis, given that
> 
>         /*
>          * If memory slot is created, or moved, we need to clear all
>          * mmio sptes.
>          */
>         if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
>                 kvm_mmu_zap_mmio_sptes(kvm);
>                 kvm_reload_remote_mmus(kvm);
>         }
> 
> Is going to be dealt with generation number on mmio spte idea?

Yes. Actually, this patchset (v3) is based on other two patchset:

======
This patchset is based on my previous two patchset:
[PATCH 0/2] KVM: x86: avoid potential soft lockup and unneeded mmu reload
(https://lkml.org/lkml/2013/4/1/2)

[PATCH v2 0/6] KVM: MMU: fast invalid all mmio sptes
(https://lkml.org/lkml/2013/4/1/134)
======

We did that it [PATCH v2 0/6] KVM: MMU: fast invalid all mmio sptes.

> 
> Note at the moment all shadows pages are zapped on deletion / move,
> and there is no performance complaint for those cases.

Yes, zap-mmio is only needed for MEMSLOT_CREATE.

> 
> In fact, for what case is generation number on mmio spte optimizes for?
> The cases are where slots are deleted/moved/created on a live guest
> are:
> 
> - Legacy VGA mode operation where VGA slots are created/deleted. Zapping
> all shadow not a performance issue in that case.
> - Device hotplug (not performance critical).
> - Remapping of PCI memory regions (not a performance issue).
> - Memory hotplug (not a performance issue).
> 
> These are all rare events in which there is no particular concern about
> rebuilding shadow pages to resume cruise speed operation.
> 
> So from this POV (please correct if not accurate) avoiding problems
> with huge number of shadow pages is all thats being asked for.
> 
> Which is handled nicely by zap roots + sp gen number increase.

So, we can use "zap roots + sp gen number" instead of current zap-mmio-sp?
If yes, i totally agree with you.

next prev parent reply	other threads:[~2013-04-18 15:20 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-16  6:32 [PATCH v3 00/15] KVM: MMU: fast zap all shadow pages Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 01/15] KVM: x86: clean up and optimize for kvm_arch_free_memslot Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 02/15] KVM: fold kvm_arch_create_memslot into kvm_arch_prepare_memory_region Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 03/15] KVM: x86: do not reuse rmap when memslot is moved Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 04/15] KVM: MMU: abstract memslot rmap related operations Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 05/15] KVM: MMU: allow per-rmap operations Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 06/15] KVM: MMU: allow concurrently clearing spte on remove-only pte-list Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 07/15] KVM: MMU: introduce invalid rmap handlers Xiao Guangrong
2013-04-17 23:38   ` Marcelo Tosatti
2013-04-18  3:15     ` Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 08/15] KVM: MMU: allow unmap invalid rmap out of mmu-lock Xiao Guangrong
2013-04-18 11:00   ` Gleb Natapov
2013-04-18 11:22     ` Xiao Guangrong
2013-04-18 11:38       ` Gleb Natapov
2013-04-18 12:10         ` Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 09/15] KVM: MMU: introduce free_meslot_rmap_desc_nolock Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 10/15] KVM: x86: introduce memslot_set_lpage_disallowed Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 11/15] KVM: MMU: introduce kvm_clear_all_lpage_info Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 12/15] KVM: MMU: fast invalid all shadow pages Xiao Guangrong
2013-04-18  0:05   ` Marcelo Tosatti
2013-04-18  4:00     ` Xiao Guangrong
2013-04-18 13:03       ` Marcelo Tosatti
2013-04-18 13:29         ` Marcelo Tosatti
2013-04-18 15:20           ` Xiao Guangrong [this message]
2013-04-16  6:32 ` [PATCH v3 13/15] KVM: x86: use the fast way to invalid all pages Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 14/15] KVM: move srcu_read_lock/srcu_read_unlock to arch-specified code Xiao Guangrong
2013-04-16  6:32 ` [PATCH v3 15/15] KVM: MMU: replace kvm_zap_all with kvm_mmu_invalid_all_pages Xiao Guangrong
2013-04-18  0:08   ` Marcelo Tosatti
2013-04-18  4:03     ` Xiao Guangrong
2013-04-20 17:18       ` Marcelo Tosatti
2013-04-21  6:59         ` Xiao Guangrong
2013-04-21 13:03 ` [PATCH v3 00/15] KVM: MMU: fast zap all shadow pages Gleb Natapov
2013-04-21 14:09   ` Xiao Guangrong
2013-04-21 15:24     ` Marcelo Tosatti
2013-04-22  2:50       ` Xiao Guangrong
2013-04-22  9:21     ` Gleb Natapov
2013-04-23  0:19       ` Xiao Guangrong
2013-04-23  6:28         ` Gleb Natapov
2013-04-23  7:20           ` Xiao Guangrong
2013-04-23  7:33             ` Gleb Natapov
2013-04-21 15:27   ` Marcelo Tosatti
2013-04-21 15:35     ` Marcelo Tosatti
2013-04-22 12:39       ` Gleb Natapov
2013-04-22 13:45         ` Takuya Yoshikawa
2013-04-22 23:02           ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51700F59.6080707@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=avi.kivity@gmail.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.