From: Gleb Natapov <gleb@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
avi.kivity@gmail.com, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org, takuya.yoshikawa@gmail.com
Subject: Re: [PATCH v4 4/6] KVM: MMU: fast invalid all shadow pages
Date: Tue, 7 May 2013 17:56:08 +0300 [thread overview]
Message-ID: <20130507145608.GB7899@redhat.com> (raw)
In-Reply-To: <20130507143329.GB6408@amt.cnet>
On Tue, May 07, 2013 at 11:33:29AM -0300, Marcelo Tosatti wrote:
> On Tue, May 07, 2013 at 01:00:51PM +0300, Gleb Natapov wrote:
> > On Tue, May 07, 2013 at 05:41:35PM +0800, Xiao Guangrong wrote:
> > > On 05/07/2013 04:58 PM, Gleb Natapov wrote:
> > > > On Tue, May 07, 2013 at 01:45:52AM +0800, Xiao Guangrong wrote:
> > > >> On 05/07/2013 01:24 AM, Gleb Natapov wrote:
> > > >>> On Mon, May 06, 2013 at 09:10:11PM +0800, Xiao Guangrong wrote:
> > > >>>> On 05/06/2013 08:36 PM, Gleb Natapov wrote:
> > > >>>>
> > > >>>>>>> Step 1) Fix kvm_mmu_zap_all's behaviour: introduce lockbreak via
> > > >>>>>>> spin_needbreak. Use generation numbers so that in case kvm_mmu_zap_all
> > > >>>>>>> releases mmu_lock and reacquires it again, only shadow pages
> > > >>>>>>> from the generation with which kvm_mmu_zap_all started are zapped (this
> > > >>>>>>> guarantees forward progress and eventual termination).
> > > >>>>>>>
> > > >>>>>>> kvm_mmu_zap_generation()
> > > >>>>>>> spin_lock(mmu_lock)
> > > >>>>>>> int generation = kvm->arch.mmu_generation;
> > > >>>>>>>
> > > >>>>>>> for_each_shadow_page(sp) {
> > > >>>>>>> if (sp->generation == kvm->arch.mmu_generation)
> > > >>>>>>> zap_page(sp)
> > > >>>>>>> if (spin_needbreak(mmu_lock)) {
> > > >>>>>>> kvm->arch.mmu_generation++;
> > > >>>>>>> cond_resched_lock(mmu_lock);
> > > >>>>>>> }
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> kvm_mmu_zap_all()
> > > >>>>>>> spin_lock(mmu_lock)
> > > >>>>>>> for_each_shadow_page(sp) {
> > > >>>>>>> if (spin_needbreak(mmu_lock)) {
> > > >>>>>>> cond_resched_lock(mmu_lock);
> > > >>>>>>> }
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> Use kvm_mmu_zap_generation for kvm_arch_flush_shadow_memslot.
> > > >>>>>>> Use kvm_mmu_zap_all for kvm_mmu_notifier_release,kvm_destroy_vm.
> > > >>>>>>>
> > > >>>>>>> This addresses the main problem: excessively long hold times
> > > >>>>>>> of kvm_mmu_zap_all with very large guests.
> > > >>>>>>>
> > > >>>>>>> Do you see any problem with this logic? This was what i was thinking
> > > >>>>>>> we agreed.
> > > >>>>>>
> > > >>>>>> No. I understand it and it can work.
> > > >>>>>>
> > > >>>>>> Actually, it is similar with Gleb's idea that "zapping stale shadow pages
> > > >>>>>> (and uses lock break technique)", after some discussion, we thought "only zap
> > > >>>>>> shadow pages that are reachable from the slot's rmap" is better, that is this
> > > >>>>>> patchset does.
> > > >>>>>> (https://lkml.org/lkml/2013/4/23/73)
> > > >>>>>>
> > > >>>>> But this is not what the patch is doing. Close, but not the same :)
> > > >>>>
> > > >>>> Okay. :)
> > > >>>>
> > > >>>>> Instead of zapping shadow pages reachable from slot's rmap the patch
> > > >>>>> does kvm_unmap_rmapp() which drop all spte without zapping shadow pages.
> > > >>>>> That is why you need special code to re-init lpage_info. What I proposed
> > > >>>>> was to call zap_page() on all shadow pages reachable from rmap. This
> > > >>>>> will take care of lpage_info counters. Does this make sense?
> > > >>>>
> > > >>>> Unfortunately, no! We still need to care lpage_info. lpage_info is used
> > > >>>> to count the number of guest page tables in the memslot.
> > > >>>>
> > > >>>> For example, there is a memslot:
> > > >>>> memslot[0].based_gfn = 0, memslot[0].npages = 100,
> > > >>>>
> > > >>>> and there is a shadow page:
> > > >>>> sp->role.direct =0, sp->role.level = 4, sp->gfn = 10.
> > > >>>>
> > > >>>> this sp is counted in the memslot[0] but it can not be found by walking
> > > >>>> memslot[0]->rmap since there is no last mapping in this shadow page.
> > > >>>>
> > > >>> Right, so what about walking mmu_page_hash for each gfn belonging to the
> > > >>> slot that is in process to be removed to find those?
> > > >>
> > > >> That will cost lots of time. The size of hashtable is 1 << 10. If the
> > > >> memslot has 4M memory, it will walk all the entries, the cost is the same
> > > >> as walking active_list (maybe litter more). And a memslot has 4M memory is
> > > >> the normal case i think.
> > > >>
> > > > Memslots will be much bigger with memory hotplug. Lock break should be
> > > > used while walking mmu_page_hash obviously, but still iterating over
> > > > entire memslot gfn space to find a few gfn that may be there is
> > > > suboptimal. We can keep a list of them in the memslot itself.
> > >
> > > It sounds good to me.
> > >
> > > BTW, this approach looks more complex and use more memory (new list_head
> > > added into every shadow page) used, why you dislike clearing lpage_info? ;)
> > >
> > Looks a little bit hackish, but now that I see we do not have easy way
> > to find all shadow pages counted in lpage_info I am not entirely against
> > it. If you convince Marcelo that clearing lpage_info like that is a good
> > idea I may reconsider. I think, regardless of tracking lpage_info,
> > having a way to find all shadow pages that reference a memslot is a good
> > thing though.
> >
> > > >
> > > >> Another point is that lpage_info stops mmu to use large page. If we
> > > >> do not reset lpage_info, mmu is using 4K page until the invalid-sp is
> > > >> zapped.
> > > >>
> > > > I do not think this is a big issue. If lpage_info prevented the use of
> > > > large pages for some memory ranges before we zapped entire shadow pages
> > > > it was probably for a reason, so new shadow page will prevent large
> > > > pages from been created for the same memory ranges.
> > >
> > > Still worried, but I will try it if Marcelo does not have objects.
> > > Thanks a lot for your valuable suggestion, Gleb!
> > >
> > > Now, i am trying my best to catch Marcelo's idea of "zapping root
> > > pages", but......
> > >
> > Yes, I am missing what Marcelo means there too. We cannot free memslot
> > until we unmap its rmap one way or the other.
>
> I do not understand what are you optimizing for, given the four possible
> cases we discussed at
>
> https://lkml.org/lkml/2013/4/18/280
>
We are optimizing mmu_lock holding time for all of those cases.
But you cannot just "zap roots + sp gen number increase." on slot
deletion because you need to transfer access/dirty information from rmap
that is going to be deleted to actual page before
kvm_set_memory_region() returns to a caller.
> That is, why a simple for_each_all_shadow_page(zap_page) is not sufficient.
With a lock break? It is. We tried to optimize that by zapping only pages
that reference memslot that is going to be deleted and zap all other
later when recycling old sps, but if you think this is premature
optimization I am fine with it.
--
Gleb.
next prev parent reply other threads:[~2013-05-07 14:57 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-27 3:13 [PATCH v4 0/6] KVM: MMU: fast zap all shadow pages Xiao Guangrong
2013-04-27 3:13 ` [PATCH v4 1/6] KVM: MMU: drop unnecessary kvm_reload_remote_mmus Xiao Guangrong
2013-04-27 3:13 ` [PATCH v4 2/6] KVM: x86: introduce memslot_set_lpage_disallowed Xiao Guangrong
2013-05-03 2:10 ` Takuya Yoshikawa
2013-05-03 5:55 ` Xiao Guangrong
2013-04-27 3:13 ` [PATCH v4 3/6] KVM: MMU: introduce kvm_clear_all_lpage_info Xiao Guangrong
2013-05-03 2:15 ` Takuya Yoshikawa
2013-05-03 5:57 ` Xiao Guangrong
2013-04-27 3:13 ` [PATCH v4 4/6] KVM: MMU: fast invalid all shadow pages Xiao Guangrong
2013-05-03 1:05 ` Marcelo Tosatti
2013-05-03 5:52 ` Xiao Guangrong
2013-05-03 15:53 ` Marcelo Tosatti
2013-05-03 16:51 ` Xiao Guangrong
2013-05-04 0:52 ` Marcelo Tosatti
2013-05-04 0:56 ` Marcelo Tosatti
2013-05-06 3:39 ` Xiao Guangrong
2013-05-06 12:36 ` Gleb Natapov
2013-05-06 13:10 ` Xiao Guangrong
2013-05-06 17:24 ` Gleb Natapov
2013-05-06 17:45 ` Xiao Guangrong
2013-05-07 8:58 ` Gleb Natapov
2013-05-07 9:41 ` Xiao Guangrong
2013-05-07 10:00 ` Gleb Natapov
2013-05-07 14:33 ` Marcelo Tosatti
2013-05-07 14:56 ` Gleb Natapov [this message]
2013-05-07 15:09 ` Marcelo Tosatti
2013-05-08 10:41 ` Gleb Natapov
2013-05-06 19:50 ` Marcelo Tosatti
2013-05-07 3:39 ` Xiao Guangrong
2013-05-07 14:42 ` Marcelo Tosatti
2013-05-03 2:27 ` Takuya Yoshikawa
2013-05-03 6:00 ` Xiao Guangrong
2013-04-27 3:13 ` [PATCH v4 5/6] KVM: x86: use the fast way to invalid all pages Xiao Guangrong
2013-04-27 3:13 ` [PATCH v4 6/6] KVM: MMU: make kvm_mmu_zap_all preemptable Xiao Guangrong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130507145608.GB7899@redhat.com \
--to=gleb@redhat.com \
--cc=avi.kivity@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=takuya.yoshikawa@gmail.com \
--cc=xiaoguangrong@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).