All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
Date: Wed, 17 Apr 2013 17:39:04 -0300	[thread overview]
Message-ID: <20130417203904.GA29493@amt.cnet> (raw)
In-Reply-To: <20130322191524.GY9382@redhat.com>

On Fri, Mar 22, 2013 at 09:15:24PM +0200, Gleb Natapov wrote:
> On Fri, Mar 22, 2013 at 08:37:33PM +0800, Xiao Guangrong wrote:
> > On 03/22/2013 08:12 PM, Gleb Natapov wrote:
> > > On Fri, Mar 22, 2013 at 08:03:04PM +0800, Xiao Guangrong wrote:
> > >> On 03/22/2013 07:47 PM, Gleb Natapov wrote:
> > >>> On Fri, Mar 22, 2013 at 07:39:24PM +0800, Xiao Guangrong wrote:
> > >>>> On 03/22/2013 07:28 PM, Gleb Natapov wrote:
> > >>>>> On Fri, Mar 22, 2013 at 07:10:44PM +0800, Xiao Guangrong wrote:
> > >>>>>> On 03/22/2013 06:54 PM, Marcelo Tosatti wrote:
> > >>>>>>
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> And then have codepaths that nuke shadow pages break from the spinlock,
> > >>>>>>>>
> > >>>>>>>> I think this is not needed any more. We can let mmu_notify use the generation
> > >>>>>>>> number to invalid all shadow pages, then we only need to free them after
> > >>>>>>>> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
> > >>>>>>>> we can directly free them.
> > >>>>>>>>
> > >>>>>>>>> such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
> > >>>>>>>>
> > >>>>>>>> BTW, to my honest, i do not think spin_needbreak is a good way - it does
> > >>>>>>>> not fix the hot-lock contention and it just occupies more cpu time to avoid
> > >>>>>>>> possible soft lock-ups.
> > >>>>>>>>
> > >>>>>>>> Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
> > >>>>>>>> mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
> > >>>>>>>> create page tables again. zap-all-shadow-page need long time to be finished,
> > >>>>>>>> the worst case is, it can not completed forever on intensive vcpu and memory
> > >>>>>>>> usage.
> > >>>>>>>
> > >>>>>>> Yes, but the suggestion is to use spin_needbreak on the VM shutdown
> > >>>>>>> cases, where there is no detailed concern about performance. Such as
> > >>>>>>> mmu_notifier_release, kvm_destroy_vm, etc. In those cases what matters
> > >>>>>>> most is that host remains unaffected (and that it finishes in a
> > >>>>>>> reasonable time).
> > >>>>>>
> > >>>>>> Okay. I agree with you, will give a try.
> > >>>>>>
> > >>>>>>>
> > >>>>>>>> I still think the right way to fix this kind of thing is optimization for
> > >>>>>>>> mmu-lock.
> > >>>>>>>
> > >>>>>>> And then for the cases where performance matters just increase a
> > >>>>>>> VM global generetion number, zap the roots and then on kvm_mmu_get_page:
> > >>>>>>>
> > >>>>>>> kvm_mmu_get_page() {
> > >>>>>>> 	sp = lookup_hash(gfn)
> > >>>>>>> 	if (sp->role = role) {
> > >>>>>>> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> > >>>>>>> 			kvm_mmu_commit_zap_page(sp); (no need for TLB flushes as its unreachable)
> > >>>>>>> 			kvm_mmu_init_page(sp);
> > >>>>>>> 			proceed as if the page was just allocated
> > >>>>>>> 		}
> > >>>>>>> 	}
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> It makes the kvm_mmu_zap_all path even faster than you have now.
> > >>>>>>> I suppose this was your idea correct with the generation number correct?
> > >>>>>>
> > >>>>>> Wow, great minds think alike, this is exactly what i am doing. ;)
> > >>>>>>
> > >>>>> Not that I disagree with above code, but why not make mmu_gen_number to be
> > >>>>> part of a role and remove old pages in kvm_mmu_free_some_pages() whenever
> > >>>>> limit is reached like we looks to be doing with role.invalid pages now.
> > >>>>
> > >>>> These pages can be reused after purge its entries and delete it from parents
> > >>>> list, it can reduce the pressure of memory allocator. Also, we can move it to
> > >>>> the head of active_list so that the pages with invalid_gen can be reclaimed first.
> > >>>>
> > >>> You mean tail of the active_list, since kvm_mmu_free_some_pages()
> > >>> removes pages from tail? Since pages with new mmu_gen_number will be put
> > >>
> > >> I mean purge the invalid-gen page first, then update its valid-gen to current-gen,
> > >> then move it to the head of active_list:
> > >>
> > >> kvm_mmu_get_page() {
> > >> 	sp = lookup_hash(gfn)
> > >> 	if (sp->role = role) {
> > >> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> > >> 			kvm_mmu_purge_page(sp); (no need for TLB flushes as its unreachable)
> > >> 			sp->mmu_gen_number = kvm->arch.mmu_gen_number;
> > >> 			@@@@@@ move sp to the head of active list @@@@@@
> > >> 		}
> > >> 	}
> > >> }
> > >>
> > >>
> > > And I am saying that if you make mmu_gen_number part of the role you do
> > > not need to change kvm_mmu_get_page() at all. It will just work.
> > 
> > Oh, i got what your said. But i want to reuse these page (without
> > free and re-allocate). What do you think about this?
> > 
> We did not do that for sp->role.invalid pages although we could do what
> is proposed above for them too (am I right?). If there is measurable
> advantage of reusing invalid pages in kvm_mmu_get_page() lets do it like
> that, but if not then less code is better.

The number of sp->role.invalid=1 pages is small (only shadow roots). It
can grow but is bounded to a handful. No improvement visible there.

The number of shadow pages with old mmu_gen_number is potentially large.

Returning all shadow pages to the allocator is problematic because it
takes a long time (therefore the suggestion to postpone it).

Spreading the work to free (or reuse) those shadow pages to individual
page fault instances alleviates the mmu_lock hold time issue without
significant reduction to post kvm_mmu_zap_all operation (which has to
rebuilt all pagetables anyway).

You prefer to modify SLAB allocator to aggressively free these stale
shadow pages rather than kvm_mmu_get_page to reuse them?


  reply	other threads:[~2013-04-17 20:59 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-20  8:30 [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages Xiao Guangrong
2013-03-20  8:30 ` [PATCH v2 1/7] KVM: MMU: introduce mmu_cache->pte_list_descs Xiao Guangrong
2013-03-20  8:30 ` [PATCH v2 2/7] KVM: x86: introduce memslot_set_lpage_disallowed Xiao Guangrong
2013-03-20  8:30 ` [PATCH v2 3/7] KVM: x86: introduce kvm_clear_all_gfn_page_info Xiao Guangrong
2013-03-20  8:30 ` [PATCH v2 4/7] KVM: MMU: delete shadow page from hash list in kvm_mmu_prepare_zap_page Xiao Guangrong
2013-03-21 13:14   ` Gleb Natapov
2013-03-22  2:16     ` Xiao Guangrong
2013-03-20  8:30 ` [PATCH v2 5/7] KVM: MMU: split kvm_mmu_prepare_zap_page Xiao Guangrong
2013-03-20  8:30 ` [PATCH v2 6/7] KVM: MMU: fast zap all shadow pages Xiao Guangrong
2013-03-20  8:30 ` [PATCH v2 7/7] KVM: MMU: drop unnecessary kvm_reload_remote_mmus after kvm_mmu_zap_all Xiao Guangrong
2013-03-21 22:21 ` [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages Marcelo Tosatti
2013-03-22  2:11   ` Xiao Guangrong
2013-03-22 10:01     ` Xiao Guangrong
2013-03-22 10:54     ` Marcelo Tosatti
2013-03-22 11:10       ` Xiao Guangrong
2013-03-22 11:28         ` Gleb Natapov
2013-03-22 11:39           ` Xiao Guangrong
2013-03-22 11:47             ` Gleb Natapov
2013-03-22 12:03               ` Xiao Guangrong
2013-03-22 12:12                 ` Gleb Natapov
2013-03-22 12:37                   ` Xiao Guangrong
2013-03-22 19:15                     ` Gleb Natapov
2013-04-17 20:39                       ` Marcelo Tosatti [this message]
2013-04-18  9:42                         ` Gleb Natapov
2013-04-18 14:01                           ` Marcelo Tosatti
2013-04-18 16:36                             ` Gleb Natapov
2013-04-18 17:34                               ` Marcelo Tosatti
  -- strict thread matches above, loose matches on Subject: below --
2013-03-20  8:29 Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130417203904.GA29493@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.