Re: [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
	gleb@redhat.com, avi.kivity@gmail.com, pbonzini@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread
Date: Tue, 8 Oct 2013 22:56:27 -0300	[thread overview]
Message-ID: <20131009015627.GA4816@amt.cnet> (raw)
In-Reply-To: <CAD6C3B4-DD55-47EB-9BC0-17867937AE2D@gmail.com>

On Tue, Oct 08, 2013 at 12:02:32PM +0800, Xiao Guangrong wrote:
> 
> Hi Marcelo,
> 
> On Oct 8, 2013, at 9:23 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
> >> 
> >> +	if (kvm->arch.rcu_free_shadow_page) {
> >> +		kvm_mmu_isolate_pages(invalid_list);
> >> +		sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
> >> +		list_del_init(invalid_list);
> >> +		call_rcu(&sp->rcu, free_pages_rcu);
> >> +		return;
> >> +	}
> > 
> > This is unbounded (there was a similar problem with early fast page fault
> > implementations):
> > 
> > From RCU/checklist.txt:
> > 
> > "        An especially important property of the synchronize_rcu()
> >        primitive is that it automatically self-limits: if grace periods
> >        are delayed for whatever reason, then the synchronize_rcu()
> >        primitive will correspondingly delay updates.  In contrast,
> >        code using call_rcu() should explicitly limit update rate in
> >        cases where grace periods are delayed, as failing to do so can
> >        result in excessive realtime latencies or even OOM conditions.
> > "
> 
> I understand what you are worrying about… Hmm, can it be avoided by
> just using kvm->arch.rcu_free_shadow_page in a small window? - Then
> there are slight chance that the page need to be freed by call_rcu.

The point that must be addressed is that you cannot allow an unlimited
number of sp's to be freed via call_rcu between two grace periods.

So something like:

- For every 17MB worth of shadow pages.
- Guarantee a grace period has passed.

If you control kvm->arch.rcu_free_shadow_page, you could periodically
verify how many MBs worth of shadow pages are in the queue for RCU
freeing and force grace period after a certain number.

> > Moreover, freeing pages differently depending on some state should 
> > be avoided.
> > 
> > Alternatives:
> > 
> > - Disable interrupts at write protect sites.
> 
> The write-protection can be triggered by KVM ioctl that is not in the VCPU
> context, if we do this, we also need to send IPI to the KVM thread when do
> TLB flush.

Yes. However for the case being measured, simultaneous page freeing by vcpus 
should be minimal (therefore not affecting the latency of GET_DIRTY_LOG).

> And we can not do much work while interrupt is disabled due to
> interrupt latency.
> 
> > - Rate limit the number of pages freed via call_rcu
> > per grace period.
> 
> Seems complex. :(
> 
> > - Some better alternative.
> 
> Gleb has a idea that uses RCU_DESTORY to protect the shadow page table
> and encodes the page-level into the spte (since we need to check if the spte
> is the last-spte. ).  How about this?

Pointer please? Why is DESTROY_SLAB_RCU any safer than call_rcu with
regards to limitation? (maybe it is).

> I planned to do it after this patchset merged, if you like it and if you think
> that "using kvm->arch.rcu_free_shadow_page in a small window" can not avoid
> the issue, i am happy to do it in the next version. :)

Unfortunately the window can be large (as it depends on the size of the
memslot), so it would be best if this problem can be addressed before 
merging. What is your idea for reducing rcu_free_shadow_page=1 window?

Thank you for the good work.

next prev parent reply	other threads:[~2013-10-09  1:57 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-05 10:29 [PATCH v2 00/15] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 01/15] KVM: MMU: fix the count of spte number Xiao Guangrong
2013-09-08 12:19   ` Gleb Natapov
2013-09-08 13:55     ` Xiao Guangrong
2013-09-08 14:01       ` Gleb Natapov
2013-09-08 14:24         ` Xiao Guangrong
2013-09-08 14:26           ` Gleb Natapov
2013-09-05 10:29 ` [PATCH v2 02/15] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-09-30 21:23   ` Marcelo Tosatti
2013-10-03  6:16     ` Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 03/15] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-09-30 22:39   ` Marcelo Tosatti
2013-10-03  6:29     ` Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 04/15] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 05/15] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Xiao Guangrong
2013-09-30 23:05   ` Marcelo Tosatti
2013-10-03  6:46     ` Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 06/15] KVM: MMU: update spte and add it into rmap before dirty log Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 07/15] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 08/15] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 09/15] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-09-08 12:03   ` Xiao Guangrong
2013-09-16 12:42   ` Gleb Natapov
2013-09-16 13:52     ` Xiao Guangrong
2013-09-16 15:04       ` Gleb Natapov
2013-09-05 10:29 ` [PATCH v2 10/15] KVM: MMU: initialize the pointers in pte_list_desc properly Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 11/15] KVM: MMU: reintroduce kvm_mmu_isolate_page() Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread Xiao Guangrong
2013-10-08  1:23   ` Marcelo Tosatti
2013-10-08  4:02     ` Xiao Guangrong
2013-10-09  1:56       ` Marcelo Tosatti [this message]
2013-10-09 10:45         ` Xiao Guangrong
2013-10-10  1:47           ` Marcelo Tosatti
2013-10-10 12:08             ` Gleb Natapov
2013-10-10 16:42               ` Marcelo Tosatti
2013-10-10 19:16                 ` Gleb Natapov
2013-10-10 21:03                   ` Marcelo Tosatti
2013-10-11  5:38                     ` Gleb Natapov
2013-10-11 20:30                       ` Marcelo Tosatti
2013-10-12  5:53                         ` Gleb Natapov
2013-10-14 19:29                           ` Marcelo Tosatti
2013-10-15  3:57                             ` Gleb Natapov
2013-10-15 22:21                               ` Marcelo Tosatti
2013-10-16  0:41                                 ` Xiao Guangrong
2013-10-16  9:12                                 ` Gleb Natapov
2013-10-16 20:43                                   ` Marcelo Tosatti
2013-09-05 10:29 ` [PATCH v2 13/15] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 14/15] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 15/15] KVM: MMU: use rcu functions to access the pointer Xiao Guangrong
2013-09-15 10:26 ` [PATCH v2 00/15] KVM: MMU: locklessly wirte-protect Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131009015627.GA4816@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=avi.kivity@gmail.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=xiaoguangrong.eric@gmail.com \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).