Re: [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Gleb Natapov <gleb@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
	Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
	avi.kivity@gmail.com, pbonzini@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread
Date: Tue, 15 Oct 2013 06:57:05 +0300	[thread overview]
Message-ID: <20131015035705.GB30802@redhat.com> (raw)
In-Reply-To: <20131014192945.GA22655@amt.cnet>

On Mon, Oct 14, 2013 at 04:29:45PM -0300, Marcelo Tosatti wrote:
> On Sat, Oct 12, 2013 at 08:53:56AM +0300, Gleb Natapov wrote:
> > On Fri, Oct 11, 2013 at 05:30:17PM -0300, Marcelo Tosatti wrote:
> > > On Fri, Oct 11, 2013 at 08:38:31AM +0300, Gleb Natapov wrote:
> > > > > n_max_mmu_pages is not a suitable limit to throttle freeing of pages via
> > > > > RCU (its too large). If the free memory watermarks are smaller than 
> > > > > n_max_mmu_pages for all guests, OOM is possible.
> > > > > 
> > > > Ah, yes. I am not saying n_max_mmu_pages will throttle RCU, just saying
> > > > that slab size will be bound, so hopefully shrinker will touch it
> > > > rarely.
> > > > 
> > > > > > > > and, in addition, page released to slab is immediately
> > > > > > > > available for allocation, no need to wait for grace period. 
> > > > > > > 
> > > > > > > See SLAB_DESTROY_BY_RCU comment at include/linux/slab.h.
> > > > > > > 
> > > > > > This comment is exactly what I was referring to in the code you quoted. Do
> > > > > > you see anything problematic in what comment describes?
> > > > > 
> > > > > "This delays freeing the SLAB page by a grace period, it does _NOT_
> > > > > delay object freeing." The page is not available for allocation.
> > > > By "page" I mean "spt page" which is a slab object. So "spt page"
> > > > AKA slab object will be available fo allocation immediately.
> > > 
> > > The object is reusable within that SLAB cache only, not the 
> > > entire system (therefore it does not prevent OOM condition).
> > > 
> > Since object is allocatable immediately by shadow paging code the number
> > of SLAB objects is bound by n_max_mmu_pages. If there is no enough
> > memory for n_max_mmu_pages OOM condition can happen anyway since shadow
> > paging code will usually have exactly n_max_mmu_pages allocated.
> > 
> > > OK, perhaps it is useful to use SLAB_DESTROY_BY_RCU, but throttling 
> > > is still necessary, as described in the RCU documentation.
> > > 
> > I do not see what should be throttled if we use SLAB_DESTROY_BY_RCU. RCU
> > comes into play only when SLAB cache is shrunk and it happens far from
> > kvm code.
> 
> You are right.
> 
> Why is it safe to allow access, by the lockless page write protect
> side, to spt pointer for shadow page A that can change to a shadow page 
> pointer of shadow page B?
> 
> Write protect spte of any page at will? Or verify that in fact thats the
> shadow you want to write protect?
> 
> Note that spte value might be the same for different shadow pages, 
> so cmpxchg succeeding does not guarantees its the same shadow page that
> has been protected.
> 
Two things can happen: spte that we accidentally write protect is some
other last level spte - this is benign, it will be unprotected on next
fault.  If spte is not last level this is a problem and Xiao propose to
fix it by encoding spte level into spte itself. Another way to fix it is
to handle fault that is caused by write protected middle sptes in KVM -
just unprotected them and go back to a guest.

--
			Gleb.

next prev parent reply	other threads:[~2013-10-15  3:57 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-05 10:29 [PATCH v2 00/15] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 01/15] KVM: MMU: fix the count of spte number Xiao Guangrong
2013-09-08 12:19   ` Gleb Natapov
2013-09-08 13:55     ` Xiao Guangrong
2013-09-08 14:01       ` Gleb Natapov
2013-09-08 14:24         ` Xiao Guangrong
2013-09-08 14:26           ` Gleb Natapov
2013-09-05 10:29 ` [PATCH v2 02/15] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-09-30 21:23   ` Marcelo Tosatti
2013-10-03  6:16     ` Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 03/15] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-09-30 22:39   ` Marcelo Tosatti
2013-10-03  6:29     ` Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 04/15] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 05/15] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Xiao Guangrong
2013-09-30 23:05   ` Marcelo Tosatti
2013-10-03  6:46     ` Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 06/15] KVM: MMU: update spte and add it into rmap before dirty log Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 07/15] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 08/15] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 09/15] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-09-08 12:03   ` Xiao Guangrong
2013-09-16 12:42   ` Gleb Natapov
2013-09-16 13:52     ` Xiao Guangrong
2013-09-16 15:04       ` Gleb Natapov
2013-09-05 10:29 ` [PATCH v2 10/15] KVM: MMU: initialize the pointers in pte_list_desc properly Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 11/15] KVM: MMU: reintroduce kvm_mmu_isolate_page() Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread Xiao Guangrong
2013-10-08  1:23   ` Marcelo Tosatti
2013-10-08  4:02     ` Xiao Guangrong
2013-10-09  1:56       ` Marcelo Tosatti
2013-10-09 10:45         ` Xiao Guangrong
2013-10-10  1:47           ` Marcelo Tosatti
2013-10-10 12:08             ` Gleb Natapov
2013-10-10 16:42               ` Marcelo Tosatti
2013-10-10 19:16                 ` Gleb Natapov
2013-10-10 21:03                   ` Marcelo Tosatti
2013-10-11  5:38                     ` Gleb Natapov
2013-10-11 20:30                       ` Marcelo Tosatti
2013-10-12  5:53                         ` Gleb Natapov
2013-10-14 19:29                           ` Marcelo Tosatti
2013-10-15  3:57                             ` Gleb Natapov [this message]
2013-10-15 22:21                               ` Marcelo Tosatti
2013-10-16  0:41                                 ` Xiao Guangrong
2013-10-16  9:12                                 ` Gleb Natapov
2013-10-16 20:43                                   ` Marcelo Tosatti
2013-09-05 10:29 ` [PATCH v2 13/15] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 14/15] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-09-05 10:29 ` [PATCH v2 15/15] KVM: MMU: use rcu functions to access the pointer Xiao Guangrong
2013-09-15 10:26 ` [PATCH v2 00/15] KVM: MMU: locklessly wirte-protect Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131015035705.GB30802@redhat.com \
    --to=gleb@redhat.com \
    --cc=avi.kivity@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=xiaoguangrong.eric@gmail.com \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).