From: Gleb Natapov <gleb@redhat.com>
To: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: avi.kivity@gmail.com, mtosatti@redhat.com, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker
Date: Fri, 30 Aug 2013 14:38:05 +0300 [thread overview]
Message-ID: <20130830113805.GA1844@redhat.com> (raw)
In-Reply-To: <521F2FF0.9060105@linux.vnet.ibm.com>
On Thu, Aug 29, 2013 at 07:26:40PM +0800, Xiao Guangrong wrote:
> On 08/29/2013 05:51 PM, Gleb Natapov wrote:
> > On Thu, Aug 29, 2013 at 05:31:42PM +0800, Xiao Guangrong wrote:
> >>> As Documentation/RCU/whatisRCU.txt says:
> >>>
> >>> As with rcu_assign_pointer(), an important function of
> >>> rcu_dereference() is to document which pointers are protected by
> >>> RCU, in particular, flagging a pointer that is subject to changing
> >>> at any time, including immediately after the rcu_dereference().
> >>> And, again like rcu_assign_pointer(), rcu_dereference() is
> >>> typically used indirectly, via the _rcu list-manipulation
> >>> primitives, such as list_for_each_entry_rcu().
> >>>
> >>> The documentation aspect of rcu_assign_pointer()/rcu_dereference() is
> >>> important. The code is complicated, so self documentation will not hurt.
> >>> I want to see what is actually protected by rcu here. Freeing shadow
> >>> pages with call_rcu() further complicates matters: does it mean that
> >>> shadow pages are also protected by rcu?
> >>
> >> Yes, it stops shadow page to be freed when we do write-protection on
> >> it.
> >>
> > Yeah, I got the trick, what I am saying that we have a data structure
> > here protected by RCU, but we do not use RCU functions to access it...
>
> Yes, they are not used when insert a spte into rmap and get the rmap from
> the entry... but do we need to use these functions to guarantee the order?
>
> The worst case is, we fetch the spte from the desc but the spte is not
> updated yet, we can happily skip this spte since it will set the
> dirty-bitmap later, this is guaranteed by the barrier between mmu_spte_update()
> and mark_page_dirty(), the code is:
>
> set_spte():
>
> if (mmu_spte_update(sptep, spte))
> kvm_flush_remote_tlbs(vcpu->kvm);
>
> if (!remap) {
> if (rmap_add(vcpu, sptep, gfn) > RMAP_RECYCLE_THRESHOLD)
> rmap_recycle(vcpu, sptep, gfn);
>
> if (level > PT_PAGE_TABLE_LEVEL)
> ++vcpu->kvm->stat.lpages;
> }
>
> smp_wmb();
>
> if (pte_access & ACC_WRITE_MASK)
> mark_page_dirty(vcpu->kvm, gfn);
>
> So, i guess if we can guaranteed the order by ourself, we do not need
> to call the rcu functions explicitly...
>
> But, the memory barres in the rcu functions are really light on x86 (store
> can not be reordered with store), so i do not mind to explicitly use them
> if you think this way is more safe. :)
>
I think the self documentation aspect of using rcu function is also
important.
> > BTW why not allocate sp->spt from SLAB_DESTROY_BY_RCU cache too? We may
> > switch write protection on a random spt occasionally if page is deleted
> > and reused for another spt though. For last level spt it should not be a
> > problem and for non last level we have is_last_spte() check in
> > __rmap_write_protect_lockless(). Can it work?
>
> Yes, i also considered this way. It can work if we handle is_last_spte()
> properly. Since the sp->spte can be reused, we can not get the mapping
> level from sp. We need to encode the mapping level into spte so that
> cmpxhg can understand if the page table has been moved to another mapping
> level.
Isn't one bit that says that spte is the last one enough? IIRC we
have one more ignored bit to spare in spte.
> Could you allow me to make this optimization separately after this
> patchset be merged?
>
If you think it will complicate the initial version I am fine with
postponing it for later.
--
Gleb.
next prev parent reply other threads:[~2013-08-30 11:38 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-30 13:01 [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-07-30 13:01 ` [PATCH 01/12] KVM: MMU: remove unused parameter Xiao Guangrong
2013-08-29 7:22 ` Gleb Natapov
2013-07-30 13:02 ` [PATCH 02/12] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-07-30 13:02 ` [PATCH 03/12] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-08-02 14:55 ` Marcelo Tosatti
2013-08-02 15:42 ` Xiao Guangrong
2013-08-02 20:27 ` Marcelo Tosatti
2013-08-02 22:56 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable Xiao Guangrong
2013-07-30 13:26 ` Paolo Bonzini
2013-07-31 7:25 ` Xiao Guangrong
2013-08-07 1:48 ` Marcelo Tosatti
2013-08-07 4:06 ` Xiao Guangrong
2013-08-08 15:06 ` Marcelo Tosatti
2013-08-08 16:26 ` Xiao Guangrong
2013-11-20 0:29 ` Marcelo Tosatti
2013-11-20 0:35 ` Marcelo Tosatti
2013-11-20 14:20 ` Xiao Guangrong
2013-11-20 19:47 ` Marcelo Tosatti
2013-11-21 4:26 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 05/12] KVM: MMU: add spte into rmap before logging dirty page Xiao Guangrong
2013-07-30 13:27 ` Paolo Bonzini
2013-07-31 7:33 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 06/12] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-08-28 7:23 ` Gleb Natapov
2013-08-28 7:50 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 07/12] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-08-28 8:12 ` Gleb Natapov
2013-08-28 8:37 ` Xiao Guangrong
2013-08-28 8:58 ` Gleb Natapov
2013-08-28 9:19 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 08/12] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-08-28 8:40 ` Gleb Natapov
2013-08-28 8:54 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-08-28 9:20 ` Gleb Natapov
2013-08-28 9:33 ` Xiao Guangrong
2013-08-28 9:46 ` Gleb Natapov
2013-08-28 10:13 ` Xiao Guangrong
2013-08-28 10:49 ` Gleb Natapov
2013-08-28 12:15 ` Xiao Guangrong
2013-08-28 13:36 ` Gleb Natapov
2013-08-29 6:50 ` Xiao Guangrong
2013-08-29 9:08 ` Gleb Natapov
2013-08-29 9:31 ` Xiao Guangrong
2013-08-29 9:51 ` Gleb Natapov
2013-08-29 11:26 ` Xiao Guangrong
2013-08-30 11:38 ` Gleb Natapov [this message]
2013-09-02 7:02 ` Xiao Guangrong
2013-08-29 9:31 ` Gleb Natapov
2013-08-29 11:33 ` Xiao Guangrong
2013-08-29 12:02 ` Xiao Guangrong
2013-08-30 11:44 ` Gleb Natapov
2013-09-02 8:50 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 10/12] KVM: MMU: allow locklessly access shadow page table out of vcpu thread Xiao Guangrong
2013-08-07 13:09 ` Takuya Yoshikawa
2013-08-07 13:19 ` Xiao Guangrong
2013-08-29 9:10 ` Gleb Natapov
2013-08-29 9:25 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 11/12] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-07-30 13:02 ` [PATCH 12/12] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-07-30 13:11 ` [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-08-03 5:09 ` Takuya Yoshikawa
2013-08-04 14:15 ` Xiao Guangrong
2013-08-29 7:16 ` Gleb Natapov
2013-08-06 13:16 ` Xiao Guangrong
2013-08-08 17:38 ` Paolo Bonzini
2013-08-09 4:51 ` Xiao Guangrong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130830113805.GA1844@redhat.com \
--to=gleb@redhat.com \
--cc=avi.kivity@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=xiaoguangrong@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).