linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: avi.kivity@gmail.com, mtosatti@redhat.com, pbonzini@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker
Date: Thu, 29 Aug 2013 19:26:40 +0800	[thread overview]
Message-ID: <521F2FF0.9060105@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130829095120.GD22899@redhat.com>

On 08/29/2013 05:51 PM, Gleb Natapov wrote:
> On Thu, Aug 29, 2013 at 05:31:42PM +0800, Xiao Guangrong wrote:
>>> As Documentation/RCU/whatisRCU.txt says:
>>>
>>>         As with rcu_assign_pointer(), an important function of
>>>         rcu_dereference() is to document which pointers are protected by
>>>         RCU, in particular, flagging a pointer that is subject to changing
>>>         at any time, including immediately after the rcu_dereference().
>>>         And, again like rcu_assign_pointer(), rcu_dereference() is
>>>         typically used indirectly, via the _rcu list-manipulation
>>>         primitives, such as list_for_each_entry_rcu().
>>>
>>> The documentation aspect of rcu_assign_pointer()/rcu_dereference() is
>>> important. The code is complicated, so self documentation will not hurt.
>>> I want to see what is actually protected by rcu here. Freeing shadow
>>> pages with call_rcu() further complicates matters: does it mean that
>>> shadow pages are also protected by rcu? 
>>
>> Yes, it stops shadow page to be freed when we do write-protection on
>> it.
>>
> Yeah, I got the trick, what I am saying that we have a data structure
> here protected by RCU, but we do not use RCU functions to access it...

Yes, they are not used when insert a spte into rmap and get the rmap from
the entry... but do we need to use these functions to guarantee the order?

The worst case is, we fetch the spte from the desc but the spte is not
updated yet, we can happily skip this spte since it will set the
dirty-bitmap later, this is guaranteed by the barrier between mmu_spte_update()
and mark_page_dirty(), the code is:

set_spte():

	if (mmu_spte_update(sptep, spte))
		kvm_flush_remote_tlbs(vcpu->kvm);

	if (!remap) {
		if (rmap_add(vcpu, sptep, gfn) > RMAP_RECYCLE_THRESHOLD)
			rmap_recycle(vcpu, sptep, gfn);

		if (level > PT_PAGE_TABLE_LEVEL)
			++vcpu->kvm->stat.lpages;
	}

	smp_wmb();

	if (pte_access & ACC_WRITE_MASK)
		mark_page_dirty(vcpu->kvm, gfn);

So, i guess if we can guaranteed the order by ourself, we do not need
to call the rcu functions explicitly...

But, the memory barres in the rcu functions are really light on x86 (store
can not be reordered with store), so i do not mind to explicitly use them
if you think this way is more safe. :)

> BTW why not allocate sp->spt from SLAB_DESTROY_BY_RCU cache too? We may
> switch write protection on a random spt occasionally if page is deleted
> and reused for another spt though. For last level spt it should not be a
> problem and for non last level we have is_last_spte() check in
> __rmap_write_protect_lockless(). Can it work?

Yes, i also considered this way. It can work if we handle is_last_spte()
properly. Since the sp->spte can be reused, we can not get the mapping
level from sp. We need to encode the mapping level into spte so that
cmpxhg can understand if the page table has been moved to another mapping
level. Could you allow me to make this optimization separately after this
patchset be merged?





  reply	other threads:[~2013-08-29 11:26 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-30 13:01 [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-07-30 13:01 ` [PATCH 01/12] KVM: MMU: remove unused parameter Xiao Guangrong
2013-08-29  7:22   ` Gleb Natapov
2013-07-30 13:02 ` [PATCH 02/12] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-07-30 13:02 ` [PATCH 03/12] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-08-02 14:55   ` Marcelo Tosatti
2013-08-02 15:42     ` Xiao Guangrong
2013-08-02 20:27       ` Marcelo Tosatti
2013-08-02 22:56         ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable Xiao Guangrong
2013-07-30 13:26   ` Paolo Bonzini
2013-07-31  7:25     ` Xiao Guangrong
2013-08-07  1:48   ` Marcelo Tosatti
2013-08-07  4:06     ` Xiao Guangrong
2013-08-08 15:06       ` Marcelo Tosatti
2013-08-08 16:26         ` Xiao Guangrong
2013-11-20  0:29       ` Marcelo Tosatti
2013-11-20  0:35         ` Marcelo Tosatti
2013-11-20 14:20         ` Xiao Guangrong
2013-11-20 19:47           ` Marcelo Tosatti
2013-11-21  4:26             ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 05/12] KVM: MMU: add spte into rmap before logging dirty page Xiao Guangrong
2013-07-30 13:27   ` Paolo Bonzini
2013-07-31  7:33     ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 06/12] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-08-28  7:23   ` Gleb Natapov
2013-08-28  7:50     ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 07/12] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-08-28  8:12   ` Gleb Natapov
2013-08-28  8:37     ` Xiao Guangrong
2013-08-28  8:58       ` Gleb Natapov
2013-08-28  9:19         ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 08/12] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-08-28  8:40   ` Gleb Natapov
2013-08-28  8:54     ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-08-28  9:20   ` Gleb Natapov
2013-08-28  9:33     ` Xiao Guangrong
2013-08-28  9:46       ` Gleb Natapov
2013-08-28 10:13         ` Xiao Guangrong
2013-08-28 10:49           ` Gleb Natapov
2013-08-28 12:15             ` Xiao Guangrong
2013-08-28 13:36               ` Gleb Natapov
2013-08-29  6:50                 ` Xiao Guangrong
2013-08-29  9:08                   ` Gleb Natapov
2013-08-29  9:31                     ` Xiao Guangrong
2013-08-29  9:51                       ` Gleb Natapov
2013-08-29 11:26                         ` Xiao Guangrong [this message]
2013-08-30 11:38                           ` Gleb Natapov
2013-09-02  7:02                             ` Xiao Guangrong
2013-08-29  9:31                   ` Gleb Natapov
2013-08-29 11:33                     ` Xiao Guangrong
2013-08-29 12:02                       ` Xiao Guangrong
2013-08-30 11:44                         ` Gleb Natapov
2013-09-02  8:50                           ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 10/12] KVM: MMU: allow locklessly access shadow page table out of vcpu thread Xiao Guangrong
2013-08-07 13:09   ` Takuya Yoshikawa
2013-08-07 13:19     ` Xiao Guangrong
2013-08-29  9:10   ` Gleb Natapov
2013-08-29  9:25     ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 11/12] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-07-30 13:02 ` [PATCH 12/12] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-07-30 13:11 ` [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-08-03  5:09 ` Takuya Yoshikawa
2013-08-04 14:15   ` Xiao Guangrong
2013-08-29  7:16   ` Gleb Natapov
2013-08-06 13:16 ` Xiao Guangrong
2013-08-08 17:38   ` Paolo Bonzini
2013-08-09  4:51     ` Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=521F2FF0.9060105@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=avi.kivity@gmail.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).