Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: avi.kivity@gmail.com, mtosatti@redhat.com, pbonzini@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker
Date: Mon, 02 Sep 2013 15:02:07 +0800	[thread overview]
Message-ID: <522437EF.8040002@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130830113805.GA1844@redhat.com>

On 08/30/2013 07:38 PM, Gleb Natapov wrote:
> On Thu, Aug 29, 2013 at 07:26:40PM +0800, Xiao Guangrong wrote:
>> On 08/29/2013 05:51 PM, Gleb Natapov wrote:
>>> On Thu, Aug 29, 2013 at 05:31:42PM +0800, Xiao Guangrong wrote:
>>>>> As Documentation/RCU/whatisRCU.txt says:
>>>>>
>>>>>         As with rcu_assign_pointer(), an important function of
>>>>>         rcu_dereference() is to document which pointers are protected by
>>>>>         RCU, in particular, flagging a pointer that is subject to changing
>>>>>         at any time, including immediately after the rcu_dereference().
>>>>>         And, again like rcu_assign_pointer(), rcu_dereference() is
>>>>>         typically used indirectly, via the _rcu list-manipulation
>>>>>         primitives, such as list_for_each_entry_rcu().
>>>>>
>>>>> The documentation aspect of rcu_assign_pointer()/rcu_dereference() is
>>>>> important. The code is complicated, so self documentation will not hurt.
>>>>> I want to see what is actually protected by rcu here. Freeing shadow
>>>>> pages with call_rcu() further complicates matters: does it mean that
>>>>> shadow pages are also protected by rcu? 
>>>>
>>>> Yes, it stops shadow page to be freed when we do write-protection on
>>>> it.
>>>>
>>> Yeah, I got the trick, what I am saying that we have a data structure
>>> here protected by RCU, but we do not use RCU functions to access it...
>>
>> Yes, they are not used when insert a spte into rmap and get the rmap from
>> the entry... but do we need to use these functions to guarantee the order?
>>
>> The worst case is, we fetch the spte from the desc but the spte is not
>> updated yet, we can happily skip this spte since it will set the
>> dirty-bitmap later, this is guaranteed by the barrier between mmu_spte_update()
>> and mark_page_dirty(), the code is:
>>
>> set_spte():
>>
>> 	if (mmu_spte_update(sptep, spte))
>> 		kvm_flush_remote_tlbs(vcpu->kvm);
>>
>> 	if (!remap) {
>> 		if (rmap_add(vcpu, sptep, gfn) > RMAP_RECYCLE_THRESHOLD)
>> 			rmap_recycle(vcpu, sptep, gfn);
>>
>> 		if (level > PT_PAGE_TABLE_LEVEL)
>> 			++vcpu->kvm->stat.lpages;
>> 	}
>>
>> 	smp_wmb();
>>
>> 	if (pte_access & ACC_WRITE_MASK)
>> 		mark_page_dirty(vcpu->kvm, gfn);
>>
>> So, i guess if we can guaranteed the order by ourself, we do not need
>> to call the rcu functions explicitly...
>>
>> But, the memory barres in the rcu functions are really light on x86 (store
>> can not be reordered with store), so i do not mind to explicitly use them
>> if you think this way is more safe. :)
>>
> I think the self documentation aspect of using rcu function is also
> important.

Okay. I will use these rcu functions and measure them to see whether it'll
cause performance issue.

> 
>>> BTW why not allocate sp->spt from SLAB_DESTROY_BY_RCU cache too? We may
>>> switch write protection on a random spt occasionally if page is deleted
>>> and reused for another spt though. For last level spt it should not be a
>>> problem and for non last level we have is_last_spte() check in
>>> __rmap_write_protect_lockless(). Can it work?
>>
>> Yes, i also considered this way. It can work if we handle is_last_spte()
>> properly. Since the sp->spte can be reused, we can not get the mapping
>> level from sp. We need to encode the mapping level into spte so that
>> cmpxhg can understand if the page table has been moved to another mapping
>> level.
> Isn't one bit that says that spte is the last one enough? IIRC we
> have one more ignored bit to spare in spte.

Right. But i also want to use this way in fast_page_fault where mapping-level
is needed.

> 
>>         Could you allow me to make this optimization separately after this
>> patchset be merged?
>>
> If you think it will complicate the initial version I am fine with
> postponing it for later.

Thank you, Gleb!

next prev parent reply	other threads:[~2013-09-02  7:02 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-30 13:01 [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-07-30 13:01 ` [PATCH 01/12] KVM: MMU: remove unused parameter Xiao Guangrong
2013-08-29  7:22   ` Gleb Natapov
2013-07-30 13:02 ` [PATCH 02/12] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-07-30 13:02 ` [PATCH 03/12] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-08-02 14:55   ` Marcelo Tosatti
2013-08-02 15:42     ` Xiao Guangrong
2013-08-02 20:27       ` Marcelo Tosatti
2013-08-02 22:56         ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable Xiao Guangrong
2013-07-30 13:26   ` Paolo Bonzini
2013-07-31  7:25     ` Xiao Guangrong
2013-08-07  1:48   ` Marcelo Tosatti
2013-08-07  4:06     ` Xiao Guangrong
2013-08-08 15:06       ` Marcelo Tosatti
2013-08-08 16:26         ` Xiao Guangrong
2013-11-20  0:29       ` Marcelo Tosatti
2013-11-20  0:35         ` Marcelo Tosatti
2013-11-20 14:20         ` Xiao Guangrong
2013-11-20 19:47           ` Marcelo Tosatti
2013-11-21  4:26             ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 05/12] KVM: MMU: add spte into rmap before logging dirty page Xiao Guangrong
2013-07-30 13:27   ` Paolo Bonzini
2013-07-31  7:33     ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 06/12] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-08-28  7:23   ` Gleb Natapov
2013-08-28  7:50     ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 07/12] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-08-28  8:12   ` Gleb Natapov
2013-08-28  8:37     ` Xiao Guangrong
2013-08-28  8:58       ` Gleb Natapov
2013-08-28  9:19         ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 08/12] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-08-28  8:40   ` Gleb Natapov
2013-08-28  8:54     ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-08-28  9:20   ` Gleb Natapov
2013-08-28  9:33     ` Xiao Guangrong
2013-08-28  9:46       ` Gleb Natapov
2013-08-28 10:13         ` Xiao Guangrong
2013-08-28 10:49           ` Gleb Natapov
2013-08-28 12:15             ` Xiao Guangrong
2013-08-28 13:36               ` Gleb Natapov
2013-08-29  6:50                 ` Xiao Guangrong
2013-08-29  9:08                   ` Gleb Natapov
2013-08-29  9:31                     ` Xiao Guangrong
2013-08-29  9:51                       ` Gleb Natapov
2013-08-29 11:26                         ` Xiao Guangrong
2013-08-30 11:38                           ` Gleb Natapov
2013-09-02  7:02                             ` Xiao Guangrong [this message]
2013-08-29  9:31                   ` Gleb Natapov
2013-08-29 11:33                     ` Xiao Guangrong
2013-08-29 12:02                       ` Xiao Guangrong
2013-08-30 11:44                         ` Gleb Natapov
2013-09-02  8:50                           ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 10/12] KVM: MMU: allow locklessly access shadow page table out of vcpu thread Xiao Guangrong
2013-08-07 13:09   ` Takuya Yoshikawa
2013-08-07 13:19     ` Xiao Guangrong
2013-08-29  9:10   ` Gleb Natapov
2013-08-29  9:25     ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 11/12] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-07-30 13:02 ` [PATCH 12/12] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-07-30 13:11 ` [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-08-03  5:09 ` Takuya Yoshikawa
2013-08-04 14:15   ` Xiao Guangrong
2013-08-29  7:16   ` Gleb Natapov
2013-08-06 13:16 ` Xiao Guangrong
2013-08-08 17:38   ` Paolo Bonzini
2013-08-09  4:51     ` Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=522437EF.8040002@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=avi.kivity@gmail.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).