From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: avi.kivity@gmail.com, mtosatti@redhat.com, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker
Date: Thu, 29 Aug 2013 17:31:42 +0800 [thread overview]
Message-ID: <521F14FE.3070900@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130829090833.GA22899@redhat.com>
On 08/29/2013 05:08 PM, Gleb Natapov wrote:
> On Thu, Aug 29, 2013 at 02:50:51PM +0800, Xiao Guangrong wrote:
>>>>> BTW I do not see
>>>>> rcu_assign_pointer()/rcu_dereference() in your patches which hints on
>>>>
>>>> IIUC, We can not directly use rcu_assign_pointer(), that is something like:
>>>> p = v to assign a pointer to a pointer. But in our case, we need:
>>>> *pte_list = (unsigned long)desc | 1;
>>> >From Documentation/RCU/whatisRCU.txt:
>>>
>>> The updater uses this function to assign a new value to an RCU-protected pointer.
>>>
>>> This is what we do, no? (assuming slot->arch.rmap[] is what rcu protects here)
>>> The fact that the value is not correct pointer should not matter.
>>>
>>
>> Okay. Will change that code to:
>>
>> +
>> +#define rcu_assign_head_desc(pte_list_p, value) \
>> + rcu_assign_pointer(*(unsigned long __rcu **)(pte_list_p), (unsigned long *)(value))
>> +
>> /*
>> * Pte mapping structures:
>> *
>> @@ -1006,14 +1010,7 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
>> desc->sptes[1] = spte;
>> desc_mark_nulls(pte_list, desc);
>>
>> - /*
>> - * Esure the old spte has been updated into desc, so
>> - * that the another side can not get the desc from pte_list
>> - * but miss the old spte.
>> - */
>> - smp_wmb();
>> -
>> - *pte_list = (unsigned long)desc | 1;
>> + rcu_assign_head_desc(pte_list, (unsigned long)desc | 1);
>>
>>>>
>>>> So i add the smp_wmb() by myself:
>>>> /*
>>>> * Esure the old spte has been updated into desc, so
>>>> * that the another side can not get the desc from pte_list
>>>> * but miss the old spte.
>>>> */
>>>> smp_wmb();
>>>>
>>>> *pte_list = (unsigned long)desc | 1;
>>>>
>>>> But i missed it when inserting a empty desc, in that case, we need the barrier
>>>> too since we should make desc->more visible before assign it to pte_list to
>>>> avoid the lookup side seeing the invalid "nulls".
>>>>
>>>> I also use own code instead of rcu_dereference():
>>>> pte_list_walk_lockless():
>>>> pte_list_value = ACCESS_ONCE(*pte_list);
>>>> if (!pte_list_value)
>>>> return;
>>>>
>>>> if (!(pte_list_value & 1))
>>>> return fn((u64 *)pte_list_value);
>>>>
>>>> /*
>>>> * fetch pte_list before read sptes in the desc, see the comments
>>>> * in pte_list_add().
>>>> *
>>>> * There is the data dependence since the desc is got from pte_list.
>>>> */
>>>> smp_read_barrier_depends();
>>>>
>>>> That part can be replaced by rcu_dereference().
>>>>
>>> Yes please, also see commit c87a124a5d5e8cf8e21c4363c3372bcaf53ea190 for
>>> kind of scary bugs we can get here.
>>
>> Right, it is likely trigger-able in our case, will fix it.
>>
>>>
>>>>> incorrect usage of RCU. I think any access to slab pointers will need to
>>>>> use those.
>>>>
>>>> Remove desc is not necessary i think since we do not mind to see the old
>>>> info. (hlist_nulls_del_rcu() does not use rcu_dereference() too)
>>>>
>>> May be a bug. I also noticed that rculist_nulls uses rcu_dereference()
>>
>> But list_del_rcu() does not use rcu_assign_pointer() too.
>>
> This also suspicious.
>
>>> to access ->next, but it does not use rcu_assign_pointer() pointer to
>>> assign it.
>>
>> You mean rcu_dereference() is used in hlist_nulls_for_each_entry_rcu()? I think
>> it's because we should validate the prefetched data before entry->next is
>> accessed, it is paired with the barrier in rcu_assign_pointer() when add a
>> new entry into the list. rcu_assign_pointer() make other fields in the entry
>> be visible before linking entry to the list. Otherwise, the lookup can access
>> that entry but get the invalid fields.
>>
>> After more thinking, I still think rcu_assign_pointer() is unneeded when a entry
>> is removed. The remove-API does not care the order between unlink the entry and
>> the changes to its fields. It is the caller's responsibility:
>> - in the case of rcuhlist, the caller uses call_rcu()/synchronize_rcu(), etc to
>> enforce all lookups exit and the later change on that entry is invisible to the
>> lookups.
>>
>> - In the case of rculist_nulls, it seems refcounter is used to guarantee the order
>> (see the example from Documentation/RCU/rculist_nulls.txt).
>>
>> - In our case, we allow the lookup to see the deleted desc even if it is in slab cache
>> or its is initialized or it is re-added.
>>
>> Your thought?
>>
>
> As Documentation/RCU/whatisRCU.txt says:
>
> As with rcu_assign_pointer(), an important function of
> rcu_dereference() is to document which pointers are protected by
> RCU, in particular, flagging a pointer that is subject to changing
> at any time, including immediately after the rcu_dereference().
> And, again like rcu_assign_pointer(), rcu_dereference() is
> typically used indirectly, via the _rcu list-manipulation
> primitives, such as list_for_each_entry_rcu().
>
> The documentation aspect of rcu_assign_pointer()/rcu_dereference() is
> important. The code is complicated, so self documentation will not hurt.
> I want to see what is actually protected by rcu here. Freeing shadow
> pages with call_rcu() further complicates matters: does it mean that
> shadow pages are also protected by rcu?
Yes, it stops shadow page to be freed when we do write-protection on
it.
next prev parent reply other threads:[~2013-08-29 9:31 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-30 13:01 [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-07-30 13:01 ` [PATCH 01/12] KVM: MMU: remove unused parameter Xiao Guangrong
2013-08-29 7:22 ` Gleb Natapov
2013-07-30 13:02 ` [PATCH 02/12] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-07-30 13:02 ` [PATCH 03/12] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-08-02 14:55 ` Marcelo Tosatti
2013-08-02 15:42 ` Xiao Guangrong
2013-08-02 20:27 ` Marcelo Tosatti
2013-08-02 22:56 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable Xiao Guangrong
2013-07-30 13:26 ` Paolo Bonzini
2013-07-31 7:25 ` Xiao Guangrong
2013-08-07 1:48 ` Marcelo Tosatti
2013-08-07 4:06 ` Xiao Guangrong
2013-08-08 15:06 ` Marcelo Tosatti
2013-08-08 16:26 ` Xiao Guangrong
2013-11-20 0:29 ` Marcelo Tosatti
2013-11-20 0:35 ` Marcelo Tosatti
2013-11-20 14:20 ` Xiao Guangrong
2013-11-20 19:47 ` Marcelo Tosatti
2013-11-21 4:26 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 05/12] KVM: MMU: add spte into rmap before logging dirty page Xiao Guangrong
2013-07-30 13:27 ` Paolo Bonzini
2013-07-31 7:33 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 06/12] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-08-28 7:23 ` Gleb Natapov
2013-08-28 7:50 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 07/12] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-08-28 8:12 ` Gleb Natapov
2013-08-28 8:37 ` Xiao Guangrong
2013-08-28 8:58 ` Gleb Natapov
2013-08-28 9:19 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 08/12] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-08-28 8:40 ` Gleb Natapov
2013-08-28 8:54 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-08-28 9:20 ` Gleb Natapov
2013-08-28 9:33 ` Xiao Guangrong
2013-08-28 9:46 ` Gleb Natapov
2013-08-28 10:13 ` Xiao Guangrong
2013-08-28 10:49 ` Gleb Natapov
2013-08-28 12:15 ` Xiao Guangrong
2013-08-28 13:36 ` Gleb Natapov
2013-08-29 6:50 ` Xiao Guangrong
2013-08-29 9:08 ` Gleb Natapov
2013-08-29 9:31 ` Xiao Guangrong [this message]
2013-08-29 9:51 ` Gleb Natapov
2013-08-29 11:26 ` Xiao Guangrong
2013-08-30 11:38 ` Gleb Natapov
2013-09-02 7:02 ` Xiao Guangrong
2013-08-29 9:31 ` Gleb Natapov
2013-08-29 11:33 ` Xiao Guangrong
2013-08-29 12:02 ` Xiao Guangrong
2013-08-30 11:44 ` Gleb Natapov
2013-09-02 8:50 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 10/12] KVM: MMU: allow locklessly access shadow page table out of vcpu thread Xiao Guangrong
2013-08-07 13:09 ` Takuya Yoshikawa
2013-08-07 13:19 ` Xiao Guangrong
2013-08-29 9:10 ` Gleb Natapov
2013-08-29 9:25 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 11/12] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-07-30 13:02 ` [PATCH 12/12] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-07-30 13:11 ` [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-08-03 5:09 ` Takuya Yoshikawa
2013-08-04 14:15 ` Xiao Guangrong
2013-08-29 7:16 ` Gleb Natapov
2013-08-06 13:16 ` Xiao Guangrong
2013-08-08 17:38 ` Paolo Bonzini
2013-08-09 4:51 ` Xiao Guangrong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=521F14FE.3070900@linux.vnet.ibm.com \
--to=xiaoguangrong@linux.vnet.ibm.com \
--cc=avi.kivity@gmail.com \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).