From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: avi.kivity@gmail.com, mtosatti@redhat.com, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker
Date: Mon, 02 Sep 2013 16:50:18 +0800 [thread overview]
Message-ID: <5224514A.4040307@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130830114426.GB1844@redhat.com>
On 08/30/2013 07:44 PM, Gleb Natapov wrote:
> On Thu, Aug 29, 2013 at 08:02:30PM +0800, Xiao Guangrong wrote:
>> On 08/29/2013 07:33 PM, Xiao Guangrong wrote:
>>> On 08/29/2013 05:31 PM, Gleb Natapov wrote:
>>>> On Thu, Aug 29, 2013 at 02:50:51PM +0800, Xiao Guangrong wrote:
>>>>> After more thinking, I still think rcu_assign_pointer() is unneeded when a entry
>>>>> is removed. The remove-API does not care the order between unlink the entry and
>>>>> the changes to its fields. It is the caller's responsibility:
>>>>> - in the case of rcuhlist, the caller uses call_rcu()/synchronize_rcu(), etc to
>>>>> enforce all lookups exit and the later change on that entry is invisible to the
>>>>> lookups.
>>>>>
>>>>> - In the case of rculist_nulls, it seems refcounter is used to guarantee the order
>>>>> (see the example from Documentation/RCU/rculist_nulls.txt).
>>>>>
>>>>> - In our case, we allow the lookup to see the deleted desc even if it is in slab cache
>>>>> or its is initialized or it is re-added.
>>>>>
>>>> BTW is it a good idea? We can access deleted desc while it is allocated
>>>> and initialized to zero by kmem_cache_zalloc(), are we sure we cannot
>>>> see partially initialized desc->sptes[] entry? On related note what about
>>>> 32 bit systems, they do not have atomic access to desc->sptes[].
>>
>> Ah... wait. desc is a array of pointers:
>>
>> struct pte_list_desc {
>> u64 *sptes[PTE_LIST_EXT];
>> struct pte_list_desc *more;
>> };
>>
> Yep, I misread it to be u64 bits and wondered why do we use u64 to store
> pointers.
>
>> assigning a pointer is aways aotomic, but we should carefully initialize it
>> as you said. I will introduce a constructor for desc slab cache which initialize
>> the struct like this:
>>
>> for (i = 0; i < PTE_LIST_EXT; i++)
>> desc->sptes[i] = NULL;
>>
>> It is okay.
>>
> I hope slab does not write anything into allocated memory internally if
> constructor is present.
If only constructor is present (no SLAB_DESTROY_BY_RCU), It'll temporarily
write the "poison" value into the memory then call the constructor to initialize
it again, e.g, in slab.c:
static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
gfp_t flags, void *objp, unsigned long caller)
{
if (cachep->flags & SLAB_POISON) {
......
poison_obj(cachep, objp, POISON_INUSE);
}
......
if (cachep->ctor && cachep->flags & SLAB_POISON)
cachep->ctor(objp);
}
But SLAB_DESTROY_BY_RCU can force the allocer to don't touch
the memory, this is true in our case.
> BTW do you know what happens when SLAB debug is enabled
> and SLAB_DESTROY_BY_RCU is set?
When SLAB debug is enabled, these 3 flags may be set:
#define SLAB_DEBUG_FREE 0x00000100UL /* DEBUG: Perform (expensive) checks on free */
#define SLAB_RED_ZONE 0x00000400UL /* DEBUG: Red zone objs in a cache */
#define SLAB_POISON 0x00000800UL /* DEBUG: Poison objects */
Only SLAB_POISON can write something into the memory, and ...
> Does poison value is written into freed
> object (freed to slab, but not yet to page allocator)?
SLAB_POISON is cleared if SLAB_DESTROY_BY_RCU is used.
- In slab, There is the code in __kmem_cache_create():
if (flags & SLAB_DESTROY_BY_RCU)
BUG_ON(flags & SLAB_POISON);
- In slub, the code is in calculate_sizes():
/*
* Determine if we can poison the object itself. If the user of
* the slab may touch the object after free or before allocation
* then we should never poison the object itself.
*/
if ((flags & SLAB_POISON) && !(flags & SLAB_DESTROY_BY_RCU) &&
!s->ctor)
s->flags |= __OBJECT_POISON;
else
s->flags &= ~__OBJECT_POISON;
- in slob, it seems it does not support SLAB DEBUG.
next prev parent reply other threads:[~2013-09-02 8:50 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-30 13:01 [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-07-30 13:01 ` [PATCH 01/12] KVM: MMU: remove unused parameter Xiao Guangrong
2013-08-29 7:22 ` Gleb Natapov
2013-07-30 13:02 ` [PATCH 02/12] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-07-30 13:02 ` [PATCH 03/12] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-08-02 14:55 ` Marcelo Tosatti
2013-08-02 15:42 ` Xiao Guangrong
2013-08-02 20:27 ` Marcelo Tosatti
2013-08-02 22:56 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable Xiao Guangrong
2013-07-30 13:26 ` Paolo Bonzini
2013-07-31 7:25 ` Xiao Guangrong
2013-08-07 1:48 ` Marcelo Tosatti
2013-08-07 4:06 ` Xiao Guangrong
2013-08-08 15:06 ` Marcelo Tosatti
2013-08-08 16:26 ` Xiao Guangrong
2013-11-20 0:29 ` Marcelo Tosatti
2013-11-20 0:35 ` Marcelo Tosatti
2013-11-20 14:20 ` Xiao Guangrong
2013-11-20 19:47 ` Marcelo Tosatti
2013-11-21 4:26 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 05/12] KVM: MMU: add spte into rmap before logging dirty page Xiao Guangrong
2013-07-30 13:27 ` Paolo Bonzini
2013-07-31 7:33 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 06/12] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-08-28 7:23 ` Gleb Natapov
2013-08-28 7:50 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 07/12] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-08-28 8:12 ` Gleb Natapov
2013-08-28 8:37 ` Xiao Guangrong
2013-08-28 8:58 ` Gleb Natapov
2013-08-28 9:19 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 08/12] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-08-28 8:40 ` Gleb Natapov
2013-08-28 8:54 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-08-28 9:20 ` Gleb Natapov
2013-08-28 9:33 ` Xiao Guangrong
2013-08-28 9:46 ` Gleb Natapov
2013-08-28 10:13 ` Xiao Guangrong
2013-08-28 10:49 ` Gleb Natapov
2013-08-28 12:15 ` Xiao Guangrong
2013-08-28 13:36 ` Gleb Natapov
2013-08-29 6:50 ` Xiao Guangrong
2013-08-29 9:08 ` Gleb Natapov
2013-08-29 9:31 ` Xiao Guangrong
2013-08-29 9:51 ` Gleb Natapov
2013-08-29 11:26 ` Xiao Guangrong
2013-08-30 11:38 ` Gleb Natapov
2013-09-02 7:02 ` Xiao Guangrong
2013-08-29 9:31 ` Gleb Natapov
2013-08-29 11:33 ` Xiao Guangrong
2013-08-29 12:02 ` Xiao Guangrong
2013-08-30 11:44 ` Gleb Natapov
2013-09-02 8:50 ` Xiao Guangrong [this message]
2013-07-30 13:02 ` [PATCH 10/12] KVM: MMU: allow locklessly access shadow page table out of vcpu thread Xiao Guangrong
2013-08-07 13:09 ` Takuya Yoshikawa
2013-08-07 13:19 ` Xiao Guangrong
2013-08-29 9:10 ` Gleb Natapov
2013-08-29 9:25 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 11/12] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-07-30 13:02 ` [PATCH 12/12] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-07-30 13:11 ` [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-08-03 5:09 ` Takuya Yoshikawa
2013-08-04 14:15 ` Xiao Guangrong
2013-08-29 7:16 ` Gleb Natapov
2013-08-06 13:16 ` Xiao Guangrong
2013-08-08 17:38 ` Paolo Bonzini
2013-08-09 4:51 ` Xiao Guangrong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5224514A.4040307@linux.vnet.ibm.com \
--to=xiaoguangrong@linux.vnet.ibm.com \
--cc=avi.kivity@gmail.com \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.