From: Chengming Zhou <chengming.zhou@linux.dev>
To: "Christoph Lameter (Ampere)" <cl@linux.com>,
Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>,
Jianfeng Wang <jianfeng.w.wang@oracle.com>,
penberg@kernel.org, iamjoonsoo.kim@lge.com,
akpm@linux-foundation.org, roman.gushchin@linux.dev,
42.hyeyoo@gmail.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Chengming Zhou <zhouchengming@bytedance.com>
Subject: Re: [PATCH] slub: avoid scanning all partial slabs in get_slabinfo()
Date: Tue, 27 Feb 2024 17:30:31 +0800 [thread overview]
Message-ID: <eec445a6-7024-40b6-9d4e-7fc2bc71cce7@linux.dev> (raw)
In-Reply-To: <036f2bb4-b086-2988-e46d-86d399405687@linux.com>
On 2024/2/27 01:38, Christoph Lameter (Ampere) wrote:
> On Fri, 23 Feb 2024, Vlastimil Babka wrote:
>
>> On 2/23/24 10:37, Chengming Zhou wrote:
>>> On 2024/2/23 17:24, Vlastimil Babka wrote:
>>>>
>>>>>>
>>>>>
>>>>> I think this is a better direction! We can use RCU list if slab can be freed by RCU.
>>>>
>>>> Often we remove slab from the partial list for other purposes than freeing -
>>>> i.e. to become a cpu (partial) slab, and that can't be handled by a rcu
>>>> callback nor can we wait a grace period in such situations.
>>>
>>> IMHO, only free_slab() need to use call_rcu() to delay free the slab,
>>> other paths like taking partial slabs from node partial list don't need
>>> to wait for RCU grace period.
>>>
>>> All we want is safely lockless iterate over the node partial list, right?
>>
>> Yes, and for that there's the "list_head slab_list", which is in union with
>> "struct slab *next" and "int slabs" for the cpu partial list. So if we
>> remove a slab from the partial list and rewrite the list_head for the
>> partial list purposes, it will break the lockless iterators, right? We would
>> have to wait a grace period between unlinking the slab from partial list (so
>> no new iterators can reach it), and reusing the list_head (so we are sure
>> the existing iterators stopped looking at our slab).
>
> We could mark the state change (list ownership) in the slab metadata and then abort the scan if the state mismatches the list.
It seems feasible, maybe something like below?
But this way needs all kmem_caches have SLAB_TYPESAFE_BY_RCU, right?
Not sure if this is acceptable? Which may cause random delay of memory free.
```
retry:
rcu_read_lock();
h = rcu_dereference(list_next_rcu(&n->partial));
while (h != &n->partial) {
slab = list_entry(h, struct slab, slab_list);
/* Recheck slab with node list lock. */
spin_lock_irqsave(&n->list_lock, flags);
if (!slab_test_node_partial(slab)) {
spin_unlock_irqrestore(&n->list_lock, flags);
rcu_read_unlock();
goto retry;
}
/* Count this slab's inuse. */
/* Get the next pointer with node list lock. */
h = rcu_dereference(list_next_rcu(h));
spin_unlock_irqrestore(&n->list_lock, flags);
}
rcu_read_unlock();
```
>
>> Maybe there's more advanced rcu tricks but this is my basic understanding
>> how this works.
>
> This could get tricky but we already do similar things with RCU slabs objects/metadata where we allow the resuse of the object before the RCU period expires and there is an understanding that the user of those objects need to verify the type of object matching expectations when looking for objects.
>
next prev parent reply other threads:[~2024-02-27 9:30 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-15 21:14 [PATCH] slub: avoid scanning all partial slabs in get_slabinfo() Jianfeng Wang
2024-02-18 19:25 ` David Rientjes
2024-02-19 8:30 ` Vlastimil Babka
2024-02-19 9:29 ` Chengming Zhou
2024-02-19 10:17 ` Vlastimil Babka
2024-02-22 13:20 ` Chengming Zhou
2024-02-23 3:02 ` Christoph Lameter (Ampere)
2024-02-23 3:36 ` Chengming Zhou
2024-02-23 3:50 ` Christoph Lameter (Ampere)
2024-02-23 5:00 ` Chengming Zhou
2024-02-23 9:24 ` Vlastimil Babka
2024-02-23 9:37 ` Chengming Zhou
2024-02-23 9:46 ` Chengming Zhou
2024-02-23 9:51 ` Vlastimil Babka
2024-02-26 17:38 ` Christoph Lameter (Ampere)
2024-02-27 9:30 ` Chengming Zhou [this message]
2024-02-27 22:55 ` Christoph Lameter (Ampere)
2024-02-28 9:51 ` Chengming Zhou
2024-03-14 0:38 ` Jianfeng Wang
2024-03-14 23:45 ` Christoph Lameter (Ampere)
2024-02-23 7:36 ` Jianfeng Wang
2024-02-23 9:17 ` Vlastimil Babka
2024-02-20 18:41 ` Jianfeng Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eec445a6-7024-40b6-9d4e-7fc2bc71cce7@linux.dev \
--to=chengming.zhou@linux.dev \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=jianfeng.w.wang@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=vbabka@suse.cz \
--cc=zhouchengming@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).