From: Chengming Zhou <chengming.zhou@linux.dev>
To: Johannes Weiner <hannes@cmpxchg.org>, Nhat Pham <nphamcs@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
akpm@linux-foundation.org, cerasuolodomenico@gmail.com,
yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org,
vitaly.wool@konsulko.com, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeelb@google.com,
muchun.song@linux.dev, chrisl@kernel.org, linux-mm@kvack.org,
kernel-team@meta.com, linux-kernel@vger.kernel.org,
cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kselftest@vger.kernel.org, shuah@kernel.org
Subject: Re: [PATCH v8 1/6] list_lru: allows explicit memcg and NUMA node selection
Date: Mon, 4 Dec 2023 16:30:29 +0800 [thread overview]
Message-ID: <e3e319f5-9bcd-4c35-92e6-6fdb33eaa080@linux.dev> (raw)
In-Reply-To: <20231130203522.GC543908@cmpxchg.org>
On 2023/12/1 04:35, Johannes Weiner wrote:
> On Thu, Nov 30, 2023 at 12:07:41PM -0800, Nhat Pham wrote:
>> On Thu, Nov 30, 2023 at 11:57 AM Matthew Wilcox <willy@infradead.org> wrote:
>>>
>>> On Thu, Nov 30, 2023 at 11:40:18AM -0800, Nhat Pham wrote:
>>>> This patch changes list_lru interface so that the caller must explicitly
>>>> specify numa node and memcg when adding and removing objects. The old
>>>> list_lru_add() and list_lru_del() are renamed to list_lru_add_obj() and
>>>> list_lru_del_obj(), respectively.
>>>
>>> Wouldn't it be better to add list_lru_add_memcg() and
>>> list_lru_del_memcg() and have:
>>>
>>> +bool list_lru_del(struct list_lru *lru, struct list_head *item)
>>> +{
>>> + int nid = page_to_nid(virt_to_page(item));
>>> + struct mem_cgroup *memcg = list_lru_memcg_aware(lru) ?
>>> + mem_cgroup_from_slab_obj(item) : NULL;
>>> +
>>> + return list_lru_del_memcg(lru, item, nid, memcg);
>>> +}
>>>
>>> Seems like _most_ callers will want the original versions and only
>>> a few will want the explicit memcg/nid versions. No?
>>>
>>
>> I actually did something along that line in earlier iterations of this
>> patch series (albeit with poorer naming - __list_lru_add() instead of
>> list_lru_add_memcg()). The consensus after some back and forth was
>> that the original list_lru_add() was not a very good design (the
>> better one was this new version that allows for explicit numa/memcg
>> selection). So I agreed to fix it everywhere as a prep patch.
>>
>> I don't have strong opinions here to be completely honest, but I do
>> think this new API makes more sense (at the cost of quite a bit of
>> elbow grease to fix every callsites and extra reviewing).
>
> Maybe I can shed some light since I was pushing for doing it this way.
>
> The quiet assumption that 'struct list_head *item' is (embedded in) a
> slab object that is also charged to a cgroup is a bit much, given that
> nothing in the name or documentation of the function points to that.
>
> It bit us in the THP shrinker where that list head is embedded in a
> tailpage (virt_to_page(page) is fun to debug). And it caused some
> confusion in this case as well, where the zswap entry is a slab object
> but not charged (the entry descriptor is not attractive for cgroup
> accounting, only the backing memory it points to.)
Hi,
I have a question, maybe I missed something since I haven't read all
the earlier versions.
IIUC, the problem here is that "zswap_entry" has different memcg and node
than the "page", so I wonder if we can just charge "zswap_entry" to the
same memcg of the "page".
Like we can do these when allocating the "zswap_entry":
old_memcg = set_active_memcg(memcg)
kmem_cache_alloc_lru(zswap_entry_cache, lru, gfp)
set_active_memcg(old_memcg)
The good points are:
1. "zswap_entry" is charged to the memcg of "page", which is more sensible?
2. We can reuse the kmem_cache_alloc_lru() interface, which makes code simpler
since we don't need to manage list_lru_memcg by ourselves.
3. Maybe the new list_lru_add() and list_lru_del() are not needed anymore?
Since the "zswap_entry" is of the same memcg and node with the "page".
But don't know if THP shrinker still need it.
Thanks!
>
> Yes, for most users - at least right now - the current assumption is
> accurate. The thinking was just that if we do have to differentiate
> callers now anyway, we might as well make the interface a bit more
> self-documenting and harder to misuse going forward, even if it's a
> bit more churn now.
>
>
next prev parent reply other threads:[~2023-12-04 8:36 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-30 19:40 [PATCH v8 0/6] workload-specific and memory pressure-driven zswap writeback Nhat Pham
2023-11-30 19:40 ` [PATCH v8 1/6] list_lru: allows explicit memcg and NUMA node selection Nhat Pham
2023-11-30 19:57 ` Matthew Wilcox
2023-11-30 20:07 ` Nhat Pham
2023-11-30 20:35 ` Johannes Weiner
2023-12-04 8:30 ` Chengming Zhou [this message]
2023-12-04 17:48 ` Nhat Pham
2023-12-05 2:28 ` Chengming Zhou
2023-12-05 0:30 ` Chris Li
2023-12-05 17:17 ` Johannes Weiner
2023-11-30 19:40 ` [PATCH v8 2/6] memcontrol: implement mem_cgroup_tryget_online() Nhat Pham
2023-12-05 0:35 ` Chris Li
2023-12-05 1:39 ` Nhat Pham
2023-12-06 0:16 ` Chris Li
2023-12-06 1:30 ` Nhat Pham
2023-12-05 18:02 ` Yosry Ahmed
2023-12-05 19:55 ` Nhat Pham
2023-11-30 19:40 ` [PATCH v8 3/6] zswap: make shrinking memcg-aware Nhat Pham
2023-12-05 18:20 ` Yosry Ahmed
2023-12-05 18:49 ` Nhat Pham
2023-12-05 18:59 ` Yosry Ahmed
2023-12-05 19:09 ` Nhat Pham
2023-12-05 19:54 ` [PATCH v8 3/6] zswap: make shrinking memcg-aware (fix) Nhat Pham
2023-12-06 0:10 ` [PATCH v8 3/6] zswap: make shrinking memcg-aware Chris Li
2023-12-06 1:53 ` Nhat Pham
2023-12-06 3:03 ` Nhat Pham
2023-12-06 3:06 ` [PATCH v8 3/6] zswap: make shrinking memcg-aware (fix 2) Nhat Pham
2023-11-30 19:40 ` [PATCH v8 4/6] mm: memcg: add per-memcg zswap writeback stat Nhat Pham
2023-12-05 18:21 ` Yosry Ahmed
2023-12-05 18:56 ` Nhat Pham
2023-12-05 19:33 ` [PATCH v8 4/6] mm: memcg: add per-memcg zswap writeback stat (fix) Nhat Pham
2023-12-05 20:05 ` Yosry Ahmed
2023-12-08 0:25 ` Chris Li
2023-11-30 19:40 ` [PATCH v8 5/6] selftests: cgroup: update per-memcg zswap writeback selftest Nhat Pham
2023-12-08 0:43 ` Chris Li
2023-11-30 19:40 ` [PATCH v8 6/6] zswap: shrinks zswap pool based on memory pressure Nhat Pham
2023-12-06 5:51 ` Chengming Zhou
2023-12-06 5:59 ` Yosry Ahmed
2023-12-06 6:43 ` Chengming Zhou
2023-12-06 7:36 ` Yosry Ahmed
2023-12-06 7:39 ` Chengming Zhou
2023-12-06 16:56 ` Nhat Pham
2023-12-06 19:47 ` Nhat Pham
2023-12-06 21:13 ` Yosry Ahmed
2023-12-07 2:32 ` Chengming Zhou
2023-12-06 19:44 ` [PATCH v8 6/6] zswap: shrinks zswap pool based on memory pressure (fix) Nhat Pham
2023-11-30 21:19 ` [PATCH v8 0/6] workload-specific and memory pressure-driven zswap writeback Andrew Morton
2023-12-06 4:10 ` Bagas Sanjaya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e3e319f5-9bcd-4c35-92e6-6fdb33eaa080@linux.dev \
--to=chengming.zhou@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=cerasuolodomenico@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=chrisl@kernel.org \
--cc=ddstreet@ieee.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=shuah@kernel.org \
--cc=sjenning@redhat.com \
--cc=vitaly.wool@konsulko.com \
--cc=willy@infradead.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.