From: Qi Zheng <qi.zheng@linux.dev>
To: Chen Ridong <chenridong@huaweicloud.com>,
hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
muchun.song@linux.dev, david@redhat.com,
lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com,
imran.f.khan@oracle.com, kamalesh.babulal@oracle.com,
axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com,
akpm@linux-foundation.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
cgroups@vger.kernel.org, Muchun Song <songmuchun@bytedance.com>,
Qi Zheng <zhengqi.arch@bytedance.com>
Subject: Re: [PATCH v1 05/26] mm: memcontrol: allocate object cgroup for non-kmem case
Date: Fri, 21 Nov 2025 16:17:52 +0800 [thread overview]
Message-ID: <b2a64a62-9cf2-4d81-bcd0-1af7e64a34db@linux.dev> (raw)
In-Reply-To: <f31661d8-21e4-4626-86bb-8a8daa5d47ef@huaweicloud.com>
On 11/21/25 11:58 AM, Chen Ridong wrote:
>
>
> On 2025/10/28 21:58, Qi Zheng wrote:
>> From: Muchun Song <songmuchun@bytedance.com>
>>
>> Pagecache pages are charged at allocation time and hold a reference
>> to the original memory cgroup until reclaimed. Depending on memory
>> pressure, page sharing patterns between different cgroups and cgroup
>> creation/destruction rates, many dying memory cgroups can be pinned
>> by pagecache pages, reducing page reclaim efficiency and wasting
>> memory. Converting LRU folios and most other raw memory cgroup pins
>> to the object cgroup direction can fix this long-living problem.
>>
>> As a result, the objcg infrastructure is no longer solely applicable
>> to the kmem case. In this patch, we extend the scope of the objcg
>> infrastructure beyond the kmem case, enabling LRU folios to reuse
>> it for folio charging purposes.
>>
>> It should be noted that LRU folios are not accounted for at the root
>> level, yet the folio->memcg_data points to the root_mem_cgroup. Hence,
>> the folio->memcg_data of LRU folios always points to a valid pointer.
>> However, the root_mem_cgroup does not possess an object cgroup.
>> Therefore, we also allocate an object cgroup for the root_mem_cgroup.
>>
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
>> mm/memcontrol.c | 51 +++++++++++++++++++++++--------------------------
>> 1 file changed, 24 insertions(+), 27 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index d5257465c9d75..2afd7f99ca101 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -204,10 +204,10 @@ static struct obj_cgroup *obj_cgroup_alloc(void)
>> return objcg;
>> }
>>
>> -static void memcg_reparent_objcgs(struct mem_cgroup *memcg,
>> - struct mem_cgroup *parent)
>> +static void memcg_reparent_objcgs(struct mem_cgroup *memcg)
>> {
>> struct obj_cgroup *objcg, *iter;
>> + struct mem_cgroup *parent = parent_mem_cgroup(memcg);
>>
>> objcg = rcu_replace_pointer(memcg->objcg, NULL, true);
>>
>> @@ -3302,30 +3302,17 @@ unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
>> return val;
>> }
>>
>> -static int memcg_online_kmem(struct mem_cgroup *memcg)
>> +static void memcg_online_kmem(struct mem_cgroup *memcg)
>> {
>> - struct obj_cgroup *objcg;
>> -
>> if (mem_cgroup_kmem_disabled())
>> - return 0;
>> + return;
>>
>> if (unlikely(mem_cgroup_is_root(memcg)))
>> - return 0;
>> -
>> - objcg = obj_cgroup_alloc();
>> - if (!objcg)
>> - return -ENOMEM;
>> -
>> - objcg->memcg = memcg;
>> - rcu_assign_pointer(memcg->objcg, objcg);
>> - obj_cgroup_get(objcg);
>> - memcg->orig_objcg = objcg;
>> + return;
>>
>> static_branch_enable(&memcg_kmem_online_key);
>>
>> memcg->kmemcg_id = memcg->id.id;
>> -
>> - return 0;
>> }
>>
>> static void memcg_offline_kmem(struct mem_cgroup *memcg)
>> @@ -3340,12 +3327,6 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg)
>>
>> parent = parent_mem_cgroup(memcg);
>> memcg_reparent_list_lrus(memcg, parent);
>> -
>> - /*
>> - * Objcg's reparenting must be after list_lru's, make sure list_lru
>> - * helpers won't use parent's list_lru until child is drained.
>> - */
>> - memcg_reparent_objcgs(memcg, parent);
>> }
>>
>> #ifdef CONFIG_CGROUP_WRITEBACK
>> @@ -3862,9 +3843,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>> static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
>> {
>> struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>> + struct obj_cgroup *objcg;
>>
>> - if (memcg_online_kmem(memcg))
>> - goto remove_id;
>> + memcg_online_kmem(memcg);
>>
>> /*
>> * A memcg must be visible for expand_shrinker_info()
>> @@ -3874,6 +3855,15 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
>> if (alloc_shrinker_info(memcg))
>> goto offline_kmem;
>>
>> + objcg = obj_cgroup_alloc();
>> + if (!objcg)
>> + goto free_shrinker;
>> +
>> + objcg->memcg = memcg;
>> + rcu_assign_pointer(memcg->objcg, objcg);
>> + obj_cgroup_get(objcg);
>> + memcg->orig_objcg = objcg;
>> +
>
> Will it be better to add a helper function like obj_cgroup_init()?
This part is not complicated, and it is only called in this one place,
so perhaps there is no need to add a helper function?
Of course, it doesn't matter, I'm fine with either method.
>
>> if (unlikely(mem_cgroup_is_root(memcg)) && !mem_cgroup_disabled())
>> queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
>> FLUSH_TIME);
>> @@ -3896,9 +3886,10 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
>> xa_store(&mem_cgroup_ids, memcg->id.id, memcg, GFP_KERNEL);
>>
>> return 0;
>> +free_shrinker:
>> + free_shrinker_info(memcg);
>> offline_kmem:
>> memcg_offline_kmem(memcg);
>> -remove_id:
>> mem_cgroup_id_remove(memcg);
>> return -ENOMEM;
>> }
>> @@ -3916,6 +3907,12 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>>
>> memcg_offline_kmem(memcg);
>> reparent_deferred_split_queue(memcg);
>> + /*
>> + * The reparenting of objcg must be after the reparenting of the
>> + * list_lru and deferred_split_queue above, which ensures that they will
>> + * not mistakenly get the parent list_lru and deferred_split_queue.
>> + */
>> + memcg_reparent_objcgs(memcg);
>> reparent_shrinker_deferred(memcg);
>> wb_memcg_offline(memcg);
>> lru_gen_offline_memcg(memcg);
>
next prev parent reply other threads:[~2025-11-21 8:18 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-28 13:58 [PATCH v1 00/26] Eliminate Dying Memory Cgroup Qi Zheng
2025-10-28 13:58 ` [PATCH v1 01/26] mm: memcontrol: remove dead code of checking parent memory cgroup Qi Zheng
2025-11-07 1:40 ` Harry Yoo
2025-11-20 9:07 ` Chen Ridong
2025-10-28 13:58 ` [PATCH v1 02/26] mm: workingset: use folio_lruvec() in workingset_refault() Qi Zheng
2025-11-07 1:55 ` Harry Yoo
2025-10-28 13:58 ` [PATCH v1 03/26] mm: rename unlock_page_lruvec_irq and its variants Qi Zheng
2025-11-07 2:03 ` Harry Yoo
2025-11-20 12:27 ` Chen Ridong
2025-10-28 13:58 ` [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru() Qi Zheng
2025-11-07 5:11 ` Harry Yoo
2025-11-07 6:41 ` Qi Zheng
2025-11-07 13:20 ` Harry Yoo
2025-11-08 6:32 ` Shakeel Butt
2025-11-10 2:13 ` Harry Yoo
2025-11-10 4:30 ` Qi Zheng
2025-11-10 5:43 ` Harry Yoo
2025-11-10 6:11 ` Qi Zheng
2025-11-10 16:47 ` Shakeel Butt
2025-11-11 0:42 ` Harry Yoo
2025-11-11 3:04 ` Qi Zheng
2025-11-11 3:16 ` Harry Yoo
2025-11-11 3:23 ` Qi Zheng
2025-11-11 8:49 ` Sebastian Andrzej Siewior
2025-11-11 16:44 ` Shakeel Butt
2025-11-12 7:49 ` Sebastian Andrzej Siewior
2025-11-12 8:46 ` Harry Yoo
2025-11-12 8:54 ` Sebastian Andrzej Siewior
2025-11-12 15:45 ` Steven Rostedt
2025-11-11 3:17 ` Shakeel Butt
2025-11-11 3:24 ` Qi Zheng
2025-11-07 7:18 ` Sebastian Andrzej Siewior
2025-10-28 13:58 ` [PATCH v1 05/26] mm: memcontrol: allocate object cgroup for non-kmem case Qi Zheng
2025-11-17 8:02 ` Harry Yoo
2025-11-21 3:58 ` Chen Ridong
2025-11-21 8:17 ` Qi Zheng [this message]
2025-10-28 13:58 ` [PATCH v1 06/26] mm: memcontrol: return root object cgroup for root memory cgroup Qi Zheng
2025-11-17 9:17 ` Harry Yoo
2025-11-17 9:41 ` Harry Yoo
2025-11-18 11:31 ` Qi Zheng
2025-11-18 11:28 ` Qi Zheng
2025-11-18 12:11 ` Qi Zheng
2025-11-19 7:24 ` Harry Yoo
2025-11-19 7:42 ` Qi Zheng
2025-11-18 12:12 ` Harry Yoo
2025-11-19 6:40 ` Qi Zheng
2025-10-28 13:58 ` [PATCH v1 07/26] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio() Qi Zheng
2025-11-19 8:06 ` Harry Yoo
2025-11-20 13:32 ` Qi Zheng
2025-10-28 13:58 ` [PATCH v1 08/26] buffer: prevent memory cgroup release in folio_alloc_buffers() Qi Zheng
2025-11-19 8:10 ` Harry Yoo
2025-10-28 13:58 ` [PATCH v1 09/26] writeback: prevent memory cgroup release in writeback module Qi Zheng
2025-11-19 9:18 ` Harry Yoo
2025-10-28 13:58 ` [PATCH v1 10/26] mm: memcontrol: prevent memory cgroup release in count_memcg_folio_events() Qi Zheng
2025-11-19 9:21 ` Harry Yoo
2025-10-28 13:58 ` [PATCH v1 11/26] mm: page_io: prevent memory cgroup release in page_io module Qi Zheng
2025-11-19 9:26 ` Harry Yoo
2025-11-20 13:34 ` Qi Zheng
2025-10-28 13:58 ` [PATCH v1 12/26] mm: migrate: prevent memory cgroup release in folio_migrate_mapping() Qi Zheng
2025-11-19 10:00 ` Harry Yoo
2025-10-28 13:58 ` [PATCH v1 13/26] mm: mglru: prevent memory cgroup release in mglru Qi Zheng
2025-11-19 10:13 ` Harry Yoo
2025-11-20 13:39 ` Qi Zheng
2025-10-28 13:58 ` [PATCH v1 14/26] mm: memcontrol: prevent memory cgroup release in mem_cgroup_swap_full() Qi Zheng
2025-11-20 7:51 ` Harry Yoo
2025-10-28 13:58 ` [PATCH v1 15/26] mm: workingset: prevent memory cgroup release in lru_gen_eviction() Qi Zheng
2025-11-20 8:26 ` Harry Yoo
2025-10-28 13:58 ` [PATCH v1 16/26] mm: thp: prevent memory cgroup release in folio_split_queue_lock{_irqsave}() Qi Zheng
2025-11-20 8:53 ` Harry Yoo
2025-10-28 13:58 ` [PATCH v1 17/26] mm: workingset: prevent lruvec release in workingset_refault() Qi Zheng
2025-11-20 9:40 ` Harry Yoo
2025-10-28 13:58 ` [PATCH v1 18/26] mm: zswap: prevent lruvec release in zswap_folio_swapin() Qi Zheng
2025-11-20 9:42 ` Harry Yoo
2025-10-28 13:58 ` [PATCH v1 19/26] mm: swap: prevent lruvec release in swap module Qi Zheng
2025-11-20 9:52 ` Harry Yoo
2025-11-20 13:41 ` Qi Zheng
2025-10-28 13:58 ` [PATCH v1 20/26] mm: workingset: prevent lruvec release in workingset_activation() Qi Zheng
2025-11-20 9:54 ` Harry Yoo
2025-10-28 13:58 ` [PATCH v1 21/26] mm: memcontrol: prepare for reparenting LRU pages for lruvec lock Qi Zheng
2025-11-04 6:49 ` kernel test robot
2025-11-04 8:59 ` Qi Zheng
2025-11-21 3:15 ` Harry Yoo
2025-11-21 8:01 ` Qi Zheng
2025-10-28 13:58 ` [PATCH v1 22/26] mm: vmscan: prepare for reparenting traditional LRU folios Qi Zheng
2025-11-21 10:11 ` Harry Yoo
2025-10-28 13:58 ` [PATCH v1 23/26] mm: vmscan: prepare for reparenting MGLRU folios Qi Zheng
2025-11-25 9:55 ` Harry Yoo
2025-11-26 2:44 ` Qi Zheng
2025-11-26 13:48 ` Harry Yoo
2025-11-27 3:48 ` Qi Zheng
2025-12-01 15:40 ` Qi Zheng
2025-12-01 21:50 ` Yuanchu Xie
2025-12-02 3:04 ` Qi Zheng
2025-10-28 13:58 ` [PATCH v1 24/26] mm: memcontrol: refactor memcg_reparent_objcgs() Qi Zheng
2025-10-28 13:58 ` [PATCH v1 25/26] mm: memcontrol: eliminate the problem of dying memory cgroup for LRU folios Qi Zheng
2025-11-14 17:56 ` Michal Koutný
2025-11-20 11:56 ` Chen Ridong
2025-11-20 13:45 ` Qi Zheng
2025-10-28 13:58 ` [PATCH v1 26/26] mm: lru: add VM_WARN_ON_ONCE_FOLIO to lru maintenance helpers Qi Zheng
2025-10-28 20:58 ` [syzbot ci] Re: Eliminate Dying Memory Cgroup syzbot ci
2025-10-29 0:22 ` Harry Yoo
2025-10-29 0:25 ` syzbot ci
2025-10-29 3:12 ` Qi Zheng
2025-10-29 7:53 ` [PATCH v1 00/26] " Michal Hocko
2025-10-29 8:05 ` Qi Zheng
2025-10-31 10:35 ` Michal Hocko
2025-11-03 3:33 ` Qi Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b2a64a62-9cf2-4d81-bcd0-1af7e64a34db@linux.dev \
--to=qi.zheng@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hughd@google.com \
--cc=imran.f.khan@oracle.com \
--cc=kamalesh.babulal@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=songmuchun@bytedance.com \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.