From: "Harry Yoo (Oracle)" <harry@kernel.org>
To: Qi Zheng <qi.zheng@linux.dev>
Cc: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
muchun.song@linux.dev, david@kernel.org,
lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com,
yosry.ahmed@linux.dev, imran.f.khan@oracle.com,
kamalesh.babulal@oracle.com, axelrasmussen@google.com,
yuanchu@google.com, weixugc@google.com,
chenridong@huaweicloud.com, mkoutny@suse.com,
akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com,
apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com,
usamaarif642@gmail.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
Qi Zheng <zhengqi.arch@bytedance.com>
Subject: Re: [PATCH v6 26/33] mm: vmscan: prepare for reparenting MGLRU folios
Date: Mon, 23 Mar 2026 22:29:16 +0900 [thread overview]
Message-ID: <acFALMLIvjP4i76U@hyeyoo> (raw)
In-Reply-To: <e75050354cdbc42221a04f7cf133292b61105548.1772711148.git.zhengqi.arch@bytedance.com>
On Thu, Mar 05, 2026 at 07:52:44PM +0800, Qi Zheng wrote:
> From: Qi Zheng <zhengqi.arch@bytedance.com>
>
> Similar to traditional LRU folios, in order to solve the dying memcg
> problem, we also need to reparenting MGLRU folios to the parent memcg when
> memcg offline.
>
> However, there are the following challenges:
>
> 1. Each lruvec has between MIN_NR_GENS and MAX_NR_GENS generations, the
> number of generations of the parent and child memcg may be different,
> so we cannot simply transfer MGLRU folios in the child memcg to the
> parent memcg as we did for traditional LRU folios.
> 2. The generation information is stored in folio->flags, but we cannot
> traverse these folios while holding the lru lock, otherwise it may
> cause softlockup.
> 3. In walk_update_folio(), the gen of folio and corresponding lru size
> may be updated, but the folio is not immediately moved to the
> corresponding lru list. Therefore, there may be folios of different
> generations on an LRU list.
> 4. In lru_gen_del_folio(), the generation to which the folio belongs is
> found based on the generation information in folio->flags, and the
> corresponding LRU size will be updated. Therefore, we need to update
> the lru size correctly during reparenting, otherwise the lru size may
> be updated incorrectly in lru_gen_del_folio().
>
> Finally, this patch chose a compromise method, which is to splice the lru
> list in the child memcg to the lru list of the same generation in the
> parent memcg during reparenting. And in order to ensure that the parent
> memcg has the same generation, we need to increase the generations in the
> parent memcg to the MAX_NR_GENS before reparenting.
>
> Of course, the same generation has different meanings in the parent and
> child memcg, this will cause confusion in the hot and cold information of
> folios. But other than that, this method is simple enough, the lru size
> is correct, and there is no need to consider some concurrency issues (such
> as lru_gen_del_folio()).
>
> To prepare for the above work, this commit implements the specific
> functions, which will be used during reparenting.
>
> Suggested-by: Harry Yoo <harry.yoo@oracle.com>
> Suggested-by: Imran Khan <imran.f.khan@oracle.com>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> Acked-by: Harry Yoo <harry.yoo@oracle.com>
> ---
> +/*
> + * Compared to traditional LRU, MGLRU faces the following challenges:
> + *
> + * 1. Each lruvec has between MIN_NR_GENS and MAX_NR_GENS generations, the
> + * number of generations of the parent and child memcg may be different,
> + * so we cannot simply transfer MGLRU folios in the child memcg to the
> + * parent memcg as we did for traditional LRU folios.
> + * 2. The generation information is stored in folio->flags, but we cannot
> + * traverse these folios while holding the lru lock, otherwise it may
> + * cause softlockup.
> + * 3. In walk_update_folio(), the gen of folio and corresponding lru size
> + * may be updated, but the folio is not immediately moved to the
> + * corresponding lru list. Therefore, there may be folios of different
> + * generations on an LRU list.
> + * 4. In lru_gen_del_folio(), the generation to which the folio belongs is
> + * found based on the generation information in folio->flags, and the
> + * corresponding LRU size will be updated. Therefore, we need to update
> + * the lru size correctly during reparenting, otherwise the lru size may
> + * be updated incorrectly in lru_gen_del_folio().
> + *
> + * Finally, we choose a compromise method, which is to splice the lru list in
> + * the child memcg to the lru list of the same generation in the parent memcg
> + * during reparenting.
> + *
> + * The same generation has different meanings in the parent and child memcg,
> + * so this compromise method will cause the LRU inversion problem. But as the
> + * system runs, this problem will be fixed automatically.
> + */
> +static void __lru_gen_reparent_memcg(struct lruvec *child_lruvec, struct lruvec *parent_lruvec,
> + int zone, int type)
> +{
> + struct lru_gen_folio *child_lrugen, *parent_lrugen;
> + enum lru_list lru = type * LRU_INACTIVE_FILE;
> + int i;
> +
> + child_lrugen = &child_lruvec->lrugen;
> + parent_lrugen = &parent_lruvec->lrugen;
> +
> + for (i = 0; i < get_nr_gens(child_lruvec, type); i++) {
> + int gen = lru_gen_from_seq(child_lrugen->max_seq - i);
> + long nr_pages = child_lrugen->nr_pages[gen][type][zone];
> + int child_lru_active = lru_gen_is_active(child_lruvec, gen) ? LRU_ACTIVE : 0;
> + int parent_lru_active = lru_gen_is_active(parent_lruvec, gen) ? LRU_ACTIVE : 0;
Not a correctness thing, but...
> + /* Assuming that child pages are colder than parent pages */
> + list_splice_init(&child_lrugen->folios[gen][type][zone],
> + &parent_lrugen->folios[gen][type][zone]);
I think the other end (tail) is where cold pages go in MGLRU just like
in the traditional LRU, since lru_to_folio(head) returns the tail folio?
> + WRITE_ONCE(child_lrugen->nr_pages[gen][type][zone], 0);
> + WRITE_ONCE(parent_lrugen->nr_pages[gen][type][zone],
> + parent_lrugen->nr_pages[gen][type][zone] + nr_pages);
> +
> + if (lru_gen_is_active(child_lruvec, gen) != lru_gen_is_active(parent_lruvec, gen)) {
> + __update_lru_size(child_lruvec, lru + child_lru_active, zone, -nr_pages);
> + __update_lru_size(parent_lruvec, lru + parent_lru_active, zone, nr_pages);
> + }
> + }
> +}
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2026-03-23 13:29 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-05 11:52 [PATCH v6 00/33] Eliminate Dying Memory Cgroup Qi Zheng
2026-03-05 11:52 ` [PATCH v6 01/33] mm: memcontrol: remove dead code of checking parent memory cgroup Qi Zheng
2026-03-05 11:52 ` [PATCH v6 02/33] mm: workingset: use folio_lruvec() in workingset_refault() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 03/33] mm: rename unlock_page_lruvec_irq and its variants Qi Zheng
2026-03-05 11:52 ` [PATCH v6 04/33] mm: vmscan: prepare for the refactoring the move_folios_to_lru() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 05/33] mm: vmscan: refactor move_folios_to_lru() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 06/33] mm: memcontrol: allocate object cgroup for non-kmem case Qi Zheng
2026-03-05 11:52 ` [PATCH v6 07/33] mm: memcontrol: return root object cgroup for root memory cgroup Qi Zheng
2026-03-05 11:52 ` [PATCH v6 08/33] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 09/33] buffer: prevent memory cgroup release in folio_alloc_buffers() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 10/33] writeback: prevent memory cgroup release in writeback module Qi Zheng
2026-03-05 11:52 ` [PATCH v6 11/33] mm: memcontrol: prevent memory cgroup release in count_memcg_folio_events() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 12/33] mm: page_io: prevent memory cgroup release in page_io module Qi Zheng
2026-03-05 11:52 ` [PATCH v6 13/33] mm: migrate: prevent memory cgroup release in folio_migrate_mapping() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 14/33] mm: mglru: prevent memory cgroup release in mglru Qi Zheng
2026-03-05 11:52 ` [PATCH v6 15/33] mm: memcontrol: prevent memory cgroup release in mem_cgroup_swap_full() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 16/33] mm: workingset: prevent memory cgroup release in lru_gen_eviction() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 17/33] mm: thp: prevent memory cgroup release in folio_split_queue_lock{_irqsave}() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 18/33] mm: zswap: prevent memory cgroup release in zswap_compress() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 19/33] mm: workingset: prevent lruvec release in workingset_refault() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 20/33] mm: zswap: prevent lruvec release in zswap_folio_swapin() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 21/33] mm: swap: prevent lruvec release in lru_gen_clear_refs() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 22/33] mm: workingset: prevent lruvec release in workingset_activation() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 23/33] mm: do not open-code lruvec lock Qi Zheng
2026-03-05 11:52 ` [PATCH v6 24/33] mm: memcontrol: prepare for reparenting LRU pages for " Qi Zheng
2026-03-05 11:52 ` [PATCH v6 25/33] mm: vmscan: prepare for reparenting traditional LRU folios Qi Zheng
2026-03-05 11:52 ` [PATCH v6 26/33] mm: vmscan: prepare for reparenting MGLRU folios Qi Zheng
2026-03-23 13:29 ` Harry Yoo (Oracle) [this message]
2026-03-24 2:46 ` Qi Zheng
2026-03-24 11:49 ` [PATCH] fix: " Qi Zheng
2026-03-25 0:28 ` Harry Yoo (Oracle)
2026-03-05 11:52 ` [PATCH v6 27/33] mm: memcontrol: refactor memcg_reparent_objcgs() Qi Zheng
2026-03-05 11:52 ` [PATCH v6 28/33] mm: workingset: use lruvec_lru_size() to get the number of lru pages Qi Zheng
2026-03-05 11:52 ` [PATCH v6 29/33] mm: memcontrol: refactor mod_memcg_state() and mod_memcg_lruvec_state() Qi Zheng
2026-04-03 21:39 ` Shakeel Butt
2026-03-05 11:52 ` [PATCH v6 30/33] mm: memcontrol: prepare for reparenting non-hierarchical stats Qi Zheng
2026-03-13 16:22 ` Michal Koutný
2026-03-16 3:47 ` Qi Zheng
2026-03-23 7:53 ` Harry Yoo (Oracle)
2026-03-23 9:47 ` Qi Zheng
2026-03-23 12:25 ` Harry Yoo (Oracle)
2026-03-24 2:54 ` Qi Zheng
2026-03-24 4:05 ` Harry Yoo (Oracle)
2026-03-24 4:25 ` Qi Zheng
2026-03-24 4:40 ` Harry Yoo (Oracle)
2026-03-05 11:52 ` [PATCH v6 31/33] mm: memcontrol: convert objcg to be per-memcg per-node type Qi Zheng
2026-03-06 20:29 ` Usama Arif
2026-03-07 8:51 ` Qi Zheng
2026-03-07 11:08 ` Usama Arif
2026-03-09 2:59 ` Qi Zheng
2026-03-09 11:29 ` [PATCH] fix: " Qi Zheng
2026-03-09 11:33 ` Usama Arif
2026-03-09 11:43 ` Qi Zheng
2026-03-05 11:52 ` [PATCH v6 32/33] mm: memcontrol: eliminate the problem of dying memory cgroup for LRU folios Qi Zheng
2026-04-06 18:11 ` Joshua Hahn
2026-04-07 2:12 ` Qi Zheng
2026-03-05 11:52 ` [PATCH v6 33/33] mm: lru: add VM_WARN_ON_ONCE_FOLIO to lru maintenance helpers Qi Zheng
2026-03-06 0:51 ` [PATCH v6 00/33] Eliminate Dying Memory Cgroup Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acFALMLIvjP4i76U@hyeyoo \
--to=harry@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=apais@linux.microsoft.com \
--cc=axelrasmussen@google.com \
--cc=bhe@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=david@kernel.org \
--cc=hamzamahfooz@linux.microsoft.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hughd@google.com \
--cc=imran.f.khan@oracle.com \
--cc=kamalesh.babulal@oracle.com \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=qi.zheng@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=weixugc@google.com \
--cc=yosry.ahmed@linux.dev \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.