From: Muchun Song <muchun.song@linux.dev>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Dave Chinner <david@fromorbit.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Qi Zheng <qi.zheng@linux.dev>, Kairui Song <kasong@tencent.com>,
Meta kernel team <kernel-team@meta.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, Chris Mason <clm@fb.com>,
stable@vger.kernel.org
Subject: Re: [PATCH v2] mm/list_lru: drain before clearing xarray entry on reparent
Date: Tue, 2 Jun 2026 10:37:25 +0800 [thread overview]
Message-ID: <8CD76642-E809-4E81-9F12-DB110A9F958D@linux.dev> (raw)
In-Reply-To: <20260601161501.1444829-1-shakeel.butt@linux.dev>
> On Jun 2, 2026, at 00:15, Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> memcg_reparent_list_lrus() clears the dying memcg's xarray entry with
> xas_store(&xas, NULL) before reparenting its per-node lists into the
> parent. This opens a window where a concurrent list_lru_del() arriving
> for the dying memcg sees xa_load() == NULL, walks to the parent in
> lock_list_lru_of_memcg(), takes the parent's per-node lock, and calls
> list_del_init() on an item still physically linked on the dying
> memcg's list.
>
> If another in-flight thread holds the dying memcg's per-node lock at
> the same moment (another list_lru_del, or a list_lru_walk_one running
> an isolate callback), both threads modify ->next/->prev pointers on the
> same physical list under different locks. Adjacent items can corrupt
> each other's links.
>
> Fix it by reversing the order: reparent each per-node list and mark the
> child's list lru dead and then clear the xarray entry. Any concurrent
> list_lru op that finds the still-set xarray entry either takes the dying
> memcg's per-node lock (synchronizing with the drain) or sees LONG_MIN
> and walks to the parent, where the items now live.
>
> Fixes: fb56fdf8b9a2 ("mm/list_lru: split the lock to per-cgroup scope")
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reported-by: Chris Mason <clm@fb.com>
> Cc: stable@vger.kernel.org
> ---
> Changes since v1:
> - Use xa_erase_irq() instead of xa_erase() (Sashiko & Claude).
> - Added comment on CSS_DYING check in memcg_list_lru_alloc avoiding a new mlru
> allocation.
>
> mm/list_lru.c | 21 ++++++++++++---------
> 1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index dd29bcf8eb5f..d454bce9a78e 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -473,26 +473,29 @@ void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *paren
> mutex_lock(&list_lrus_mutex);
> list_for_each_entry(lru, &memcg_list_lrus, list) {
> struct list_lru_memcg *mlru;
> - XA_STATE(xas, &lru->xa, memcg->kmemcg_id);
>
> /*
> - * Lock the Xarray to ensure no on going list_lru_memcg
> - * allocation and further allocation will see css_is_dying().
> + * css_is_dying() check in memcg_list_lru_alloc() avoids
> + * allocating a new mlru since CSS_DYING is already set for this
> + * memcg a rcu grace period ago.
I see. xas_lock_irqsave() in memcg_list_lru_alloc() functions as an RCU read lock.
Acked-by: Muchun Song <muchun.song@linux.dev>
Thanks.
> */
> - xas_lock_irq(&xas);
> - mlru = xas_store(&xas, NULL);
> - xas_unlock_irq(&xas);
> + mlru = xa_load(&lru->xa, memcg->kmemcg_id);
> if (!mlru)
> continue;
>
> /*
> - * With Xarray value set to NULL, holding the lru lock below
> - * prevents list_lru_{add,del,isolate} from touching the lru,
> - * safe to reparent.
> + * Reparent each per-node list and mark the child dead
> + * (LONG_MIN) before clearing xarray entry otherwise a
> + * concurrent list_lru_del() may corrupt the list if it arrives
> + * after xarray clear but before reparenting as
> + * lock_list_lru_of_memcg will acquire parent's lock while the
> + * item is still on child's list.
> */
> for_each_node(i)
> memcg_reparent_list_lru_one(lru, i, &mlru->node[i], parent);
>
> + xa_erase_irq(&lru->xa, memcg->kmemcg_id);
> +
> /*
> * Here all list_lrus corresponding to the cgroup are guaranteed
> * to remain empty, we can safely free this lru, any further
> --
> 2.53.0-Meta
>
prev parent reply other threads:[~2026-06-02 2:38 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-01 16:15 [PATCH v2] mm/list_lru: drain before clearing xarray entry on reparent Shakeel Butt
2026-06-01 17:50 ` Kairui Song
2026-06-02 2:37 ` Muchun Song [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8CD76642-E809-4E81-9F12-DB110A9F958D@linux.dev \
--to=muchun.song@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=clm@fb.com \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=qi.zheng@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.