All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org,zhengqi.arch@bytedance.com,yuzhao@google.com,yuanchu@google.com,wjl.linux@gmail.com,weixugc@google.com,vbabka@suse.cz,surenb@google.com,shakeel.butt@linux.dev,rppt@kernel.org,mhocko@suse.com,lorenzo.stoakes@oracle.com,Liam.Howlett@oracle.com,laoar.shao@gmail.com,hannes@cmpxchg.org,david@kernel.org,axelrasmussen@google.com,21cnbao@gmail.com,lenohou@gmail.com,akpm@linux-foundation.org
Subject: + mm-mglru-fix-cgroup-oom-during-mglru-state-switching.patch added to mm-new branch
Date: Sat, 28 Feb 2026 10:58:32 -0800	[thread overview]
Message-ID: <20260228185833.1737BC116D0@smtp.kernel.org> (raw)


The patch titled
     Subject: mm/mglru: fix cgroup OOM during MGLRU state switching
has been added to the -mm mm-new branch.  Its filename is
     mm-mglru-fix-cgroup-oom-during-mglru-state-switching.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-mglru-fix-cgroup-oom-during-mglru-state-switching.patch

This patch will later appear in the mm-new branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews.  Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.

The mm-new branch of mm.git is not included in linux-next

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days

------------------------------------------------------
From: Leno Hou <lenohou@gmail.com>
Subject: mm/mglru: fix cgroup OOM during MGLRU state switching
Date: Sun, 1 Mar 2026 00:10:08 +0800

When the Multi-Gen LRU (MGLRU) state is toggled dynamically, a race
condition exists between the state switching and the memory reclaim path. 
This can lead to unexpected cgroup OOM kills, even when plenty of
reclaimable memory is available.

*** Problem Description ***

The issue arises from a "reclaim vacuum" during the transition:

1. When disabling MGLRU, lru_gen_change_state() sets lrugen->enabled to
   false before the pages are drained from MGLRU lists back to
   traditional LRU lists.
2. Concurrent reclaimers in shrink_lruvec() see lrugen->enabled as false
   and skip the MGLRU path.
3. However, these pages might not have reached the traditional LRU lists
   yet, or the changes are not yet visible to all CPUs due to a lack of
   synchronization.
4. get_scan_count() subsequently finds traditional LRU lists empty,
   concludes there is no reclaimable memory, and triggers an OOM kill.

A similar race can occur during enablement, where the reclaimer sees the
new state but the MGLRU lists haven't been populated via fill_evictable()
yet.

*** Solution ***

Introduce a 'draining' state to bridge the gap during transitions:

- Use smp_store_release() and smp_load_acquire() to ensure the visibility
  of 'enabled' and 'draining' flags across CPUs.
- Modify shrink_lruvec() to allow a "joint reclaim" period. If an lruvec
  is in the 'draining' state, the reclaimer will attempt to scan MGLRU
  lists first, and then fall through to traditional LRU lists instead
  of returning early. This ensures that folios are visible to at least
  one reclaim path at any given time.

*** Reproduction ***

The issue was consistently reproduced on v6.1.157 and v6.18.3 using
a high-pressure memory cgroup (v1) environment.

Reproduction steps:
1. Create a 16GB memcg and populate it with 10GB file cache (5GB active)
   and 8GB active anonymous memory.
2. Toggle MGLRU state while performing new memory allocations to force
   direct reclaim.

Reproduction script:

#!/bin/bash
# Fixed reproduction for memcg OOM during MGLRU toggle
set -euo pipefail

MGLRU_FILE="/sys/kernel/mm/lru_gen/enabled"
CGROUP_PATH="/sys/fs/cgroup/memory/memcg_oom_test"

# Switch MGLRU dynamically in the background
switch_mglru() {
    local orig_val=$(cat "$MGLRU_FILE")
    if [[ "$orig_val" != "0x0000" ]]; then
        echo n > "$MGLRU_FILE" &
    else
        echo y > "$MGLRU_FILE" &
    fi
}

# Setup 16G memcg
mkdir -p "$CGROUP_PATH"
echo $((16 * 1024 * 1024 * 1024)) > "$CGROUP_PATH/memory.limit_in_bytes"
echo $$ > "$CGROUP_PATH/cgroup.procs"

# 1. Build memory pressure (File + Anon)
dd if=/dev/urandom of=/tmp/test_file bs=1M count=10240
dd if=/tmp/test_file of=/dev/null bs=1M # Warm up cache

stress-ng --vm 1 --vm-bytes 8G --vm-keep -t 600 &
sleep 5

# 2. Trigger switch and concurrent allocation
switch_mglru
stress-ng --vm 1 --vm-bytes 2G --vm-populate --timeout 5s || echo "OOM Triggered"

# Check OOM counter
grep oom_kill "$CGROUP_PATH/memory.oom_control"

Link: https://lkml.kernel.org/r/20260228161008.707-1-lenohou@gmail.com
Signed-off-by: Leno Hou <lenohou@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Jialing Wang <wjl.linux@gmail.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmzone.h |    2 ++
 mm/vmscan.c            |   14 +++++++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

--- a/include/linux/mmzone.h~mm-mglru-fix-cgroup-oom-during-mglru-state-switching
+++ a/include/linux/mmzone.h
@@ -577,6 +577,8 @@ struct lru_gen_folio {
 	atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS];
 	/* whether the multi-gen LRU is enabled */
 	bool enabled;
+	/* whether the multi-gen LRU is draining to LRU */
+	bool draining;
 	/* the memcg generation this lru_gen_folio belongs to */
 	u8 gen;
 	/* the list segment this lru_gen_folio belongs to */
--- a/mm/vmscan.c~mm-mglru-fix-cgroup-oom-during-mglru-state-switching
+++ a/mm/vmscan.c
@@ -5305,7 +5305,8 @@ static void lru_gen_change_state(bool en
 			VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
 			VM_WARN_ON_ONCE(!state_is_valid(lruvec));
 
-			lruvec->lrugen.enabled = enabled;
+			smp_store_release(&lruvec->lrugen.enabled, enabled);
+			smp_store_release(&lruvec->lrugen.draining, true);
 
 			while (!(enabled ? fill_evictable(lruvec) : drain_evictable(lruvec))) {
 				lruvec_unlock_irq(lruvec);
@@ -5313,6 +5314,8 @@ static void lru_gen_change_state(bool en
 				lruvec_lock_irq(lruvec);
 			}
 
+			smp_store_release(&lruvec->lrugen.draining, false);
+
 			lruvec_unlock_irq(lruvec);
 		}
 
@@ -5889,10 +5892,15 @@ static void shrink_lruvec(struct lruvec
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
 	bool proportional_reclaim;
 	struct blk_plug plug;
+	bool lrugen_enabled = smp_load_acquire(&lruvec->lrugen.enabled);
+	bool lru_draining = smp_load_acquire(&lruvec->lrugen.draining);
 
-	if (lru_gen_enabled() && !root_reclaim(sc)) {
+	if (lrugen_enabled || lru_draining && !root_reclaim(sc)) {
 		lru_gen_shrink_lruvec(lruvec, sc);
-		return;
+
+		if (!lru_draining)
+			return;
+
 	}
 
 	get_scan_count(lruvec, sc, nr);
_

Patches currently in -mm which might be from lenohou@gmail.com are

mm-mglru-fix-cgroup-oom-during-mglru-state-switching.patch


             reply	other threads:[~2026-02-28 18:58 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-28 18:58 Andrew Morton [this message]
  -- strict thread matches above, loose matches on Subject: below --
2026-03-19 22:58 + mm-mglru-fix-cgroup-oom-during-mglru-state-switching.patch added to mm-new branch Andrew Morton
2026-03-19 22:58 Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260228185833.1737BC116D0@smtp.kernel.org \
    --to=akpm@linux-foundation.org \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=axelrasmussen@google.com \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=laoar.shao@gmail.com \
    --cc=lenohou@gmail.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=mm-commits@vger.kernel.org \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=wjl.linux@gmail.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.