cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michal Koutný" <mkoutny@suse.com>
To: cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Cc: "Michal Koutný" <mkoutny@suse.com>,
	"Martin Doucha" <mdoucha@suse.cz>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Shakeel Butt" <shakeel.butt@linux.dev>,
	"Muchun Song" <muchun.song@linux.dev>,
	"Andrew Morton" <akpm@linux-foundation.org>
Subject: [RFC PATCH] memcontrol: Wait for draining of remote stocks to avoid OOM when charging
Date: Fri, 30 May 2025 17:18:57 +0200	[thread overview]
Message-ID: <20250530151858.672391-1-mkoutny@suse.com> (raw)

The LTP memcontrol03.c checks behavior of memory.min protection under
relatively tight conditions -- there is 2MiB margin for allocating task
below test's memory.max.

MEMCG_CHARGE_BATCH might be over-charged to page_counters temporarily
but this alone should not lead to OOM because this overcharged amount is
retrieved by draining stock. Or is it?

I suspect this may cause troubles when there is >MEMCG_CHARGE_BATCH charge
preceded by a small charge:

  try_charge_memcg(memcg, ..., 1);
    // counter->usage += 64
    // local stock = 63
    // no OOM but counter->usage > counter->max
  // running on different CPU
  try_charge_memcg(memcg, ..., 65);
    // 4M in stock + 148M new charge, only 150M w/out hard protection to reclaim
    try_to_free_mem_cgroup_pages
      if (cpu == curcpu)
        drain_local_stock // this would be ok
      else
        schedule_work_on(cpu, &stock->work); // this is asynchronous
      // charging+(no more)reclaim is retried MAX_RECLAIM_RETRIES = 16 times
      // if other cpu stock aren't flushed by now, this may cause OOM

This effect is pronounced on machines with 64k page size where it makes
MEMCG_CHARGE_BATCH worth whopping 4MiB (per CPU).

Prevent the premature OOM by waiting for stock flushing (even) from remote
CPUs.

Link: https://lore.kernel.org/ltp/144b6bac-edba-470a-bf87-abf492d85ef5@suse.cz/
Reported-by: Martin Doucha <mdoucha@suse.cz>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Tested-by: Martin Doucha <mdoucha@suse.cz>
---
 mm/memcontrol-v1.h |  2 +-
 mm/memcontrol.c    | 15 ++++++++++-----
 2 files changed, 11 insertions(+), 6 deletions(-)

My reason(s) for RFC:
1) I'm not sure if there isn't a simpler way than flushing stocks over
   all CPUs (also the guard with gfpflags_allow_blocking() is there only
   for explicitness, in case the code was moved over).
2) It requires specific scheduling over CPUs, so it may not be so common
   and severe in practice.

diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h
index 6358464bb4160..3e57645d0c175 100644
--- a/mm/memcontrol-v1.h
+++ b/mm/memcontrol-v1.h
@@ -24,7 +24,7 @@
 
 unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap);
 
-void drain_all_stock(struct mem_cgroup *root_memcg);
+void drain_all_stock(struct mem_cgroup *root_memcg, bool sync);
 
 unsigned long memcg_events(struct mem_cgroup *memcg, int event);
 unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2d4d65f25fecd..ddf905baab12d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1911,7 +1911,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
  * Drains all per-CPU charge caches for given root_memcg resp. subtree
  * of the hierarchy under it.
  */
-void drain_all_stock(struct mem_cgroup *root_memcg)
+void drain_all_stock(struct mem_cgroup *root_memcg, bool sync)
 {
 	int cpu, curcpu;
 
@@ -1948,6 +1948,11 @@ void drain_all_stock(struct mem_cgroup *root_memcg)
 				schedule_work_on(cpu, &stock->work);
 		}
 	}
+	if (sync)
+		for_each_online_cpu(cpu) {
+			struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
+			flush_work(&stock->work);
+		}
 	migrate_enable();
 	mutex_unlock(&percpu_charge_mutex);
 }
@@ -2307,7 +2312,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 		goto retry;
 
 	if (!drained) {
-		drain_all_stock(mem_over_limit);
+		drain_all_stock(mem_over_limit, gfpflags_allow_blocking(gfp_mask));
 		drained = true;
 		goto retry;
 	}
@@ -3773,7 +3778,7 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	wb_memcg_offline(memcg);
 	lru_gen_offline_memcg(memcg);
 
-	drain_all_stock(memcg);
+	drain_all_stock(memcg, false);
 
 	mem_cgroup_id_put(memcg);
 }
@@ -4205,7 +4210,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
 			break;
 
 		if (!drained) {
-			drain_all_stock(memcg);
+			drain_all_stock(memcg, false);
 			drained = true;
 			continue;
 		}
@@ -4253,7 +4258,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
 			break;
 
 		if (!drained) {
-			drain_all_stock(memcg);
+			drain_all_stock(memcg, false);
 			drained = true;
 			continue;
 		}

base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
-- 
2.49.0


             reply	other threads:[~2025-05-30 15:19 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-30 15:18 Michal Koutný [this message]
2025-06-12 11:37 ` [RFC PATCH] memcontrol: Wait for draining of remote stocks to avoid OOM when charging Michal Koutný

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250530151858.672391-1-mkoutny@suse.com \
    --to=mkoutny@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mdoucha@suse.cz \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).