public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
From: "Michal Koutný" <mkoutny-IBi9RG/b67k@public.gmane.org>
To: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Richard Palethorpe <rpalethorpe-IBi9RG/b67k@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Roman Gushchin
	<roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>,
	Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>,
	Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org>,
	"Matthew Wilcox (Oracle)"
	<willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Yang Shi <shy828301-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Suren Baghdasaryan
	<surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Chris Down <chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org>
Subject: [RFC PATCH] mm: memcg: Do not count memory.low reclaim if it does not happen
Date: Tue, 22 Mar 2022 19:22:48 +0100	[thread overview]
Message-ID: <20220322182248.29121-1-mkoutny@suse.com> (raw)

This was observed with memcontrol selftest/new LTP test but can be also
reproduced in simplified setup of two siblings:

	`parent .low=50M
	  ` s1	.low=50M  .current=50M+ε
	  ` s2  .low=0M   .current=50M

The expectation is that s2/memory.events:low will be zero under outer
reclaimer since no protection should be given to cgroup s2 (even with
memory_recursiveprot).

However, this does not happen. The apparent reason is that when s1 is
considered for (proportional) reclaim the scanned proportion is rounded
up to SWAP_CLUSTER_MAX and slightly over-proportional amount is
reclaimed. Consequently, when the effective low value of s2 is
calculated, it observes unclaimed parent's protection from s1
(ε-SWAP_CLUSTER_MAX in theory) and effectively appropriates it.
The effect is slightly regularized protection (workload dependent)
between siblings and misreported MEMCG_LOW event when reclaiming s2 with
this protection.

Fix the behavior by not reporting breached memory.low in such
situations. (This affects also setups where all siblings have
memory.low=0, parent's memory.events:low will still be non-zero when
parent's memory.low is breached but it will be reduced by the events
originated in children.)

Fixes: 8a931f801340 ("mm: memcontrol: recursive memory.low protection")
Reported-by: Richard Palethorpe <rpalethorpe-IBi9RG/b67k@public.gmane.org>
Link: https://lore.kernel.org/all/20220321101429.3703-1-rpalethorpe-IBi9RG/b67k@public.gmane.org/
Signed-off-by: Michal Koutný <mkoutny-IBi9RG/b67k@public.gmane.org>
---
 include/linux/memcontrol.h | 8 ++++----
 mm/vmscan.c                | 5 +++--
 2 files changed, 7 insertions(+), 6 deletions(-)

Why is this RFC?

1) It changes number of events observed on parent/memory.events:low (especially
   for truly recursive configs where all children specify memory.low=0).
   IIUC past discussions about equality of all-zeros and all-infinities, those
   eagerly reported MEMCG_LOW events (in latter case) were deemed skewing the
   stats [1].
2) The observed behavior slightly impacts distribution of parent's memory.low. 
   Constructed example is a passive protected workload in s1 and active in s2
   (active ~ counteracts the reclaim with allocations). It could strip
   protection from s1 one by one (one:=SWAP_CLUSTER_MAX/2^sc.priority).
   That may be considered both wrong (s1 should have been more protected) or
   correct s2 deserves protection due to its activity.
   I don't have (didn't collect) data for this, so I think just masking the
   false events is sufficient (or independent).

[1] https://lore.kernel.org/r/20200221185839.GB70967-druUgvl0LCNAfugRpC6u6w@public.gmane.org

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0abbd685703b..99ac72e00bff 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -626,13 +626,13 @@ static inline bool mem_cgroup_supports_protection(struct mem_cgroup *memcg)
 
 }
 
-static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg)
+static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg, bool effective)
 {
 	if (!mem_cgroup_supports_protection(memcg))
 		return false;
 
-	return READ_ONCE(memcg->memory.elow) >=
-		page_counter_read(&memcg->memory);
+	return page_counter_read(&memcg->memory) <= (effective ?
+		READ_ONCE(memcg->memory.elow) :	READ_ONCE(memcg->memory.low));
 }
 
 static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
@@ -1177,7 +1177,7 @@ static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root,
 {
 }
 
-static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg)
+static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg, bool effective)
 {
 	return false;
 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 59b14e0d696c..3bdb35d6bee6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3152,7 +3152,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
 			 * If there is no reclaimable memory, OOM.
 			 */
 			continue;
-		} else if (mem_cgroup_below_low(memcg)) {
+		} else if (mem_cgroup_below_low(memcg, true)) {
 			/*
 			 * Soft protection.
 			 * Respect the protection only as long as
@@ -3163,7 +3163,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
 				sc->memcg_low_skipped = 1;
 				continue;
 			}
-			memcg_memory_event(memcg, MEMCG_LOW);
+			if (mem_cgroup_below_low(memcg, false))
+				memcg_memory_event(memcg, MEMCG_LOW);
 		}
 
 		reclaimed = sc->nr_reclaimed;
-- 
2.35.1


             reply	other threads:[~2022-03-22 18:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-22 18:22 Michal Koutný [this message]
     [not found] ` <20220322182248.29121-1-mkoutny-IBi9RG/b67k@public.gmane.org>
2022-03-23 21:44   ` [RFC PATCH] mm: memcg: Do not count memory.low reclaim if it does not happen Roman Gushchin
     [not found]     ` <YjuUuLW+8iRtYOmP-cx5fftMpWqeCjSd+JxjunQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2022-03-24  9:51       ` Michal Koutný
2022-03-24 18:17         ` Roman Gushchin
     [not found]           ` <5049EBC3-5BAE-4509-BA63-1F4A7D913517-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
2022-03-25 10:31             ` Michal Koutný

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220322182248.29121-1-mkoutny@suse.com \
    --to=mkoutny-ibi9rg/b67k@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mhocko-IBi9RG/b67k@public.gmane.org \
    --cc=roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org \
    --cc=rpalethorpe-IBi9RG/b67k@public.gmane.org \
    --cc=shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=shy828301-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org \
    --cc=surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=vbabka-AlSwsSmVLrQ@public.gmane.org \
    --cc=willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox