linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/3] add max arg to swappiness in memory.reclaim and lru_gen
@ 2025-04-09  7:06 Zhongkun He
  2025-04-09  7:06 ` [PATCH V3 1/3] mm: add swappiness=max arg to memory.reclaim for only anon reclaim Zhongkun He
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Zhongkun He @ 2025-04-09  7:06 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, yosry.ahmed, muchun.song, yuzhao, linux-mm,
	linux-kernel, Zhongkun He

This patchset add max arg to swappiness in memory.reclaim and lru_gen
for anon only proactive memory reclaim.

With the patch 'commit <68cd9050d871> ("mm: add swappiness= arg to
memory.reclaim")', we can submit an additional swappiness=<val> argument
to memory.reclaim. It is very useful because we can dynamically adjust
the reclamation ratio based on the anonymous folios and file folios of
each cgroup. For example,when swappiness is set to 0, we only reclaim
from file folios. But we can not relciam memory just from anon folios.

This patch introduces a new macro, SWAPPINESS_ANON_ONLY, defined as
MAX_SWAPPINESS + 1, represent the max arg semantics. It specifically
indicates that reclamation should occur only from anonymous pages.

Patch 1 add swappiness=max arg to memory.reclaim
suggested-by: Yosry Ahmed

Patch 2 add max arg to lru_gen for proactie memory reclaim in MGLRU.
The MGLRU already supports reclaiming exclusively from anonymous pages.
This patch formalizes that behavior by introducing a max parameter to
represent the corresponding semantics, and explicitly uses the
SWAPPINESS_ANON_ONLY macro in the code to denote this logic.

Patch 3 add more comments for cache_trim_mode from
Johannes Weiner in [1].

[1]:
https://lore.kernel.org/all/20250314141833.GA1316033@cmpxchg.org/

Here is the previous discussion:
https://lore.kernel.org/all/20250314033350.1156370-1-hezhongkun.hzk@bytedance.com/
https://lore.kernel.org/all/20250312094337.2296278-1-hezhongkun.hzk@bytedance.com/
https://lore.kernel.org/all/20250318135330.3358345-1-hezhongkun.hzk@bytedance.com/

V3:
In MGLRU, add max swappiness arg to lru_gen for proactive memory reclaim. 
Add the use of SWAPPINESS_ANON_ONLY in place of 'MAX_SWAPPINESS + 1' to
improves code clarity and makes the intention more explicit.

Add more comments about cache_trim_mode.

V2:
Add max arg to swappiness as the mode of reclaim from anon memory only
in memory.reclaim.

Zhongkun He (3):
  mm: add swappiness=max arg to memory.reclaim for only anon reclaim
  mm: add max swappiness arg to lru_gen for anonymous memory only
  mm: vmscan: add more comments about cache_trim_mode

 Documentation/admin-guide/cgroup-v2.rst       |  3 ++
 Documentation/admin-guide/mm/multigen_lru.rst |  5 +--
 include/linux/swap.h                          |  4 +++
 mm/memcontrol.c                               |  5 +++
 mm/vmscan.c                                   | 36 ++++++++++++++-----
 5 files changed, 43 insertions(+), 10 deletions(-)

-- 
2.39.5


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V3 1/3] mm: add swappiness=max arg to memory.reclaim for only anon reclaim
  2025-04-09  7:06 [PATCH V3 0/3] add max arg to swappiness in memory.reclaim and lru_gen Zhongkun He
@ 2025-04-09  7:06 ` Zhongkun He
  2025-04-10  2:09   ` Andrew Morton
  2025-04-09  7:06 ` [PATCH V3 2/3] mm: add max swappiness arg to lru_gen for anonymous memory only Zhongkun He
  2025-04-09  7:06 ` [PATCH V3 3/3] mm: vmscan: add more comments about cache_trim_mode Zhongkun He
  2 siblings, 1 reply; 12+ messages in thread
From: Zhongkun He @ 2025-04-09  7:06 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, yosry.ahmed, muchun.song, yuzhao, linux-mm,
	linux-kernel, Zhongkun He

With this patch 'commit <68cd9050d871> ("mm: add swappiness= arg to
memory.reclaim")', we can submit an additional swappiness=<val> argument
to memory.reclaim. It is very useful because we can dynamically adjust
the reclamation ratio based on the anonymous folios and file folios of
each cgroup. For example,when swappiness is set to 0, we only reclaim
from file folios.

However,we have also encountered a new issue: when swappiness is set to
the MAX_SWAPPINESS, it may still only reclaim file folios.

So, we hope to add a new arg 'swappiness=max' in memory.reclaim where
proactive memory reclaim only reclaims from anonymous folios when
swappiness is set to max. The swappiness semantics from a user
perspective remain unchanged.

For example, something like this:

echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim

will perform reclaim on the rootcg with a swappiness setting of 'max' (a
new mode) regardless of the file folios. Users have a more comprehensive
view of the application's memory distribution because there are many
metrics available. For example, if we find that a certain cgroup has a
large number of inactive anon folios, we can reclaim only those and skip
file folios, because with the zram/zswap, the IO tradeoff that
cache_trim_mode or other file first logic is making doesn't hold -
file refaults will cause IO, whereas anon decompression will not.

With this patch, the swappiness argument of memory.reclaim has a new
mode 'max', means reclaiming just from anonymous folios both in traditional
LRU and MGLRU.

Here is the previous discussion:
https://lore.kernel.org/all/20250314033350.1156370-1-hezhongkun.hzk@bytedance.com/
https://lore.kernel.org/all/20250312094337.2296278-1-hezhongkun.hzk@bytedance.com/
https://lore.kernel.org/all/20250318135330.3358345-1-hezhongkun.hzk@bytedance.com/

Suggested-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 3 +++
 include/linux/swap.h                    | 4 ++++
 mm/memcontrol.c                         | 5 +++++
 mm/vmscan.c                             | 7 +++++++
 4 files changed, 19 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 1a16ce68a4d7..472c01e0eb2c 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1348,6 +1348,9 @@ The following nested keys are defined.
 	same semantics as vm.swappiness applied to memcg reclaim with
 	all the existing limitations and potential future extensions.
 
+	The valid range for swappiness is [0-200, max], setting
+	swappiness=max exclusively reclaims anonymous memory.
+
   memory.peak
 	A read-write single value file which exists on non-root cgroups.
 
diff --git a/include/linux/swap.h b/include/linux/swap.h
index db46b25a65ae..f57c7e0012ba 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -414,6 +414,10 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 #define MEMCG_RECLAIM_PROACTIVE (1 << 2)
 #define MIN_SWAPPINESS 0
 #define MAX_SWAPPINESS 200
+
+/* Just recliam from anon folios in proactive memory reclaim */
+#define SWAPPINESS_ANON_ONLY (MAX_SWAPPINESS + 1)
+
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 						  unsigned long nr_pages,
 						  gfp_t gfp_mask,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 421740f1bcdc..b0b3411dc0df 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4396,11 +4396,13 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
 
 enum {
 	MEMORY_RECLAIM_SWAPPINESS = 0,
+	MEMORY_RECLAIM_SWAPPINESS_MAX,
 	MEMORY_RECLAIM_NULL,
 };
 
 static const match_table_t tokens = {
 	{ MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"},
+	{ MEMORY_RECLAIM_SWAPPINESS_MAX, "swappiness=max"},
 	{ MEMORY_RECLAIM_NULL, NULL },
 };
 
@@ -4434,6 +4436,9 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
 			if (swappiness < MIN_SWAPPINESS || swappiness > MAX_SWAPPINESS)
 				return -EINVAL;
 			break;
+		case MEMORY_RECLAIM_SWAPPINESS_MAX:
+			swappiness = SWAPPINESS_ANON_ONLY;
+			break;
 		default:
 			return -EINVAL;
 		}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b620d74b0f66..c99a6a48d0bc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2503,6 +2503,13 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 		goto out;
 	}
 
+	/* Proactive reclaim initiated by userspace for anonymous memory only */
+	if (swappiness == SWAPPINESS_ANON_ONLY) {
+		WARN_ON_ONCE(!sc->proactive);
+		scan_balance = SCAN_ANON;
+		goto out;
+	}
+
 	/*
 	 * Do not apply any pressure balancing cleverness when the
 	 * system is close to OOM, scan both anon and file equally
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V3 2/3] mm: add max swappiness arg to lru_gen for anonymous memory only
  2025-04-09  7:06 [PATCH V3 0/3] add max arg to swappiness in memory.reclaim and lru_gen Zhongkun He
  2025-04-09  7:06 ` [PATCH V3 1/3] mm: add swappiness=max arg to memory.reclaim for only anon reclaim Zhongkun He
@ 2025-04-09  7:06 ` Zhongkun He
  2025-04-10  2:09   ` Andrew Morton
  2025-04-30  7:59   ` Dan Carpenter
  2025-04-09  7:06 ` [PATCH V3 3/3] mm: vmscan: add more comments about cache_trim_mode Zhongkun He
  2 siblings, 2 replies; 12+ messages in thread
From: Zhongkun He @ 2025-04-09  7:06 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, yosry.ahmed, muchun.song, yuzhao, linux-mm,
	linux-kernel, Zhongkun He

The MGLRU already supports reclaiming only from anonymous memory
via the /sys/kernel/debug/lru_gen interface. Now, memory.reclaim
also supports the swappiness=max parameter to enable reclaiming
solely from anonymous memory. To unify the semantics of proactive
reclaiming from anonymous folios, the max parameter is introduced.

Additionally, the use of SWAPPINESS_ANON_ONLY in place of
'MAX_SWAPPINESS + 1' improves code clarity and makes the intention
more explicit.

Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
---
 Documentation/admin-guide/mm/multigen_lru.rst |  5 ++--
 mm/vmscan.c                                   | 26 ++++++++++++++-----
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/mm/multigen_lru.rst b/Documentation/admin-guide/mm/multigen_lru.rst
index 33e068830497..9cb54b4ff5d9 100644
--- a/Documentation/admin-guide/mm/multigen_lru.rst
+++ b/Documentation/admin-guide/mm/multigen_lru.rst
@@ -151,8 +151,9 @@ generations less than or equal to ``min_gen_nr``.
 ``min_gen_nr`` should be less than ``max_gen_nr-1``, since
 ``max_gen_nr`` and ``max_gen_nr-1`` are not fully aged (equivalent to
 the active list) and therefore cannot be evicted. ``swappiness``
-overrides the default value in ``/proc/sys/vm/swappiness``.
-``nr_to_reclaim`` limits the number of pages to evict.
+overrides the default value in ``/proc/sys/vm/swappiness`` and the valid
+range is [0-200, max], with max being exclusively used for the reclamation
+of anonymous memory. ``nr_to_reclaim`` limits the number of pages to evict.
 
 A typical use case is that a job scheduler runs this command before it
 tries to land a new job on a server. If it fails to materialize enough
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c99a6a48d0bc..18a175752b57 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2697,8 +2697,11 @@ static bool should_clear_pmd_young(void)
 		READ_ONCE((lruvec)->lrugen.min_seq[LRU_GEN_FILE]),	\
 	}
 
+#define max_evictable_type(swappiness)						\
+	((swappiness) != SWAPPINESS_ANON_ONLY)
+
 #define evictable_min_seq(min_seq, swappiness)				\
-	min((min_seq)[!(swappiness)], (min_seq)[(swappiness) <= MAX_SWAPPINESS])
+	min((min_seq)[!(swappiness)], (min_seq)[max_evictable_type(swappiness)])
 
 #define for_each_gen_type_zone(gen, type, zone)				\
 	for ((gen) = 0; (gen) < MAX_NR_GENS; (gen)++)			\
@@ -2706,7 +2709,7 @@ static bool should_clear_pmd_young(void)
 			for ((zone) = 0; (zone) < MAX_NR_ZONES; (zone)++)
 
 #define for_each_evictable_type(type, swappiness)			\
-	for ((type) = !(swappiness); (type) <= ((swappiness) <= MAX_SWAPPINESS); (type)++)
+	for ((type) = !(swappiness); (type) <= max_evictable_type(swappiness); (type)++)
 
 #define get_memcg_gen(seq)	((seq) % MEMCG_NR_GENS)
 #define get_memcg_bin(bin)	((bin) % MEMCG_NR_BINS)
@@ -3857,7 +3860,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
 	int hist = lru_hist_from_seq(lrugen->min_seq[type]);
 	int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
 
-	if (type ? swappiness > MAX_SWAPPINESS : !swappiness)
+	if (type ? (swappiness == SWAPPINESS_ANON_ONLY) : !swappiness)
 		goto done;
 
 	/* prevent cold/hot inversion if the type is evictable */
@@ -5523,7 +5526,7 @@ static int run_cmd(char cmd, int memcg_id, int nid, unsigned long seq,
 
 	if (swappiness < MIN_SWAPPINESS)
 		swappiness = get_swappiness(lruvec, sc);
-	else if (swappiness > MAX_SWAPPINESS + 1)
+	else if (swappiness > SWAPPINESS_ANON_ONLY)
 		goto done;
 
 	switch (cmd) {
@@ -5580,7 +5583,7 @@ static ssize_t lru_gen_seq_write(struct file *file, const char __user *src,
 	while ((cur = strsep(&next, ",;\n"))) {
 		int n;
 		int end;
-		char cmd;
+		char cmd, swap_string[5];
 		unsigned int memcg_id;
 		unsigned int nid;
 		unsigned long seq;
@@ -5591,13 +5594,22 @@ static ssize_t lru_gen_seq_write(struct file *file, const char __user *src,
 		if (!*cur)
 			continue;
 
-		n = sscanf(cur, "%c %u %u %lu %n %u %n %lu %n", &cmd, &memcg_id, &nid,
-			   &seq, &end, &swappiness, &end, &opt, &end);
+		n = sscanf(cur, "%c %u %u %lu %n %4s %n %lu %n", &cmd, &memcg_id, &nid,
+			   &seq, &end, swap_string, &end, &opt, &end);
 		if (n < 4 || cur[end]) {
 			err = -EINVAL;
 			break;
 		}
 
+		/* set by userspace for anonymous memory only */
+		if (!strncmp("max", swap_string, sizeof("max"))) {
+			swappiness = SWAPPINESS_ANON_ONLY;
+		} else {
+			err = kstrtouint(swap_string, 0, &swappiness);
+			if (err)
+				break;
+		}
+
 		err = run_cmd(cmd, memcg_id, nid, seq, &sc, swappiness, opt);
 		if (err)
 			break;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V3 3/3] mm: vmscan: add more comments about cache_trim_mode
  2025-04-09  7:06 [PATCH V3 0/3] add max arg to swappiness in memory.reclaim and lru_gen Zhongkun He
  2025-04-09  7:06 ` [PATCH V3 1/3] mm: add swappiness=max arg to memory.reclaim for only anon reclaim Zhongkun He
  2025-04-09  7:06 ` [PATCH V3 2/3] mm: add max swappiness arg to lru_gen for anonymous memory only Zhongkun He
@ 2025-04-09  7:06 ` Zhongkun He
  2 siblings, 0 replies; 12+ messages in thread
From: Zhongkun He @ 2025-04-09  7:06 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, yosry.ahmed, muchun.song, yuzhao, linux-mm,
	linux-kernel, Zhongkun He

Add more comments for cache_trim_mode, and the annotations
provided by Johannes Weiner in [1].

[1]:
https://lore.kernel.org/all/20250314141833.GA1316033@cmpxchg.org/

Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
---
 mm/vmscan.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 18a175752b57..ffa8a7a97c8f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2530,7 +2530,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 
 	/*
 	 * If there is enough inactive page cache, we do not reclaim
-	 * anything from the anonymous working right now.
+	 * anything from the anonymous working right now to make sure
+         * a streaming file access pattern doesn't cause swapping.
 	 */
 	if (sc->cache_trim_mode) {
 		scan_balance = SCAN_FILE;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V3 1/3] mm: add swappiness=max arg to memory.reclaim for only anon reclaim
  2025-04-09  7:06 ` [PATCH V3 1/3] mm: add swappiness=max arg to memory.reclaim for only anon reclaim Zhongkun He
@ 2025-04-10  2:09   ` Andrew Morton
  2025-04-10  3:48     ` [External] " Zhongkun He
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2025-04-10  2:09 UTC (permalink / raw)
  To: Zhongkun He
  Cc: hannes, mhocko, yosry.ahmed, muchun.song, yuzhao, linux-mm,
	linux-kernel

On Wed,  9 Apr 2025 15:06:18 +0800 Zhongkun He <hezhongkun.hzk@bytedance.com> wrote:

> With this patch 'commit <68cd9050d871> ("mm: add swappiness= arg to
> memory.reclaim")', we can submit an additional swappiness=<val> argument
> to memory.reclaim. It is very useful because we can dynamically adjust
> the reclamation ratio based on the anonymous folios and file folios of
> each cgroup. For example,when swappiness is set to 0, we only reclaim
> from file folios.
> 
> However,we have also encountered a new issue: when swappiness is set to
> the MAX_SWAPPINESS, it may still only reclaim file folios.
> 
> So, we hope to add a new arg 'swappiness=max' in memory.reclaim where
> proactive memory reclaim only reclaims from anonymous folios when
> swappiness is set to max. The swappiness semantics from a user
> perspective remain unchanged.
> 
> For example, something like this:
> 
> echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim
> 
> will perform reclaim on the rootcg with a swappiness setting of 'max' (a
> new mode) regardless of the file folios. Users have a more comprehensive
> view of the application's memory distribution because there are many
> metrics available. For example, if we find that a certain cgroup has a
> large number of inactive anon folios, we can reclaim only those and skip
> file folios, because with the zram/zswap, the IO tradeoff that
> cache_trim_mode or other file first logic is making doesn't hold -
> file refaults will cause IO, whereas anon decompression will not.
> 
> With this patch, the swappiness argument of memory.reclaim has a new
> mode 'max', means reclaiming just from anonymous folios both in traditional
> LRU and MGLRU.
> 
> ...
>
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1348,6 +1348,9 @@ The following nested keys are defined.
>  	same semantics as vm.swappiness applied to memcg reclaim with
>  	all the existing limitations and potential future extensions.
>  
> +	The valid range for swappiness is [0-200, max], setting
> +	swappiness=max exclusively reclaims anonymous memory.

Being able to assign either a number or a string feels a bit weird. 
Usually we use something like "-1" for a hack like this.  eg,
NUMA_NO_NODE.

Perhaps we just shouldn't overload swappiness like this.  Add a new
tunable (swappiness_mode?) which can override the swappiness setting.

I guess it doesn't matter much.  We're used to adding messy interfaces ;)



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3 2/3] mm: add max swappiness arg to lru_gen for anonymous memory only
  2025-04-09  7:06 ` [PATCH V3 2/3] mm: add max swappiness arg to lru_gen for anonymous memory only Zhongkun He
@ 2025-04-10  2:09   ` Andrew Morton
  2025-04-10  4:50     ` [External] " Zhongkun He
  2025-04-30  7:59   ` Dan Carpenter
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2025-04-10  2:09 UTC (permalink / raw)
  To: Zhongkun He
  Cc: hannes, mhocko, yosry.ahmed, muchun.song, yuzhao, linux-mm,
	linux-kernel

On Wed,  9 Apr 2025 15:06:19 +0800 Zhongkun He <hezhongkun.hzk@bytedance.com> wrote:

> The MGLRU

paging yuzhao?

> already supports reclaiming only from anonymous memory
> via the /sys/kernel/debug/lru_gen interface. Now, memory.reclaim
> also supports the swappiness=max parameter to enable reclaiming
> solely from anonymous memory. To unify the semantics of proactive
> reclaiming from anonymous folios, the max parameter is introduced.
> 
> Additionally, the use of SWAPPINESS_ANON_ONLY in place of
> 'MAX_SWAPPINESS + 1' improves code clarity and makes the intention
> more explicit.
> 
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2697,8 +2697,11 @@ static bool should_clear_pmd_young(void)
>  		READ_ONCE((lruvec)->lrugen.min_seq[LRU_GEN_FILE]),	\
>  	}
>  
> +#define max_evictable_type(swappiness)						\
> +	((swappiness) != SWAPPINESS_ANON_ONLY)
> +
>  #define evictable_min_seq(min_seq, swappiness)				\
> -	min((min_seq)[!(swappiness)], (min_seq)[(swappiness) <= MAX_SWAPPINESS])
> +	min((min_seq)[!(swappiness)], (min_seq)[max_evictable_type(swappiness)])

Why oh why did we implement these in cpp?

>  
> @@ -3857,7 +3860,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
>  	int hist = lru_hist_from_seq(lrugen->min_seq[type]);
>  	int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
>  
> -	if (type ? swappiness > MAX_SWAPPINESS : !swappiness)
> +	if (type ? (swappiness == SWAPPINESS_ANON_ONLY) : !swappiness)

This expression makes my brain bleed.

	if (type) {
		if (swappiness == SWAPPINESS_ANON_ONLY) {
			/*
			 * Nice comment explaining why we're doing this
			 */
			goto done;;
		}
	} else {
		if (!swappiness) {
			/*
			 * Nice comment explaining why we're doing this
			 */
			goto done;
		}
	}

or

	if (type && (swappiness == SWAPPINESS_ANON_ONLY)) {
		/*
		 * Nice comment explaining why we're doing this
		 */
		goto done;
	}

	if (!type && !swappiness) {
		/*
		 * Nice comment explaining why we're doing this
		 */
		goto done;
	}

It's much more verbose, but it has the huge advantage that it creates
locations where we can add comments which tell readers what's going on.
Which is pretty important, no?
	
>  		goto done;
>  
>  	/* prevent cold/hot inversion if the type is evictable */
> @@ -5523,7 +5526,7 @@ static int run_cmd(char cmd, int memcg_id, int nid, unsigned long seq,
>  
>  	if (swappiness < MIN_SWAPPINESS)
>  		swappiness = get_swappiness(lruvec, sc);
> -	else if (swappiness > MAX_SWAPPINESS + 1)
> +	else if (swappiness > SWAPPINESS_ANON_ONLY)
>  		goto done;
>  
>  	switch (cmd) {
> @@ -5580,7 +5583,7 @@ static ssize_t lru_gen_seq_write(struct file *file, const char __user *src,
>  	while ((cur = strsep(&next, ",;\n"))) {
>  		int n;
>  		int end;
> -		char cmd;
> +		char cmd, swap_string[5];
>  		unsigned int memcg_id;
>  		unsigned int nid;
>  		unsigned long seq;
> @@ -5591,13 +5594,22 @@ static ssize_t lru_gen_seq_write(struct file *file, const char __user *src,
>  		if (!*cur)
>  			continue;
>  
> -		n = sscanf(cur, "%c %u %u %lu %n %u %n %lu %n", &cmd, &memcg_id, &nid,
> -			   &seq, &end, &swappiness, &end, &opt, &end);
> +		n = sscanf(cur, "%c %u %u %lu %n %4s %n %lu %n", &cmd, &memcg_id, &nid,
> +			   &seq, &end, swap_string, &end, &opt, &end);

Permits userspace to easily overrun swap_string[].  OK, it's root-only,
but still, why permit this?

>  		if (n < 4 || cur[end]) {
>  			err = -EINVAL;
>  			break;
>  		}
>  
> +		/* set by userspace for anonymous memory only */
> +		if (!strncmp("max", swap_string, sizeof("max"))) {

Can sscanf() give us a non null-terminated string?

> +			swappiness = SWAPPINESS_ANON_ONLY;
> +		} else {
> +			err = kstrtouint(swap_string, 0, &swappiness);
> +			if (err)
> +				break;
> +		}
> +
>  		err = run_cmd(cmd, memcg_id, nid, seq, &sc, swappiness, opt);
>  		if (err)
>  			break;


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [External] Re: [PATCH V3 1/3] mm: add swappiness=max arg to memory.reclaim for only anon reclaim
  2025-04-10  2:09   ` Andrew Morton
@ 2025-04-10  3:48     ` Zhongkun He
  0 siblings, 0 replies; 12+ messages in thread
From: Zhongkun He @ 2025-04-10  3:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, mhocko, yosry.ahmed, muchun.song, yuzhao, linux-mm,
	linux-kernel

On Thu, Apr 10, 2025 at 10:09 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Wed,  9 Apr 2025 15:06:18 +0800 Zhongkun He <hezhongkun.hzk@bytedance.com> wrote:
>
> > With this patch 'commit <68cd9050d871> ("mm: add swappiness= arg to
> > memory.reclaim")', we can submit an additional swappiness=<val> argument
> > to memory.reclaim. It is very useful because we can dynamically adjust
> > the reclamation ratio based on the anonymous folios and file folios of
> > each cgroup. For example,when swappiness is set to 0, we only reclaim
> > from file folios.
> >
> > However,we have also encountered a new issue: when swappiness is set to
> > the MAX_SWAPPINESS, it may still only reclaim file folios.
> >
> > So, we hope to add a new arg 'swappiness=max' in memory.reclaim where
> > proactive memory reclaim only reclaims from anonymous folios when
> > swappiness is set to max. The swappiness semantics from a user
> > perspective remain unchanged.
> >
> > For example, something like this:
> >
> > echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim
> >
> > will perform reclaim on the rootcg with a swappiness setting of 'max' (a
> > new mode) regardless of the file folios. Users have a more comprehensive
> > view of the application's memory distribution because there are many
> > metrics available. For example, if we find that a certain cgroup has a
> > large number of inactive anon folios, we can reclaim only those and skip
> > file folios, because with the zram/zswap, the IO tradeoff that
> > cache_trim_mode or other file first logic is making doesn't hold -
> > file refaults will cause IO, whereas anon decompression will not.
> >
> > With this patch, the swappiness argument of memory.reclaim has a new
> > mode 'max', means reclaiming just from anonymous folios both in traditional
> > LRU and MGLRU.
> >
> > ...
> >
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1348,6 +1348,9 @@ The following nested keys are defined.
> >       same semantics as vm.swappiness applied to memcg reclaim with
> >       all the existing limitations and potential future extensions.
> >
> > +     The valid range for swappiness is [0-200, max], setting
> > +     swappiness=max exclusively reclaims anonymous memory.
>
> Being able to assign either a number or a string feels a bit weird.
> Usually we use something like "-1" for a hack like this.  eg,
> NUMA_NO_NODE.
>
> Perhaps we just shouldn't overload swappiness like this.  Add a new
> tunable (swappiness_mode?) which can override the swappiness setting.
>
> I guess it doesn't matter much.  We're used to adding messy interfaces ;)
>

Hi Andrew, thanks for your review.

In the initial patch, I used 200 (the maximum value of swappiness) to represent
the semantics of reclaiming only from anonymous pages. However, someone
pointed out that the current usage needs some fine-tuning. Later discussions
suggested using max(swappiness=201) to represent this specific
semantic explicitly
in the code. It was then discovered that MGLRU already includes this logic(201),
so it's only necessary to make the intention of the code clearer.

So we can add the max to memory.reclaim and lru_gen.

More info please see:
https://lore.kernel.org/all/Z9Rs8ZtgkupXpFYn@google.com/

>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [External] Re: [PATCH V3 2/3] mm: add max swappiness arg to lru_gen for anonymous memory only
  2025-04-10  2:09   ` Andrew Morton
@ 2025-04-10  4:50     ` Zhongkun He
  0 siblings, 0 replies; 12+ messages in thread
From: Zhongkun He @ 2025-04-10  4:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, mhocko, yosry.ahmed, muchun.song, yuzhao, linux-mm,
	linux-kernel

On Thu, Apr 10, 2025 at 10:10 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Wed,  9 Apr 2025 15:06:19 +0800 Zhongkun He <hezhongkun.hzk@bytedance.com> wrote:
>
> > The MGLRU
>
> paging yuzhao?

I have cc yuzhao and look forward to the relay.

>
> > already supports reclaiming only from anonymous memory
> > via the /sys/kernel/debug/lru_gen interface. Now, memory.reclaim
> > also supports the swappiness=max parameter to enable reclaiming
> > solely from anonymous memory. To unify the semantics of proactive
> > reclaiming from anonymous folios, the max parameter is introduced.
> >
> > Additionally, the use of SWAPPINESS_ANON_ONLY in place of
> > 'MAX_SWAPPINESS + 1' improves code clarity and makes the intention
> > more explicit.
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2697,8 +2697,11 @@ static bool should_clear_pmd_young(void)
> >               READ_ONCE((lruvec)->lrugen.min_seq[LRU_GEN_FILE]),      \
> >       }
> >
> > +#define max_evictable_type(swappiness)                                               \
> > +     ((swappiness) != SWAPPINESS_ANON_ONLY)
> > +
> >  #define evictable_min_seq(min_seq, swappiness)                               \
> > -     min((min_seq)[!(swappiness)], (min_seq)[(swappiness) <= MAX_SWAPPINESS])
> > +     min((min_seq)[!(swappiness)], (min_seq)[max_evictable_type(swappiness)])
>
> Why oh why did we implement these in cpp?

Just want to make the code more clear.  Maybe we should do more like this

/* The range of swappiness is [0,1-200,201], 0 means file type only;
 * 1-200 anon and file type; 201 anon type only
 */
#define max_type(swappiness) ((swappiness) != SWAPPINESS_ANON_ONLY)
#define min_type(swappiness) !(swappiness)

#define evictable_min_seq(min_seq, swappiness)              \
    min((min_seq)[min_type(swappiness)], (min_seq)[max_type(swappiness)])

 #define for_each_evictable_type(type, swappiness)                      \
-       for ((type) = !(swappiness); (type) <= ((swappiness) <=
MAX_SWAPPINESS); (type)++)
+       for ((type) = min_type(swappiness) ; (type) <=
max_type(swappiness); (type)++)

>
> >
> > @@ -3857,7 +3860,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
> >       int hist = lru_hist_from_seq(lrugen->min_seq[type]);
> >       int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
> >
> > -     if (type ? swappiness > MAX_SWAPPINESS : !swappiness)
> > +     if (type ? (swappiness == SWAPPINESS_ANON_ONLY) : !swappiness)
>
> This expression makes my brain bleed.
>
>         if (type) {
>                 if (swappiness == SWAPPINESS_ANON_ONLY) {
>                         /*
>                          * Nice comment explaining why we're doing this
>                          */
>                         goto done;;
>                 }
>         } else {
>                 if (!swappiness) {
>                         /*
>                          * Nice comment explaining why we're doing this
>                          */
>                         goto done;
>                 }
>         }
>
> or
>
>         if (type && (swappiness == SWAPPINESS_ANON_ONLY)) {
>                 /*
>                  * Nice comment explaining why we're doing this
>                  */
>                 goto done;
>         }
>
>         if (!type && !swappiness) {
>                 /*
>                  * Nice comment explaining why we're doing this
>                  */
>                 goto done;
>         }
>
> It's much more verbose, but it has the huge advantage that it creates
> locations where we can add comments which tell readers what's going on.
> Which is pretty important, no?
>

Yes, I agree. Will do, thanks.

> >               goto done;
> >
> >       /* prevent cold/hot inversion if the type is evictable */
> > @@ -5523,7 +5526,7 @@ static int run_cmd(char cmd, int memcg_id, int nid, unsigned long seq,
> >
> >       if (swappiness < MIN_SWAPPINESS)
> >               swappiness = get_swappiness(lruvec, sc);
> > -     else if (swappiness > MAX_SWAPPINESS + 1)
> > +     else if (swappiness > SWAPPINESS_ANON_ONLY)
> >               goto done;
> >
> >       switch (cmd) {
> > @@ -5580,7 +5583,7 @@ static ssize_t lru_gen_seq_write(struct file *file, const char __user *src,
> >       while ((cur = strsep(&next, ",;\n"))) {
> >               int n;
> >               int end;
> > -             char cmd;
> > +             char cmd, swap_string[5];
> >               unsigned int memcg_id;
> >               unsigned int nid;
> >               unsigned long seq;
> > @@ -5591,13 +5594,22 @@ static ssize_t lru_gen_seq_write(struct file *file, const char __user *src,
> >               if (!*cur)
> >                       continue;
> >
> > -             n = sscanf(cur, "%c %u %u %lu %n %u %n %lu %n", &cmd, &memcg_id, &nid,
> > -                        &seq, &end, &swappiness, &end, &opt, &end);
> > +             n = sscanf(cur, "%c %u %u %lu %n %4s %n %lu %n", &cmd, &memcg_id, &nid,
> > +                        &seq, &end, swap_string, &end, &opt, &end);
>
> Permits userspace to easily overrun swap_string[].  OK, it's root-only,
> but still, why permit this?
>

IICC, the arg in sscanf is %4s meaning the length of the string will
not be allowed to overrun
the swap_string, thanks.

> >               if (n < 4 || cur[end]) {
> >                       err = -EINVAL;
> >                       break;
> >               }
> >
> > +             /* set by userspace for anonymous memory only */
> > +             if (!strncmp("max", swap_string, sizeof("max"))) {
>
> Can sscanf() give us a non null-terminated string?
>

No, the sscanf will add '\0' to the end, so the maximum number of
input characters is 4,
and the length of swap_string is 5, with one character reserved for
the null terminator '\0'
for sscanf.

Thanks for your time.

> > +                     swappiness = SWAPPINESS_ANON_ONLY;
> > +             } else {
> > +                     err = kstrtouint(swap_string, 0, &swappiness);
> > +                     if (err)
> > +                             break;
> > +             }
> > +
> >               err = run_cmd(cmd, memcg_id, nid, seq, &sc, swappiness, opt);
> >               if (err)
> >                       break;
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3 2/3] mm: add max swappiness arg to lru_gen for anonymous memory only
  2025-04-09  7:06 ` [PATCH V3 2/3] mm: add max swappiness arg to lru_gen for anonymous memory only Zhongkun He
  2025-04-10  2:09   ` Andrew Morton
@ 2025-04-30  7:59   ` Dan Carpenter
  2025-05-01  1:56     ` [External] " Zhongkun He
  1 sibling, 1 reply; 12+ messages in thread
From: Dan Carpenter @ 2025-04-30  7:59 UTC (permalink / raw)
  To: Zhongkun He
  Cc: akpm, hannes, mhocko, yosry.ahmed, muchun.song, yuzhao, linux-mm,
	linux-kernel

On Wed, Apr 09, 2025 at 03:06:19PM +0800, Zhongkun He wrote:
> +		/* set by userspace for anonymous memory only */
> +		if (!strncmp("max", swap_string, sizeof("max"))) {

This pattern of strncmp("foo", str, sizeof("foo")) is exactly the same
as strcmp().  It doesn't provide any additional security.  The strncmp()
function is meant for matching string prefixes and it's a relatively
common bug to do this:

intended: if (strcmp(string, "prefix", sizeof("prefix") - 1) == 0) {
  actual: if (strcmp(string, "prefix", sizeof("prefix")) == 0) {

I have a static checker warning for these:
https://lore.kernel.org/all/30210ed77b40b4b6629de659cb56b9ec7832c447.1744452787.git.dan.carpenter@linaro.org/

If people deliberately misuse the function then it makes it trickier
to tell accidental mistakes from deliberate mistakes.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [External] Re: [PATCH V3 2/3] mm: add max swappiness arg to lru_gen for anonymous memory only
  2025-04-30  7:59   ` Dan Carpenter
@ 2025-05-01  1:56     ` Zhongkun He
  2025-05-02  6:58       ` Dan Carpenter
  0 siblings, 1 reply; 12+ messages in thread
From: Zhongkun He @ 2025-05-01  1:56 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: akpm, hannes, mhocko, yosry.ahmed, muchun.song, yuzhao, linux-mm,
	linux-kernel

Hi Dan

On Wed, Apr 30, 2025 at 3:59 PM Dan Carpenter <dan.carpenter@linaro.org> wrote:
>
> On Wed, Apr 09, 2025 at 03:06:19PM +0800, Zhongkun He wrote:
> > +             /* set by userspace for anonymous memory only */
> > +             if (!strncmp("max", swap_string, sizeof("max"))) {
>
> This pattern of strncmp("foo", str, sizeof("foo")) is exactly the same
> as strcmp().  It doesn't provide any additional security.  The strncmp()
> function is meant for matching string prefixes and it's a relatively
> common bug to do this:
>
> intended: if (strcmp(string, "prefix", sizeof("prefix") - 1) == 0) {
>   actual: if (strcmp(string, "prefix", sizeof("prefix")) == 0) {
>

Yes, I understand the difference.

> I have a static checker warning for these:
> https://lore.kernel.org/all/30210ed77b40b4b6629de659cb56b9ec7832c447.1744452787.git.dan.carpenter@linaro.org/
>
> If people deliberately misuse the function then it makes it trickier
> to tell accidental mistakes from deliberate mistakes.
>

if (!strncmp("max", swap_string, sizeof("max"))) {

The length of swap_string is 5 because it's read using sscanf, which
will add the null terminator \0
at the end of the string. If we input max into the interface,
swap_string will contain max\0, which is
equivalent to the string "max". Since we only need to compare the
first few characters(There are other
possible inputs as well.) — effectively treating it as a prefix match
— I used strncmp.

Thank you for the reminder, Dan.

> regards,
> dan carpenter
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [External] Re: [PATCH V3 2/3] mm: add max swappiness arg to lru_gen for anonymous memory only
  2025-05-01  1:56     ` [External] " Zhongkun He
@ 2025-05-02  6:58       ` Dan Carpenter
  2025-05-07  3:27         ` Zhongkun He
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Carpenter @ 2025-05-02  6:58 UTC (permalink / raw)
  To: Zhongkun He
  Cc: akpm, hannes, mhocko, yosry.ahmed, muchun.song, yuzhao, linux-mm,
	linux-kernel

On Thu, May 01, 2025 at 09:56:57AM +0800, Zhongkun He wrote:
> Hi Dan
> 
> On Wed, Apr 30, 2025 at 3:59 PM Dan Carpenter <dan.carpenter@linaro.org> wrote:
> >
> > On Wed, Apr 09, 2025 at 03:06:19PM +0800, Zhongkun He wrote:
> > > +             /* set by userspace for anonymous memory only */
> > > +             if (!strncmp("max", swap_string, sizeof("max"))) {
> >
> > This pattern of strncmp("foo", str, sizeof("foo")) is exactly the same
> > as strcmp().  It doesn't provide any additional security.  The strncmp()
> > function is meant for matching string prefixes and it's a relatively
> > common bug to do this:
> >
> > intended: if (strcmp(string, "prefix", sizeof("prefix") - 1) == 0) {
> >   actual: if (strcmp(string, "prefix", sizeof("prefix")) == 0) {
> >
> 
> Yes, I understand the difference.
> 
> > I have a static checker warning for these:
> > https://lore.kernel.org/all/30210ed77b40b4b6629de659cb56b9ec7832c447.1744452787.git.dan.carpenter@linaro.org/
> >
> > If people deliberately misuse the function then it makes it trickier
> > to tell accidental mistakes from deliberate mistakes.
> >
> 
> if (!strncmp("max", swap_string, sizeof("max"))) {
> 
> The length of swap_string is 5 because it's read using sscanf, which
> will add the null terminator \0
> at the end of the string. If we input max into the interface,
> swap_string will contain max\0, which is
> equivalent to the string "max". Since we only need to compare the
> first few characters(There are other
> possible inputs as well.) — effectively treating it as a prefix match
> — I used strncmp.

I'm a not sure I understand.  You say you are treating it as a "prefix
match", but sizeof("max") is 4 so this is not treated as a prefix.  Did
you mean to write strlen("max") which does not include the NUL
terminator?

regards,
dan carpenter



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [External] Re: [PATCH V3 2/3] mm: add max swappiness arg to lru_gen for anonymous memory only
  2025-05-02  6:58       ` Dan Carpenter
@ 2025-05-07  3:27         ` Zhongkun He
  0 siblings, 0 replies; 12+ messages in thread
From: Zhongkun He @ 2025-05-07  3:27 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: akpm, hannes, mhocko, yosry.ahmed, muchun.song, yuzhao, linux-mm,
	linux-kernel

On Fri, May 2, 2025 at 2:59 PM Dan Carpenter <dan.carpenter@linaro.org> wrote:
>
> On Thu, May 01, 2025 at 09:56:57AM +0800, Zhongkun He wrote:
> > Hi Dan
> >
> > On Wed, Apr 30, 2025 at 3:59 PM Dan Carpenter <dan.carpenter@linaro.org> wrote:
> > >
> > > On Wed, Apr 09, 2025 at 03:06:19PM +0800, Zhongkun He wrote:
> > > > +             /* set by userspace for anonymous memory only */
> > > > +             if (!strncmp("max", swap_string, sizeof("max"))) {
> > >
> > > This pattern of strncmp("foo", str, sizeof("foo")) is exactly the same
> > > as strcmp().  It doesn't provide any additional security.  The strncmp()
> > > function is meant for matching string prefixes and it's a relatively
> > > common bug to do this:
> > >
> > > intended: if (strcmp(string, "prefix", sizeof("prefix") - 1) == 0) {
> > >   actual: if (strcmp(string, "prefix", sizeof("prefix")) == 0) {
> > >
> >
> > Yes, I understand the difference.
> >
> > > I have a static checker warning for these:
> > > https://lore.kernel.org/all/30210ed77b40b4b6629de659cb56b9ec7832c447.1744452787.git.dan.carpenter@linaro.org/
> > >
> > > If people deliberately misuse the function then it makes it trickier
> > > to tell accidental mistakes from deliberate mistakes.
> > >
> >
> > if (!strncmp("max", swap_string, sizeof("max"))) {
> >
> > The length of swap_string is 5 because it's read using sscanf, which
> > will add the null terminator \0
> > at the end of the string. If we input max into the interface,
> > swap_string will contain max\0, which is
> > equivalent to the string "max". Since we only need to compare the
> > first few characters(There are other
> > possible inputs as well.) — effectively treating it as a prefix match
> > — I used strncmp.
>
> I'm a not sure I understand.  You say you are treating it as a "prefix
> match", but sizeof("max") is 4 so this is not treated as a prefix.  Did
> you mean to write strlen("max") which does not include the NUL
> terminator?
>

Hi Dan,  sorry for the late reply.

I agree with you that we should use strncmp for prefix matches
and I will update it later.

Thanks.

> regards,
> dan carpenter
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-05-07  3:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-09  7:06 [PATCH V3 0/3] add max arg to swappiness in memory.reclaim and lru_gen Zhongkun He
2025-04-09  7:06 ` [PATCH V3 1/3] mm: add swappiness=max arg to memory.reclaim for only anon reclaim Zhongkun He
2025-04-10  2:09   ` Andrew Morton
2025-04-10  3:48     ` [External] " Zhongkun He
2025-04-09  7:06 ` [PATCH V3 2/3] mm: add max swappiness arg to lru_gen for anonymous memory only Zhongkun He
2025-04-10  2:09   ` Andrew Morton
2025-04-10  4:50     ` [External] " Zhongkun He
2025-04-30  7:59   ` Dan Carpenter
2025-05-01  1:56     ` [External] " Zhongkun He
2025-05-02  6:58       ` Dan Carpenter
2025-05-07  3:27         ` Zhongkun He
2025-04-09  7:06 ` [PATCH V3 3/3] mm: vmscan: add more comments about cache_trim_mode Zhongkun He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).