public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap
@ 2026-03-11 11:05 Li Wang
  2026-03-11 11:05 ` [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap Li Wang
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Li Wang @ 2026-03-11 11:05 UTC (permalink / raw)
  To: linux-kselftest, linux-kernel, akpm
  Cc: Johannes Weiner, Michal Hocko, Michal Koutný, Muchun Song,
	Nhat Pham, Tejun Heo, Roman Gushchin, Shakeel Butt, Yosry Ahmed

test_zswap currently checks only for zswap presence via /sys/module/zswap,
but does not account for the global runtime state in
/sys/module/zswap/parameters/enabled.

If zswap is configured but globally disabled, zswap cgroup tests may run in
an invalid environment and fail spuriously.

Add helpers to:
  - detect the runtime zswap enabled state,
  - enable zswap when it is initially disabled,
  - restore the original state after tests complete.

Skip the test when zswap state cannot be determined (e.g. unsupported or
unreadable), and keep existing behavior when zswap is already enabled.

This makes test_zswap more robust across systems where zswap is built but
disabled by default.

Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
---
 tools/testing/selftests/cgroup/test_zswap.c | 77 ++++++++++++++++++++-
 1 file changed, 75 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
index 64ebc3f3f203..1c80f4af9683 100644
--- a/tools/testing/selftests/cgroup/test_zswap.c
+++ b/tools/testing/selftests/cgroup/test_zswap.c
@@ -594,18 +594,88 @@ static bool zswap_configured(void)
 	return access("/sys/module/zswap", F_OK) == 0;
 }
 
+static int zswap_enabled_state(void)
+{
+	char buf[16];
+	ssize_t n;
+
+	if (!zswap_configured())
+		return -1;
+
+	n = read_text("/sys/module/zswap/parameters/enabled", buf, sizeof(buf));
+	if (n < 0 || n == 0)
+		return -1;
+
+	switch (buf[0]) {
+	case 'Y':
+	case 'y':
+	case '1':
+		return 1;
+	case 'N':
+	case 'n':
+	case '0':
+		return 0;
+	default:
+		return -1;
+	}
+}
+
+static bool enable_zswap(void)
+{
+	int st;
+	char y[] = "Y\n";
+
+	st = zswap_enabled_state();
+	if (st == 1)
+		return true;
+	if (st < 0)
+		return false;
+
+	if (write_text("/sys/module/zswap/parameters/enabled", y, strlen(y)) >= 0) {
+		if (zswap_enabled_state() == 1)
+			return true;
+	}
+
+	ksft_print_msg("Failed to enable zswap\n");
+	return false;
+}
+
+static bool disable_zswap(void)
+{
+	int st;
+	char n[] = "N\n";
+
+	st = zswap_enabled_state();
+	if (st == 0)
+		return true;
+	if (st < 0)
+		return false;
+
+	if (write_text("/sys/module/zswap/parameters/enabled", n, strlen(n)) >= 0) {
+		if (zswap_enabled_state() == 0)
+			return true;
+	}
+
+	ksft_print_msg("Failed to disable zswap\n");
+	return false;
+}
+
 int main(int argc, char **argv)
 {
 	char root[PATH_MAX];
-	int i;
+	int i, orig_zswap_state;
 
 	ksft_print_header();
 	ksft_set_plan(ARRAY_SIZE(tests));
 	if (cg_find_unified_root(root, sizeof(root), NULL))
 		ksft_exit_skip("cgroup v2 isn't mounted\n");
 
-	if (!zswap_configured())
+	orig_zswap_state = zswap_enabled_state();
+
+	if (orig_zswap_state == -1)
 		ksft_exit_skip("zswap isn't configured\n");
+	else if (orig_zswap_state == 0 && !enable_zswap())
+		ksft_exit_skip("zswap is disabled and cannot be enabled\n");
 
 	/*
 	 * Check that memory controller is available:
@@ -632,5 +702,8 @@ int main(int argc, char **argv)
 		}
 	}
 
+	if (orig_zswap_state == 0)
+		disable_zswap();
+
 	ksft_finished();
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap
  2026-03-11 11:05 [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Li Wang
@ 2026-03-11 11:05 ` Li Wang
  2026-03-11 18:50   ` Yosry Ahmed
  2026-03-11 11:05 ` [PATCH 3/5] selftests/cgroup: use runtime page size for zswpin check Li Wang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Li Wang @ 2026-03-11 11:05 UTC (permalink / raw)
  To: linux-kselftest, linux-kernel, akpm
  Cc: Johannes Weiner, Michal Hocko, Michal Koutný, Muchun Song,
	Nhat Pham, Tejun Heo, Roman Gushchin, Shakeel Butt, Yosry Ahmed

test_swapin_nozswap can hit OOM before reaching its assertions on some
setups. The test currently sets memory.max=8M and then allocates/reads
32M with memory.zswap.max=0, which may over-constrain reclaim and kill
the workload process.

Raise memory.max to 24M so the workload can make forward progress, and
lower the swap_peak expectation from 24M to 8M to keep the check robust
across environments.

The test intent is unchanged: verify that swapping happens while zswap
remains unused when memory.zswap.max=0.

=== Error Logs ===

  # ./test_zswap
  TAP version 13
  1..7
  ok 1 test_zswap_usage
  not ok 2 test_swapin_nozswap
  ...

  # dmesg
  [271641.879153] test_zswap invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
  [271641.879168] CPU: 1 UID: 0 PID: 177372 Comm: test_zswap Kdump: loaded Not tainted 6.12.0-211.el10.ppc64le #1 VOLUNTARY
  [271641.879171] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW940.02 (UL940_041) hv:phyp pSeries
  [271641.879173] Call Trace:
  [271641.879174] [c00000037540f730] [c00000000127ec44] dump_stack_lvl+0x88/0xc4 (unreliable)
  [271641.879184] [c00000037540f760] [c0000000005cc594] dump_header+0x5c/0x1e4
  [271641.879188] [c00000037540f7e0] [c0000000005cb464] oom_kill_process+0x324/0x3b0
  [271641.879192] [c00000037540f860] [c0000000005cbe48] out_of_memory+0x118/0x420
  [271641.879196] [c00000037540f8f0] [c00000000070d8ec] mem_cgroup_out_of_memory+0x18c/0x1b0
  [271641.879200] [c00000037540f990] [c000000000713888] try_charge_memcg+0x598/0x890
  [271641.879204] [c00000037540fa70] [c000000000713dbc] charge_memcg+0x5c/0x110
  [271641.879207] [c00000037540faa0] [c0000000007159f8] __mem_cgroup_charge+0x48/0x120
  [271641.879211] [c00000037540fae0] [c000000000641914] alloc_anon_folio+0x2b4/0x5a0
  [271641.879215] [c00000037540fb60] [c000000000641d58] do_anonymous_page+0x158/0x6b0
  [271641.879218] [c00000037540fbd0] [c000000000642f8c] __handle_mm_fault+0x4bc/0x910
  [271641.879221] [c00000037540fcf0] [c000000000643500] handle_mm_fault+0x120/0x3c0
  [271641.879224] [c00000037540fd40] [c00000000014bba0] ___do_page_fault+0x1c0/0x980
  [271641.879228] [c00000037540fdf0] [c00000000014c44c] hash__do_page_fault+0x2c/0xc0
  [271641.879232] [c00000037540fe20] [c0000000001565d8] do_hash_fault+0x128/0x1d0
  [271641.879236] [c00000037540fe50] [c000000000008be0] data_access_common_virt+0x210/0x220
  [271641.879548] Tasks state (memory values in pages):
  ...
  [271641.879550] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
  [271641.879555] [ 177372]     0 177372      571        0        0        0         0    51200       96             0 test_zswap
  [271641.879562] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/no_zswap_test,task_memcg=/no_zswap_test,task=test_zswap,pid=177372,uid=0
  [271641.879578] Memory cgroup out of memory: Killed process 177372 (test_zswap) total-vm:36544kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:50kB oom_score_adj:0

Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
---
 tools/testing/selftests/cgroup/test_zswap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
index 1c80f4af9683..ccfa6d42a3a3 100644
--- a/tools/testing/selftests/cgroup/test_zswap.c
+++ b/tools/testing/selftests/cgroup/test_zswap.c
@@ -168,7 +168,7 @@ static int test_swapin_nozswap(const char *root)
 		goto out;
 	if (cg_create(test_group))
 		goto out;
-	if (cg_write(test_group, "memory.max", "8M"))
+	if (cg_write(test_group, "memory.max", "24M"))
 		goto out;
 	if (cg_write(test_group, "memory.zswap.max", "0"))
 		goto out;
@@ -184,8 +184,8 @@ static int test_swapin_nozswap(const char *root)
 		goto out;
 	}
 
-	if (swap_peak < MB(24)) {
-		ksft_print_msg("at least 24MB of memory should be swapped out\n");
+	if (swap_peak < MB(8)) {
+		ksft_print_msg("at least 8MB of memory should be swapped out\n");
 		goto out;
 	}
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] selftests/cgroup: use runtime page size for zswpin check
  2026-03-11 11:05 [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Li Wang
  2026-03-11 11:05 ` [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap Li Wang
@ 2026-03-11 11:05 ` Li Wang
  2026-03-11 18:56   ` Yosry Ahmed
  2026-03-11 11:05 ` [PATCH 4/5] selftest/cgroup: fix zswap test_no_invasive_cgroup_shrink on 64K pagesize system Li Wang
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Li Wang @ 2026-03-11 11:05 UTC (permalink / raw)
  To: linux-kselftest, linux-kernel, akpm
  Cc: Johannes Weiner, Michal Hocko, Michal Koutný, Muchun Song,
	Nhat Pham, Tejun Heo, Roman Gushchin, Shakeel Butt, Yosry Ahmed

test_zswapin compares memory.stat:zswpin (counted in pages) against a
byte threshold converted with PAGE_SIZE. In cgroup selftests, PAGE_SIZE
is hardcoded to 4096, which makes the conversion wrong on systems with
non-4K base pages (e.g. 64K).

As a result, the test requires too many pages to pass and fails
spuriously even when zswap is working.

Use sysconf(_SC_PAGESIZE) for the zswpin threshold conversion so the
check matches the actual system page size.

Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
---
 tools/testing/selftests/cgroup/test_zswap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
index ccfa6d42a3a3..af21fa4f5b7b 100644
--- a/tools/testing/selftests/cgroup/test_zswap.c
+++ b/tools/testing/selftests/cgroup/test_zswap.c
@@ -235,7 +235,7 @@ static int test_zswapin(const char *root)
 		goto out;
 	}
 
-	if (zswpin < MB(24) / PAGE_SIZE) {
+	if (zswpin < MB(24) / sysconf(_SC_PAGESIZE)) {
 		ksft_print_msg("at least 24MB should be brought back from zswap\n");
 		goto out;
 	}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] selftest/cgroup: fix zswap test_no_invasive_cgroup_shrink on 64K pagesize system
  2026-03-11 11:05 [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Li Wang
  2026-03-11 11:05 ` [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap Li Wang
  2026-03-11 11:05 ` [PATCH 3/5] selftests/cgroup: use runtime page size for zswpin check Li Wang
@ 2026-03-11 11:05 ` Li Wang
  2026-03-11 19:01   ` Yosry Ahmed
  2026-03-11 11:05 ` [PATCH 5/5] selftest/cgroup: fix zswap attempt_writeback() " Li Wang
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Li Wang @ 2026-03-11 11:05 UTC (permalink / raw)
  To: linux-kselftest, linux-kernel, akpm
  Cc: Johannes Weiner, Michal Hocko, Michal Koutný, Muchun Song,
	Nhat Pham, Tejun Heo, Roman Gushchin, Shakeel Butt, Yosry Ahmed

The test_no_invasive_cgroup_shrink and allocate_bytes use a hardcoded
stride of 4095 bytes when touching allocated pages. On systems with 64K
page size, this results in writing to the same page multiple times
instead of touching all pages, leading to insufficient memory pressure.

Additionally, the original memory limits and allocation sizes are too
small for 64K page size systems. With only 1M memory.max, there are
very few pages available, and a zswap.max of 10K may not provide enough
room to store even a single compressed page. This can cause OOM kills
or false positives due to insufficient zswap writeback being triggered.

Fix these issues by:
- Using sysconf(_SC_PAGESIZE) instead of the hardcoded 4095 stride in
  both allocate_bytes() and test_no_invasive_cgroup_shrink().
- Increasing memory.max to 32M for both wb_group and control_group to
  ensure enough pages are available regardless of page size.
- Increasing zswap.max from 10K to 64K and allocation sizes from 10M
  to 64M to reliably trigger zswap writeback on all configurations.

=== Error Log ===
  # getconf PAGESIZE
  65536

  # ./test_zswap
  TAP version 13
  ...
  ok 5 test_zswap_writeback_disabled
  ok 6 # SKIP test_no_kmem_bypass
  not ok 7 test_no_invasive_cgroup_shrink

Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
---
 tools/testing/selftests/cgroup/test_zswap.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
index af21fa4f5b7b..30d3fbf6b4fb 100644
--- a/tools/testing/selftests/cgroup/test_zswap.c
+++ b/tools/testing/selftests/cgroup/test_zswap.c
@@ -83,12 +83,13 @@ static int allocate_and_read_bytes(const char *cgroup, void *arg)
 
 static int allocate_bytes(const char *cgroup, void *arg)
 {
+	long pagesize = sysconf(_SC_PAGESIZE);
 	size_t size = (size_t)arg;
 	char *mem = (char *)malloc(size);
 
 	if (!mem)
 		return -1;
-	for (int i = 0; i < size; i += 4095)
+	for (int i = 0; i < size; i += pagesize)
 		mem[i] = 'a';
 	free(mem);
 	return 0;
@@ -415,34 +416,41 @@ static int test_zswap_writeback_disabled(const char *root)
 static int test_no_invasive_cgroup_shrink(const char *root)
 {
 	int ret = KSFT_FAIL;
-	size_t control_allocation_size = MB(10);
+	long pagesize = sysconf(_SC_PAGESIZE);
+	size_t control_allocation_size = MB(64);
 	char *control_allocation = NULL, *wb_group = NULL, *control_group = NULL;
 
 	wb_group = setup_test_group_1M(root, "per_memcg_wb_test1");
 	if (!wb_group)
 		return KSFT_FAIL;
-	if (cg_write(wb_group, "memory.zswap.max", "10K"))
+	if (cg_write(wb_group, "memory.zswap.max", "64K"))
+		goto out;
+	if (cg_write(wb_group, "memory.max", "32M"))
 		goto out;
+
 	control_group = setup_test_group_1M(root, "per_memcg_wb_test2");
 	if (!control_group)
 		goto out;
+	if (cg_write(control_group, "memory.max", "32M"))
+		goto out;
 
 	/* Push some test_group2 memory into zswap */
 	if (cg_enter_current(control_group))
 		goto out;
 	control_allocation = malloc(control_allocation_size);
-	for (int i = 0; i < control_allocation_size; i += 4095)
+	for (int i = 0; i < control_allocation_size; i += pagesize)
 		control_allocation[i] = 'a';
 	if (cg_read_key_long(control_group, "memory.stat", "zswapped") < 1)
 		goto out;
 
-	/* Allocate 10x memory.max to push wb_group memory into zswap and trigger wb */
-	if (cg_run(wb_group, allocate_bytes, (void *)MB(10)))
+	/* Allocate 2x memory.max to push wb_group memory into zswap and trigger wb */
+	if (cg_run(wb_group, allocate_bytes, (void *)MB(64)))
 		goto out;
 
 	/* Verify that only zswapped memory from gwb_group has been written back */
 	if (get_cg_wb_count(wb_group) > 0 && get_cg_wb_count(control_group) == 0)
 		ret = KSFT_PASS;
+
 out:
 	cg_enter_current(root);
 	if (control_group) {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] selftest/cgroup: fix zswap attempt_writeback() on 64K pagesize system
  2026-03-11 11:05 [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Li Wang
                   ` (2 preceding siblings ...)
  2026-03-11 11:05 ` [PATCH 4/5] selftest/cgroup: fix zswap test_no_invasive_cgroup_shrink on 64K pagesize system Li Wang
@ 2026-03-11 11:05 ` Li Wang
  2026-03-11 18:58   ` Yosry Ahmed
  2026-03-11 13:20 ` [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Michal Koutný
  2026-03-11 18:47 ` Yosry Ahmed
  5 siblings, 1 reply; 19+ messages in thread
From: Li Wang @ 2026-03-11 11:05 UTC (permalink / raw)
  To: linux-kselftest, linux-kernel, akpm
  Cc: Johannes Weiner, Michal Hocko, Michal Koutný, Muchun Song,
	Nhat Pham, Tejun Heo, Roman Gushchin, Shakeel Butt, Yosry Ahmed

In attempt_writeback(), a memsize of 4M only covers 64 pages on 64K
page size systems. When memory.reclaim is called, the kernel prefers
reclaiming clean file pages (binary, libc, linker, etc.) over swapping
anonymous pages. With only 64 pages of anonymous memory, the reclaim
target can be largely or entirely satisfied by dropping file pages,
resulting in very few or zero anonymous pages being pushed into zswap.

This causes zswap_usage to be extremely small or zero, making
zswap_usage/2 insufficient to create meaningful writeback pressure.
The test then fails because no writeback is triggered.

On 4K page size systems this is not an issue because 4M covers 1024
pages, and file pages are a small fraction of the reclaim target.

Scale memsize up to 64M on systems with page size larger than 4K, so
that enough anonymous pages are allocated to reliably populate zswap
and trigger writeback. The original 4M is kept for 4K page size systems
to avoid unnecessary memory usage and test runtime.

=== Error Log ===
  # uname -rm
  6.12.0-211.el10.ppc64le ppc64le

  # getconf PAGESIZE
  65536

  # ./test_zswap
  TAP version 13
  1..7
  ok 1 test_zswap_usage
  ok 2 test_swapin_nozswap
  ok 3 test_zswapin
  not ok 4 test_zswap_writeback_enabled
  ...

Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
---
 tools/testing/selftests/cgroup/test_zswap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
index 30d3fbf6b4fb..2d065184eea4 100644
--- a/tools/testing/selftests/cgroup/test_zswap.c
+++ b/tools/testing/selftests/cgroup/test_zswap.c
@@ -264,7 +264,7 @@ static int test_zswapin(const char *root)
 static int attempt_writeback(const char *cgroup, void *arg)
 {
 	long pagesize = sysconf(_SC_PAGESIZE);
-	size_t memsize = MB(4);
+	size_t memsize = pagesize > 4096 ? MB(64): MB(4);
 	char buf[pagesize];
 	long zswap_usage;
 	bool wb_enabled = *(bool *) arg;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap
  2026-03-11 11:05 [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Li Wang
                   ` (3 preceding siblings ...)
  2026-03-11 11:05 ` [PATCH 5/5] selftest/cgroup: fix zswap attempt_writeback() " Li Wang
@ 2026-03-11 13:20 ` Michal Koutný
  2026-03-11 18:41   ` Yosry Ahmed
  2026-03-11 18:47 ` Yosry Ahmed
  5 siblings, 1 reply; 19+ messages in thread
From: Michal Koutný @ 2026-03-11 13:20 UTC (permalink / raw)
  To: Li Wang
  Cc: linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Muchun Song, Nhat Pham, Tejun Heo, Roman Gushchin,
	Shakeel Butt, Yosry Ahmed

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

Hi.

On Wed, Mar 11, 2026 at 07:05:19PM +0800, Li Wang <liwang@redhat.com> wrote:
> +static bool disable_zswap(void)
> +{
> +	int st;
> +	char n[] = "N\n";
> +
> +	st = zswap_enabled_state();
> +	if (st == 0)
> +		return true;
> +	if (st < 0)
> +		return false;
> +
> +	if (write_text("/sys/module/zswap/parameters/enabled", n, strlen(n)) >= 0) {
> +		if (zswap_enabled_state() == 0)
> +			return true;
> +	}
> +
> +	ksft_print_msg("Failed to disable zswap\n");

Hm, so this can fail?

> +	return false;
> +}
> +
>  int main(int argc, char **argv)
>  {
>  	char root[PATH_MAX];
> -	int i;
> +	int i, orig_zswap_state;
>  
>  	ksft_print_header();
>  	ksft_set_plan(ARRAY_SIZE(tests));
>  	if (cg_find_unified_root(root, sizeof(root), NULL))
>  		ksft_exit_skip("cgroup v2 isn't mounted\n");
>  
> -	if (!zswap_configured())
> +	orig_zswap_state = zswap_enabled_state();
> +
> +	if (orig_zswap_state == -1)
>  		ksft_exit_skip("zswap isn't configured\n");
> +	else if (orig_zswap_state == 0 && !enable_zswap())
> +		ksft_exit_skip("zswap is disabled and cannot be enabled\n");

I should simply check the enablement state and skip if it's not
satisfactory (possibly printing an instructive message what to tweak).
(To keep the test dummy and leave it up to who's reponsible for the environment.)

My 0.02€,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap
  2026-03-11 13:20 ` [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Michal Koutný
@ 2026-03-11 18:41   ` Yosry Ahmed
  0 siblings, 0 replies; 19+ messages in thread
From: Yosry Ahmed @ 2026-03-11 18:41 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Li Wang, linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Muchun Song, Nhat Pham, Tejun Heo, Roman Gushchin,
	Shakeel Butt

> >  int main(int argc, char **argv)
> >  {
> >       char root[PATH_MAX];
> > -     int i;
> > +     int i, orig_zswap_state;
> >
> >       ksft_print_header();
> >       ksft_set_plan(ARRAY_SIZE(tests));
> >       if (cg_find_unified_root(root, sizeof(root), NULL))
> >               ksft_exit_skip("cgroup v2 isn't mounted\n");
> >
> > -     if (!zswap_configured())
> > +     orig_zswap_state = zswap_enabled_state();
> > +
> > +     if (orig_zswap_state == -1)
> >               ksft_exit_skip("zswap isn't configured\n");
> > +     else if (orig_zswap_state == 0 && !enable_zswap())
> > +             ksft_exit_skip("zswap is disabled and cannot be enabled\n");
>
> I should simply check the enablement state and skip if it's not
> satisfactory (possibly printing an instructive message what to tweak).
> (To keep the test dummy and leave it up to who's reponsible for the environment.)

+1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap
  2026-03-11 11:05 [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Li Wang
                   ` (4 preceding siblings ...)
  2026-03-11 13:20 ` [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Michal Koutný
@ 2026-03-11 18:47 ` Yosry Ahmed
  2026-03-12  1:41   ` Li Wang
  5 siblings, 1 reply; 19+ messages in thread
From: Yosry Ahmed @ 2026-03-11 18:47 UTC (permalink / raw)
  To: Li Wang
  Cc: linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Michal Koutný, Muchun Song, Nhat Pham,
	Tejun Heo, Roman Gushchin, Shakeel Butt

On Wed, Mar 11, 2026 at 4:05 AM Li Wang <liwang@redhat.com> wrote:
>
> test_zswap currently checks only for zswap presence via /sys/module/zswap,
> but does not account for the global runtime state in
> /sys/module/zswap/parameters/enabled.
>
> If zswap is configured but globally disabled, zswap cgroup tests may run in
> an invalid environment and fail spuriously.
>
> Add helpers to:
>   - detect the runtime zswap enabled state,
>   - enable zswap when it is initially disabled,
>   - restore the original state after tests complete.
>
> Skip the test when zswap state cannot be determined (e.g. unsupported or
> unreadable), and keep existing behavior when zswap is already enabled.
>
> This makes test_zswap more robust across systems where zswap is built but
> disabled by default.
>
> Signed-off-by: Li Wang <liwang@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Michal Koutný <mkoutny@suse.com>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Nhat Pham <nphamcs@gmail.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> ---
>  tools/testing/selftests/cgroup/test_zswap.c | 77 ++++++++++++++++++++-
>  1 file changed, 75 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
> index 64ebc3f3f203..1c80f4af9683 100644
> --- a/tools/testing/selftests/cgroup/test_zswap.c
> +++ b/tools/testing/selftests/cgroup/test_zswap.c
> @@ -594,18 +594,88 @@ static bool zswap_configured(void)
>         return access("/sys/module/zswap", F_OK) == 0;
>  }
>
> +static int zswap_enabled_state(void)

Just zswap_enabled() is good.

> +{
> +       char buf[16];
> +       ssize_t n;
> +
> +       if (!zswap_configured())
> +               return -1;

Return 0 here, zswap is disabled for all intents and purposes.

> +
> +       n = read_text("/sys/module/zswap/parameters/enabled", buf, sizeof(buf));
> +       if (n < 0 || n == 0)
> +               return -1;

if (read_text(..) <= 0)

> +
> +       switch (buf[0]) {
> +       case 'Y':
> +       case 'y':
> +       case '1':
> +               return 1;
> +       case 'N':
> +       case 'n':
> +       case '0':

Can a read really return any of these values? Or just Y/N?

> +               return 0;
> +       default:
> +               return -1;
> +       }
> +}
> +
[..]
>  int main(int argc, char **argv)
>  {
>         char root[PATH_MAX];
> -       int i;
> +       int i, orig_zswap_state;
>
>         ksft_print_header();
>         ksft_set_plan(ARRAY_SIZE(tests));
>         if (cg_find_unified_root(root, sizeof(root), NULL))
>                 ksft_exit_skip("cgroup v2 isn't mounted\n");
>
> -       if (!zswap_configured())
> +       orig_zswap_state = zswap_enabled_state();
> +
> +       if (orig_zswap_state == -1)
>                 ksft_exit_skip("zswap isn't configured\n");
> +       else if (orig_zswap_state == 0 && !enable_zswap())
> +               ksft_exit_skip("zswap is disabled and cannot be enabled\n");

As Michal mentioned, skip the test if zswap is not enabled.

Assuming zswap_enabled() only returns -1 if it fails to read the
module param (and zswap_configured() is true), then we should probably
fail instead of skip, as it means something is wrong with the test or
the module param.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap
  2026-03-11 11:05 ` [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap Li Wang
@ 2026-03-11 18:50   ` Yosry Ahmed
  2026-03-12  4:01     ` Li Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Yosry Ahmed @ 2026-03-11 18:50 UTC (permalink / raw)
  To: Li Wang
  Cc: linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Michal Koutný, Muchun Song, Nhat Pham,
	Tejun Heo, Roman Gushchin, Shakeel Butt

On Wed, Mar 11, 2026 at 4:05 AM Li Wang <liwang@redhat.com> wrote:
>
> test_swapin_nozswap can hit OOM before reaching its assertions on some
> setups. The test currently sets memory.max=8M and then allocates/reads
> 32M with memory.zswap.max=0, which may over-constrain reclaim and kill
> the workload process.
>
> Raise memory.max to 24M so the workload can make forward progress, and
> lower the swap_peak expectation from 24M to 8M to keep the check robust
> across environments.
>
> The test intent is unchanged: verify that swapping happens while zswap
> remains unused when memory.zswap.max=0.
>
> === Error Logs ===
>
>   # ./test_zswap
>   TAP version 13
>   1..7
>   ok 1 test_zswap_usage
>   not ok 2 test_swapin_nozswap
>   ...
>
>   # dmesg
>   [271641.879153] test_zswap invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
>   [271641.879168] CPU: 1 UID: 0 PID: 177372 Comm: test_zswap Kdump: loaded Not tainted 6.12.0-211.el10.ppc64le #1 VOLUNTARY
>   [271641.879171] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW940.02 (UL940_041) hv:phyp pSeries
>   [271641.879173] Call Trace:
>   [271641.879174] [c00000037540f730] [c00000000127ec44] dump_stack_lvl+0x88/0xc4 (unreliable)
>   [271641.879184] [c00000037540f760] [c0000000005cc594] dump_header+0x5c/0x1e4
>   [271641.879188] [c00000037540f7e0] [c0000000005cb464] oom_kill_process+0x324/0x3b0
>   [271641.879192] [c00000037540f860] [c0000000005cbe48] out_of_memory+0x118/0x420
>   [271641.879196] [c00000037540f8f0] [c00000000070d8ec] mem_cgroup_out_of_memory+0x18c/0x1b0
>   [271641.879200] [c00000037540f990] [c000000000713888] try_charge_memcg+0x598/0x890
>   [271641.879204] [c00000037540fa70] [c000000000713dbc] charge_memcg+0x5c/0x110
>   [271641.879207] [c00000037540faa0] [c0000000007159f8] __mem_cgroup_charge+0x48/0x120
>   [271641.879211] [c00000037540fae0] [c000000000641914] alloc_anon_folio+0x2b4/0x5a0
>   [271641.879215] [c00000037540fb60] [c000000000641d58] do_anonymous_page+0x158/0x6b0
>   [271641.879218] [c00000037540fbd0] [c000000000642f8c] __handle_mm_fault+0x4bc/0x910
>   [271641.879221] [c00000037540fcf0] [c000000000643500] handle_mm_fault+0x120/0x3c0
>   [271641.879224] [c00000037540fd40] [c00000000014bba0] ___do_page_fault+0x1c0/0x980
>   [271641.879228] [c00000037540fdf0] [c00000000014c44c] hash__do_page_fault+0x2c/0xc0
>   [271641.879232] [c00000037540fe20] [c0000000001565d8] do_hash_fault+0x128/0x1d0
>   [271641.879236] [c00000037540fe50] [c000000000008be0] data_access_common_virt+0x210/0x220
>   [271641.879548] Tasks state (memory values in pages):
>   ...
>   [271641.879550] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
>   [271641.879555] [ 177372]     0 177372      571        0        0        0         0    51200       96             0 test_zswap
>   [271641.879562] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/no_zswap_test,task_memcg=/no_zswap_test,task=test_zswap,pid=177372,uid=0
>   [271641.879578] Memory cgroup out of memory: Killed process 177372 (test_zswap) total-vm:36544kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:50kB oom_score_adj:0

Why are we getting an OOM kill when there's a swap device? Is the
device slow / not keeping up with reclaim pace?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] selftests/cgroup: use runtime page size for zswpin check
  2026-03-11 11:05 ` [PATCH 3/5] selftests/cgroup: use runtime page size for zswpin check Li Wang
@ 2026-03-11 18:56   ` Yosry Ahmed
  2026-03-12  2:35     ` Li Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Yosry Ahmed @ 2026-03-11 18:56 UTC (permalink / raw)
  To: Li Wang
  Cc: linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Michal Koutný, Muchun Song, Nhat Pham,
	Tejun Heo, Roman Gushchin, Shakeel Butt

On Wed, Mar 11, 2026 at 4:05 AM Li Wang <liwang@redhat.com> wrote:
>
> test_zswapin compares memory.stat:zswpin (counted in pages) against a
> byte threshold converted with PAGE_SIZE. In cgroup selftests, PAGE_SIZE
> is hardcoded to 4096, which makes the conversion wrong on systems with
> non-4K base pages (e.g. 64K).

This should be fixed in cgroup_util.h, but I see how that will be
annoying as it's used to initialize a bunch of arrays. Looking a bit
closer it seems like for most of these code paths it's really just a
buffer size, not necessarily the system page size, so maybe we should
just rename it?

For this patch, we actually need the system page size at runtime so
this looks good:

Reviewed-by: Yosry Ahmed <yosry@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] selftest/cgroup: fix zswap attempt_writeback() on 64K pagesize system
  2026-03-11 11:05 ` [PATCH 5/5] selftest/cgroup: fix zswap attempt_writeback() " Li Wang
@ 2026-03-11 18:58   ` Yosry Ahmed
  2026-03-12  2:38     ` Li Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Yosry Ahmed @ 2026-03-11 18:58 UTC (permalink / raw)
  To: Li Wang
  Cc: linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Michal Koutný, Muchun Song, Nhat Pham,
	Tejun Heo, Roman Gushchin, Shakeel Butt

On Wed, Mar 11, 2026 at 4:06 AM Li Wang <liwang@redhat.com> wrote:
>
> In attempt_writeback(), a memsize of 4M only covers 64 pages on 64K
> page size systems. When memory.reclaim is called, the kernel prefers
> reclaiming clean file pages (binary, libc, linker, etc.) over swapping
> anonymous pages. With only 64 pages of anonymous memory, the reclaim
> target can be largely or entirely satisfied by dropping file pages,
> resulting in very few or zero anonymous pages being pushed into zswap.
>
> This causes zswap_usage to be extremely small or zero, making
> zswap_usage/2 insufficient to create meaningful writeback pressure.
> The test then fails because no writeback is triggered.
>
> On 4K page size systems this is not an issue because 4M covers 1024
> pages, and file pages are a small fraction of the reclaim target.
>
> Scale memsize up to 64M on systems with page size larger than 4K, so
> that enough anonymous pages are allocated to reliably populate zswap
> and trigger writeback. The original 4M is kept for 4K page size systems
> to avoid unnecessary memory usage and test runtime.
>
> === Error Log ===
>   # uname -rm
>   6.12.0-211.el10.ppc64le ppc64le
>
>   # getconf PAGESIZE
>   65536
>
>   # ./test_zswap
>   TAP version 13
>   1..7
>   ok 1 test_zswap_usage
>   ok 2 test_swapin_nozswap
>   ok 3 test_zswapin
>   not ok 4 test_zswap_writeback_enabled
>   ...
>
> Signed-off-by: Li Wang <liwang@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Michal Koutný <mkoutny@suse.com>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Nhat Pham <nphamcs@gmail.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> ---
>  tools/testing/selftests/cgroup/test_zswap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
> index 30d3fbf6b4fb..2d065184eea4 100644
> --- a/tools/testing/selftests/cgroup/test_zswap.c
> +++ b/tools/testing/selftests/cgroup/test_zswap.c
> @@ -264,7 +264,7 @@ static int test_zswapin(const char *root)
>  static int attempt_writeback(const char *cgroup, void *arg)
>  {
>         long pagesize = sysconf(_SC_PAGESIZE);
> -       size_t memsize = MB(4);
> +       size_t memsize = pagesize > 4096 ? MB(64): MB(4);

pagesize << 10 or pagesize * 1024?

>         char buf[pagesize];
>         long zswap_usage;
>         bool wb_enabled = *(bool *) arg;
> --
> 2.53.0
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] selftest/cgroup: fix zswap test_no_invasive_cgroup_shrink on 64K pagesize system
  2026-03-11 11:05 ` [PATCH 4/5] selftest/cgroup: fix zswap test_no_invasive_cgroup_shrink on 64K pagesize system Li Wang
@ 2026-03-11 19:01   ` Yosry Ahmed
  2026-03-12  2:36     ` Li Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Yosry Ahmed @ 2026-03-11 19:01 UTC (permalink / raw)
  To: Li Wang
  Cc: linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Michal Koutný, Muchun Song, Nhat Pham,
	Tejun Heo, Roman Gushchin, Shakeel Butt

On Wed, Mar 11, 2026 at 4:06 AM Li Wang <liwang@redhat.com> wrote:
>
> The test_no_invasive_cgroup_shrink and allocate_bytes use a hardcoded
> stride of 4095 bytes when touching allocated pages. On systems with 64K
> page size, this results in writing to the same page multiple times
> instead of touching all pages, leading to insufficient memory pressure.
>
> Additionally, the original memory limits and allocation sizes are too
> small for 64K page size systems. With only 1M memory.max, there are
> very few pages available, and a zswap.max of 10K may not provide enough
> room to store even a single compressed page. This can cause OOM kills
> or false positives due to insufficient zswap writeback being triggered.
>
> Fix these issues by:
> - Using sysconf(_SC_PAGESIZE) instead of the hardcoded 4095 stride in
>   both allocate_bytes() and test_no_invasive_cgroup_shrink().

AFAICT there are other instances of hardcoded 4095 and 4096 values in
the test, do you mind having a separate patch that updates all of them
to the runtime value?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap
  2026-03-11 18:47 ` Yosry Ahmed
@ 2026-03-12  1:41   ` Li Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Li Wang @ 2026-03-12  1:41 UTC (permalink / raw)
  To: Yosry Ahmed, Michal Koutný
  Cc: linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Muchun Song, Nhat Pham, Tejun Heo, Roman Gushchin,
	Shakeel Butt

On Wed, Mar 11, 2026 at 11:47:22AM -0700, Yosry Ahmed wrote:
...
> > --- a/tools/testing/selftests/cgroup/test_zswap.c
> > +++ b/tools/testing/selftests/cgroup/test_zswap.c
> > @@ -594,18 +594,88 @@ static bool zswap_configured(void)
> >         return access("/sys/module/zswap", F_OK) == 0;
> >  }
> >
> > +static int zswap_enabled_state(void)
> 
> Just zswap_enabled() is good.

+1

> 
> > +{
> > +       char buf[16];
> > +       ssize_t n;
> > +
> > +       if (!zswap_configured())
> > +               return -1;
> 
> Return 0 here, zswap is disabled for all intents and purposes.
> 
> > +
> > +       n = read_text("/sys/module/zswap/parameters/enabled", buf, sizeof(buf));
> > +       if (n < 0 || n == 0)
> > +               return -1;
> 
> if (read_text(..) <= 0)
> 
> > +
> > +       switch (buf[0]) {
> > +       case 'Y':
> > +       case 'y':
> > +       case '1':
> > +               return 1;
> > +       case 'N':
> > +       case 'n':
> > +       case '0':
> 
> Can a read really return any of these values? Or just Y/N?

On my side it just returns 'Y/N', but I use more chars only for
safe/compatiable consideration. Maybe we don't need others.

> >  int main(int argc, char **argv)
> >  {
> >         char root[PATH_MAX];
> > -       int i;
> > +       int i, orig_zswap_state;
> >
> >         ksft_print_header();
> >         ksft_set_plan(ARRAY_SIZE(tests));
> >         if (cg_find_unified_root(root, sizeof(root), NULL))
> >                 ksft_exit_skip("cgroup v2 isn't mounted\n");
> >
> > -       if (!zswap_configured())
> > +       orig_zswap_state = zswap_enabled_state();
> > +
> > +       if (orig_zswap_state == -1)
> >                 ksft_exit_skip("zswap isn't configured\n");
> > +       else if (orig_zswap_state == 0 && !enable_zswap())
> > +               ksft_exit_skip("zswap is disabled and cannot be enabled\n");
> 
> As Michal mentioned, skip the test if zswap is not enabled.

Sure, that would be simpler and we can remove the enable/disable_zswap() functions.

> Assuming zswap_enabled() only returns -1 if it fails to read the
> module param (and zswap_configured() is true), then we should probably
> fail instead of skip, as it means something is wrong with the test or
> the module param.

Absoultely right. Thanks for reviewing.

-- 
Regards,
Li Wang


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] selftests/cgroup: use runtime page size for zswpin check
  2026-03-11 18:56   ` Yosry Ahmed
@ 2026-03-12  2:35     ` Li Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Li Wang @ 2026-03-12  2:35 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Michal Koutný, Muchun Song, Nhat Pham,
	Tejun Heo, Roman Gushchin, Shakeel Butt

On Wed, Mar 11, 2026 at 11:56:38AM -0700, Yosry Ahmed wrote:
> On Wed, Mar 11, 2026 at 4:05 AM Li Wang <liwang@redhat.com> wrote:
> >
> > test_zswapin compares memory.stat:zswpin (counted in pages) against a
> > byte threshold converted with PAGE_SIZE. In cgroup selftests, PAGE_SIZE
> > is hardcoded to 4096, which makes the conversion wrong on systems with
> > non-4K base pages (e.g. 64K).
> 
> This should be fixed in cgroup_util.h, but I see how that will be
> annoying as it's used to initialize a bunch of arrays. Looking a bit
> closer it seems like for most of these code paths it's really just a
> buffer size, not necessarily the system page size, so maybe we should
> just rename it?

Agree, maybe rename it to BUF_SIZE in anthoer seperate patch.

> For this patch, we actually need the system page size at runtime so
> this looks good:
> 
> Reviewed-by: Yosry Ahmed <yosry@kernel.org>

Thanks!

-- 
Regards,
Li Wang


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] selftest/cgroup: fix zswap test_no_invasive_cgroup_shrink on 64K pagesize system
  2026-03-11 19:01   ` Yosry Ahmed
@ 2026-03-12  2:36     ` Li Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Li Wang @ 2026-03-12  2:36 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Michal Koutný, Muchun Song, Nhat Pham,
	Tejun Heo, Roman Gushchin, Shakeel Butt

On Wed, Mar 11, 2026 at 12:01:24PM -0700, Yosry Ahmed wrote:
> On Wed, Mar 11, 2026 at 4:06 AM Li Wang <liwang@redhat.com> wrote:
> >
> > The test_no_invasive_cgroup_shrink and allocate_bytes use a hardcoded
> > stride of 4095 bytes when touching allocated pages. On systems with 64K
> > page size, this results in writing to the same page multiple times
> > instead of touching all pages, leading to insufficient memory pressure.
> >
> > Additionally, the original memory limits and allocation sizes are too
> > small for 64K page size systems. With only 1M memory.max, there are
> > very few pages available, and a zswap.max of 10K may not provide enough
> > room to store even a single compressed page. This can cause OOM kills
> > or false positives due to insufficient zswap writeback being triggered.
> >
> > Fix these issues by:
> > - Using sysconf(_SC_PAGESIZE) instead of the hardcoded 4095 stride in
> >   both allocate_bytes() and test_no_invasive_cgroup_shrink().
> 
> AFAICT there are other instances of hardcoded 4095 and 4096 values in
> the test, do you mind having a separate patch that updates all of them
> to the runtime value?

Good point, I will achive it in patch V2. Thanks!

-- 
Regards,
Li Wang


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] selftest/cgroup: fix zswap attempt_writeback() on 64K pagesize system
  2026-03-11 18:58   ` Yosry Ahmed
@ 2026-03-12  2:38     ` Li Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Li Wang @ 2026-03-12  2:38 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Michal Koutný, Muchun Song, Nhat Pham,
	Tejun Heo, Roman Gushchin, Shakeel Butt

On Wed, Mar 11, 2026 at 11:58:34AM -0700, Yosry Ahmed wrote:
> On Wed, Mar 11, 2026 at 4:06 AM Li Wang <liwang@redhat.com> wrote:
> >
> > In attempt_writeback(), a memsize of 4M only covers 64 pages on 64K
> > page size systems. When memory.reclaim is called, the kernel prefers
> > reclaiming clean file pages (binary, libc, linker, etc.) over swapping
> > anonymous pages. With only 64 pages of anonymous memory, the reclaim
> > target can be largely or entirely satisfied by dropping file pages,
> > resulting in very few or zero anonymous pages being pushed into zswap.
> >
> > This causes zswap_usage to be extremely small or zero, making
> > zswap_usage/2 insufficient to create meaningful writeback pressure.
> > The test then fails because no writeback is triggered.
> >
> > On 4K page size systems this is not an issue because 4M covers 1024
> > pages, and file pages are a small fraction of the reclaim target.
> >
> > Scale memsize up to 64M on systems with page size larger than 4K, so
> > that enough anonymous pages are allocated to reliably populate zswap
> > and trigger writeback. The original 4M is kept for 4K page size systems
> > to avoid unnecessary memory usage and test runtime.
> >
> > === Error Log ===
> >   # uname -rm
> >   6.12.0-211.el10.ppc64le ppc64le
> >
> >   # getconf PAGESIZE
> >   65536
> >
> >   # ./test_zswap
> >   TAP version 13
> >   1..7
> >   ok 1 test_zswap_usage
> >   ok 2 test_swapin_nozswap
> >   ok 3 test_zswapin
> >   not ok 4 test_zswap_writeback_enabled
> >   ...
> >
> > Signed-off-by: Li Wang <liwang@redhat.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Michal Koutný <mkoutny@suse.com>
> > Cc: Muchun Song <muchun.song@linux.dev>
> > Cc: Nhat Pham <nphamcs@gmail.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Roman Gushchin <roman.gushchin@linux.dev>
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > Cc: Yosry Ahmed <yosryahmed@google.com>
> > ---
> >  tools/testing/selftests/cgroup/test_zswap.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
> > index 30d3fbf6b4fb..2d065184eea4 100644
> > --- a/tools/testing/selftests/cgroup/test_zswap.c
> > +++ b/tools/testing/selftests/cgroup/test_zswap.c
> > @@ -264,7 +264,7 @@ static int test_zswapin(const char *root)
> >  static int attempt_writeback(const char *cgroup, void *arg)
> >  {
> >         long pagesize = sysconf(_SC_PAGESIZE);
> > -       size_t memsize = MB(4);
> > +       size_t memsize = pagesize > 4096 ? MB(64): MB(4);
> 
> pagesize << 10 or pagesize * 1024?

Sure, I will go pagesize * 1024 which clearly shows the numbers of page.

-- 
Regards,
Li Wang


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap
  2026-03-11 18:50   ` Yosry Ahmed
@ 2026-03-12  4:01     ` Li Wang
  2026-03-12 17:09       ` Nhat Pham
  0 siblings, 1 reply; 19+ messages in thread
From: Li Wang @ 2026-03-12  4:01 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Michal Koutný, Muchun Song, Nhat Pham,
	Tejun Heo, Roman Gushchin, Shakeel Butt

On Wed, Mar 11, 2026 at 11:50:05AM -0700, Yosry Ahmed wrote:
> On Wed, Mar 11, 2026 at 4:05 AM Li Wang <liwang@redhat.com> wrote:
> >
> > test_swapin_nozswap can hit OOM before reaching its assertions on some
> > setups. The test currently sets memory.max=8M and then allocates/reads
> > 32M with memory.zswap.max=0, which may over-constrain reclaim and kill
> > the workload process.
> >
> > Raise memory.max to 24M so the workload can make forward progress, and
> > lower the swap_peak expectation from 24M to 8M to keep the check robust
> > across environments.
> >
> > The test intent is unchanged: verify that swapping happens while zswap
> > remains unused when memory.zswap.max=0.
> >
> > === Error Logs ===
> >
> >   # ./test_zswap
> >   TAP version 13
> >   1..7
> >   ok 1 test_zswap_usage
> >   not ok 2 test_swapin_nozswap
> >   ...
> >
> >   # dmesg
> >   [271641.879153] test_zswap invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> >   [271641.879168] CPU: 1 UID: 0 PID: 177372 Comm: test_zswap Kdump: loaded Not tainted 6.12.0-211.el10.ppc64le #1 VOLUNTARY
> >   [271641.879171] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW940.02 (UL940_041) hv:phyp pSeries
> >   [271641.879173] Call Trace:
> >   [271641.879174] [c00000037540f730] [c00000000127ec44] dump_stack_lvl+0x88/0xc4 (unreliable)
> >   [271641.879184] [c00000037540f760] [c0000000005cc594] dump_header+0x5c/0x1e4
> >   [271641.879188] [c00000037540f7e0] [c0000000005cb464] oom_kill_process+0x324/0x3b0
> >   [271641.879192] [c00000037540f860] [c0000000005cbe48] out_of_memory+0x118/0x420
> >   [271641.879196] [c00000037540f8f0] [c00000000070d8ec] mem_cgroup_out_of_memory+0x18c/0x1b0
> >   [271641.879200] [c00000037540f990] [c000000000713888] try_charge_memcg+0x598/0x890
> >   [271641.879204] [c00000037540fa70] [c000000000713dbc] charge_memcg+0x5c/0x110
> >   [271641.879207] [c00000037540faa0] [c0000000007159f8] __mem_cgroup_charge+0x48/0x120
> >   [271641.879211] [c00000037540fae0] [c000000000641914] alloc_anon_folio+0x2b4/0x5a0
> >   [271641.879215] [c00000037540fb60] [c000000000641d58] do_anonymous_page+0x158/0x6b0
> >   [271641.879218] [c00000037540fbd0] [c000000000642f8c] __handle_mm_fault+0x4bc/0x910
> >   [271641.879221] [c00000037540fcf0] [c000000000643500] handle_mm_fault+0x120/0x3c0
> >   [271641.879224] [c00000037540fd40] [c00000000014bba0] ___do_page_fault+0x1c0/0x980
> >   [271641.879228] [c00000037540fdf0] [c00000000014c44c] hash__do_page_fault+0x2c/0xc0
> >   [271641.879232] [c00000037540fe20] [c0000000001565d8] do_hash_fault+0x128/0x1d0
> >   [271641.879236] [c00000037540fe50] [c000000000008be0] data_access_common_virt+0x210/0x220
> >   [271641.879548] Tasks state (memory values in pages):
> >   ...
> >   [271641.879550] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> >   [271641.879555] [ 177372]     0 177372      571        0        0        0         0    51200       96             0 test_zswap
> >   [271641.879562] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/no_zswap_test,task_memcg=/no_zswap_test,task=test_zswap,pid=177372,uid=0
> >   [271641.879578] Memory cgroup out of memory: Killed process 177372 (test_zswap) total-vm:36544kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:50kB oom_score_adj:0
> 
> Why are we getting an OOM kill when there's a swap device? Is the
> device slow / not keeping up with reclaim pace?

This is a good question. The OOM is triggered very likely because memcg
reclaim can't make forward progress fast enough within the retry budget
of try_charge_memcg.

Looking at the OOM info, the system has 64K pages, so memory.max=8M gives
only 128 pages. At OOM time, RSS is 0 and swapents is only 96. Swap space
itself isn't full, the charge path simply gave up trying to reclaim.

The core issue, I guess, is that with memory.zswap.max=0, every page
reclaimed must go through the real block device. The charge path works
like this: a page fault fires, charge_memcg tries to charge 64K to the
cgroup, the cgroup is at its limit, so try_charge_memcg attempts direct
reclaim to free space. If the swap device can't drain pages fast enough,
the reclaim attempts within the retry loop fail to bring usage below
memory.max, and the kernel invokes OOM, even though swap space is
technically available.

Raising memory.max to 24M gives reclaim a much larger pool to work with,
so it can absorb I/O latency without exhausting its retry budget.

-- 
Regards,
Li Wang


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap
  2026-03-12  4:01     ` Li Wang
@ 2026-03-12 17:09       ` Nhat Pham
  2026-03-13  2:59         ` Li Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Nhat Pham @ 2026-03-12 17:09 UTC (permalink / raw)
  To: Li Wang
  Cc: Yosry Ahmed, linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Michal Koutný, Muchun Song, Tejun Heo,
	Roman Gushchin, Shakeel Butt

On Wed, Mar 11, 2026 at 9:01 PM Li Wang <liwang@redhat.com> wrote:
>
> On Wed, Mar 11, 2026 at 11:50:05AM -0700, Yosry Ahmed wrote:
> > On Wed, Mar 11, 2026 at 4:05 AM Li Wang <liwang@redhat.com> wrote:
> > >
> > > test_swapin_nozswap can hit OOM before reaching its assertions on some
> > > setups. The test currently sets memory.max=8M and then allocates/reads
> > > 32M with memory.zswap.max=0, which may over-constrain reclaim and kill
> > > the workload process.
> > >
> > > Raise memory.max to 24M so the workload can make forward progress, and
> > > lower the swap_peak expectation from 24M to 8M to keep the check robust
> > > across environments.
> > >
> > > The test intent is unchanged: verify that swapping happens while zswap
> > > remains unused when memory.zswap.max=0.
> > >
> > > === Error Logs ===
> > >
> > >   # ./test_zswap
> > >   TAP version 13
> > >   1..7
> > >   ok 1 test_zswap_usage
> > >   not ok 2 test_swapin_nozswap
> > >   ...
> > >
> > >   # dmesg
> > >   [271641.879153] test_zswap invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> > >   [271641.879168] CPU: 1 UID: 0 PID: 177372 Comm: test_zswap Kdump: loaded Not tainted 6.12.0-211.el10.ppc64le #1 VOLUNTARY
> > >   [271641.879171] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW940.02 (UL940_041) hv:phyp pSeries
> > >   [271641.879173] Call Trace:
> > >   [271641.879174] [c00000037540f730] [c00000000127ec44] dump_stack_lvl+0x88/0xc4 (unreliable)
> > >   [271641.879184] [c00000037540f760] [c0000000005cc594] dump_header+0x5c/0x1e4
> > >   [271641.879188] [c00000037540f7e0] [c0000000005cb464] oom_kill_process+0x324/0x3b0
> > >   [271641.879192] [c00000037540f860] [c0000000005cbe48] out_of_memory+0x118/0x420
> > >   [271641.879196] [c00000037540f8f0] [c00000000070d8ec] mem_cgroup_out_of_memory+0x18c/0x1b0
> > >   [271641.879200] [c00000037540f990] [c000000000713888] try_charge_memcg+0x598/0x890
> > >   [271641.879204] [c00000037540fa70] [c000000000713dbc] charge_memcg+0x5c/0x110
> > >   [271641.879207] [c00000037540faa0] [c0000000007159f8] __mem_cgroup_charge+0x48/0x120
> > >   [271641.879211] [c00000037540fae0] [c000000000641914] alloc_anon_folio+0x2b4/0x5a0
> > >   [271641.879215] [c00000037540fb60] [c000000000641d58] do_anonymous_page+0x158/0x6b0
> > >   [271641.879218] [c00000037540fbd0] [c000000000642f8c] __handle_mm_fault+0x4bc/0x910
> > >   [271641.879221] [c00000037540fcf0] [c000000000643500] handle_mm_fault+0x120/0x3c0
> > >   [271641.879224] [c00000037540fd40] [c00000000014bba0] ___do_page_fault+0x1c0/0x980
> > >   [271641.879228] [c00000037540fdf0] [c00000000014c44c] hash__do_page_fault+0x2c/0xc0
> > >   [271641.879232] [c00000037540fe20] [c0000000001565d8] do_hash_fault+0x128/0x1d0
> > >   [271641.879236] [c00000037540fe50] [c000000000008be0] data_access_common_virt+0x210/0x220
> > >   [271641.879548] Tasks state (memory values in pages):
> > >   ...
> > >   [271641.879550] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> > >   [271641.879555] [ 177372]     0 177372      571        0        0        0         0    51200       96             0 test_zswap
> > >   [271641.879562] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/no_zswap_test,task_memcg=/no_zswap_test,task=test_zswap,pid=177372,uid=0
> > >   [271641.879578] Memory cgroup out of memory: Killed process 177372 (test_zswap) total-vm:36544kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:50kB oom_score_adj:0
> >
> > Why are we getting an OOM kill when there's a swap device? Is the
> > device slow / not keeping up with reclaim pace?
>
> This is a good question. The OOM is triggered very likely because memcg
> reclaim can't make forward progress fast enough within the retry budget
> of try_charge_memcg.
>
> Looking at the OOM info, the system has 64K pages, so memory.max=8M gives
> only 128 pages. At OOM time, RSS is 0 and swapents is only 96. Swap space
> itself isn't full, the charge path simply gave up trying to reclaim.
>
> The core issue, I guess, is that with memory.zswap.max=0, every page
> reclaimed must go through the real block device. The charge path works
> like this: a page fault fires, charge_memcg tries to charge 64K to the
> cgroup, the cgroup is at its limit, so try_charge_memcg attempts direct
> reclaim to free space. If the swap device can't drain pages fast enough,
> the reclaim attempts within the retry loop fail to bring usage below
> memory.max, and the kernel invokes OOM, even though swap space is
> technically available.
>
> Raising memory.max to 24M gives reclaim a much larger pool to work with,
> so it can absorb I/O latency without exhausting its retry budget.

Hmmm, perhaps we should change all these constants to multiples of
base page size of a system?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap
  2026-03-12 17:09       ` Nhat Pham
@ 2026-03-13  2:59         ` Li Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Li Wang @ 2026-03-13  2:59 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Yosry Ahmed, linux-kselftest, linux-kernel, akpm, Johannes Weiner,
	Michal Hocko, Michal Koutný, Muchun Song, Tejun Heo,
	Roman Gushchin, Shakeel Butt

On Thu, Mar 12, 2026 at 10:09:10AM -0700, Nhat Pham wrote:
> On Wed, Mar 11, 2026 at 9:01 PM Li Wang <liwang@redhat.com> wrote:
> >
> > On Wed, Mar 11, 2026 at 11:50:05AM -0700, Yosry Ahmed wrote:
> > > On Wed, Mar 11, 2026 at 4:05 AM Li Wang <liwang@redhat.com> wrote:
> > > >
> > > > test_swapin_nozswap can hit OOM before reaching its assertions on some
> > > > setups. The test currently sets memory.max=8M and then allocates/reads
> > > > 32M with memory.zswap.max=0, which may over-constrain reclaim and kill
> > > > the workload process.
> > > >
> > > > Raise memory.max to 24M so the workload can make forward progress, and
> > > > lower the swap_peak expectation from 24M to 8M to keep the check robust
> > > > across environments.
> > > >
> > > > The test intent is unchanged: verify that swapping happens while zswap
> > > > remains unused when memory.zswap.max=0.
> > > >
> > > > === Error Logs ===
> > > >
> > > >   # ./test_zswap
> > > >   TAP version 13
> > > >   1..7
> > > >   ok 1 test_zswap_usage
> > > >   not ok 2 test_swapin_nozswap
> > > >   ...
> > > >
> > > >   # dmesg
> > > >   [271641.879153] test_zswap invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> > > >   [271641.879168] CPU: 1 UID: 0 PID: 177372 Comm: test_zswap Kdump: loaded Not tainted 6.12.0-211.el10.ppc64le #1 VOLUNTARY
> > > >   [271641.879171] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW940.02 (UL940_041) hv:phyp pSeries
> > > >   [271641.879173] Call Trace:
> > > >   [271641.879174] [c00000037540f730] [c00000000127ec44] dump_stack_lvl+0x88/0xc4 (unreliable)
> > > >   [271641.879184] [c00000037540f760] [c0000000005cc594] dump_header+0x5c/0x1e4
> > > >   [271641.879188] [c00000037540f7e0] [c0000000005cb464] oom_kill_process+0x324/0x3b0
> > > >   [271641.879192] [c00000037540f860] [c0000000005cbe48] out_of_memory+0x118/0x420
> > > >   [271641.879196] [c00000037540f8f0] [c00000000070d8ec] mem_cgroup_out_of_memory+0x18c/0x1b0
> > > >   [271641.879200] [c00000037540f990] [c000000000713888] try_charge_memcg+0x598/0x890
> > > >   [271641.879204] [c00000037540fa70] [c000000000713dbc] charge_memcg+0x5c/0x110
> > > >   [271641.879207] [c00000037540faa0] [c0000000007159f8] __mem_cgroup_charge+0x48/0x120
> > > >   [271641.879211] [c00000037540fae0] [c000000000641914] alloc_anon_folio+0x2b4/0x5a0
> > > >   [271641.879215] [c00000037540fb60] [c000000000641d58] do_anonymous_page+0x158/0x6b0
> > > >   [271641.879218] [c00000037540fbd0] [c000000000642f8c] __handle_mm_fault+0x4bc/0x910
> > > >   [271641.879221] [c00000037540fcf0] [c000000000643500] handle_mm_fault+0x120/0x3c0
> > > >   [271641.879224] [c00000037540fd40] [c00000000014bba0] ___do_page_fault+0x1c0/0x980
> > > >   [271641.879228] [c00000037540fdf0] [c00000000014c44c] hash__do_page_fault+0x2c/0xc0
> > > >   [271641.879232] [c00000037540fe20] [c0000000001565d8] do_hash_fault+0x128/0x1d0
> > > >   [271641.879236] [c00000037540fe50] [c000000000008be0] data_access_common_virt+0x210/0x220
> > > >   [271641.879548] Tasks state (memory values in pages):
> > > >   ...
> > > >   [271641.879550] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> > > >   [271641.879555] [ 177372]     0 177372      571        0        0        0         0    51200       96             0 test_zswap
> > > >   [271641.879562] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/no_zswap_test,task_memcg=/no_zswap_test,task=test_zswap,pid=177372,uid=0
> > > >   [271641.879578] Memory cgroup out of memory: Killed process 177372 (test_zswap) total-vm:36544kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:50kB oom_score_adj:0
> > >
> > > Why are we getting an OOM kill when there's a swap device? Is the
> > > device slow / not keeping up with reclaim pace?
> >
> > This is a good question. The OOM is triggered very likely because memcg
> > reclaim can't make forward progress fast enough within the retry budget
> > of try_charge_memcg.
> >
> > Looking at the OOM info, the system has 64K pages, so memory.max=8M gives
> > only 128 pages. At OOM time, RSS is 0 and swapents is only 96. Swap space
> > itself isn't full, the charge path simply gave up trying to reclaim.
> >
> > The core issue, I guess, is that with memory.zswap.max=0, every page
> > reclaimed must go through the real block device. The charge path works
> > like this: a page fault fires, charge_memcg tries to charge 64K to the
> > cgroup, the cgroup is at its limit, so try_charge_memcg attempts direct
> > reclaim to free space. If the swap device can't drain pages fast enough,
> > the reclaim attempts within the retry loop fail to bring usage below
> > memory.max, and the kernel invokes OOM, even though swap space is
> > technically available.
> >
> > Raising memory.max to 24M gives reclaim a much larger pool to work with,
> > so it can absorb I/O latency without exhausting its retry budget.
> 
> Hmmm, perhaps we should change all these constants to multiples of
> base page size of a system?

Yeah, this may better, let me try it in next version.

-- 
Regards,
Li Wang


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-03-13  2:59 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11 11:05 [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Li Wang
2026-03-11 11:05 ` [PATCH 2/5] selftests/cgroup: avoid OOM in test_swapin_nozswap Li Wang
2026-03-11 18:50   ` Yosry Ahmed
2026-03-12  4:01     ` Li Wang
2026-03-12 17:09       ` Nhat Pham
2026-03-13  2:59         ` Li Wang
2026-03-11 11:05 ` [PATCH 3/5] selftests/cgroup: use runtime page size for zswpin check Li Wang
2026-03-11 18:56   ` Yosry Ahmed
2026-03-12  2:35     ` Li Wang
2026-03-11 11:05 ` [PATCH 4/5] selftest/cgroup: fix zswap test_no_invasive_cgroup_shrink on 64K pagesize system Li Wang
2026-03-11 19:01   ` Yosry Ahmed
2026-03-12  2:36     ` Li Wang
2026-03-11 11:05 ` [PATCH 5/5] selftest/cgroup: fix zswap attempt_writeback() " Li Wang
2026-03-11 18:58   ` Yosry Ahmed
2026-03-12  2:38     ` Li Wang
2026-03-11 13:20 ` [PATCH 1/5] selftests/cgroup: detect and handle global zswap state in test_zswap Michal Koutný
2026-03-11 18:41   ` Yosry Ahmed
2026-03-11 18:47 ` Yosry Ahmed
2026-03-12  1:41   ` Li Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox