Linux Documentation
 help / color / mirror / Atom feed
* [PATCH v5 0/6] mm/zswap: Implement per-cgroup proactive writeback
@ 2026-06-29 11:20 Hao Jia
  2026-06-29 11:20 ` [PATCH v5 1/6] mm/zswap: Fix global shrinker when memory cgroup is disabled Hao Jia
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Hao Jia @ 2026-06-29 11:20 UTC (permalink / raw)
  To: akpm, tj, hannes, shakeel.butt, mhocko, yosry, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin
  Cc: linux-mm, linux-kernel, linux-doc, Hao Jia

From: Hao Jia <jiahao1@lixiang.com>

Zswap currently writes back pages to backing swap reactively, triggered
either by the shrinker or by the pool reaching its size limit. Although
proactive memory reclaim can automatically write back a portion of zswap
pages via the shrinker, it cannot explicitly control the amount of
writeback for a specific memory cgroup. Moreover, proactive memory reclaim
may not always be triggered during a steady state.

In certain scenarios, it is desirable to trigger writeback in advance to
free up memory. For example, users may want to prepare for an upcoming
memory-intensive workload by flushing cold memory to the backing storage
when the system is relatively idle.

This patch series introduces a "source=zswap" key to memory.reclaim
cgroup interface, allowing users to proactively write back cold compressed
data from zswap to the backing swap device. When specified, this key
bypasses standard memory reclaim and exclusively performs proactive zswap
writeback up to the requested budget. If omitted, the default reclaim
behavior remains unchanged.

Example usage:
  # Write back 10MB of compressed data from zswap to the backing swap
  echo "10M source=zswap" > memory.reclaim

Patch 1: Fix missing global shrinker when memory cgroup is disabled.
Patch 2: Extend shrink_memcg() to support batch writeback and update its
  return value semantics, thereby improving the writeback efficiency in
  the shrink_worker() path.
Patch 3: Extract a reusable writeback helper zswap_shrink_one_memcg() from
  shrink_worker().
Patch 4: Extend the memory.reclaim cgroup v2 interface with a new
  "source=zswap" key, allowing users to trigger proactive zswap
  writeback up to a requested budget.
Patch 5: Add the zswpwb_proactive_b stat to track the compressed bytes
  of proactive writeback for better monitoring and tuning.
Patch 6: Add tests for zswap proactive writeback.

v4->v5:
  - Add a new patch to fix missing global shrinker when memory cgroup is disabled.
  - Simplify batch writeback logic in shrink_memcg() and improve comments.
  - Refactor the writeback retry helper out of shrink_worker().
  - Replace the "zswap_writeback_only" memory.reclaim key with a more
    extensible "source=zswap" key, leaving room for selecting other
    reclaim sources in the future.

v3->v4:
  - Drop the per-memcg cursor and keep the root cgroup cursor
    (zswap_next_shrink) logic intact.
  - Stick to using the zswap_writeback_only key, and change the proactive
    writeback size to use the compressed size.
  - Consolidate and reuse the logic between shrink_worker() and
    shrink_memcg(). Enable batch writeback in the shrink_worker() path,
    while maintaining a low writeback budget in the zswap_store() path.

v2->v3:
    - Align the return value of zswap_proactive_writeback() with
      memory.reclaim and update the corresponding documentation accordingly.
    - Resolve conflicts in test_zswap.c on the mm-unstable branch.
    - Enhance the zswap proactive writeback selftests to guard against potential
      future regressions.

v1->v2:
    - As suggested by Yosry and Nhat, extend the memory.reclaim cgroup v2
      interface with a "zswap_writeback_only" key instead of adding a new
      dedicated cgroup interface.
    - Update the zswap documentation and add selftests for proactive writeback.

[v4] https://lore.kernel.org/all/20260618044857.69439-1-jiahao.kernel@gmail.com
[v3] https://lore.kernel.org/all/20260526114601.67041-1-jiahao.kernel@gmail.com
[v2] https://lore.kernel.org/all/20260525122242.36127-1-jiahao.kernel@gmail.com
[v1] https://lore.kernel.org/all/20260511105149.75584-1-jiahao.kernel@gmail.com

Hao Jia (6):
  mm/zswap: Fix global shrinker when memory cgroup is disabled
  mm/zswap: Support batch writeback in shrink_memcg()
  mm/zswap: Extract a reusable writeback helper from shrink_worker()
  mm/zswap: Implement proactive writeback
  mm/zswap: Add per-memcg stat for proactive writeback
  selftests/cgroup: Add tests for zswap proactive writeback

 Documentation/admin-guide/cgroup-v2.rst     |  22 +-
 Documentation/admin-guide/mm/zswap.rst      |  11 +-
 include/linux/memcontrol.h                  |   1 +
 include/linux/zswap.h                       |   7 +
 mm/memcontrol.c                             |   3 +
 mm/vmscan.c                                 |  24 +-
 mm/zswap.c                                  | 239 +++++++++++++++-----
 tools/testing/selftests/cgroup/test_zswap.c | 161 ++++++++++++-
 8 files changed, 410 insertions(+), 58 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v5 1/6] mm/zswap: Fix global shrinker when memory cgroup is disabled
  2026-06-29 11:20 [PATCH v5 0/6] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
@ 2026-06-29 11:20 ` Hao Jia
  2026-06-29 18:37   ` Nhat Pham
  2026-06-29 11:20 ` [PATCH v5 2/6] mm/zswap: Support batch writeback in shrink_memcg() Hao Jia
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Hao Jia @ 2026-06-29 11:20 UTC (permalink / raw)
  To: akpm, tj, hannes, shakeel.butt, mhocko, yosry, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin
  Cc: linux-mm, linux-kernel, linux-doc, Hao Jia, stable

From: Hao Jia <jiahao1@lixiang.com>

When memory cgroup is disabled, mem_cgroup_iter() always returns NULL.
Therefore, the global shrinker shrink_worker() always takes the !memcg
branch. After MAX_RECLAIM_RETRIES empty walks, the worker simply gives up,
so it fails to write back anything.

Therefore, when memory cgroup is disabled, fall through with the !memcg
branch and shrink the root memcg directly. Stop the loop once
shrink_memcg() reports -ENOENT, since the root LRU is the only target and
-ENOENT means it has been exhausted.

Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
Cc: stable@vger.kernel.org
Reported-by: Yosry Ahmed <yosry@kernel.org>
Closes: https://lore.kernel.org/all/CAO9r8zPVzMKFbCixxD-qgtRrkFxWVrHiZZeLc=eyTPKPVQgX4g@mail.gmail.com
Signed-off-by: Hao Jia <jiahao1@lixiang.com>
---
 mm/zswap.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 761cd699e0a3..0f8f04f22888 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1356,7 +1356,12 @@ static void shrink_worker(struct work_struct *w)
 		} while (memcg && !mem_cgroup_tryget_online(memcg));
 		spin_unlock(&zswap_shrink_lock);
 
-		if (!memcg) {
+		/*
+		 * Reaching a NULL memcg means a full hierarchy pass completed.
+		 * Exclude the memcg-disabled case, where it is always NULL, and
+		 * fall through to shrink the root LRU directly.
+		 */
+		if (!memcg && !mem_cgroup_disabled()) {
 			/*
 			 * Continue shrinking without incrementing failures if
 			 * we found candidate memcgs in the last tree walk.
@@ -1378,8 +1383,15 @@ static void shrink_worker(struct work_struct *w)
 		 * with pages in zswap. Skip this without incrementing attempts
 		 * and failures.
 		 */
-		if (ret == -ENOENT)
+		if (ret == -ENOENT) {
+			/*
+			 * With memcg disabled the root LRU is the only target, so
+			 * we should abort if it has no writeback-candidate pages.
+			 */
+			if (mem_cgroup_disabled())
+				break;
 			continue;
+		}
 		++attempts;
 
 		if (ret && ++failures == MAX_RECLAIM_RETRIES)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 2/6] mm/zswap: Support batch writeback in shrink_memcg()
  2026-06-29 11:20 [PATCH v5 0/6] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
  2026-06-29 11:20 ` [PATCH v5 1/6] mm/zswap: Fix global shrinker when memory cgroup is disabled Hao Jia
@ 2026-06-29 11:20 ` Hao Jia
  2026-06-30  0:21   ` Yosry Ahmed
  2026-06-29 11:20 ` [PATCH v5 3/6] mm/zswap: Extract a reusable writeback helper from shrink_worker() Hao Jia
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Hao Jia @ 2026-06-29 11:20 UTC (permalink / raw)
  To: akpm, tj, hannes, shakeel.butt, mhocko, yosry, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin
  Cc: linux-mm, linux-kernel, linux-doc, Hao Jia

From: Hao Jia <jiahao1@lixiang.com>

Currently, shrink_memcg() writes back at most one entry per-node during
its traversal. This makes shrink_worker() inefficient, as it must
repeatedly re-enter shrink_memcg() to make any substantial progress.

To address this, extend shrink_memcg() and rewrite its LRU iteration logic
to support batch writeback. Introduce the nr_to_scan parameter to bound how
many pages are scanned per call. This enables batch writeback in the
shrink_worker() path, while maintaining a low scan budget in the
zswap_store() path.

Additionally, to prepare for future proactive writeback, update the return
value semantics of shrink_memcg(): a positive value now represents the
actual number of compressed bytes written back, 0 indicates that candidates
existed but no writeback succeeded, and a negative value represents an
error code.

Suggested-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Hao Jia <jiahao1@lixiang.com>
---
 mm/zswap.c | 89 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 69 insertions(+), 20 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 0f8f04f22888..e2c2a3f1e061 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -160,6 +160,11 @@ struct zswap_pool {
 	char tfm_name[CRYPTO_MAX_ALG_NAME];
 };
 
+struct zswap_shrink_walk_arg {
+	unsigned long bytes_written;
+	bool encountered_page_in_swapcache;
+};
+
 /* Global LRU lists shared by all zswap pools. */
 static struct list_lru zswap_list_lru;
 
@@ -1089,8 +1094,9 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
 				       void *arg)
 {
 	struct zswap_entry *entry = container_of(item, struct zswap_entry, lru);
-	bool *encountered_page_in_swapcache = (bool *)arg;
+	struct zswap_shrink_walk_arg *walk_arg = arg;
 	swp_entry_t swpentry;
+	unsigned int length;
 	enum lru_status ret = LRU_REMOVED_RETRY;
 	int writeback_result;
 
@@ -1133,10 +1139,11 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
 
 	/*
 	 * Once the lru lock is dropped, the entry might get freed. The
-	 * swpentry is copied to the stack, and entry isn't deref'd again
-	 * until the entry is verified to still be alive in the tree.
+	 * needed fields are copied to the stack, and entry isn't deref'd
+	 * again until it is verified to still be alive in the tree.
 	 */
 	swpentry = entry->swpentry;
+	length = entry->length;
 
 	/*
 	 * It's safe to drop the lock here because we return either
@@ -1155,12 +1162,13 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
 		 * into the warmer region. We should terminate shrinking (if we're in the dynamic
 		 * shrinker context).
 		 */
-		if (writeback_result == -EEXIST && encountered_page_in_swapcache) {
+		if (writeback_result == -EEXIST) {
 			ret = LRU_STOP;
-			*encountered_page_in_swapcache = true;
+			walk_arg->encountered_page_in_swapcache = true;
 		}
 	} else {
 		zswap_written_back_pages++;
+		walk_arg->bytes_written += length;
 	}
 
 	return ret;
@@ -1169,8 +1177,11 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
 static unsigned long zswap_shrinker_scan(struct shrinker *shrinker,
 		struct shrink_control *sc)
 {
+	struct zswap_shrink_walk_arg walk_arg = {
+		.bytes_written = 0,
+		.encountered_page_in_swapcache = false,
+	};
 	unsigned long shrink_ret;
-	bool encountered_page_in_swapcache = false;
 
 	if (!zswap_shrinker_enabled ||
 			!mem_cgroup_zswap_writeback_enabled(sc->memcg)) {
@@ -1179,9 +1190,9 @@ static unsigned long zswap_shrinker_scan(struct shrinker *shrinker,
 	}
 
 	shrink_ret = list_lru_shrink_walk(&zswap_list_lru, sc, &shrink_memcg_cb,
-		&encountered_page_in_swapcache);
+		&walk_arg);
 
-	if (encountered_page_in_swapcache)
+	if (walk_arg.encountered_page_in_swapcache)
 		return SHRINK_STOP;
 
 	return shrink_ret ? shrink_ret : SHRINK_STOP;
@@ -1275,9 +1286,31 @@ static struct shrinker *zswap_alloc_shrinker(void)
 	return shrinker;
 }
 
-static int shrink_memcg(struct mem_cgroup *memcg)
+#define NR_ZSWAP_WB_BATCH	64UL
+
+/*
+ * Scan up to @nr_to_scan pages across the per-node zswap LRUs of @memcg
+ * and write back the reclaimable ones.
+ *
+ * Since the second-chance algorithm rotates referenced entries to the
+ * LRU tail, the per-node scan is capped at the current LRU length so
+ * each entry is scanned at most once per call. It is up to the caller
+ * to handle retries, deciding whether to scan another memcg to complete
+ * the full iteration, or to rescan the current memcg to drain its zswap
+ * entries.
+ *
+ * Return: The number of compressed bytes written back (>= 0), or -ENOENT
+ * if @memcg has writeback disabled, is a zombie cgroup, or has empty
+ * zswap LRUs.
+ */
+static long shrink_memcg(struct mem_cgroup *memcg, unsigned long nr_to_scan)
 {
-	int nid, shrunk = 0, scanned = 0;
+	struct zswap_shrink_walk_arg walk_arg = {
+		.bytes_written = 0,
+		.encountered_page_in_swapcache = false,
+	};
+	unsigned long nr_remaining = nr_to_scan;
+	int nid;
 
 	if (!mem_cgroup_zswap_writeback_enabled(memcg))
 		return -ENOENT;
@@ -1290,24 +1323,40 @@ static int shrink_memcg(struct mem_cgroup *memcg)
 		return -ENOENT;
 
 	for_each_node_state(nid, N_NORMAL_MEMORY) {
-		unsigned long nr_to_walk = 1;
+		unsigned long nr_to_walk;
 
-		shrunk += list_lru_walk_one(&zswap_list_lru, nid, memcg,
-					    &shrink_memcg_cb, NULL, &nr_to_walk);
-		scanned += 1 - nr_to_walk;
+		/*
+		 * Cap the scan at per-node LRU length so each entry is scanned
+		 * at most once per call.
+		 */
+		nr_to_walk = min(nr_remaining,
+				 list_lru_count_one(&zswap_list_lru, nid, memcg));
+		if (!nr_to_walk)
+			continue;
+
+		nr_remaining -= nr_to_walk;
+		list_lru_walk_one(&zswap_list_lru, nid, memcg, &shrink_memcg_cb,
+				  &walk_arg, &nr_to_walk);
+		/* Return the unused share of the budget to the pool. */
+		nr_remaining += nr_to_walk;
+
+		if (!nr_remaining)
+			break;
 	}
 
-	if (!scanned)
+	/* Nothing was scanned: every LRU under @memcg was empty. */
+	if (nr_remaining == nr_to_scan)
 		return -ENOENT;
 
-	return shrunk ? 0 : -EAGAIN;
+	return walk_arg.bytes_written;
 }
 
 static void shrink_worker(struct work_struct *w)
 {
 	struct mem_cgroup *memcg;
-	int ret, failures = 0, attempts = 0;
+	int failures = 0, attempts = 0;
 	unsigned long thr;
+	long ret;
 
 	/* Reclaim down to the accept threshold */
 	thr = zswap_accept_thr_pages();
@@ -1373,7 +1422,7 @@ static void shrink_worker(struct work_struct *w)
 			goto resched;
 		}
 
-		ret = shrink_memcg(memcg);
+		ret = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);
 		/* drop the extra reference */
 		mem_cgroup_put(memcg);
 
@@ -1394,7 +1443,7 @@ static void shrink_worker(struct work_struct *w)
 		}
 		++attempts;
 
-		if (ret && ++failures == MAX_RECLAIM_RETRIES)
+		if (ret <= 0 && ++failures == MAX_RECLAIM_RETRIES)
 			break;
 resched:
 		cond_resched();
@@ -1504,7 +1553,7 @@ bool zswap_store(struct folio *folio)
 	objcg = get_obj_cgroup_from_folio(folio);
 	if (objcg && !obj_cgroup_may_zswap(objcg)) {
 		memcg = get_mem_cgroup_from_objcg(objcg);
-		if (shrink_memcg(memcg)) {
+		if (shrink_memcg(memcg, num_node_state(N_NORMAL_MEMORY)) <= 0) {
 			mem_cgroup_put(memcg);
 			goto put_objcg;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 3/6] mm/zswap: Extract a reusable writeback helper from shrink_worker()
  2026-06-29 11:20 [PATCH v5 0/6] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
  2026-06-29 11:20 ` [PATCH v5 1/6] mm/zswap: Fix global shrinker when memory cgroup is disabled Hao Jia
  2026-06-29 11:20 ` [PATCH v5 2/6] mm/zswap: Support batch writeback in shrink_memcg() Hao Jia
@ 2026-06-29 11:20 ` Hao Jia
  2026-06-29 11:20 ` [PATCH v5 4/6] mm/zswap: Implement proactive writeback Hao Jia
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Hao Jia @ 2026-06-29 11:20 UTC (permalink / raw)
  To: akpm, tj, hannes, shakeel.butt, mhocko, yosry, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin
  Cc: linux-mm, linux-kernel, linux-doc, Hao Jia

From: Hao Jia <jiahao1@lixiang.com>

Extract a reusable writeback helper zswap_shrink_one_memcg() from
shrink_worker(). This helper will be reused by the upcoming proactive
writeback feature.

zswap_shrink_one_memcg() takes one step of a memcg-tree writeback walk
driven by the caller's iterator. Consequently, shrink_worker() now only
needs to calculate the acceptance threshold, drive its own iteration
based on this helper, and abort the walk when zswap_shrink_one_memcg()
returns -EBUSY.

Suggested-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Hao Jia <jiahao1@lixiang.com>
---
 mm/zswap.c | 118 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 69 insertions(+), 49 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index e2c2a3f1e061..ba01bf0e44e9 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1351,12 +1351,70 @@ static long shrink_memcg(struct mem_cgroup *memcg, unsigned long nr_to_scan)
 	return walk_arg.bytes_written;
 }
 
+/* Track progress of a memcg-tree writeback walk. */
+struct zswap_shrink_state {
+	int scans;
+	int failures;
+};
+
+/*
+ * Take one step of a memcg-tree writeback walk driven by the caller's
+ * iterator, and fold the result into @s, the retry bookkeeping shared
+ * across steps. @memcg is the iterator's current memcg, or NULL once
+ * it has wrapped around after a full pass over the tree.
+ *
+ * The function returns -EBUSY to signal the caller to abort the walk after
+ * encountering either of the following MAX_RECLAIM_RETRIES times:
+ * - No writeback-candidate memcgs were found in a memcg tree walk.
+ * - Shrinking a writeback-candidate memcg failed.
+ *
+ * Return: The number of compressed bytes written back (>= 0), or -EBUSY
+ * when the caller should abort the walk.
+ */
+static long zswap_shrink_one_memcg(struct mem_cgroup *memcg,
+				   struct zswap_shrink_state *s)
+{
+	long shrunk;
+
+	/*
+	 * Reaching a NULL memcg means a full hierarchy pass completed.
+	 * Exclude the memcg-disabled case, where it is always NULL, and
+	 * fall through to shrink the root LRU directly.
+	 */
+	if (!memcg && !mem_cgroup_disabled()) {
+		/*
+		 * Continue shrinking without incrementing failures if we found
+		 * candidate memcgs in the last tree walk.
+		 */
+		if (!s->scans && ++s->failures == MAX_RECLAIM_RETRIES)
+			return -EBUSY;
+		s->scans = 0;
+		return 0;
+	}
+
+	shrunk = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);
+
+	/*
+	 * There are no writeback-candidate pages in the memcg. With memcg
+	 * enabled this is not an issue as long as we can find another memcg
+	 * with pages in zswap, so skip without counting it as a candidate.
+	 * With memcg disabled the root LRU is the only target, so we should
+	 * abort if it has no writeback-candidate pages.
+	 */
+	if (shrunk == -ENOENT)
+		return mem_cgroup_disabled() ? -EBUSY : 0;
+	s->scans++;
+
+	if (shrunk <= 0 && ++s->failures == MAX_RECLAIM_RETRIES)
+		return -EBUSY;
+
+	return shrunk;
+}
+
 static void shrink_worker(struct work_struct *w)
 {
-	struct mem_cgroup *memcg;
-	int failures = 0, attempts = 0;
+	struct zswap_shrink_state s = {};
 	unsigned long thr;
-	long ret;
 
 	/* Reclaim down to the accept threshold */
 	thr = zswap_accept_thr_pages();
@@ -1367,11 +1425,6 @@ static void shrink_worker(struct work_struct *w)
 	 * writeback-disabled memcgs (memory.zswap.writeback=0) are not
 	 * candidates for shrinking.
 	 *
-	 * Shrinking will be aborted if we encounter the following
-	 * MAX_RECLAIM_RETRIES times:
-	 * - No writeback-candidate memcgs found in a memcg tree walk.
-	 * - Shrinking a writeback-candidate memcg failed.
-	 *
 	 * We save iteration cursor memcg into zswap_next_shrink,
 	 * which can be modified by the offline memcg cleaner
 	 * zswap_memcg_offline_cleanup().
@@ -1386,7 +1439,11 @@ static void shrink_worker(struct work_struct *w)
 	 * offline memcg left in zswap_next_shrink will hold the reference
 	 * until the next run of shrink_worker().
 	 */
-	do {
+	while (zswap_total_pages() > thr) {
+		struct mem_cgroup *memcg;
+		long ret;
+
+		cond_resched();
 		/*
 		 * Start shrinking from the next memcg after zswap_next_shrink.
 		 * When the offline cleaner has already advanced the cursor,
@@ -1405,49 +1462,12 @@ static void shrink_worker(struct work_struct *w)
 		} while (memcg && !mem_cgroup_tryget_online(memcg));
 		spin_unlock(&zswap_shrink_lock);
 
-		/*
-		 * Reaching a NULL memcg means a full hierarchy pass completed.
-		 * Exclude the memcg-disabled case, where it is always NULL, and
-		 * fall through to shrink the root LRU directly.
-		 */
-		if (!memcg && !mem_cgroup_disabled()) {
-			/*
-			 * Continue shrinking without incrementing failures if
-			 * we found candidate memcgs in the last tree walk.
-			 */
-			if (!attempts && ++failures == MAX_RECLAIM_RETRIES)
-				break;
-
-			attempts = 0;
-			goto resched;
-		}
-
-		ret = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);
+		ret = zswap_shrink_one_memcg(memcg, &s);
 		/* drop the extra reference */
 		mem_cgroup_put(memcg);
-
-		/*
-		 * There are no writeback-candidate pages in the memcg.
-		 * This is not an issue as long as we can find another memcg
-		 * with pages in zswap. Skip this without incrementing attempts
-		 * and failures.
-		 */
-		if (ret == -ENOENT) {
-			/*
-			 * With memcg disabled the root LRU is the only target, so
-			 * we should abort if it has no writeback-candidate pages.
-			 */
-			if (mem_cgroup_disabled())
-				break;
-			continue;
-		}
-		++attempts;
-
-		if (ret <= 0 && ++failures == MAX_RECLAIM_RETRIES)
+		if (ret == -EBUSY)
 			break;
-resched:
-		cond_resched();
-	} while (zswap_total_pages() > thr);
+	}
 }
 
 /*********************************
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 4/6] mm/zswap: Implement proactive writeback
  2026-06-29 11:20 [PATCH v5 0/6] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
                   ` (2 preceding siblings ...)
  2026-06-29 11:20 ` [PATCH v5 3/6] mm/zswap: Extract a reusable writeback helper from shrink_worker() Hao Jia
@ 2026-06-29 11:20 ` Hao Jia
  2026-06-30  0:15   ` Yosry Ahmed
  2026-06-29 11:20 ` [PATCH v5 5/6] mm/zswap: Add per-memcg stat for " Hao Jia
  2026-06-29 11:20 ` [PATCH v5 6/6] selftests/cgroup: Add tests for zswap " Hao Jia
  5 siblings, 1 reply; 15+ messages in thread
From: Hao Jia @ 2026-06-29 11:20 UTC (permalink / raw)
  To: akpm, tj, hannes, shakeel.butt, mhocko, yosry, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin
  Cc: linux-mm, linux-kernel, linux-doc, Hao Jia

From: Hao Jia <jiahao1@lixiang.com>

Zswap currently writes back pages to backing swap reactively, triggered
either by the shrinker or when the pool reaches its size limit. There is
no mechanism to control the amount of writeback for a specific memory
cgroup. However, users may want to proactively write back zswap pages,
e.g., to free up memory for other applications or to prepare for
memory-intensive workloads.

Introduce a "source=" key to the memory.reclaim cgroup interface,
currently accepting the single value "zswap". When set to "zswap", it
bypasses standard memory reclaim and exclusively performs proactive
zswap writeback up to the requested budget. If omitted, the default
reclaim behavior remains unchanged.

Example usage:
  # Write back 10MB of compressed data from zswap to the backing swap
  echo "10M source=zswap" > memory.reclaim

Note that the actual amount of compressed data written back may be less
than requested due to the zswap second-chance algorithm: referenced
entries are rotated on the LRU on the first encounter and only written
back on a second pass. If fewer bytes are written back than requested,
-EAGAIN is returned, matching the existing memory.reclaim semantics.

Internally, extend user_proactive_reclaim() to parse the new "source="
key and invoke the dedicated handler zswap_proactive_writeback() when it
is set to "zswap". This handler walks the target memcg subtree in a
round-robin fashion and drains each memcg's per-node zswap LRUs through
shrink_memcg(), accumulating the compressed bytes written back until the
requested budget is met.

Suggested-by: Yosry Ahmed <yosry@kernel.org>
Suggested-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Hao Jia <jiahao1@lixiang.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 18 ++++++++-
 Documentation/admin-guide/mm/zswap.rst  | 11 +++++-
 include/linux/zswap.h                   |  7 ++++
 mm/vmscan.c                             | 24 +++++++++++-
 mm/zswap.c                              | 50 +++++++++++++++++++++++++
 5 files changed, 106 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 993446ab66d0..bbcc9695aa8d 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1425,9 +1425,10 @@ PAGE_SIZE multiple when read back.
 
 The following nested keys are defined.
 
-	  ==========            ================================
+	  ====================  ==================================================
 	  swappiness            Swappiness value to reclaim with
-	  ==========            ================================
+	  source=zswap          Only perform proactive zswap writeback
+	  ====================  ==================================================
 
 	Specifying a swappiness value instructs the kernel to perform
 	the reclaim with that swappiness value. Note that this has the
@@ -1437,6 +1438,19 @@ The following nested keys are defined.
 	The valid range for swappiness is [0-200, max], setting
 	swappiness=max exclusively reclaims anonymous memory.
 
+	The source=zswap key skips ordinary memory reclaim and
+	writes back pages from zswap to the backing swap device until
+	the requested amount has been written or no further candidates
+	are found. This is useful to proactively offload cold compressed
+	data from the zswap pool to the swap device. It is only available
+	if zswap writeback is enabled. source=zswap cannot be
+	combined with swappiness; specifying both returns -EINVAL.
+
+	Example::
+
+	  # Writeback up to 10MB of compressed data from zswap to the backing swap
+	  echo "10M source=zswap" > memory.reclaim
+
   memory.peak
 	A read-write single value file which exists on non-root cgroups.
 
diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst
index 2464425c783d..b49b8c130389 100644
--- a/Documentation/admin-guide/mm/zswap.rst
+++ b/Documentation/admin-guide/mm/zswap.rst
@@ -131,7 +131,16 @@ User can enable it as follows::
   echo Y > /sys/module/zswap/parameters/shrinker_enabled
 
 This can be enabled at the boot time if ``CONFIG_ZSWAP_SHRINKER_DEFAULT_ON`` is
-selected.
+selected. Once enabled, the shrinker automatically writes back zswap pages to
+backing swap during memory reclaim.
+
+If users want to explicitly trigger proactive zswap writeback for a specific
+memory cgroup without invoking standard page reclaim, it can be done as follows::
+
+	echo "10M source=zswap" > /sys/fs/cgroup/<cgroup-name>/memory.reclaim
+
+Both of the methods mentioned above are subject to the ``memory.zswap.writeback``
+control. This means that ``memory.zswap.writeback`` can prevent all zswap writeback.
 
 A debugfs interface is provided for various statistic about pool size, number
 of pages stored, same-value filled pages and various counters for the reasons
diff --git a/include/linux/zswap.h b/include/linux/zswap.h
index 30c193a1207e..e5f217759894 100644
--- a/include/linux/zswap.h
+++ b/include/linux/zswap.h
@@ -35,6 +35,7 @@ void zswap_lruvec_state_init(struct lruvec *lruvec);
 void zswap_folio_swapin(struct folio *folio);
 bool zswap_is_enabled(void);
 bool zswap_never_enabled(void);
+int zswap_proactive_writeback(struct mem_cgroup *memcg, u64 bytes_to_writeback);
 #else
 
 struct zswap_lruvec_state {};
@@ -69,6 +70,12 @@ static inline bool zswap_never_enabled(void)
 	return true;
 }
 
+static inline int zswap_proactive_writeback(struct mem_cgroup *memcg,
+					    u64 bytes_to_writeback)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* _LINUX_ZSWAP_H */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 35c3bb15ae96..56ed7ff48ec9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -64,6 +64,7 @@
 
 #include <linux/swapops.h>
 #include <linux/sched/sysctl.h>
+#include <linux/zswap.h>
 
 #include "internal.h"
 #include "swap.h"
@@ -7855,11 +7856,13 @@ static unsigned long __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask,
 enum {
 	MEMORY_RECLAIM_SWAPPINESS = 0,
 	MEMORY_RECLAIM_SWAPPINESS_MAX,
+	MEMORY_RECLAIM_SOURCE,
 	MEMORY_RECLAIM_NULL,
 };
 static const match_table_t tokens = {
 	{ MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"},
 	{ MEMORY_RECLAIM_SWAPPINESS_MAX, "swappiness=max"},
+	{ MEMORY_RECLAIM_SOURCE, "source=%s"},
 	{ MEMORY_RECLAIM_NULL, NULL },
 };
 
@@ -7869,9 +7872,12 @@ int user_proactive_reclaim(char *buf,
 	unsigned int nr_retries = MAX_RECLAIM_RETRIES;
 	unsigned long nr_to_reclaim, nr_reclaimed = 0;
 	int swappiness = -1;
+	bool zswap_writeback_only = false;
 	char *old_buf, *start;
+	char source[16];
 	substring_t args[MAX_OPT_ARGS];
 	gfp_t gfp_mask = GFP_KERNEL;
+	u64 nr_bytes;
 
 	if (!buf || (!memcg && !pgdat) || (memcg && pgdat))
 		return -EINVAL;
@@ -7879,7 +7885,8 @@ int user_proactive_reclaim(char *buf,
 	buf = strstrip(buf);
 
 	old_buf = buf;
-	nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE;
+	nr_bytes = memparse(buf, &buf);
+	nr_to_reclaim = nr_bytes / PAGE_SIZE;
 	if (buf == old_buf)
 		return -EINVAL;
 
@@ -7899,11 +7906,26 @@ int user_proactive_reclaim(char *buf,
 		case MEMORY_RECLAIM_SWAPPINESS_MAX:
 			swappiness = SWAPPINESS_ANON_ONLY;
 			break;
+		case MEMORY_RECLAIM_SOURCE:
+			if (match_strlcpy(source, &args[0], sizeof(source)) >= sizeof(source))
+				return -EINVAL;
+			/* Only zswap is supported as a reclaim source for now. */
+			if (strcmp(source, "zswap"))
+				return -EINVAL;
+			zswap_writeback_only = true;
+			break;
 		default:
 			return -EINVAL;
 		}
 	}
 
+	if (zswap_writeback_only) {
+		/* source=zswap and swappiness are mutually exclusive. */
+		if (swappiness != -1)
+			return -EINVAL;
+		return zswap_proactive_writeback(memcg, nr_bytes);
+	}
+
 	while (nr_reclaimed < nr_to_reclaim) {
 		/* Will converge on zero, but reclaim enforces a minimum */
 		unsigned long batch_size = (nr_to_reclaim - nr_reclaimed) / 4;
diff --git a/mm/zswap.c b/mm/zswap.c
index ba01bf0e44e9..9cda96f05508 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1713,6 +1713,56 @@ int zswap_load(struct folio *folio)
 	return 0;
 }
 
+int zswap_proactive_writeback(struct mem_cgroup *memcg, u64 bytes_to_writeback)
+{
+	struct zswap_shrink_state s = {};
+	struct mem_cgroup *iter = NULL;
+	u64 bytes_written = 0;
+	int ret = 0;
+
+	if (!memcg)
+		return -EINVAL;
+	if (!mem_cgroup_zswap_writeback_enabled(memcg))
+		return -EINVAL;
+	if (!bytes_to_writeback)
+		return 0;
+
+	while (bytes_written < bytes_to_writeback) {
+		long shrunk;
+
+		cond_resched();
+
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			break;
+		}
+
+		/*
+		 * Use a local iterator to walk the memcg and its online descendants
+		 * in a round-robin manner. Upon exiting the loop, mem_cgroup_iter_break()
+		 * must be called to drop the iterator reference.
+		 */
+		do {
+			iter = mem_cgroup_iter(memcg, iter, NULL);
+		} while (iter && !mem_cgroup_tryget_online(iter));
+
+		shrunk = zswap_shrink_one_memcg(iter, &s);
+		if (shrunk > 0)
+			bytes_written += shrunk;
+
+		/* drop the extra reference taken by mem_cgroup_tryget_online() */
+		mem_cgroup_put(iter);
+
+		if (shrunk == -EBUSY) {
+			ret = -EAGAIN;
+			break;
+		}
+	}
+
+	mem_cgroup_iter_break(memcg, iter);
+	return ret;
+}
+
 void zswap_invalidate(swp_entry_t swp)
 {
 	pgoff_t offset = swp_offset(swp);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 5/6] mm/zswap: Add per-memcg stat for proactive writeback
  2026-06-29 11:20 [PATCH v5 0/6] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
                   ` (3 preceding siblings ...)
  2026-06-29 11:20 ` [PATCH v5 4/6] mm/zswap: Implement proactive writeback Hao Jia
@ 2026-06-29 11:20 ` Hao Jia
  2026-06-29 11:20 ` [PATCH v5 6/6] selftests/cgroup: Add tests for zswap " Hao Jia
  5 siblings, 0 replies; 15+ messages in thread
From: Hao Jia @ 2026-06-29 11:20 UTC (permalink / raw)
  To: akpm, tj, hannes, shakeel.butt, mhocko, yosry, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin
  Cc: linux-mm, linux-kernel, linux-doc, Hao Jia

From: Hao Jia <jiahao1@lixiang.com>

Add a new stat zswpwb_proactive_b to memory.stat. This counter is
incremented by entry->length during proactive writebacks triggered
via the source=zswap key in memory.reclaim. It tracks the
compressed size (in bytes) of pages proactively written back from
zswap to swap, allowing users to better monitor and tune the
proactive writeback mechanism.

Signed-off-by: Hao Jia <jiahao1@lixiang.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 4 ++++
 include/linux/memcontrol.h              | 1 +
 mm/memcontrol.c                         | 3 +++
 mm/zswap.c                              | 4 +++-
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index bbcc9695aa8d..e1f6a4729a65 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1748,6 +1748,10 @@ The following nested keys are defined.
 	  zswpwb
 		Number of pages written from zswap to swap.
 
+	  zswpwb_proactive_b
+		Bytes of compressed data proactively written back from
+		zswap to swap via the memory.reclaim source=zswap key.
+
 	  zswap_incomp
 		Number of incompressible pages currently stored in zswap
 		without compression. These pages could not be compressed to
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e1f46a0016fc..56580b264dc4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -40,6 +40,7 @@ enum memcg_stat_item {
 	MEMCG_ZSWAP_B,
 	MEMCG_ZSWAPPED,
 	MEMCG_ZSWAP_INCOMP,
+	MEMCG_ZSWPWB_PROACTIVE_B,
 	MEMCG_NR_STAT,
 };
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d20ffc827306..d81c34484bca 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -433,6 +433,7 @@ static const unsigned int memcg_stat_items[] = {
 	MEMCG_ZSWAP_B,
 	MEMCG_ZSWAPPED,
 	MEMCG_ZSWAP_INCOMP,
+	MEMCG_ZSWPWB_PROACTIVE_B,
 };
 
 #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
@@ -1558,6 +1559,7 @@ static const struct memory_stat memory_stats[] = {
 	{ "zswap",			MEMCG_ZSWAP_B			},
 	{ "zswapped",			MEMCG_ZSWAPPED			},
 	{ "zswap_incomp",		MEMCG_ZSWAP_INCOMP		},
+	{ "zswpwb_proactive_b",		MEMCG_ZSWPWB_PROACTIVE_B	},
 #endif
 	{ "file_mapped",		NR_FILE_MAPPED			},
 	{ "file_dirty",			NR_FILE_DIRTY			},
@@ -1614,6 +1616,7 @@ static int memcg_page_state_unit(int item)
 	switch (item) {
 	case MEMCG_PERCPU_B:
 	case MEMCG_ZSWAP_B:
+	case MEMCG_ZSWPWB_PROACTIVE_B:
 	case NR_SLAB_RECLAIMABLE_B:
 	case NR_SLAB_UNRECLAIMABLE_B:
 		return 1;
diff --git a/mm/zswap.c b/mm/zswap.c
index 9cda96f05508..d356c1739c68 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1747,8 +1747,10 @@ int zswap_proactive_writeback(struct mem_cgroup *memcg, u64 bytes_to_writeback)
 		} while (iter && !mem_cgroup_tryget_online(iter));
 
 		shrunk = zswap_shrink_one_memcg(iter, &s);
-		if (shrunk > 0)
+		if (shrunk > 0) {
 			bytes_written += shrunk;
+			mod_memcg_state(iter, MEMCG_ZSWPWB_PROACTIVE_B, shrunk);
+		}
 
 		/* drop the extra reference taken by mem_cgroup_tryget_online() */
 		mem_cgroup_put(iter);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 6/6] selftests/cgroup: Add tests for zswap proactive writeback
  2026-06-29 11:20 [PATCH v5 0/6] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
                   ` (4 preceding siblings ...)
  2026-06-29 11:20 ` [PATCH v5 5/6] mm/zswap: Add per-memcg stat for " Hao Jia
@ 2026-06-29 11:20 ` Hao Jia
  5 siblings, 0 replies; 15+ messages in thread
From: Hao Jia @ 2026-06-29 11:20 UTC (permalink / raw)
  To: akpm, tj, hannes, shakeel.butt, mhocko, yosry, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin
  Cc: linux-mm, linux-kernel, linux-doc, Hao Jia

From: Hao Jia <jiahao1@lixiang.com>

Add test_zswap_proactive_writeback() to cover the new memory.reclaim
"source=zswap" key. The test populates a memory cgroup zswap
pool, triggers proactive writeback, and verifies the behavior by
observing the change in zswpwb_proactive_b. Invalid input combinations
are also covered.

Extend test_zswap_writeback_one() to assert that the existing
non-proactive writeback path leaves zswpwb_proactive_b at zero.

Signed-off-by: Hao Jia <jiahao1@lixiang.com>
---
 tools/testing/selftests/cgroup/test_zswap.c | 161 +++++++++++++++++++-
 1 file changed, 160 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
index 49b36ee79160..dfd5f24b9d99 100644
--- a/tools/testing/selftests/cgroup/test_zswap.c
+++ b/tools/testing/selftests/cgroup/test_zswap.c
@@ -60,7 +60,12 @@ static int get_zswap_stored_pages(size_t *value)
 
 static long get_cg_wb_count(const char *cg)
 {
-	return cg_read_key_long(cg, "memory.stat", "zswpwb");
+	return cg_read_key_long(cg, "memory.stat", "zswpwb ");
+}
+
+static long get_cg_pwb_bytes(const char *cg)
+{
+	return cg_read_key_long(cg, "memory.stat", "zswpwb_proactive_b ");
 }
 
 static long get_zswpout(const char *cgroup)
@@ -355,6 +360,7 @@ static int attempt_writeback(const char *cgroup, void *arg)
 static int test_zswap_writeback_one(const char *cgroup, bool wb)
 {
 	long zswpwb_before, zswpwb_after;
+	long pwb_bytes;
 
 	zswpwb_before = get_cg_wb_count(cgroup);
 	if (zswpwb_before != 0) {
@@ -362,6 +368,12 @@ static int test_zswap_writeback_one(const char *cgroup, bool wb)
 		return -1;
 	}
 
+	pwb_bytes = get_cg_pwb_bytes(cgroup);
+	if (pwb_bytes > 0) {
+		ksft_print_msg("zswpwb_proactive_b_before = %ld instead of 0\n", pwb_bytes);
+		return -1;
+	}
+
 	if (cg_run(cgroup, attempt_writeback, (void *) &wb))
 		return -1;
 
@@ -379,6 +391,17 @@ static int test_zswap_writeback_one(const char *cgroup, bool wb)
 		return -1;
 	}
 
+	/*
+	 * attempt_writeback() does not use the proactive writeback path, so
+	 * zswpwb_proactive_b must stay at zero regardless of whether
+	 * writeback was enabled.
+	 */
+	pwb_bytes = get_cg_pwb_bytes(cgroup);
+	if (pwb_bytes > 0) {
+		ksft_print_msg("zswpwb_proactive_b_after is %ld, expected 0\n", pwb_bytes);
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -770,6 +793,141 @@ static int test_zswap_incompressible(const char *root)
 	return ret;
 }
 
+/*
+ * Trigger proactive zswap writeback with the following steps:
+ * 1. Allocate memory.
+ * 2. Push allocated memory into zswap.
+ * 3. Proactively write back zswap pages to swap
+ *    using "source=zswap".
+ */
+static int proactive_writeback_workload(const char *cgroup, void *arg)
+{
+	long pagesize = sysconf(_SC_PAGESIZE);
+	size_t memsize = pagesize * 1024;
+	char reclaim_cmd[64];
+	char buf[pagesize];
+	long zswap_usage;
+	int ret = -1;
+	int rc;
+	char *mem;
+
+	mem = (char *)malloc(memsize);
+	if (!mem)
+		return ret;
+
+	for (int i = 0; i < pagesize; i++)
+		buf[i] = i < pagesize / 2 ? (char)i : 0;
+	for (int i = 0; i < memsize; i += pagesize)
+		memcpy(&mem[i], buf, pagesize);
+
+	/* Evict allocated memory into zswap. */
+	if (cg_write_numeric(cgroup, "memory.reclaim", memsize)) {
+		ksft_print_msg("Failed to push pages into zswap\n");
+		goto out;
+	}
+
+	zswap_usage = cg_read_long(cgroup, "memory.zswap.current");
+	if (zswap_usage <= 0) {
+		ksft_print_msg("no zswap pool to write back\n");
+		goto out;
+	}
+
+	/* Trigger proactive zswap writeback. */
+	snprintf(reclaim_cmd, sizeof(reclaim_cmd), "%ld source=zswap", zswap_usage);
+	rc = cg_write(cgroup, "memory.reclaim", reclaim_cmd);
+	if (rc && rc != -EAGAIN) {
+		ksft_print_msg("proactive zswap writeback failed: %d\n", rc);
+		goto out;
+	}
+
+	ret = 0;
+out:
+	free(mem);
+	return ret;
+}
+
+static int check_writeback_invalid_inputs(const char *cgroup)
+{
+	static char * const bad_inputs[] = {
+		"source=zswap",
+		"1M source=zswap swappiness=60",
+		"1M swappiness=60 source=zswap",
+		"1M source=zswap swappiness=max",
+		"1M swappiness=max source=zswap",
+	};
+	int i, rc;
+
+	for (i = 0; i < ARRAY_SIZE(bad_inputs); i++) {
+		rc = cg_write(cgroup, "memory.reclaim", bad_inputs[i]);
+		if (rc != -EINVAL) {
+			ksft_print_msg("memory.reclaim '%s': returned %d, expected %d\n",
+				       bad_inputs[i], rc, -EINVAL);
+			return -1;
+		}
+	}
+	return 0;
+}
+
+static int test_zswap_proactive_writeback(const char *root)
+{
+	long wb_before, wb_after;
+	long pwb_b_before, pwb_b_after;
+	long wb_delta, pwb_b_delta;
+	int ret = KSFT_FAIL;
+	char *test_group;
+
+	if (cg_read_strcmp(root, "memory.zswap.writeback", "1"))
+		return KSFT_SKIP;
+
+	test_group = cg_name(root, "zswap_proactive_test");
+	if (!test_group)
+		return KSFT_FAIL;
+	if (cg_create(test_group))
+		goto out;
+	/*
+	 * A missing zswpwb_proactive_b stat means the kernel lacks proactive
+	 * writeback support, so skip rather than fail.
+	 */
+	if (get_cg_pwb_bytes(test_group) < 0) {
+		ret = KSFT_SKIP;
+		goto out;
+	}
+	if (check_writeback_invalid_inputs(test_group))
+		goto out;
+
+	pwb_b_before = get_cg_pwb_bytes(test_group);
+	wb_before = get_cg_wb_count(test_group);
+	if (pwb_b_before < 0 || wb_before < 0)
+		goto out;
+
+	if (cg_run(test_group, proactive_writeback_workload, NULL))
+		goto out;
+
+	pwb_b_after = get_cg_pwb_bytes(test_group);
+	wb_after = get_cg_wb_count(test_group);
+	if (pwb_b_after < 0 || wb_after < 0)
+		goto out;
+
+	pwb_b_delta = pwb_b_after - pwb_b_before;
+	wb_delta = wb_after - wb_before;
+
+	if (pwb_b_delta <= 0) {
+		ksft_print_msg("zswpwb_proactive_b did not increase: delta=%ld\n",
+			       pwb_b_delta);
+		goto out;
+	}
+	if (wb_delta <= 0) {
+		ksft_print_msg("zswpwb did not increase: delta=%ld\n", wb_delta);
+		goto out;
+	}
+
+	ret = KSFT_PASS;
+out:
+	cg_destroy(test_group);
+	free(test_group);
+	return ret;
+}
+
 #define T(x) { x, #x }
 struct zswap_test {
 	int (*fn)(const char *root);
@@ -783,6 +941,7 @@ struct zswap_test {
 	T(test_no_kmem_bypass),
 	T(test_no_invasive_cgroup_shrink),
 	T(test_zswap_incompressible),
+	T(test_zswap_proactive_writeback),
 };
 #undef T
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/6] mm/zswap: Fix global shrinker when memory cgroup is disabled
  2026-06-29 11:20 ` [PATCH v5 1/6] mm/zswap: Fix global shrinker when memory cgroup is disabled Hao Jia
@ 2026-06-29 18:37   ` Nhat Pham
  2026-06-30 10:51     ` Hao Jia
  0 siblings, 1 reply; 15+ messages in thread
From: Nhat Pham @ 2026-06-29 18:37 UTC (permalink / raw)
  To: Hao Jia
  Cc: akpm, tj, hannes, shakeel.butt, mhocko, yosry, mkoutny,
	chengming.zhou, muchun.song, roman.gushchin, linux-mm,
	linux-kernel, linux-doc, Hao Jia, stable

On Mon, Jun 29, 2026 at 4:20 AM Hao Jia <jiahao.kernel@gmail.com> wrote:
>
> From: Hao Jia <jiahao1@lixiang.com>
>
> When memory cgroup is disabled, mem_cgroup_iter() always returns NULL.
> Therefore, the global shrinker shrink_worker() always takes the !memcg
> branch. After MAX_RECLAIM_RETRIES empty walks, the worker simply gives up,
> so it fails to write back anything.
>
> Therefore, when memory cgroup is disabled, fall through with the !memcg
> branch and shrink the root memcg directly. Stop the loop once
> shrink_memcg() reports -ENOENT, since the root LRU is the only target and
> -ENOENT means it has been exhausted.
>
> Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
> Cc: stable@vger.kernel.org
> Reported-by: Yosry Ahmed <yosry@kernel.org>
> Closes: https://lore.kernel.org/all/CAO9r8zPVzMKFbCixxD-qgtRrkFxWVrHiZZeLc=eyTPKPVQgX4g@mail.gmail.com
> Signed-off-by: Hao Jia <jiahao1@lixiang.com>

Ah good catch.



> ---
>  mm/zswap.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 761cd699e0a3..0f8f04f22888 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1356,7 +1356,12 @@ static void shrink_worker(struct work_struct *w)
>                 } while (memcg && !mem_cgroup_tryget_online(memcg));
>                 spin_unlock(&zswap_shrink_lock);
>
> -               if (!memcg) {
> +               /*
> +                * Reaching a NULL memcg means a full hierarchy pass completed.
> +                * Exclude the memcg-disabled case, where it is always NULL, and
> +                * fall through to shrink the root LRU directly.
> +                */
> +               if (!memcg && !mem_cgroup_disabled()) {
>                         /*
>                          * Continue shrinking without incrementing failures if
>                          * we found candidate memcgs in the last tree walk.

nit: I wonder if we can just merge this comment with the new comment
you just added.

> @@ -1378,8 +1383,15 @@ static void shrink_worker(struct work_struct *w)
>                  * with pages in zswap. Skip this without incrementing attempts
>                  * and failures.
>                  */
> -               if (ret == -ENOENT)
> +               if (ret == -ENOENT) {
> +                       /*
> +                        * With memcg disabled the root LRU is the only target, so
> +                        * we should abort if it has no writeback-candidate pages.
> +                        */
> +                       if (mem_cgroup_disabled())
> +                               break;

Hmm do we need to do this? Consider a system with cgroup enabled but
with just one cgroup (root?). The behavior would just be trying that
cgroup for MAX_RECLAIM_RETRIES failure attempts, correct?

In that case, we don't need to do this check, and we would get the
same behavior. The loop would terminate after MAX_RECLAIM_RETRIES :)

Could you fact-check me? :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/6] mm/zswap: Implement proactive writeback
  2026-06-29 11:20 ` [PATCH v5 4/6] mm/zswap: Implement proactive writeback Hao Jia
@ 2026-06-30  0:15   ` Yosry Ahmed
  2026-06-30  1:49     ` Hao Jia
  0 siblings, 1 reply; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-30  0:15 UTC (permalink / raw)
  To: Hao Jia
  Cc: akpm, tj, hannes, shakeel.butt, mhocko, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin, linux-mm,
	linux-kernel, linux-doc, Hao Jia

On Mon, Jun 29, 2026 at 07:20:30PM +0800, Hao Jia wrote:
> From: Hao Jia <jiahao1@lixiang.com>
> 
> Zswap currently writes back pages to backing swap reactively, triggered
> either by the shrinker or when the pool reaches its size limit. There is
> no mechanism to control the amount of writeback for a specific memory
> cgroup. However, users may want to proactively write back zswap pages,
> e.g., to free up memory for other applications or to prepare for
> memory-intensive workloads.
> 
> Introduce a "source=" key to the memory.reclaim cgroup interface,
> currently accepting the single value "zswap". When set to "zswap", it
> bypasses standard memory reclaim and exclusively performs proactive
> zswap writeback up to the requested budget. If omitted, the default
> reclaim behavior remains unchanged.
> 
> Example usage:
>   # Write back 10MB of compressed data from zswap to the backing swap
>   echo "10M source=zswap" > memory.reclaim
> 
> Note that the actual amount of compressed data written back may be less
> than requested due to the zswap second-chance algorithm: referenced
> entries are rotated on the LRU on the first encounter and only written
> back on a second pass. If fewer bytes are written back than requested,
> -EAGAIN is returned, matching the existing memory.reclaim semantics.
> 
> Internally, extend user_proactive_reclaim() to parse the new "source="
> key and invoke the dedicated handler zswap_proactive_writeback() when it
> is set to "zswap". This handler walks the target memcg subtree in a
> round-robin fashion and drains each memcg's per-node zswap LRUs through
> shrink_memcg(), accumulating the compressed bytes written back until the
> requested budget is met.
> 
> Suggested-by: Yosry Ahmed <yosry@kernel.org>
> Suggested-by: Nhat Pham <nphamcs@gmail.com>
> Signed-off-by: Hao Jia <jiahao1@lixiang.com>
> ---

Before going through more versions we need to figure out if this will
pivot to be a proactive demotion interfcae for swap tiering.

> @@ -7869,9 +7872,12 @@ int user_proactive_reclaim(char *buf,
>  	unsigned int nr_retries = MAX_RECLAIM_RETRIES;
>  	unsigned long nr_to_reclaim, nr_reclaimed = 0;
>  	int swappiness = -1;
> +	bool zswap_writeback_only = false;
>  	char *old_buf, *start;
> +	char source[16];
>  	substring_t args[MAX_OPT_ARGS];
>  	gfp_t gfp_mask = GFP_KERNEL;
> +	u64 nr_bytes;
>  
>  	if (!buf || (!memcg && !pgdat) || (memcg && pgdat))
>  		return -EINVAL;
> @@ -7879,7 +7885,8 @@ int user_proactive_reclaim(char *buf,
>  	buf = strstrip(buf);
>  
>  	old_buf = buf;
> -	nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE;
> +	nr_bytes = memparse(buf, &buf);
> +	nr_to_reclaim = nr_bytes / PAGE_SIZE;

Nit: if we keep this as part of memory.reclaim, we probably want to
choose clearer names (e.g. pages_to_reclaim and bytes_to_reclaim).

>  	if (buf == old_buf)
>  		return -EINVAL;
>  
> @@ -7899,11 +7906,26 @@ int user_proactive_reclaim(char *buf,
>  		case MEMORY_RECLAIM_SWAPPINESS_MAX:
>  			swappiness = SWAPPINESS_ANON_ONLY;
>  			break;
> +		case MEMORY_RECLAIM_SOURCE:
> +			if (match_strlcpy(source, &args[0], sizeof(source)) >= sizeof(source))
> +				return -EINVAL;
> +			/* Only zswap is supported as a reclaim source for now. */
> +			if (strcmp(source, "zswap"))
> +				return -EINVAL;
> +			zswap_writeback_only = true;
> +			break;
>  		default:
>  			return -EINVAL;
>  		}
>  	}
>  
> +	if (zswap_writeback_only) {
> +		/* source=zswap and swappiness are mutually exclusive. */
> +		if (swappiness != -1)
> +			return -EINVAL;
> +		return zswap_proactive_writeback(memcg, nr_bytes);
> +	}
> +
>  	while (nr_reclaimed < nr_to_reclaim) {
>  		/* Will converge on zero, but reclaim enforces a minimum */
>  		unsigned long batch_size = (nr_to_reclaim - nr_reclaimed) / 4;
> diff --git a/mm/zswap.c b/mm/zswap.c
> index ba01bf0e44e9..9cda96f05508 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1713,6 +1713,56 @@ int zswap_load(struct folio *folio)
>  	return 0;
>  }
>  
> +int zswap_proactive_writeback(struct mem_cgroup *memcg, u64 bytes_to_writeback)
> +{
> +	struct zswap_shrink_state s = {};
> +	struct mem_cgroup *iter = NULL;
> +	u64 bytes_written = 0;
> +	int ret = 0;
> +
> +	if (!memcg)
> +		return -EINVAL;

Can this ever happen? It would be a bug in the caller.

> +	if (!mem_cgroup_zswap_writeback_enabled(memcg))
> +		return -EINVAL;
> +	if (!bytes_to_writeback)
> +		return 0;

Do we need this? I think the loop will just never enter and
mem_cgroup_iter_break() will do nothing.

> +
> +	while (bytes_written < bytes_to_writeback) {
> +		long shrunk;
> +
> +		cond_resched();
> +
> +		if (signal_pending(current)) {
> +			ret = -EINTR;
> +			break;
> +		}
> +
> +		/*
> +		 * Use a local iterator to walk the memcg and its online descendants
> +		 * in a round-robin manner. Upon exiting the loop, mem_cgroup_iter_break()
> +		 * must be called to drop the iterator reference.
> +		 */
> +		do {
> +			iter = mem_cgroup_iter(memcg, iter, NULL);
> +		} while (iter && !mem_cgroup_tryget_online(iter));
> +
> +		shrunk = zswap_shrink_one_memcg(iter, &s);
> +		if (shrunk > 0)
> +			bytes_written += shrunk;
> +
> +		/* drop the extra reference taken by mem_cgroup_tryget_online() */
> +		mem_cgroup_put(iter);


Can we just use mem_cgroup_online() instead since mem_cgroup_iter()
already graps a ref?

> +
> +		if (shrunk == -EBUSY) {
> +			ret = -EAGAIN;
> +			break;
> +		}
> +	}
> +
> +	mem_cgroup_iter_break(memcg, iter);
> +	return ret;
> +}
> +
>  void zswap_invalidate(swp_entry_t swp)
>  {
>  	pgoff_t offset = swp_offset(swp);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 2/6] mm/zswap: Support batch writeback in shrink_memcg()
  2026-06-29 11:20 ` [PATCH v5 2/6] mm/zswap: Support batch writeback in shrink_memcg() Hao Jia
@ 2026-06-30  0:21   ` Yosry Ahmed
  2026-06-30  1:18     ` Hao Jia
  0 siblings, 1 reply; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-30  0:21 UTC (permalink / raw)
  To: Hao Jia
  Cc: akpm, tj, hannes, shakeel.butt, mhocko, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin, linux-mm,
	linux-kernel, linux-doc, Hao Jia

On Mon, Jun 29, 2026 at 07:20:28PM +0800, Hao Jia wrote:
> From: Hao Jia <jiahao1@lixiang.com>
> 
> Currently, shrink_memcg() writes back at most one entry per-node during
> its traversal. This makes shrink_worker() inefficient, as it must
> repeatedly re-enter shrink_memcg() to make any substantial progress.
> 
> To address this, extend shrink_memcg() and rewrite its LRU iteration logic
> to support batch writeback. Introduce the nr_to_scan parameter to bound how
> many pages are scanned per call. This enables batch writeback in the
> shrink_worker() path, while maintaining a low scan budget in the
> zswap_store() path.
> 
> Additionally, to prepare for future proactive writeback, update the return
> value semantics of shrink_memcg(): a positive value now represents the
> actual number of compressed bytes written back, 0 indicates that candidates
> existed but no writeback succeeded, and a negative value represents an
> error code.
> 
> Suggested-by: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Hao Jia <jiahao1@lixiang.com>
> ---
>  mm/zswap.c | 89 ++++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 69 insertions(+), 20 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 0f8f04f22888..e2c2a3f1e061 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -160,6 +160,11 @@ struct zswap_pool {
>  	char tfm_name[CRYPTO_MAX_ALG_NAME];
>  };
>  
> +struct zswap_shrink_walk_arg {
> +	unsigned long bytes_written;
> +	bool encountered_page_in_swapcache;
> +};
> +
>  /* Global LRU lists shared by all zswap pools. */
>  static struct list_lru zswap_list_lru;
>  
> @@ -1089,8 +1094,9 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
>  				       void *arg)
>  {
>  	struct zswap_entry *entry = container_of(item, struct zswap_entry, lru);
> -	bool *encountered_page_in_swapcache = (bool *)arg;
> +	struct zswap_shrink_walk_arg *walk_arg = arg;
>  	swp_entry_t swpentry;
> +	unsigned int length;
>  	enum lru_status ret = LRU_REMOVED_RETRY;
>  	int writeback_result;
>  
> @@ -1133,10 +1139,11 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
>  
>  	/*
>  	 * Once the lru lock is dropped, the entry might get freed. The
> -	 * swpentry is copied to the stack, and entry isn't deref'd again
> -	 * until the entry is verified to still be alive in the tree.
> +	 * needed fields are copied to the stack, and entry isn't deref'd
> +	 * again until it is verified to still be alive in the tree.
>  	 */
>  	swpentry = entry->swpentry;
> +	length = entry->length;
>  
>  	/*
>  	 * It's safe to drop the lock here because we return either
> @@ -1155,12 +1162,13 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
>  		 * into the warmer region. We should terminate shrinking (if we're in the dynamic
>  		 * shrinker context).
>  		 */
> -		if (writeback_result == -EEXIST && encountered_page_in_swapcache) {
> +		if (writeback_result == -EEXIST) {
>  			ret = LRU_STOP;
> -			*encountered_page_in_swapcache = true;
> +			walk_arg->encountered_page_in_swapcache = true;
>  		}
>  	} else {
>  		zswap_written_back_pages++;
> +		walk_arg->bytes_written += length;
>  	}
>  
>  	return ret;
> @@ -1169,8 +1177,11 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
>  static unsigned long zswap_shrinker_scan(struct shrinker *shrinker,
>  		struct shrink_control *sc)
>  {
> +	struct zswap_shrink_walk_arg walk_arg = {
> +		.bytes_written = 0,
> +		.encountered_page_in_swapcache = false,
> +	};
>  	unsigned long shrink_ret;
> -	bool encountered_page_in_swapcache = false;
>  
>  	if (!zswap_shrinker_enabled ||
>  			!mem_cgroup_zswap_writeback_enabled(sc->memcg)) {
> @@ -1179,9 +1190,9 @@ static unsigned long zswap_shrinker_scan(struct shrinker *shrinker,
>  	}
>  
>  	shrink_ret = list_lru_shrink_walk(&zswap_list_lru, sc, &shrink_memcg_cb,
> -		&encountered_page_in_swapcache);
> +		&walk_arg);
>  
> -	if (encountered_page_in_swapcache)
> +	if (walk_arg.encountered_page_in_swapcache)
>  		return SHRINK_STOP;
>  
>  	return shrink_ret ? shrink_ret : SHRINK_STOP;
> @@ -1275,9 +1286,31 @@ static struct shrinker *zswap_alloc_shrinker(void)
>  	return shrinker;
>  }
>  
> -static int shrink_memcg(struct mem_cgroup *memcg)
> +#define NR_ZSWAP_WB_BATCH	64UL
> +
> +/*
> + * Scan up to @nr_to_scan pages across the per-node zswap LRUs of @memcg
> + * and write back the reclaimable ones.
> + *
> + * Since the second-chance algorithm rotates referenced entries to the
> + * LRU tail, the per-node scan is capped at the current LRU length so
> + * each entry is scanned at most once per call. It is up to the caller
> + * to handle retries, deciding whether to scan another memcg to complete
> + * the full iteration, or to rescan the current memcg to drain its zswap
> + * entries.
> + *
> + * Return: The number of compressed bytes written back (>= 0), or -ENOENT
> + * if @memcg has writeback disabled, is a zombie cgroup, or has empty
> + * zswap LRUs.
> + */
> +static long shrink_memcg(struct mem_cgroup *memcg, unsigned long nr_to_scan)
>  {
> -	int nid, shrunk = 0, scanned = 0;
> +	struct zswap_shrink_walk_arg walk_arg = {
> +		.bytes_written = 0,
> +		.encountered_page_in_swapcache = false,
> +	};
> +	unsigned long nr_remaining = nr_to_scan;
> +	int nid;
>  
>  	if (!mem_cgroup_zswap_writeback_enabled(memcg))
>  		return -ENOENT;
> @@ -1290,24 +1323,40 @@ static int shrink_memcg(struct mem_cgroup *memcg)
>  		return -ENOENT;
>  
>  	for_each_node_state(nid, N_NORMAL_MEMORY) {
> -		unsigned long nr_to_walk = 1;
> +		unsigned long nr_to_walk;
>  
> -		shrunk += list_lru_walk_one(&zswap_list_lru, nid, memcg,
> -					    &shrink_memcg_cb, NULL, &nr_to_walk);
> -		scanned += 1 - nr_to_walk;
> +		/*
> +		 * Cap the scan at per-node LRU length so each entry is scanned
> +		 * at most once per call.
> +		 */
> +		nr_to_walk = min(nr_remaining,
> +				 list_lru_count_one(&zswap_list_lru, nid, memcg));
> +		if (!nr_to_walk)
> +			continue;
> +
> +		nr_remaining -= nr_to_walk;
> +		list_lru_walk_one(&zswap_list_lru, nid, memcg, &shrink_memcg_cb,
> +				  &walk_arg, &nr_to_walk);
> +		/* Return the unused share of the budget to the pool. */
> +		nr_remaining += nr_to_walk;
> +
> +		if (!nr_remaining)
> +			break;
>  	}
>  
> -	if (!scanned)
> +	/* Nothing was scanned: every LRU under @memcg was empty. */
> +	if (nr_remaining == nr_to_scan)
>  		return -ENOENT;
>  
> -	return shrunk ? 0 : -EAGAIN;
> +	return walk_arg.bytes_written;
>  }
>  
>  static void shrink_worker(struct work_struct *w)
>  {
>  	struct mem_cgroup *memcg;
> -	int ret, failures = 0, attempts = 0;
> +	int failures = 0, attempts = 0;
>  	unsigned long thr;
> +	long ret;
>  
>  	/* Reclaim down to the accept threshold */
>  	thr = zswap_accept_thr_pages();
> @@ -1373,7 +1422,7 @@ static void shrink_worker(struct work_struct *w)
>  			goto resched;
>  		}
>  
> -		ret = shrink_memcg(memcg);
> +		ret = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);
>  		/* drop the extra reference */
>  		mem_cgroup_put(memcg);
>  
> @@ -1394,7 +1443,7 @@ static void shrink_worker(struct work_struct *w)
>  		}
>  		++attempts;
>  
> -		if (ret && ++failures == MAX_RECLAIM_RETRIES)
> +		if (ret <= 0 && ++failures == MAX_RECLAIM_RETRIES)
>  			break;
>  resched:
>  		cond_resched();
> @@ -1504,7 +1553,7 @@ bool zswap_store(struct folio *folio)
>  	objcg = get_obj_cgroup_from_folio(folio);
>  	if (objcg && !obj_cgroup_may_zswap(objcg)) {
>  		memcg = get_mem_cgroup_from_objcg(objcg);
> -		if (shrink_memcg(memcg)) {
> +		if (shrink_memcg(memcg, num_node_state(N_NORMAL_MEMORY)) <= 0) {

Why not just 1? 

I guess the current behavior will try each node. But this doesn't really
match it, as we may reclaim everything from the first node. Right?

I think it's probably fine to just do 1 here, fairness is not the main
concern in this code path, we're really just trying to free up some
space to free up some space for the incoming page. I doubt these limits
are actually being used extensively anyway, so we can revisit this later
if needed.

>  			mem_cgroup_put(memcg);
>  			goto put_objcg;
>  		}
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 2/6] mm/zswap: Support batch writeback in shrink_memcg()
  2026-06-30  0:21   ` Yosry Ahmed
@ 2026-06-30  1:18     ` Hao Jia
  0 siblings, 0 replies; 15+ messages in thread
From: Hao Jia @ 2026-06-30  1:18 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: akpm, tj, hannes, shakeel.butt, mhocko, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin, linux-mm,
	linux-kernel, linux-doc, Hao Jia



On 2026/6/30 08:21, Yosry Ahmed wrote:
> On Mon, Jun 29, 2026 at 07:20:28PM +0800, Hao Jia wrote:

>>   
>>   static void shrink_worker(struct work_struct *w)
>>   {
>>   	struct mem_cgroup *memcg;
>> -	int ret, failures = 0, attempts = 0;
>> +	int failures = 0, attempts = 0;
>>   	unsigned long thr;
>> +	long ret;
>>   
>>   	/* Reclaim down to the accept threshold */
>>   	thr = zswap_accept_thr_pages();
>> @@ -1373,7 +1422,7 @@ static void shrink_worker(struct work_struct *w)
>>   			goto resched;
>>   		}
>>   
>> -		ret = shrink_memcg(memcg);
>> +		ret = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);
>>   		/* drop the extra reference */
>>   		mem_cgroup_put(memcg);
>>   
>> @@ -1394,7 +1443,7 @@ static void shrink_worker(struct work_struct *w)
>>   		}
>>   		++attempts;
>>   
>> -		if (ret && ++failures == MAX_RECLAIM_RETRIES)
>> +		if (ret <= 0 && ++failures == MAX_RECLAIM_RETRIES)
>>   			break;
>>   resched:
>>   		cond_resched();
>> @@ -1504,7 +1553,7 @@ bool zswap_store(struct folio *folio)
>>   	objcg = get_obj_cgroup_from_folio(folio);
>>   	if (objcg && !obj_cgroup_may_zswap(objcg)) {
>>   		memcg = get_mem_cgroup_from_objcg(objcg);
>> -		if (shrink_memcg(memcg)) {
>> +		if (shrink_memcg(memcg, num_node_state(N_NORMAL_MEMORY)) <= 0) {
> 
> Why not just 1?
> 
> I guess the current behavior will try each node. But this doesn't really
> match it, as we may reclaim everything from the first node. Right?
> 

Yes, it just keeps the number the same as before, but the behavior is 
not exactly the same.

> I think it's probably fine to just do 1 here, fairness is not the main
> concern in this code path, we're really just trying to free up some
> space to free up some space for the incoming page. I doubt these limits
> are actually being used extensively anyway, so we can revisit this later
> if needed.

Okay, I'll do this in the next version.

Thanks,
Hao
> 
>>   			mem_cgroup_put(memcg);
>>   			goto put_objcg;
>>   		}
>> -- 
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/6] mm/zswap: Implement proactive writeback
  2026-06-30  0:15   ` Yosry Ahmed
@ 2026-06-30  1:49     ` Hao Jia
  2026-06-30 16:10       ` Yosry Ahmed
  0 siblings, 1 reply; 15+ messages in thread
From: Hao Jia @ 2026-06-30  1:49 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: akpm, tj, hannes, shakeel.butt, mhocko, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin, linux-mm,
	linux-kernel, linux-doc, Hao Jia



On 2026/6/30 08:15, Yosry Ahmed wrote:
> On Mon, Jun 29, 2026 at 07:20:30PM +0800, Hao Jia wrote:
>> From: Hao Jia <jiahao1@lixiang.com>
>>
>> Zswap currently writes back pages to backing swap reactively, triggered
>> either by the shrinker or when the pool reaches its size limit. There is
>> no mechanism to control the amount of writeback for a specific memory
>> cgroup. However, users may want to proactively write back zswap pages,
>> e.g., to free up memory for other applications or to prepare for
>> memory-intensive workloads.
>>
>> Introduce a "source=" key to the memory.reclaim cgroup interface,
>> currently accepting the single value "zswap". When set to "zswap", it
>> bypasses standard memory reclaim and exclusively performs proactive
>> zswap writeback up to the requested budget. If omitted, the default
>> reclaim behavior remains unchanged.
>>
>> Example usage:
>>    # Write back 10MB of compressed data from zswap to the backing swap
>>    echo "10M source=zswap" > memory.reclaim
>>
>> Note that the actual amount of compressed data written back may be less
>> than requested due to the zswap second-chance algorithm: referenced
>> entries are rotated on the LRU on the first encounter and only written
>> back on a second pass. If fewer bytes are written back than requested,
>> -EAGAIN is returned, matching the existing memory.reclaim semantics.
>>
>> Internally, extend user_proactive_reclaim() to parse the new "source="
>> key and invoke the dedicated handler zswap_proactive_writeback() when it
>> is set to "zswap". This handler walks the target memcg subtree in a
>> round-robin fashion and drains each memcg's per-node zswap LRUs through
>> shrink_memcg(), accumulating the compressed bytes written back until the
>> requested budget is met.
>>
>> Suggested-by: Yosry Ahmed <yosry@kernel.org>
>> Suggested-by: Nhat Pham <nphamcs@gmail.com>
>> Signed-off-by: Hao Jia <jiahao1@lixiang.com>
>> ---
> 
> Before going through more versions we need to figure out if this will
> pivot to be a proactive demotion interfcae for swap tiering.
> 

Yes. Should I drop patches 4-6 in the next version and wait for swap 
tiering to be finalized?
We can try to get the non-memcg parts (patches 1-3) merged upstream 
first. This would also give them plenty of time to bake and catch any 
potential regressions. Thoughts?


>> @@ -7869,9 +7872,12 @@ int user_proactive_reclaim(char *buf,
>>   	unsigned int nr_retries = MAX_RECLAIM_RETRIES;
>>   	unsigned long nr_to_reclaim, nr_reclaimed = 0;
>>   	int swappiness = -1;
>> +	bool zswap_writeback_only = false;
>>   	char *old_buf, *start;
>> +	char source[16];
>>   	substring_t args[MAX_OPT_ARGS];
>>   	gfp_t gfp_mask = GFP_KERNEL;
>> +	u64 nr_bytes;
>>   
>>   	if (!buf || (!memcg && !pgdat) || (memcg && pgdat))
>>   		return -EINVAL;
>> @@ -7879,7 +7885,8 @@ int user_proactive_reclaim(char *buf,
>>   	buf = strstrip(buf);
>>   
>>   	old_buf = buf;
>> -	nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE;
>> +	nr_bytes = memparse(buf, &buf);
>> +	nr_to_reclaim = nr_bytes / PAGE_SIZE;
> 
> Nit: if we keep this as part of memory.reclaim, we probably want to
> choose clearer names (e.g. pages_to_reclaim and bytes_to_reclaim).

Will do.
> 
>>   	if (buf == old_buf)
>>   		return -EINVAL;
>>   
>> @@ -7899,11 +7906,26 @@ int user_proactive_reclaim(char *buf,
>>   		case MEMORY_RECLAIM_SWAPPINESS_MAX:
>>   			swappiness = SWAPPINESS_ANON_ONLY;
>>   			break;
>> +		case MEMORY_RECLAIM_SOURCE:
>> +			if (match_strlcpy(source, &args[0], sizeof(source)) >= sizeof(source))
>> +				return -EINVAL;
>> +			/* Only zswap is supported as a reclaim source for now. */
>> +			if (strcmp(source, "zswap"))
>> +				return -EINVAL;
>> +			zswap_writeback_only = true;
>> +			break;
>>   		default:
>>   			return -EINVAL;
>>   		}
>>   	}
>>   
>> +	if (zswap_writeback_only) {
>> +		/* source=zswap and swappiness are mutually exclusive. */
>> +		if (swappiness != -1)
>> +			return -EINVAL;
>> +		return zswap_proactive_writeback(memcg, nr_bytes);
>> +	}
>> +
>>   	while (nr_reclaimed < nr_to_reclaim) {
>>   		/* Will converge on zero, but reclaim enforces a minimum */
>>   		unsigned long batch_size = (nr_to_reclaim - nr_reclaimed) / 4;
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index ba01bf0e44e9..9cda96f05508 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -1713,6 +1713,56 @@ int zswap_load(struct folio *folio)
>>   	return 0;
>>   }
>>   
>> +int zswap_proactive_writeback(struct mem_cgroup *memcg, u64 bytes_to_writeback)
>> +{
>> +	struct zswap_shrink_state s = {};
>> +	struct mem_cgroup *iter = NULL;
>> +	u64 bytes_written = 0;
>> +	int ret = 0;
>> +
>> +	if (!memcg)
>> +		return -EINVAL;
> 
> Can this ever happen? It would be a bug in the caller.

IIRC,Writing the following to the NUMA node sysfs entry triggers this 
check:
echo "10M source=zswap" > /sys/devices/system/node/nodeN/reclaim

> 
>> +	if (!mem_cgroup_zswap_writeback_enabled(memcg))
>> +		return -EINVAL;
>> +	if (!bytes_to_writeback)
>> +		return 0;
> 
> Do we need this? I think the loop will just never enter and
> mem_cgroup_iter_break() will do nothing.

Will do.
> 
>> +
>> +	while (bytes_written < bytes_to_writeback) {
>> +		long shrunk;
>> +
>> +		cond_resched();
>> +
>> +		if (signal_pending(current)) {
>> +			ret = -EINTR;
>> +			break;
>> +		}
>> +
>> +		/*
>> +		 * Use a local iterator to walk the memcg and its online descendants
>> +		 * in a round-robin manner. Upon exiting the loop, mem_cgroup_iter_break()
>> +		 * must be called to drop the iterator reference.
>> +		 */
>> +		do {
>> +			iter = mem_cgroup_iter(memcg, iter, NULL);
>> +		} while (iter && !mem_cgroup_tryget_online(iter));
>> +
>> +		shrunk = zswap_shrink_one_memcg(iter, &s);
>> +		if (shrunk > 0)
>> +			bytes_written += shrunk;
>> +
>> +		/* drop the extra reference taken by mem_cgroup_tryget_online() */
>> +		mem_cgroup_put(iter);
> 
> 
> Can we just use mem_cgroup_online() instead since mem_cgroup_iter()
> already graps a ref?
> 
Will do.

Thanks,
Hao

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/6] mm/zswap: Fix global shrinker when memory cgroup is disabled
  2026-06-29 18:37   ` Nhat Pham
@ 2026-06-30 10:51     ` Hao Jia
  2026-06-30 16:02       ` Yosry Ahmed
  0 siblings, 1 reply; 15+ messages in thread
From: Hao Jia @ 2026-06-30 10:51 UTC (permalink / raw)
  To: Nhat Pham, yosry
  Cc: akpm, tj, hannes, shakeel.butt, mhocko, mkoutny, chengming.zhou,
	muchun.song, roman.gushchin, linux-mm, linux-kernel, linux-doc,
	Hao Jia, stable



On 2026/6/30 02:37, Nhat Pham wrote:
> On Mon, Jun 29, 2026 at 4:20 AM Hao Jia <jiahao.kernel@gmail.com> wrote:
>>
>> From: Hao Jia <jiahao1@lixiang.com>
>>
>> When memory cgroup is disabled, mem_cgroup_iter() always returns NULL.
>> Therefore, the global shrinker shrink_worker() always takes the !memcg
>> branch. After MAX_RECLAIM_RETRIES empty walks, the worker simply gives up,
>> so it fails to write back anything.
>>
>> Therefore, when memory cgroup is disabled, fall through with the !memcg
>> branch and shrink the root memcg directly. Stop the loop once
>> shrink_memcg() reports -ENOENT, since the root LRU is the only target and
>> -ENOENT means it has been exhausted.
>>
>> Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
>> Cc: stable@vger.kernel.org
>> Reported-by: Yosry Ahmed <yosry@kernel.org>
>> Closes: https://lore.kernel.org/all/CAO9r8zPVzMKFbCixxD-qgtRrkFxWVrHiZZeLc=eyTPKPVQgX4g@mail.gmail.com
>> Signed-off-by: Hao Jia <jiahao1@lixiang.com>
> 
> Ah good catch.
> 
> 
> 
>> ---
>>   mm/zswap.c | 16 ++++++++++++++--
>>   1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index 761cd699e0a3..0f8f04f22888 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -1356,7 +1356,12 @@ static void shrink_worker(struct work_struct *w)
>>                  } while (memcg && !mem_cgroup_tryget_online(memcg));
>>                  spin_unlock(&zswap_shrink_lock);
>>
>> -               if (!memcg) {
>> +               /*
>> +                * Reaching a NULL memcg means a full hierarchy pass completed.
>> +                * Exclude the memcg-disabled case, where it is always NULL, and
>> +                * fall through to shrink the root LRU directly.
>> +                */
>> +               if (!memcg && !mem_cgroup_disabled()) {
>>                          /*
>>                           * Continue shrinking without incrementing failures if
>>                           * we found candidate memcgs in the last tree walk.
> 
> nit: I wonder if we can just merge this comment with the new comment
> you just added.

Updated. Please see below.

> 
>> @@ -1378,8 +1383,15 @@ static void shrink_worker(struct work_struct *w)
>>                   * with pages in zswap. Skip this without incrementing attempts
>>                   * and failures.
>>                   */
>> -               if (ret == -ENOENT)
>> +               if (ret == -ENOENT) {
>> +                       /*
>> +                        * With memcg disabled the root LRU is the only target, so
>> +                        * we should abort if it has no writeback-candidate pages.
>> +                        */
>> +                       if (mem_cgroup_disabled())
>> +                               break;
> 
> Hmm do we need to do this? Consider a system with cgroup enabled but
> with just one cgroup (root?). The behavior would just be trying that
> cgroup for MAX_RECLAIM_RETRIES failure attempts, correct?
> 
> In that case, we don't need to do this check, and we would get the
> same behavior. The loop would terminate after MAX_RECLAIM_RETRIES :)
> 
> Could you fact-check me? :)

Exactly. When memcg is disabled, shrink_memcg() returns -ENOENT only if 
the root LRU is empty. An empty root LRU implies that the total pages 
have already dropped below the threshold (thr). At this point, the loop 
safely terminates because of the zswap_total_pages() <= thr check. In 
all other cases (where shrink_memcg() returns anything other than 
-ENOENT), the loop will eventually exit either by hitting the 
MAX_RECLAIM_RETRIES limit or when zswap_total_pages() <= thr.

How about something like this? If there are no objections, I'll fold 
this into the next version.

     mm/zswap: Fix global shrinker when memory cgroup is disabled

     When memory cgroup is disabled, mem_cgroup_iter() always returns NULL.
     Therefore, the global shrinker shrink_worker() always takes the !memcg
     branch. After MAX_RECLAIM_RETRIES empty walks, the worker simply 
gives up,
     so it fails to write back anything.

     Therefore, when memory cgroup is disabled, fall through with the !memcg
     branch and shrink the root memcg directly.

     With memcg disabled, shrink_memcg() only returns -ENOENT when the root
     LRU is empty, which means the total pages are already below thr. 
The loop
     then safely bails out via the zswap_total_pages() <= thr check. For any
     other return value from shrink_memcg(), the loop is guaranteed to 
terminate,
     either after MAX_RECLAIM_RETRIES failures or once the threshold is met.

     Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
     Cc: stable@vger.kernel.org
     Reported-by: Yosry Ahmed <yosry@kernel.org>
     Closes: 
https://lore.kernel.org/all/CAO9r8zPVzMKFbCixxD-qgtRrkFxWVrHiZZeLc=eyTPKPVQgX4g@mail.gmail.com
     Signed-off-by: Hao Jia <jiahao1@lixiang.com>

diff --git a/mm/zswap.c b/mm/zswap.c
index 4b5149173b0e..9d4f19fc440e 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1361,11 +1361,12 @@ static void shrink_worker(struct work_struct *w)
                 } while (memcg && !mem_cgroup_tryget_online(memcg));
                 spin_unlock(&zswap_shrink_lock);

-               if (!memcg) {
-                       /*
-                        * Continue shrinking without incrementing 
failures if
-                        * we found candidate memcgs in the last tree walk.
-                        */
+               /*
+                * A NULL memcg ends a full hierarchy pass (except when 
memcg is
+                * disabled, where it is always NULL: fall through to 
the root LRU).
+                * Count a failure only if the pass found no candidates.
+                */
+               if (!memcg && !mem_cgroup_disabled()) {
                         if (!attempts && ++failures == MAX_RECLAIM_RETRIES)
                                 break;

Thanks,
Hao

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/6] mm/zswap: Fix global shrinker when memory cgroup is disabled
  2026-06-30 10:51     ` Hao Jia
@ 2026-06-30 16:02       ` Yosry Ahmed
  0 siblings, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-30 16:02 UTC (permalink / raw)
  To: Hao Jia
  Cc: Nhat Pham, akpm, tj, hannes, shakeel.butt, mhocko, mkoutny,
	chengming.zhou, muchun.song, roman.gushchin, linux-mm,
	linux-kernel, linux-doc, Hao Jia, stable

> How about something like this? If there are no objections, I'll fold
> this into the next version.
>
>      mm/zswap: Fix global shrinker when memory cgroup is disabled
>
>      When memory cgroup is disabled, mem_cgroup_iter() always returns NULL.
>      Therefore, the global shrinker shrink_worker() always takes the !memcg
>      branch. After MAX_RECLAIM_RETRIES empty walks, the worker simply
> gives up,
>      so it fails to write back anything.
>
>      Therefore, when memory cgroup is disabled, fall through with the !memcg
>      branch and shrink the root memcg directly.
>
>      With memcg disabled, shrink_memcg() only returns -ENOENT when the root
>      LRU is empty, which means the total pages are already below thr.
> The loop
>      then safely bails out via the zswap_total_pages() <= thr check. For any
>      other return value from shrink_memcg(), the loop is guaranteed to
> terminate,
>      either after MAX_RECLAIM_RETRIES failures or once the threshold is met.
>
>      Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
>      Cc: stable@vger.kernel.org
>      Reported-by: Yosry Ahmed <yosry@kernel.org>
>      Closes:
> https://lore.kernel.org/all/CAO9r8zPVzMKFbCixxD-qgtRrkFxWVrHiZZeLc=eyTPKPVQgX4g@mail.gmail.com
>      Signed-off-by: Hao Jia <jiahao1@lixiang.com>

Feel free to add:

Acked-by: Yosry Ahmed <yosry@kernel.org>

A small nit below.

>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 4b5149173b0e..9d4f19fc440e 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1361,11 +1361,12 @@ static void shrink_worker(struct work_struct *w)
>                  } while (memcg && !mem_cgroup_tryget_online(memcg));
>                  spin_unlock(&zswap_shrink_lock);
>
> -               if (!memcg) {
> -                       /*
> -                        * Continue shrinking without incrementing
> failures if
> -                        * we found candidate memcgs in the last tree walk.
> -                        */
> +               /*
> +                * A NULL memcg ends a full hierarchy pass (except when
> memcg is
> +                * disabled, where it is always NULL: fall through to
> the root LRU).
> +                * Count a failure only if the pass found no candidates.

I think "last pass" is clearer than just "pass" here?

> +                */
> +               if (!memcg && !mem_cgroup_disabled()) {
>                          if (!attempts && ++failures == MAX_RECLAIM_RETRIES)
>                                  break;
>
> Thanks,
> Hao

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/6] mm/zswap: Implement proactive writeback
  2026-06-30  1:49     ` Hao Jia
@ 2026-06-30 16:10       ` Yosry Ahmed
  0 siblings, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-30 16:10 UTC (permalink / raw)
  To: Hao Jia
  Cc: akpm, tj, hannes, shakeel.butt, mhocko, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin, linux-mm,
	linux-kernel, linux-doc, Hao Jia

> > Before going through more versions we need to figure out if this will
> > pivot to be a proactive demotion interfcae for swap tiering.
> >
>
> Yes. Should I drop patches 4-6 in the next version and wait for swap
> tiering to be finalized?
> We can try to get the non-memcg parts (patches 1-3) merged upstream
> first. This would also give them plenty of time to bake and catch any
> potential regressions. Thoughts?

Patches 1-2 can be sent and merged separately, yes. For patch 2,
please include some numbers for the writeback performance before and
after batching.

Patch 3 does refactoring in preparation for patch 4, so I don't think
it makes sense on its own.

> >> +int zswap_proactive_writeback(struct mem_cgroup *memcg, u64 bytes_to_writeback)
> >> +{
> >> +    struct zswap_shrink_state s = {};
> >> +    struct mem_cgroup *iter = NULL;
> >> +    u64 bytes_written = 0;
> >> +    int ret = 0;
> >> +
> >> +    if (!memcg)
> >> +            return -EINVAL;
> >
> > Can this ever happen? It would be a bug in the caller.
>
> IIRC,Writing the following to the NUMA node sysfs entry triggers this
> check:
> echo "10M source=zswap" > /sys/devices/system/node/nodeN/reclaim

Oh yeah, I forgot about that one :)

If we keep this, probably combine the !memcg and writeback check below.

>
> >
> >> +    if (!mem_cgroup_zswap_writeback_enabled(memcg))
> >> +            return -EINVAL;
> >> +    if (!bytes_to_writeback)
> >> +            return 0;
> >
> > Do we need this? I think the loop will just never enter and
> > mem_cgroup_iter_break() will do nothing.
>
> Will do.
> >
> >> +
> >> +    while (bytes_written < bytes_to_writeback) {
> >> +            long shrunk;
> >> +
> >> +            cond_resched();
> >> +
> >> +            if (signal_pending(current)) {
> >> +                    ret = -EINTR;
> >> +                    break;
> >> +            }
> >> +
> >> +            /*
> >> +             * Use a local iterator to walk the memcg and its online descendants
> >> +             * in a round-robin manner. Upon exiting the loop, mem_cgroup_iter_break()
> >> +             * must be called to drop the iterator reference.
> >> +             */
> >> +            do {
> >> +                    iter = mem_cgroup_iter(memcg, iter, NULL);
> >> +            } while (iter && !mem_cgroup_tryget_online(iter));
> >> +
> >> +            shrunk = zswap_shrink_one_memcg(iter, &s);
> >> +            if (shrunk > 0)
> >> +                    bytes_written += shrunk;
> >> +
> >> +            /* drop the extra reference taken by mem_cgroup_tryget_online() */
> >> +            mem_cgroup_put(iter);
> >
> >
> > Can we just use mem_cgroup_online() instead since mem_cgroup_iter()
> > already graps a ref?
> >
> Will do.

If you're looking for another cleanup to do, shrink_worker() should
probably also use mem_cgroup_online() and avoid taking/dropping an
extra ref :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-06-30 16:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-29 11:20 [PATCH v5 0/6] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
2026-06-29 11:20 ` [PATCH v5 1/6] mm/zswap: Fix global shrinker when memory cgroup is disabled Hao Jia
2026-06-29 18:37   ` Nhat Pham
2026-06-30 10:51     ` Hao Jia
2026-06-30 16:02       ` Yosry Ahmed
2026-06-29 11:20 ` [PATCH v5 2/6] mm/zswap: Support batch writeback in shrink_memcg() Hao Jia
2026-06-30  0:21   ` Yosry Ahmed
2026-06-30  1:18     ` Hao Jia
2026-06-29 11:20 ` [PATCH v5 3/6] mm/zswap: Extract a reusable writeback helper from shrink_worker() Hao Jia
2026-06-29 11:20 ` [PATCH v5 4/6] mm/zswap: Implement proactive writeback Hao Jia
2026-06-30  0:15   ` Yosry Ahmed
2026-06-30  1:49     ` Hao Jia
2026-06-30 16:10       ` Yosry Ahmed
2026-06-29 11:20 ` [PATCH v5 5/6] mm/zswap: Add per-memcg stat for " Hao Jia
2026-06-29 11:20 ` [PATCH v5 6/6] selftests/cgroup: Add tests for zswap " Hao Jia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox