From: Yosry Ahmed <yosry@kernel.org>
To: Hao Jia <jiahao.kernel@gmail.com>
Cc: akpm@linux-foundation.org, tj@kernel.org, hannes@cmpxchg.org,
shakeel.butt@linux.dev, mhocko@kernel.org, mkoutny@suse.com,
nphamcs@gmail.com, chengming.zhou@linux.dev,
muchun.song@linux.dev, roman.gushchin@linux.dev,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, Hao Jia <jiahao1@lixiang.com>
Subject: Re: [PATCH v5 4/6] mm/zswap: Implement proactive writeback
Date: Tue, 30 Jun 2026 00:15:55 +0000 [thread overview]
Message-ID: <akMJ8UfeZXrVe5LN@google.com> (raw)
In-Reply-To: <20260629112032.20423-5-jiahao.kernel@gmail.com>
On Mon, Jun 29, 2026 at 07:20:30PM +0800, Hao Jia wrote:
> From: Hao Jia <jiahao1@lixiang.com>
>
> Zswap currently writes back pages to backing swap reactively, triggered
> either by the shrinker or when the pool reaches its size limit. There is
> no mechanism to control the amount of writeback for a specific memory
> cgroup. However, users may want to proactively write back zswap pages,
> e.g., to free up memory for other applications or to prepare for
> memory-intensive workloads.
>
> Introduce a "source=" key to the memory.reclaim cgroup interface,
> currently accepting the single value "zswap". When set to "zswap", it
> bypasses standard memory reclaim and exclusively performs proactive
> zswap writeback up to the requested budget. If omitted, the default
> reclaim behavior remains unchanged.
>
> Example usage:
> # Write back 10MB of compressed data from zswap to the backing swap
> echo "10M source=zswap" > memory.reclaim
>
> Note that the actual amount of compressed data written back may be less
> than requested due to the zswap second-chance algorithm: referenced
> entries are rotated on the LRU on the first encounter and only written
> back on a second pass. If fewer bytes are written back than requested,
> -EAGAIN is returned, matching the existing memory.reclaim semantics.
>
> Internally, extend user_proactive_reclaim() to parse the new "source="
> key and invoke the dedicated handler zswap_proactive_writeback() when it
> is set to "zswap". This handler walks the target memcg subtree in a
> round-robin fashion and drains each memcg's per-node zswap LRUs through
> shrink_memcg(), accumulating the compressed bytes written back until the
> requested budget is met.
>
> Suggested-by: Yosry Ahmed <yosry@kernel.org>
> Suggested-by: Nhat Pham <nphamcs@gmail.com>
> Signed-off-by: Hao Jia <jiahao1@lixiang.com>
> ---
Before going through more versions we need to figure out if this will
pivot to be a proactive demotion interfcae for swap tiering.
> @@ -7869,9 +7872,12 @@ int user_proactive_reclaim(char *buf,
> unsigned int nr_retries = MAX_RECLAIM_RETRIES;
> unsigned long nr_to_reclaim, nr_reclaimed = 0;
> int swappiness = -1;
> + bool zswap_writeback_only = false;
> char *old_buf, *start;
> + char source[16];
> substring_t args[MAX_OPT_ARGS];
> gfp_t gfp_mask = GFP_KERNEL;
> + u64 nr_bytes;
>
> if (!buf || (!memcg && !pgdat) || (memcg && pgdat))
> return -EINVAL;
> @@ -7879,7 +7885,8 @@ int user_proactive_reclaim(char *buf,
> buf = strstrip(buf);
>
> old_buf = buf;
> - nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE;
> + nr_bytes = memparse(buf, &buf);
> + nr_to_reclaim = nr_bytes / PAGE_SIZE;
Nit: if we keep this as part of memory.reclaim, we probably want to
choose clearer names (e.g. pages_to_reclaim and bytes_to_reclaim).
> if (buf == old_buf)
> return -EINVAL;
>
> @@ -7899,11 +7906,26 @@ int user_proactive_reclaim(char *buf,
> case MEMORY_RECLAIM_SWAPPINESS_MAX:
> swappiness = SWAPPINESS_ANON_ONLY;
> break;
> + case MEMORY_RECLAIM_SOURCE:
> + if (match_strlcpy(source, &args[0], sizeof(source)) >= sizeof(source))
> + return -EINVAL;
> + /* Only zswap is supported as a reclaim source for now. */
> + if (strcmp(source, "zswap"))
> + return -EINVAL;
> + zswap_writeback_only = true;
> + break;
> default:
> return -EINVAL;
> }
> }
>
> + if (zswap_writeback_only) {
> + /* source=zswap and swappiness are mutually exclusive. */
> + if (swappiness != -1)
> + return -EINVAL;
> + return zswap_proactive_writeback(memcg, nr_bytes);
> + }
> +
> while (nr_reclaimed < nr_to_reclaim) {
> /* Will converge on zero, but reclaim enforces a minimum */
> unsigned long batch_size = (nr_to_reclaim - nr_reclaimed) / 4;
> diff --git a/mm/zswap.c b/mm/zswap.c
> index ba01bf0e44e9..9cda96f05508 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1713,6 +1713,56 @@ int zswap_load(struct folio *folio)
> return 0;
> }
>
> +int zswap_proactive_writeback(struct mem_cgroup *memcg, u64 bytes_to_writeback)
> +{
> + struct zswap_shrink_state s = {};
> + struct mem_cgroup *iter = NULL;
> + u64 bytes_written = 0;
> + int ret = 0;
> +
> + if (!memcg)
> + return -EINVAL;
Can this ever happen? It would be a bug in the caller.
> + if (!mem_cgroup_zswap_writeback_enabled(memcg))
> + return -EINVAL;
> + if (!bytes_to_writeback)
> + return 0;
Do we need this? I think the loop will just never enter and
mem_cgroup_iter_break() will do nothing.
> +
> + while (bytes_written < bytes_to_writeback) {
> + long shrunk;
> +
> + cond_resched();
> +
> + if (signal_pending(current)) {
> + ret = -EINTR;
> + break;
> + }
> +
> + /*
> + * Use a local iterator to walk the memcg and its online descendants
> + * in a round-robin manner. Upon exiting the loop, mem_cgroup_iter_break()
> + * must be called to drop the iterator reference.
> + */
> + do {
> + iter = mem_cgroup_iter(memcg, iter, NULL);
> + } while (iter && !mem_cgroup_tryget_online(iter));
> +
> + shrunk = zswap_shrink_one_memcg(iter, &s);
> + if (shrunk > 0)
> + bytes_written += shrunk;
> +
> + /* drop the extra reference taken by mem_cgroup_tryget_online() */
> + mem_cgroup_put(iter);
Can we just use mem_cgroup_online() instead since mem_cgroup_iter()
already graps a ref?
> +
> + if (shrunk == -EBUSY) {
> + ret = -EAGAIN;
> + break;
> + }
> + }
> +
> + mem_cgroup_iter_break(memcg, iter);
> + return ret;
> +}
> +
> void zswap_invalidate(swp_entry_t swp)
> {
> pgoff_t offset = swp_offset(swp);
> --
> 2.34.1
>
next prev parent reply other threads:[~2026-06-30 0:15 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-29 11:20 [PATCH v5 0/6] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
2026-06-29 11:20 ` [PATCH v5 1/6] mm/zswap: Fix global shrinker when memory cgroup is disabled Hao Jia
2026-06-29 18:37 ` Nhat Pham
2026-06-30 10:51 ` Hao Jia
2026-06-30 16:02 ` Yosry Ahmed
2026-07-01 9:39 ` Hao Jia
2026-07-01 17:33 ` Nhat Pham
2026-06-29 11:20 ` [PATCH v5 2/6] mm/zswap: Support batch writeback in shrink_memcg() Hao Jia
2026-06-30 0:21 ` Yosry Ahmed
2026-06-30 1:18 ` Hao Jia
2026-06-29 11:20 ` [PATCH v5 3/6] mm/zswap: Extract a reusable writeback helper from shrink_worker() Hao Jia
2026-06-29 11:20 ` [PATCH v5 4/6] mm/zswap: Implement proactive writeback Hao Jia
2026-06-30 0:15 ` Yosry Ahmed [this message]
2026-06-30 1:49 ` Hao Jia
2026-06-30 16:10 ` Yosry Ahmed
2026-07-01 9:35 ` Hao Jia
2026-07-01 11:45 ` Hao Jia
2026-07-02 12:32 ` Hao Jia
2026-06-29 11:20 ` [PATCH v5 5/6] mm/zswap: Add per-memcg stat for " Hao Jia
2026-06-29 11:20 ` [PATCH v5 6/6] selftests/cgroup: Add tests for zswap " Hao Jia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=akMJ8UfeZXrVe5LN@google.com \
--to=yosry@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=hannes@cmpxchg.org \
--cc=jiahao.kernel@gmail.com \
--cc=jiahao1@lixiang.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.