From: Yosry Ahmed <yosry@kernel.org>
To: Hao Jia <jiahao.kernel@gmail.com>
Cc: akpm@linux-foundation.org, tj@kernel.org, hannes@cmpxchg.org,
shakeel.butt@linux.dev, mhocko@kernel.org, mkoutny@suse.com,
nphamcs@gmail.com, chengming.zhou@linux.dev,
muchun.song@linux.dev, roman.gushchin@linux.dev,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, Hao Jia <jiahao1@lixiang.com>
Subject: Re: [PATCH v4 3/5] mm/zswap: Implement proactive writeback
Date: Mon, 22 Jun 2026 23:40:42 +0000 [thread overview]
Message-ID: <ajnHJhvWxeR1DbaP@google.com> (raw)
In-Reply-To: <20260618044857.69439-4-jiahao.kernel@gmail.com>
On Thu, Jun 18, 2026 at 12:48:55PM +0800, Hao Jia wrote:
> From: Hao Jia <jiahao1@lixiang.com>
>
> Zswap currently writes back pages to backing swap reactively, triggered
> either by the shrinker or when the pool reaches its size limit. There is
> no mechanism to control the amount of writeback for a specific memory
> cgroup. However, users may want to proactively write back zswap pages,
> e.g., to free up memory for other applications or to prepare for
> memory-intensive workloads.
>
> Introduce a "zswap_writeback_only" key to the memory.reclaim cgroup
> interface. When specified, this key bypasses standard memory reclaim
> and exclusively performs proactive zswap writeback up to the requested
> budget. If omitted, the default reclaim behavior remains unchanged.
>
> Example usage:
> # Write back 10MB of compressed data from zswap to the backing swap
> echo "10M zswap_writeback_only" > memory.reclaim
>
> Note that the actual amount of compressed data written back may be less
> than requested due to the zswap second-chance algorithm: referenced
> entries are rotated on the LRU on the first encounter and only written
> back on a second pass. If fewer bytes are written back than requested,
> -EAGAIN is returned, matching the existing memory.reclaim semantics.
>
> Internally, extend user_proactive_reclaim() to parse the new
> "zswap_writeback_only" token and invoke the dedicated handler
> zswap_proactive_writeback(). This handler reuses
> zswap_try_to_writeback() to walk the target memcg subtree, draining
> per-node zswap LRUs through list_lru_walk_one() with the
> shrink_memcg_cb() callback.
I won't comment on the memcg interface as this is more-or-less a
placeholder until an interface is finalized.
>
> Suggested-by: Yosry Ahmed <yosry@kernel.org>
> Suggested-by: Nhat Pham <nphamcs@gmail.com>
> Signed-off-by: Hao Jia <jiahao1@lixiang.com>
[..]
> diff --git a/mm/zswap.c b/mm/zswap.c
> index e29f8a61412d..28200552dde3 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1423,6 +1423,27 @@ static struct mem_cgroup *zswap_iter_global(void)
> return memcg;
> }
>
> +/*
> + * Local iteration uses a local cursor to select from online memcgs
> + * under @root in a round-robin fashion.
> + *
> + * Pass the previous return value as @prev to advance the round-robin
> + * iteration, or pass NULL to start a new walk. If exiting early before
> + * the iteration completes, the caller must call mem_cgroup_iter_break()
> + * to release the cursor reference.
> + */
> +static struct mem_cgroup *zswap_iter_local(struct mem_cgroup *root,
> + struct mem_cgroup *prev)
> +{
> + struct mem_cgroup *memcg;
> +
> + do {
> + memcg = mem_cgroup_iter(root, prev, NULL);
> + prev = memcg;
> + } while (memcg && !mem_cgroup_tryget_online(memcg));
> + return memcg;
> +}
> +
> /*
> * Walk the memcg tree and write back zswap pages until the
> * (lower_pages, upper_pages) window closes, or abort encounter
> @@ -1430,16 +1451,23 @@ static struct mem_cgroup *zswap_iter_global(void)
> * - No writeback-candidate memcgs found in a memcg tree walk.
> * - Shrinking a writeback-candidate memcg failed.
> *
> - * For shrink_worker(), it passes lower=thr and upper=zswap_total_pages().
> - * The @upper limit is refreshed in each iteration by re-evaluating
> - * zswap_total_pages(), and the window closes once the total falls
> - * below the threshold.
> + * For shrink_worker() (proactive=false), it passes lower=thr and
> + * upper=zswap_total_pages(). The @upper limit is refreshed in each
> + * iteration by re-evaluating zswap_total_pages(), and the window
> + * closes once the total falls below the threshold.
> + *
> + * For zswap_proactive_writeback() (proactive=true), it passes lower=0
> + * and upper=nr_to_writeback. The @lower limit is advanced by the
> + * compressed bytes written back via shrink_memcg(). The window closes
> + * once @nr_to_writeback pages of compressed data have been written back.
> */
> -static void zswap_try_to_writeback(unsigned long lower_pages,
> - unsigned long upper_pages)
> +static int zswap_try_to_writeback(struct mem_cgroup *memcg,
> + unsigned long lower_pages,
> + unsigned long upper_pages, bool proactive)
As I mentiond in the previous patch, this is the wrong abstraction. The
function is extremely tighyl-coupled to the callers, and needing to
pass in things like proactive makes it even worse.
It should be limited to reclaiming one batch of pages from a memcg, and
the retry logic. Everything else (memcg iteration logic, scan goal
checks) should be in the caller.
[..]
> static void shrink_worker(struct work_struct *w)
> @@ -1490,7 +1536,7 @@ static void shrink_worker(struct work_struct *w)
> /* Reclaim down to the accept threshold */
> thr = zswap_accept_thr_pages();
>
> - zswap_try_to_writeback(thr, zswap_total_pages());
> + zswap_try_to_writeback(NULL, thr, zswap_total_pages(), false);
> }
>
> /*********************************
> @@ -1736,6 +1782,19 @@ int zswap_load(struct folio *folio)
> return 0;
> }
>
> +int zswap_proactive_writeback(struct mem_cgroup *memcg,
> + unsigned long nr_to_writeback)
> +{
> + if (!memcg)
> + return -EINVAL;
> + if (!mem_cgroup_zswap_writeback_enabled(memcg))
> + return -EINVAL;
> + if (!nr_to_writeback)
> + return 0;
> +
> + return zswap_try_to_writeback(memcg, 0, nr_to_writeback, true);
The memcg loop should be here, together with a check on the written
bytes to check if the reclaim goal was achieved. I think nr_to_writeback
is also very confusing, it's really the reclaim target in bytes divided
by PAGE_SIZE. I think you need to pass in the number of bytes to
reclaim/writeback directly.
> +}
> +
> void zswap_invalidate(swp_entry_t swp)
> {
> pgoff_t offset = swp_offset(swp);
> --
> 2.34.1
>
next prev parent reply other threads:[~2026-06-22 23:40 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-18 4:48 [PATCH v4 0/5] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
2026-06-18 4:48 ` [PATCH v4 1/5] mm/zswap: Extend shrink_memcg() writeback capability Hao Jia
2026-06-22 23:33 ` Yosry Ahmed
2026-06-18 4:48 ` [PATCH v4 2/5] mm/zswap: Factor writeback loop out of shrink_worker() Hao Jia
2026-06-22 23:36 ` Yosry Ahmed
2026-06-18 4:48 ` [PATCH v4 3/5] mm/zswap: Implement proactive writeback Hao Jia
2026-06-22 23:40 ` Yosry Ahmed [this message]
2026-06-18 4:48 ` [PATCH v4 4/5] mm/zswap: Add per-memcg stat for " Hao Jia
2026-06-22 23:42 ` Yosry Ahmed
2026-06-18 4:48 ` [PATCH v4 5/5] selftests/cgroup: Add tests for zswap " Hao Jia
2026-06-21 4:20 ` [PATCH v4 0/5] mm/zswap: Implement per-cgroup " Muchun Song
2026-06-22 6:08 ` Hao Jia
2026-06-22 10:04 ` Youngjun Park
2026-06-22 21:29 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajnHJhvWxeR1DbaP@google.com \
--to=yosry@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=hannes@cmpxchg.org \
--cc=jiahao.kernel@gmail.com \
--cc=jiahao1@lixiang.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox