From: Hao Jia <jiahao.kernel@gmail.com>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: akpm@linux-foundation.org, tj@kernel.org, hannes@cmpxchg.org,
shakeel.butt@linux.dev, mhocko@kernel.org, yosry@kernel.org,
nphamcs@gmail.com, chengming.zhou@linux.dev,
muchun.song@linux.dev, roman.gushchin@linux.dev,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
Hao Jia <jiahao1@lixiang.com>
Subject: Re: [PATCH 0/3] mm/zswap: Implement per-cgroup proactive writeback
Date: Tue, 12 May 2026 19:23:32 +0800 [thread overview]
Message-ID: <5e6cf3fe-40eb-4a57-4bbb-eda2c31b3210@gmail.com> (raw)
In-Reply-To: <agG-gNEclOVf-9WA@localhost.localdomain>
On 2026/5/11 19:39, Michal Koutný wrote:
> On Mon, May 11, 2026 at 06:51:46PM +0800, Hao Jia <jiahao.kernel@gmail.com> wrote:
>> From: Hao Jia <jiahao1@lixiang.com>
>>
>> Zswap currently writes back pages to backing swap devices reactively,
>> triggered either by memory pressure via the shrinker or by the pool
>> reaching its size limit. However, this reactive approach makes writeback
>> timing indeterminate and can disrupt latency-sensitive workloads when
>> eviction happens to coincide with a critical execution window.
>>
>> Furthermore, in certain scenarios, it is desirable to trigger writeback
>> in advance to free up memory. For example, users may want to prepare for
>> an upcoming memory-intensive workload by flushing cold memory to the
>> backing storage when the system is relatively idle.
>
> I can imagine the zswap writeout can come at the least possible
> moment...
>
>> To address these issues, this patch series introduces a per-cgroup
>> interface that allows users to proactively write back cold compressed
>> pages from zswap to the backing swap device.
>
> ...but I see this series is not only per-cgroup proactive reclaim but
> it's also age-based reclaim.
>
> The per-cg consumption and limits (and regular memory reclaim) are all
> measured in sizes. This age-based invocations don't seem commensurable
> (e.g. how would users in practice determine what is the desired input to
> here).
>
Thanks Michal — you are right. The series is both per-memcg *and*
age-based.
The interface carries a size budget, like memory.reclaim. The two
parameters play different roles:
"write back up to <max> bytes, chosen from entries whose residency
in zswap is at least <age>"
Size stays the unit of *amount*; age is just how we describe *which*
entries are eligible.
> Could you explain more reasoning behind this design?
>
Context on the use case:
Our deployment runs a userspace proactive reclaimer driven by the
system's runtime state (memory/CPU/IO pressure, refault rate, ...)
and workload-specific policy. It uses memory.reclaim to drive
reclaim, which compresses cold anon pages into zswap as the first
stage. For entries that then remain in zswap past a policy-defined
age threshold, the reclaimer wants to write them back to the backing
swap device at a moment of its own choosing, to further reclaim the
DRAM still held by the compressed data.
Why age is a reasonable selector at this stage:
Pages in zswap have already passed a first-stage coldness judgement
(otherwise they would not have been compressed). For second-level
offloading, the question is which of them are cold *enough*.
Time-in-zswap is a natural proxy for that. A swap-in invalidates the
corresponding zswap entry and resets the clock, so by construction
an entry that has sat in zswap for N seconds has not been faulted in
for at least N seconds. Residency in zswap is therefore a strong
signal that the entry is not about to refault.
In our deployment the userspace reclaimer starts from a conservative
threshold (the starting value depends on the workload) and adjusts it
through closed-loop feedback:
- on one side, the age distribution of zswap entries, to see
whether there is a meaningful population past the threshold;
- on the other side, the post-writeback refault rate and related
signals, to confirm that entries written back were in fact cold
enough.
Both <age> and max=<bytes> are tuned against these signals until the
realized writeback volume matches target. This is the same
control-loop style already used to drive the first-stage
memory.reclaim budget.
Thanks,
Hao
next prev parent reply other threads:[~2026-05-12 11:23 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 10:51 [PATCH 0/3] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
2026-05-11 10:51 ` [PATCH 1/3] mm/zswap: Make shrink_worker writeback cursor per-memcg Hao Jia
2026-05-11 10:51 ` [PATCH 2/3] mm/zswap: Implement proactive writeback Hao Jia
2026-05-11 19:49 ` Nhat Pham
2026-05-11 19:57 ` Yosry Ahmed
2026-05-12 9:32 ` Hao Jia
2026-05-12 15:47 ` Nhat Pham
2026-05-13 8:04 ` Hao Jia
2026-05-13 18:54 ` Yosry Ahmed
2026-05-13 20:53 ` Nhat Pham
2026-05-14 8:13 ` Hao Jia
2026-05-13 21:09 ` Nhat Pham
2026-05-14 8:15 ` Hao Jia
2026-05-11 19:54 ` Nhat Pham
2026-05-12 9:37 ` Hao Jia
2026-05-11 10:51 ` [PATCH 3/3] mm/zswap: Add per-memcg stat for " Hao Jia
2026-05-13 21:21 ` Nhat Pham
2026-05-14 8:21 ` Hao Jia
2026-05-11 11:39 ` [PATCH 0/3] mm/zswap: Implement per-cgroup " Michal Koutný
2026-05-12 11:23 ` Hao Jia [this message]
2026-05-11 19:53 ` Nhat Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5e6cf3fe-40eb-4a57-4bbb-eda2c31b3210@gmail.com \
--to=jiahao.kernel@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=hannes@cmpxchg.org \
--cc=jiahao1@lixiang.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
--cc=yosry@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.